Published on

Polyglot Persistence in 2026 — Choosing the Right Database for Every Job

Authors

Introduction

The first rule of database selection: there is no universal database.

PostgreSQL is excellent. It''s ACID, it''s fast, it handles most workloads. But a company using only PostgreSQL for OLTP, OLAP, full-text search, and caching is leaving performance on the table.

Polyglot persistence is the practice of using multiple databases, each for what it''s best at. PostgreSQL for transactions. ClickHouse for analytics. Redis for caching. Elasticsearch for search.

The challenge: keeping them in sync without complexity spiraling.

This post covers the architecture of polyglot persistence in 2026.

Right Database for Right Use Case

OLTP (Online Transaction Processing)

Transactional reads and writes. User logs in, updates profile, makes a purchase.

Use: PostgreSQL, MySQL, MongoDB.

Why: ACID guarantees, arbitrary queries, proven consistency.

-- User updates profile (atomic)
BEGIN;
  UPDATE users SET name = 'John' WHERE id = 123;
  INSERT INTO audit_log VALUES (123, 'name_changed', now());
COMMIT;

OLAP (Online Analytical Processing)

Aggregations across millions of rows. Weekly revenue reports, user cohorts, funnel analysis.

Use: ClickHouse, Snowflake, BigQuery, DuckDB.

Why: Columnar storage, parallelism, instant queries on huge datasets.

-- ClickHouse: aggregate 1B rows in 100ms
SELECT
  toDate(timestamp) as day,
  COUNT() as revenue_count,
  SUM(amount) as total_revenue
FROM events
WHERE timestamp >= today() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day DESC;

Search (Full-Text, Semantic)

Find documents by keyword or meaning.

Use: Elasticsearch, OpenSearch, PostgreSQL (pgvector for semantic), MongoDB (Atlas Search).

Why: Specialized indexing (inverted indices, vector indices), speed, relevance ranking.

# Elasticsearch: full-text search
POST /documents/_search
{
  "query": {
    "multi_match": {
      "query": "machine learning",
      "fields": ["title^2", "content"]
    }
  }
}

# PostgreSQL pgvector: semantic search
SELECT id, 1 - (embedding <=> query_embedding) as similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

Cache

Reduce database load, speed up response times.

Use: Redis, Memcached, in-process cache (Node.js).

Why: Sub-millisecond latency, simple API, automatic expiry.

// Redis cache
const user = await redis.get(`user:${userId}`);
if (!user) {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
  await redis.setEx(`user:${userId}`, 3600, JSON.stringify(user));
}

Graph

Relationships, recommendations, social networks.

Use: Neo4j, Amazon Neptune, PostgreSQL (jsonb paths).

Why: Specialized traversal, pattern matching, efficient multi-hop queries.

// Neo4j: find friends of friends (3 hops)
MATCH (me:User {id: 123})-[:FRIEND]-(friend)-[:FRIEND]-(fof)
WHERE NOT (me)-[:FRIEND]-(fof)
RETURN fof, count(*) as mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10;

Time Series

Metrics, logs, sensor data. Trillions of data points.

Use: ClickHouse, TimescaleDB, Prometheus, InfluxDB.

Why: Compression, downsampling, retention policies, time-based queries.

-- TimescaleDB: automatic time-based partitioning
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  host TEXT NOT NULL,
  cpu_usage FLOAT NOT NULL
);
SELECT create_hypertable('metrics', 'time');

-- Query with time predicates
SELECT time, avg(cpu_usage)
FROM metrics
WHERE time > now() - INTERVAL 7 DAYS
GROUP BY time_bucket('1 hour', time);

Vector Database (Specialized)

Large-scale AI similarity search. Millions of vectors, sub-millisecond latency.

Use: Pinecone, Weaviate, Milvus, if you have >10M vectors.

Why: Specialized ANN indexing (HNSW, IVF-FLAT), distributed sharding.

For most startups, PostgreSQL pgvector or Redis vectors are sufficient. Specialized vector DBs are for search-as-product companies.

PostgreSQL as the "Good Enough" Option

PostgreSQL has extensions for almost everything:

  • pgvector: Vector search
  • pg_cron: Scheduled jobs
  • PostGIS: Geographic queries
  • pg_trgm: Fuzzy search
  • jsonb: Flexible schema
  • pgai: AI functions

For a startup with limited ops resources, all-PostgreSQL is pragmatic. You lose some performance, gain operational simplicity.

Tradeoff: PostgreSQL might use 200GB for what ClickHouse stores in 4GB. But you''re managing one database, not three.

When You Actually Need Redis (Not Just Caching)

Cache is Redis''s gateway drug. But Redis excels at:

  1. Sub-millisecond latency required: Session tokens must be <1ms. Database queries are >5ms. Redis hits memory directly.

  2. Pub/Sub messaging: Real-time notifications, chat, collaborative editing. PostgreSQL LISTEN/NOTIFY works for small clusters but doesn''t scale.

  3. Leaderboards and rankings: Sorted sets are O(log N). SQL aggregations are slower.

  4. Rate limiting: Sliding windows with Redis are trivial. SQL queries are expensive.

For general caching, even PostgreSQL with good indexes is often sufficient. Use Redis when latency is critical.

When Elasticsearch/OpenSearch Wins

Full-text search on PostgreSQL (using pg_trgm or pgroonga) is functional but slow.

Elasticsearch wins when:

  1. You search across millions of documents
  2. Relevance ranking matters (TF-IDF, BM25)
  3. Faceted search is needed (filter by category, date, author simultaneously)
  4. You need phrases, wildcards, fuzzy matching

Example: e-commerce search. 1M products, users search "nike running shoes". Elasticsearch returns results ranked by relevance, with facets (size, price, color) in milliseconds.

PostgreSQL full-text search is <10% as fast for this workload.

When ClickHouse Wins

PostgreSQL aggregations on large datasets are slow.

Query: "What''s the average response time per hour over the last month?" with 1B rows.

PostgreSQL: 30–60 seconds ClickHouse: 100–500ms

Use ClickHouse when:

  1. You aggregate >100M rows
  2. Data is time-series or event-based
  3. You need real-time dashboards
  4. Columns are accessed more often than rows

When a Vector DB is Necessary

Use Pinecone/Weaviate instead of PostgreSQL pgvector when:

  1. >10M vectors
  2. Sub-100ms latency required at scale
  3. Search is your core product feature
  4. You need distributed sharding

Most startups don''t hit these constraints. Start with pgvector. Migrate if needed.

Data Synchronization Between Databases

The hard part: keeping PostgreSQL, Redis, and Elasticsearch in sync.

Change Data Capture (CDC)

Capture changes from PostgreSQL''s write-ahead log. Stream to other systems.

Tools: Debezium, Logical Decoding, Kafka, AWS DMS.

# Debezium: stream Postgres changes to Kafka
docker run -e POSTGRES_PASSWORD=password postgres

# Debezium connector configuration
{
  "name": "postgres-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "postgres",
    "database.dbname": "mydb",
    "table.include.list": "public.users,public.posts",
    "publication.name": "dbz_publication"
  }
}

Changes stream to Kafka. Elasticsearch and Redis consumers consume and update their indexes.

Dual Writes (Avoid)

Simpler but error-prone: write to PostgreSQL, then write to Redis and Elasticsearch.

async function createUser(email: string, name: string) {
  // Write to PostgreSQL
  const user = await db.users.insert({ email, name });

  // Write to Redis
  await redis.set(`user:${user.id}`, JSON.stringify(user));

  // Write to Elasticsearch
  await elasticsearch.index({
    index: 'users',
    id: user.id,
    body: user,
  });

  return user;
}

Problem: if any write fails, systems diverge. Race conditions are common.

Use CDC instead. It''s more complex but reliable.

Materialized Views

Pre-compute and store aggregations.

PostgreSQL:

CREATE MATERIALIZED VIEW user_stats AS
SELECT
  u.id,
  u.email,
  COUNT(p.id) as post_count,
  SUM(p.views) as total_views
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.email;

-- Refresh periodically
REFRESH MATERIALIZED VIEW user_stats;

ClickHouse:

CREATE TABLE user_stats_aggregated (
  user_id UUID,
  email String,
  post_count UInt64,
  total_views UInt64
) ENGINE = SummingMergeTree()
ORDER BY user_id;

-- Materialized view to aggregate
CREATE MATERIALIZED VIEW user_stats_mv
TO user_stats_aggregated
AS SELECT
  user_id,
  email,
  COUNT() as post_count,
  SUM(views) as total_views
FROM posts
GROUP BY user_id, email;

One source of truth (PostgreSQL), aggregations computed and stored elsewhere. Simple and reliable.

CQRS with Multiple Databases

Command Query Responsibility Segregation: separate read and write paths.

Write path:

  1. User creates order
  2. PostgreSQL: INSERT into orders (transactional)
  3. CDC: changes stream to Kafka
  4. ClickHouse consumer: INSERT into order_events (analytics)

Read path:

  1. Get user''s orders (transactional): PostgreSQL
  2. Get analytics dashboard (aggregated): ClickHouse
// Command: write to PostgreSQL
export async function createOrder(userId: string, items: CartItem[]) {
  const order = await db.orders.insert({
    user_id: userId,
    items,
    status: 'pending',
  });

  // CDC takes it from here
  // Elasticsearch is updated, ClickHouse is updated, Redis cache is busted
  return order;
}

// Query: read from appropriate database
export async function getUserOrders(userId: string) {
  // Transactional read from PostgreSQL
  return await db.orders.find({ user_id: userId });
}

export async function getOrderAnalytics(userId: string) {
  // Analytics from ClickHouse (cached, denormalized)
  return await clickhouse.query(`
    SELECT
      toDate(created_at) as day,
      COUNT() as order_count,
      SUM(total) as revenue
    FROM orders
    WHERE user_id = ? AND created_at > today() - 30
    GROUP BY day
  `, [userId]);
}

Writes are normalized (PostgreSQL). Reads are denormalized (ClickHouse, Redis).

Operational Complexity Costs

Managing 5 databases costs more than managing 1:

  1. Monitoring: 5 systems to monitor
  2. Backups: 5 different backup strategies
  3. Scaling: 5 independent scaling decisions
  4. Deployment: Coordinate migrations across 5
  5. On-call: 5x failure modes

Start with PostgreSQL. Add databases only when:

  • Single database is a proven bottleneck
  • ROI is clear (time saved > ops cost)
  • You have operations expertise

For MVPs and small companies: one database is better.

For Series B+ with 100+ users: polyglot persistence makes sense.

Real-World Stack (2026)

Example: data analytics SaaS

Users (OLTP)
PostgreSQL (primary data store)
   (CDC via Kafka)
  ├→ Redis (sessions, caching)
  ├→ Elasticsearch (search documents)
  └→ ClickHouse (event analytics)

Queries:
  - GET /api/documents: PostgreSQL (transactional)
  - POST /search: Elasticsearch (full-text)
  - GET /dashboard: ClickHouse (analytics)
  - GET /user/session: Redis (cache)

5 databases, but each handles what it''s best at. Synchronized via CDC.

Checklist

  • Define your access patterns (OLTP, OLAP, search, cache, graphs)
  • Identify bottlenecks (profile, benchmark, measure)
  • Choose primary database (usually PostgreSQL)
  • Add specialized databases only when justified
  • Set up CDC or materialized views for sync
  • Monitor all databases for divergence
  • Document data ownership (which DB is source of truth)
  • Plan disaster recovery (multiple systems)
  • Measure ops cost of each additional database
  • Re-evaluate annually

Conclusion

Polyglot persistence is powerful and necessary at scale. But it''s also complex.

Start simple: PostgreSQL for everything. Add Redis when latency matters. Add ClickHouse when analytics is slow. Each addition should be justified by a real bottleneck, not premature optimization.

The best architecture is the one you understand. Complexity has a cost. Choose databases based on your constraints, not hype.

Use the right tool for every job, but don''t collect tools you don''t need. Balance performance, simplicity, and operational burden.