Polyglot Persistence in 2026 — Choosing the Right Database for Every Job

Introduction

The first rule of database selection: there is no universal database.

PostgreSQL is excellent. It''s ACID, it''s fast, it handles most workloads. But a company using only PostgreSQL for OLTP, OLAP, full-text search, and caching is leaving performance on the table.

Polyglot persistence is the practice of using multiple databases, each for what it''s best at. PostgreSQL for transactions. ClickHouse for analytics. Redis for caching. Elasticsearch for search.

The challenge: keeping them in sync without complexity spiraling.

This post covers the architecture of polyglot persistence in 2026.

Right Database for Right Use Case
OLTP (Online Transaction Processing)
OLAP (Online Analytical Processing)
Search (Full-Text, Semantic)
Cache
Graph
Time Series
Vector Database (Specialized)
PostgreSQL as the "Good Enough" Option
When You Actually Need Redis (Not Just Caching)
When Elasticsearch/OpenSearch Wins
When ClickHouse Wins
When a Vector DB is Necessary
Data Synchronization Between Databases
Change Data Capture (CDC)
Dual Writes (Avoid)
Materialized Views
CQRS with Multiple Databases
Operational Complexity Costs
Real-World Stack (2026)
Checklist
Conclusion

Right Database for Right Use Case

OLTP (Online Transaction Processing)

Transactional reads and writes. User logs in, updates profile, makes a purchase.

Use: PostgreSQL, MySQL, MongoDB.

Why: ACID guarantees, arbitrary queries, proven consistency.

-- User updates profile (atomic)
BEGIN;
  UPDATE users SET name = 'John' WHERE id = 123;
  INSERT INTO audit_log VALUES (123, 'name_changed', now());
COMMIT;

OLAP (Online Analytical Processing)

Aggregations across millions of rows. Weekly revenue reports, user cohorts, funnel analysis.

Use: ClickHouse, Snowflake, BigQuery, DuckDB.

Why: Columnar storage, parallelism, instant queries on huge datasets.

-- ClickHouse: aggregate 1B rows in 100ms
SELECT
  toDate(timestamp) as day,
  COUNT() as revenue_count,
  SUM(amount) as total_revenue
FROM events
WHERE timestamp >= today() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day DESC;

Search (Full-Text, Semantic)

Find documents by keyword or meaning.

Use: Elasticsearch, OpenSearch, PostgreSQL (pgvector for semantic), MongoDB (Atlas Search).

Why: Specialized indexing (inverted indices, vector indices), speed, relevance ranking.

# Elasticsearch: full-text search
POST /documents/_search
{
  "query": {
    "multi_match": {
      "query": "machine learning",
      "fields": ["title^2", "content"]
    }
  }
}

# PostgreSQL pgvector: semantic search
SELECT id, 1 - (embedding <=> query_embedding) as similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

Cache

Reduce database load, speed up response times.

Use: Redis, Memcached, in-process cache (Node.js).

Why: Sub-millisecond latency, simple API, automatic expiry.

// Redis cache
const user = await redis.get(`user:${userId}`);
if (!user) {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
  await redis.setEx(`user:${userId}`, 3600, JSON.stringify(user));
}

Graph

Relationships, recommendations, social networks.

Use: Neo4j, Amazon Neptune, PostgreSQL (jsonb paths).

Why: Specialized traversal, pattern matching, efficient multi-hop queries.

// Neo4j: find friends of friends (3 hops)
MATCH (me:User {id: 123})-[:FRIEND]-(friend)-[:FRIEND]-(fof)
WHERE NOT (me)-[:FRIEND]-(fof)
RETURN fof, count(*) as mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10;

Time Series

Metrics, logs, sensor data. Trillions of data points.

Use: ClickHouse, TimescaleDB, Prometheus, InfluxDB.

Why: Compression, downsampling, retention policies, time-based queries.

-- TimescaleDB: automatic time-based partitioning
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  host TEXT NOT NULL,
  cpu_usage FLOAT NOT NULL
);
SELECT create_hypertable('metrics', 'time');

-- Query with time predicates
SELECT time, avg(cpu_usage)
FROM metrics
WHERE time > now() - INTERVAL 7 DAYS
GROUP BY time_bucket('1 hour', time);

Vector Database (Specialized)

Large-scale AI similarity search. Millions of vectors, sub-millisecond latency.

Use: Pinecone, Weaviate, Milvus, if you have >10M vectors.

Why: Specialized ANN indexing (HNSW, IVF-FLAT), distributed sharding.

For most startups, PostgreSQL pgvector or Redis vectors are sufficient. Specialized vector DBs are for search-as-product companies.

PostgreSQL as the "Good Enough" Option

PostgreSQL has extensions for almost everything:

pgvector: Vector search
pg_cron: Scheduled jobs
PostGIS: Geographic queries
pg_trgm: Fuzzy search
jsonb: Flexible schema
pgai: AI functions

For a startup with limited ops resources, all-PostgreSQL is pragmatic. You lose some performance, gain operational simplicity.

Tradeoff: PostgreSQL might use 200GB for what ClickHouse stores in 4GB. But you''re managing one database, not three.

When You Actually Need Redis (Not Just Caching)

Cache is Redis''s gateway drug. But Redis excels at:

Sub-millisecond latency required: Session tokens must be <1ms. Database queries are >5ms. Redis hits memory directly.
Pub/Sub messaging: Real-time notifications, chat, collaborative editing. PostgreSQL LISTEN/NOTIFY works for small clusters but doesn''t scale.
Leaderboards and rankings: Sorted sets are O(log N). SQL aggregations are slower.
Rate limiting: Sliding windows with Redis are trivial. SQL queries are expensive.

For general caching, even PostgreSQL with good indexes is often sufficient. Use Redis when latency is critical.

When Elasticsearch/OpenSearch Wins

Full-text search on PostgreSQL (using pg_trgm or pgroonga) is functional but slow.

Elasticsearch wins when:

You search across millions of documents
Relevance ranking matters (TF-IDF, BM25)
Faceted search is needed (filter by category, date, author simultaneously)
You need phrases, wildcards, fuzzy matching

Example: e-commerce search. 1M products, users search "nike running shoes". Elasticsearch returns results ranked by relevance, with facets (size, price, color) in milliseconds.

PostgreSQL full-text search is <10% as fast for this workload.

When ClickHouse Wins

PostgreSQL aggregations on large datasets are slow.

Query: "What''s the average response time per hour over the last month?" with 1B rows.

PostgreSQL: 30–60 seconds ClickHouse: 100–500ms

Use ClickHouse when:

You aggregate >100M rows
Data is time-series or event-based
You need real-time dashboards
Columns are accessed more often than rows

When a Vector DB is Necessary

Use Pinecone/Weaviate instead of PostgreSQL pgvector when:

>10M vectors
Sub-100ms latency required at scale
Search is your core product feature
You need distributed sharding

Most startups don''t hit these constraints. Start with pgvector. Migrate if needed.

Data Synchronization Between Databases

The hard part: keeping PostgreSQL, Redis, and Elasticsearch in sync.

Change Data Capture (CDC)

Capture changes from PostgreSQL''s write-ahead log. Stream to other systems.

Tools: Debezium, Logical Decoding, Kafka, AWS DMS.

# Debezium: stream Postgres changes to Kafka
docker run -e POSTGRES_PASSWORD=password postgres

# Debezium connector configuration
{
  "name": "postgres-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "postgres",
    "database.dbname": "mydb",
    "table.include.list": "public.users,public.posts",
    "publication.name": "dbz_publication"
  }
}

Changes stream to Kafka. Elasticsearch and Redis consumers consume and update their indexes.

Dual Writes (Avoid)

Simpler but error-prone: write to PostgreSQL, then write to Redis and Elasticsearch.

async function createUser(email: string, name: string) {
  // Write to PostgreSQL
  const user = await db.users.insert({ email, name });

  // Write to Redis
  await redis.set(`user:${user.id}`, JSON.stringify(user));

  // Write to Elasticsearch
  await elasticsearch.index({
    index: 'users',
    id: user.id,
    body: user,
  });

  return user;
}

Problem: if any write fails, systems diverge. Race conditions are common.

Use CDC instead. It''s more complex but reliable.

Materialized Views

Pre-compute and store aggregations.

PostgreSQL:

CREATE MATERIALIZED VIEW user_stats AS
SELECT
  u.id,
  u.email,
  COUNT(p.id) as post_count,
  SUM(p.views) as total_views
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.email;

-- Refresh periodically
REFRESH MATERIALIZED VIEW user_stats;

ClickHouse:

CREATE TABLE user_stats_aggregated (
  user_id UUID,
  email String,
  post_count UInt64,
  total_views UInt64
) ENGINE = SummingMergeTree()
ORDER BY user_id;

-- Materialized view to aggregate
CREATE MATERIALIZED VIEW user_stats_mv
TO user_stats_aggregated
AS SELECT
  user_id,
  email,
  COUNT() as post_count,
  SUM(views) as total_views
FROM posts
GROUP BY user_id, email;

One source of truth (PostgreSQL), aggregations computed and stored elsewhere. Simple and reliable.

CQRS with Multiple Databases

Command Query Responsibility Segregation: separate read and write paths.

Write path:

User creates order
PostgreSQL: INSERT into orders (transactional)
CDC: changes stream to Kafka
ClickHouse consumer: INSERT into order_events (analytics)

Read path:

Get user''s orders (transactional): PostgreSQL
Get analytics dashboard (aggregated): ClickHouse

// Command: write to PostgreSQL
export async function createOrder(userId: string, items: CartItem[]) {
  const order = await db.orders.insert({
    user_id: userId,
    items,
    status: 'pending',
  });

  // CDC takes it from here
  // Elasticsearch is updated, ClickHouse is updated, Redis cache is busted
  return order;
}

// Query: read from appropriate database
export async function getUserOrders(userId: string) {
  // Transactional read from PostgreSQL
  return await db.orders.find({ user_id: userId });
}

export async function getOrderAnalytics(userId: string) {
  // Analytics from ClickHouse (cached, denormalized)
  return await clickhouse.query(`
    SELECT
      toDate(created_at) as day,
      COUNT() as order_count,
      SUM(total) as revenue
    FROM orders
    WHERE user_id = ? AND created_at > today() - 30
    GROUP BY day
  `, [userId]);
}

Writes are normalized (PostgreSQL). Reads are denormalized (ClickHouse, Redis).

Operational Complexity Costs

Managing 5 databases costs more than managing 1:

Monitoring: 5 systems to monitor
Backups: 5 different backup strategies
Scaling: 5 independent scaling decisions
Deployment: Coordinate migrations across 5
On-call: 5x failure modes

Start with PostgreSQL. Add databases only when:

Single database is a proven bottleneck
ROI is clear (time saved > ops cost)
You have operations expertise

For MVPs and small companies: one database is better.

For Series B+ with 100+ users: polyglot persistence makes sense.

Real-World Stack (2026)

Example: data analytics SaaS

Users (OLTP)
  ↓
PostgreSQL (primary data store)
  ↓ (CDC via Kafka)
  ├→ Redis (sessions, caching)
  ├→ Elasticsearch (search documents)
  └→ ClickHouse (event analytics)

Queries:
  - GET /api/documents: PostgreSQL (transactional)
  - POST /search: Elasticsearch (full-text)
  - GET /dashboard: ClickHouse (analytics)
  - GET /user/session: Redis (cache)

5 databases, but each handles what it''s best at. Synchronized via CDC.

Checklist

Conclusion

Polyglot persistence is powerful and necessary at scale. But it''s also complex.

Start simple: PostgreSQL for everything. Add Redis when latency matters. Add ClickHouse when analytics is slow. Each addition should be justified by a real bottleneck, not premature optimization.

The best architecture is the one you understand. Complexity has a cost. Choose databases based on your constraints, not hype.

Use the right tool for every job, but don''t collect tools you don''t need. Balance performance, simplicity, and operational burden.