Published on

Designing for 10x Growth — What Changes, What Doesn't, and What to Ignore

Authors

Introduction

The "design for scale" conversation in engineering is distorted by advice written for systems at Google, Stripe, or Netflix scale — systems serving millions of requests per second. Most teams designing for 10x growth are going from 1,000 to 10,000 users, or from 100 RPS to 1,000 RPS. At that scale, the right answers are often boring: add database indexes, introduce a caching layer, separate your read and write models, and make sure your connection pool is sized correctly. Kafka, Kubernetes, microservices, and distributed tracing are solutions to problems you don't have yet.

What Changes at 10x (and What Doesn't)

1,00010,000 users: what actually needs to change

Usually DOES matter at 10x:
Database indexes on commonly queried columns
   (query that takes 5ms at 1k rows takes 5s at 100k rows without index)
Connection pool sizing
   (10 connections fine at 100 RPM, need 50+ at 1000 RPM)
N+1 query patterns
   (invisible at small scale, ruins performance at large)
Caching for expensive repeated reads
   (product catalog, user profile, config)
Read replicas to separate read load from write load
CDN for static assets

Usually does NOT matter yet at 10x:
Microservices
   (adds operational complexity you don't need)
Message queues for everything
   (synchronous calls are fine for most operations at this scale)
Kubernetes
   (ECS/Heroku/Render is fine for 10x, Kubernetes pays off at 100x)
Sharding
   (a single Postgres can handle much more than you think)
Global multi-region
   (latency is usually not your bottleneck at this scale)

Fix 1: Know Your Current Bottlenecks Before Designing for Scale

// Don't design for scale you don't have — measure what's slow now
// Your 10x scale problem will be in the same places your current slow is

async function findScalingBottlenecks(): Promise<BottleneckReport> {
  return {
    // Where does time go in a request?
    requestProfile: await sampleRequestTimings({
      sampleSize: 1000,
      breakdown: ['routing', 'auth', 'query', 'serialization', 'response'],
    }),

    // Which queries are slow?
    slowQueries: await db.query(`
      SELECT query, calls, mean_exec_time, total_exec_time,
             rows / NULLIF(calls, 0) as rows_per_call
      FROM pg_stat_statements
      ORDER BY total_exec_time DESC
      LIMIT 20
    `),

    // Which tables are growing fastest?
    tableGrowth: await db.query(`
      SELECT relname, n_live_tup, pg_size_pretty(pg_total_relation_size(oid)) as size
      FROM pg_stat_user_tables
      ORDER BY n_live_tup DESC
      LIMIT 10
    `),

    // Connection pool utilization
    poolUtilization: await getConnectionPoolStats(),
  }
}

// If 90% of request time is database queries: fix indexes and N+1s
// If 90% of request time is serialization: cache the serialized output
// If connection pool is saturated: add PgBouncer
// The 10x solution is usually more of the same fix, not a different architecture

Fix 2: The Actual Scale Ladder for Most Applications

Most web applications scale like thisin order:

Step 1: Single server (0 → 10k users)
- Single database, application, and cache on manageable infrastructure
- Add indexes, fix N+1s, add Redis cache for hot reads
- Use RDS Multi-AZ for availability (not scale yet)

Step 2: Separate read load (10k → 100k users)
- Add read replicas for heavy read traffic
- Route reads to replica, writes to primary
- Consider read-write separation in app layer

Step 3: Horizontal application scaling (100k → 1M users)
- Stateless application servers behind load balancer
- Session in Redis (not in-memory)
- CDN for all static assets
- Database connection pooler (PgBouncer) critical here

Step 4: Database sharding or separation (1M+ users)
- Most never reach here on a single product
- Extract high-volume, low-complexity data to specialized stores
- User activity → time-series DB or data warehouse
- File storage → S3 (never in PostgreSQL)
- SearchElasticsearch or Postgres full-text

The mistake: designing Step 4 when you're at Step 1
The cost: months of engineering, complex operations, harder hiring

Fix 3: Stateless Services Enable Horizontal Scaling

// The most important preparation for horizontal scaling:
// Make your application stateless — no in-memory state between requests

// ❌ Stateful — prevents horizontal scaling
class InMemoryRateLimiter {
  private counts = new Map<string, number>()  // State lives in this process

  check(ip: string): boolean {
    const count = (this.counts.get(ip) ?? 0) + 1
    this.counts.set(ip, count)
    return count <= 100
  }
}
// Problem: 3 servers means 3 separate counters — limits are per-instance, not per-IP

// ✅ Stateless — scales horizontally
class RedisRateLimiter {
  async check(ip: string): Promise<boolean> {
    const count = await redis.incr(`rl:${ip}`)
    if (count === 1) await redis.expire(`rl:${ip}`, 60)
    return count <= 100
  }
}
// All servers share one Redis — rate limit is global across instances

// Same principle applies to:
// Sessions → Redis (not cookie-based JWT if you need revocation)
// Locks → Redis distributed locks (not Node.js mutex)
// Feature flags → external store, not in-memory config
// Queues → Redis/SQS/RabbitMQ (not in-process queue)

Fix 4: Database Choices That Don't Lock You In

-- Most scale problems are PostgreSQL problems, not "need a different database" problems
-- PostgreSQL can handle: 10TB of data, 100k concurrent connections (with PgBouncer),
-- 100M+ rows per table (with proper indexing)

-- What actually scales Postgres:
-- 1. Proper indexes (most important)
-- 2. Query analysis (EXPLAIN ANALYZE is your best friend)
-- 3. Partitioning for very large tables (10M+ rows)
-- 4. Read replicas for read-heavy workloads

-- Partitioning a large events table:
CREATE TABLE events (
  id UUID DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL,
  event_type VARCHAR(100) NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  metadata JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE events_2026 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- Partition pruning: queries with WHERE created_at > '2026-01-01'
-- only scan the 2026 partition, not the full table
-- This is often more impactful than sharding for time-series data

-- When you do need a different database:
-- Full-text search → Elasticsearch or Postgres pg_trgm
-- Time-series → TimescaleDB or InfluxDB
-- Cache → Redis (not relational DB)
-- But: exhausting Postgres first is almost always the right call

Fix 5: The 10x Checklist Before Designing New Architecture

Before adding architectural complexity for scale:

Performance baseline questions:
Have I profiled where request time is actually spent?
Do I know which queries are slow (EXPLAIN ANALYZE)?
Are there missing indexes on commonly filtered/sorted columns?
Are there N+1 queries I haven't fixed yet?
Is the connection pool sized correctly for current load?

Caching questions:
Which data is read frequently but changes slowly? (cache these)
Is static asset serving hitting my application servers? (use CDN)
Are expensive computations repeated on every request? (memoize/cache)

Infrastructure questions:
Is my application stateless? (can I add a second server?)
Are sessions stored in Redis? (not in-memory)
Is RDS Multi-AZ enabled? (reliability, not scale)

If all above are done and you're still hitting limits:
Now consider read replicas, caching layers, or horizontal scaling
Not microservices, Kafka, or sharding (yet)

Designing for Scale Checklist

  • ✅ Profiled current bottlenecks before designing new architecture
  • ✅ All commonly filtered/sorted columns have database indexes
  • ✅ N+1 query patterns eliminated
  • ✅ Connection pool sized with PgBouncer for high concurrency
  • ✅ Application is stateless — any state stored in Redis or database
  • ✅ Static assets served via CDN, not application server
  • ✅ Cache implemented for expensive, frequently-read data
  • ✅ Read replicas added when read/write ratio is skewed

Conclusion

Designing for 10x growth is mostly unglamorous work: fix the N+1 queries, add the missing indexes, move sessions to Redis, put static assets on a CDN, and size the connection pool correctly. This work reliably handles 10x growth without architectural complexity. The architectural changes — read replicas, queue-based async processing, horizontal scaling — come when the foundational work is done and a specific bottleneck demands them. The teams that introduce microservices, Kafka, and Kubernetes at 10k users spend their time on operational complexity instead of product work. The teams that resist complexity until it's genuinely needed ship more, hire more easily, and debug faster.