Maximize throughput and reduce latency without changing infrastructure.
This playbook documents practical patterns that scaled an API from 100 req/s → 50,000 req/s on the same machine and database.
Most Node.js performance problems come from doing unnecessary work.
- New DB connection per request
- Errors like:
too many connections - High latency under load
- Connection spikes in DB metrics
- Requests failing under burst traffic
- Slow response times even with low CPU usage
import { Pool } from 'pg';
export const pool = new Pool({
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});Use a shared pool across the app:
const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]);-
Always use pooling for relational databases
-
Recommended pool size:
min(2 * CPU cores, DB max_connections / 4)
- ~50–70% latency reduction
- Prevents connection exhaustion
- Too many connections → DB overload
- Too few → queueing inside the pool
Sequential awaits for independent operations
const user = await getUser(id);
const orders = await getOrders(id);
const address = await getAddress(id);- High latency with low CPU usage
- Multiple independent queries per request
const [user, orders, address] = await Promise.all([
getUser(id),
getOrders(id),
getAddress(id),
]);Use parallel execution when:
- Operations are independent
- No shared mutation/state dependency
- 2–3x faster response time (common case)
- Can overload downstream services (DB, APIs)
import pLimit from 'p-limit';
const limit = pLimit(5);
await Promise.all(tasks.map(task => limit(task)));- Same data fetched from DB on every request
- Examples: configs, permissions, feature flags
- High DB read volume for identical queries
- Low data volatility
import LRU from 'lru-cache';
const cache = new LRU({
max: 1000,
ttl: 1000 * 60, // 60s
});
export async function getConfig(key: string) {
const cached = cache.get(key);
if (cached) return cached;
const value = await fetchFromDB(key);
cache.set(key, value);
return value;
}Use cache when:
- Data changes infrequently
- Same query repeats frequently
- Up to 99% reduction in DB reads
- Stale data (eventual consistency)
- Memory usage grows with cache size
- Loading large datasets into memory
- High RAM usage / OOM crashes
- Memory spikes per request
- Node process crashes with "out of memory"
import QueryStream from 'pg-query-stream';
import { pipeline } from 'stream/promises';
const stream = client.query(new QueryStream('SELECT * FROM large_table'));
await pipeline(
stream,
transformStream, // optional
res
);Use streaming when:
- Response size > ~10MB
- Result set > ~10k rows
- Memory: GB → MB
- Stable process under load
- More complex error handling
- Harder to paginate or retry
- Single Node.js process
- Only 1 CPU core used
- CPU usage capped at ~100% on multi-core machine
- Throughput not scaling with hardware
pm2 start app.js -i maxOr native cluster:
import cluster from 'cluster';
import os from 'os';
if (cluster.isPrimary) {
const cores = os.cpus().length;
for (let i = 0; i < cores; i++) cluster.fork();
} else {
startServer();
}- Always use clustering in production (unless using container autoscaling)
- Linear scaling with CPU cores (e.g., 8x on 8 cores)
- Requires stateless architecture
- In-memory cache is not shared (use Redis if needed)
- Large responses
- High CPU time in
JSON.stringify
- High CPU usage during response phase
- Profiling shows serialization bottleneck
import fastJson from 'fast-json-stringify';
const stringify = fastJson({
type: 'object',
properties: {
id: { type: 'string' },
name: { type: 'string' },
},
});
res.send(stringify(data));Use optimized serializers when:
- Large payloads
- Known schema
- Up to 4x faster serialization
- Requires schema definition
- Less flexible for dynamic data
- Large payloads → slow network transfer
- High response size
- Slow client response time despite fast backend
import compression from 'compression';
app.use(compression());Enable compression when:
- Response size > ~1KB
- JSON-heavy APIs
- ~70% reduction in payload size
- CPU overhead for compression
| Metric | Before | After |
|---|---|---|
| Throughput | 100 req/s | 50,000 req/s |
| Infrastructure | Same | Same |
| Memory | Unstable | Stable |
| DB Load | High | Optimized |
Before scaling infrastructure:
- DB connection pooling configured
- No unnecessary sequential awaits
- Hot paths cached
- Large payloads streamed
- All CPU cores utilized
- Serialization optimized (if needed)
- Compression enabled
If your Node.js API is slow, assume misuse before assuming limits of the runtime.
Most gains come from:
- Removing redundant work
- Reducing I/O
- Using hardware efficiently
Not from rewriting the system.