Skip to content

Instantly share code, notes, and snippets.

@jgcmarins
Created April 16, 2026 19:48
Show Gist options
  • Select an option

  • Save jgcmarins/ad5527c11b33bf9c92eba0af4ff34cff to your computer and use it in GitHub Desktop.

Select an option

Save jgcmarins/ad5527c11b33bf9c92eba0af4ff34cff to your computer and use it in GitHub Desktop.

Node.js API Performance Playbook

Goal

Maximize throughput and reduce latency without changing infrastructure.

This playbook documents practical patterns that scaled an API from 100 req/s → 50,000 req/s on the same machine and database.


Core Principle

Most Node.js performance problems come from doing unnecessary work.


1. Database Connection Pooling

Problem Pattern

  • New DB connection per request
  • Errors like: too many connections
  • High latency under load

Detection Signals

  • Connection spikes in DB metrics
  • Requests failing under burst traffic
  • Slow response times even with low CPU usage

Implementation

import { Pool } from 'pg';

export const pool = new Pool({
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

Use a shared pool across the app:

const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]);

Decision Rules

  • Always use pooling for relational databases

  • Recommended pool size:

    • min(2 * CPU cores, DB max_connections / 4)

Expected Impact

  • ~50–70% latency reduction
  • Prevents connection exhaustion

Trade-offs

  • Too many connections → DB overload
  • Too few → queueing inside the pool

2. Parallelizing Independent Async Operations

Problem Pattern

Sequential awaits for independent operations

const user = await getUser(id);
const orders = await getOrders(id);
const address = await getAddress(id);

Detection Signals

  • High latency with low CPU usage
  • Multiple independent queries per request

Implementation

const [user, orders, address] = await Promise.all([
  getUser(id),
  getOrders(id),
  getAddress(id),
]);

Decision Rules

Use parallel execution when:

  • Operations are independent
  • No shared mutation/state dependency

Expected Impact

  • 2–3x faster response time (common case)

Trade-offs

  • Can overload downstream services (DB, APIs)

⚠️ If needed, limit concurrency:

import pLimit from 'p-limit';

const limit = pLimit(5);
await Promise.all(tasks.map(task => limit(task)));

3. In-Memory Caching for Hot Data

Problem Pattern

  • Same data fetched from DB on every request
  • Examples: configs, permissions, feature flags

Detection Signals

  • High DB read volume for identical queries
  • Low data volatility

Implementation

import LRU from 'lru-cache';

const cache = new LRU({
  max: 1000,
  ttl: 1000 * 60, // 60s
});

export async function getConfig(key: string) {
  const cached = cache.get(key);
  if (cached) return cached;

  const value = await fetchFromDB(key);
  cache.set(key, value);
  return value;
}

Decision Rules

Use cache when:

  • Data changes infrequently
  • Same query repeats frequently

Expected Impact

  • Up to 99% reduction in DB reads

Trade-offs

  • Stale data (eventual consistency)
  • Memory usage grows with cache size

4. Streaming Large Payloads

Problem Pattern

  • Loading large datasets into memory
  • High RAM usage / OOM crashes

Detection Signals

  • Memory spikes per request
  • Node process crashes with "out of memory"

Implementation (PostgreSQL example)

import QueryStream from 'pg-query-stream';
import { pipeline } from 'stream/promises';

const stream = client.query(new QueryStream('SELECT * FROM large_table'));

await pipeline(
  stream,
  transformStream, // optional
  res
);

Decision Rules

Use streaming when:

  • Response size > ~10MB
  • Result set > ~10k rows

Expected Impact

  • Memory: GB → MB
  • Stable process under load

Trade-offs

  • More complex error handling
  • Harder to paginate or retry

5. Multi-Core Utilization (Cluster Mode)

Problem Pattern

  • Single Node.js process
  • Only 1 CPU core used

Detection Signals

  • CPU usage capped at ~100% on multi-core machine
  • Throughput not scaling with hardware

Implementation (PM2)

pm2 start app.js -i max

Or native cluster:

import cluster from 'cluster';
import os from 'os';

if (cluster.isPrimary) {
  const cores = os.cpus().length;
  for (let i = 0; i < cores; i++) cluster.fork();
} else {
  startServer();
}

Decision Rules

  • Always use clustering in production (unless using container autoscaling)

Expected Impact

  • Linear scaling with CPU cores (e.g., 8x on 8 cores)

Trade-offs

  • Requires stateless architecture
  • In-memory cache is not shared (use Redis if needed)

6. Optimized JSON Serialization

Problem Pattern

  • Large responses
  • High CPU time in JSON.stringify

Detection Signals

  • High CPU usage during response phase
  • Profiling shows serialization bottleneck

Implementation

import fastJson from 'fast-json-stringify';

const stringify = fastJson({
  type: 'object',
  properties: {
    id: { type: 'string' },
    name: { type: 'string' },
  },
});

res.send(stringify(data));

Decision Rules

Use optimized serializers when:

  • Large payloads
  • Known schema

Expected Impact

  • Up to 4x faster serialization

Trade-offs

  • Requires schema definition
  • Less flexible for dynamic data

7. Response Compression

Problem Pattern

  • Large payloads → slow network transfer

Detection Signals

  • High response size
  • Slow client response time despite fast backend

Implementation

import compression from 'compression';

app.use(compression());

Decision Rules

Enable compression when:

  • Response size > ~1KB
  • JSON-heavy APIs

Expected Impact

  • ~70% reduction in payload size

Trade-offs

  • CPU overhead for compression

Final Outcome

Metric Before After
Throughput 100 req/s 50,000 req/s
Infrastructure Same Same
Memory Unstable Stable
DB Load High Optimized

Implementation Checklist

Before scaling infrastructure:

  • DB connection pooling configured
  • No unnecessary sequential awaits
  • Hot paths cached
  • Large payloads streamed
  • All CPU cores utilized
  • Serialization optimized (if needed)
  • Compression enabled

Final Note

If your Node.js API is slow, assume misuse before assuming limits of the runtime.

Most gains come from:

  • Removing redundant work
  • Reducing I/O
  • Using hardware efficiently

Not from rewriting the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment