| title | Records-query perf — Phase 0 baseline measurements |
|---|---|
| type | baseline |
| domain | records-table |
| created | 2026-05-12 |
| owner | florian |
| target_workspace | ws=08cdaa0c-6a2c-4d1e-95a2-e08a9fbb9f3e (pk=510 "Wellapp test max", 1,245 active companies) |
| env | local Docker stack — Hasura v2.x on PostgreSQL 17.9 (staging-ISO dump restored 2026-05-12 00:15Z) |
| related_plan | docs/features/records-table/records-query-perf-refactor-plan-2026-05-12.md |
The 8s /v1/records/query is NOT a Hasura problem. Hasura itself runs the records.companies GraphQL in 2.1s direct. The remaining ~5.8s is API-side overhead — specifically, traced-http.helper.ts logging the Hasura response body through a sanitize-then-truncate pipeline that re-walks and re-stringifies the 100KB payload 3–4 times per call.
| Layer | Time |
|---|---|
| Express middleware / auth | ~30ms |
resolveWorkspaceScope (parallel) |
~22ms |
DataViewsService.query → Hasura (measured by API client) |
7,915ms |
| JSON response serialize + send | ~80ms |
| Total observed | 8,024ms |
Same GraphQL body, same variables, same workspace, same row count. Bypasses the API.
baseline (all relations) min=2,134ms avg=2,322ms bytes=113,176
warm hit after 19 variants min=2,411ms (negligible cache effect)
Gap = ~5,800ms in the API process between axios.request() returning and res.json() flushing.
company_people 246ms (people=163k rows, indexed)
invoices 228ms (60k rows)
relations 209ms (company_relations — small, but issuer+receiver joins)
workspace 172ms (computed-field per row via 'workspace.object')
payment_means 170ms
sourceWorkspaceConnector 168ms
accounts 162ms
web_links 105ms (16KB payload, largest by bytes)
company_locations 80ms
company_phones 77ms
─────────────────────────────────
Sum top 10 (of 19) 1,617ms ≈ 76% of 2,134ms baseline
Smaller contributors: cards 62ms, subscriptions 62ms, company_financial 60ms, object (computed) 53ms, checks 47ms, journal_entry_lines 38ms, targetWorkspaceConnector 35ms, primary_contact 17ms, company_emails 8ms.
Note checks / journal_entry_lines / company_categories are empty tables in this staging-ISO snapshot — their cost is the empty-aggregate roundtrip only (~40–70ms each). On a populated workspace this would scale.
web_links 16,047 bytes
object (computed) 11,819 bytes (per-row jsonb {pk, name, domain, logo_url, ...})
workspace 11,550 bytes (workspace.object computed per row)
company_locations 8,789 bytes
company_emails 7,458 bytes
company_media 5,950 bytes
company_phones 4,553 bytes
primary_contact 4,187 bytes
company_people 2,595 bytes
Full response ≈ 113KB.
first call 2,134ms
last call 2,411ms (after 19 different variants — plan cache likely cold-by-shape)
Plan-cache is not a meaningful contributor locally. Discard Phase 4 (cache warmup) from the refactor plan.
The API's outbound HTTP client (used for every Hasura call) logs request + response bodies via Winston. For each call:
logIncoming(service, status, url, duration, response.data, response.headers, ...)
// → sanitize(responseBody) // recursive walk over entire nested object
// → safeSerialize(...) // JSON.stringify pass #1 (try/catch test)
// → truncateBody(...) // JSON.stringify pass #2 (length check)
// → winston format/print // JSON.stringify pass #3 (console transport)sanitize() walks every property of every nested object in the response looking for secret|password|token|api_key|apikey|authorization|access_token|client_secret. For a 50-row × 20-relation × 2-3-level-deep response, this is tens of thousands of recursive function calls + property descriptor lookups. V8 doesn't optimize this well — it scales superlinearly with depth.
sanitize runs before truncateBody. The full 100KB object is walked even though the log line is then truncated to 2,000 characters and the walk is discarded.
The Winston console transport then re-stringifies the result.
Measured contribution: ~5.8s for a single 110KB response. Reproduces every records query.
Same pattern applies to all other outbound calls (Plaid, Attio, etc.) — but those return <10KB so the overhead is invisible. Only Hasura records-query trips it.
The original plan assumed Hasura was the bottleneck. Reality:
| Phase | Original assumption | Revised priority |
|---|---|---|
| Phase 0 — measure | gate for rest | DONE — this doc |
Phase 0.5 — fix traced-http.helper.ts |
not in original plan | NEW: highest priority. ~5.8s win locally, ~0 in prod |
| Phase 1 — view-driven query budget | biggest win | still high value. ~1s win on Hasura side by dropping unused relations |
| Phase 2 — collapse heavy relations into computed_fields | high | medium. company_people→people→media is 246ms — measurable but not catastrophic. Worth it if col-18 ships primary_contact swap regardless |
Phase 3 — per-relation limit: 5 |
medium | low. Sum of array-relation bytes ≈ 50KB; capping at 5 saves maybe 20-30KB on populated workspaces. Bigger gains in cell-level paint than total request time |
| Phase 4 — Hasura plan-cache warmup | conditional | drop entirely. Cache cold-warm delta is in noise |
apps/api/src/helpers/traced-http.helper.ts:
-
Stringify the body first with a size cap, before
sanitize():function truncateBody(body: unknown): unknown { if (body == null) return body; const str = typeof body === "string" ? body : JSON.stringify(body); if (str.length > MAX_BODY_LOG_LENGTH) { return str.substring(0, MAX_BODY_LOG_LENGTH) + `... [truncated, total ${str.length} chars]`; } return body; }
Then in
logIncoming:// Truncate FIRST; only sanitize if the body is small enough to be worth logging in full const truncated = truncateBody(responseBody); const final = typeof truncated === "string" ? truncated // already a slice; nothing to sanitize : safeSerialize(sanitize(truncated as Record<string, unknown>), "body");
Same in
logOutgoing. -
OR add an environment-driven body-log threshold:
const HTTP_LOG_BODY_MAX_BYTES = parseInt(process.env.HTTP_LOG_BODY_MAX_BYTES ?? "10000", 10);
When the body length exceeds the threshold, log only
{ bytes: <n>, truncated: true }— never sanitize or serialize. Default 10KB in dev, ~unlimited in prod (where Cloud Logging is async).
Either change is ~15 LoC. Both keep prod logging intact (small bodies still get full sanitized log entries).
/v1/records/queryp50 drops from ~8s to ~2.5s on local Docker stack (matching direct-Hasura baseline + small overhead)- Sensitive-key redaction still works for small bodies (Plaid token, Stripe webhook, etc.)
- Prod log volume / shape unchanged for typical responses
Shipped in apps/api/src/helpers/traced-http.helper.ts (single file, ~30 LoC added):
- New
prepareBodyForLog()helper. SingleJSON.stringifysize probe; bodies overHTTP_LOG_BODY_MAX_BYTES(default 10,000) log as{ bytes: N, omitted: true }. - Replaces inline
truncateBody(sanitize(...))in bothlogOutgoingandlogIncoming. safeSerializeno longer called in the inbound path (kept intracedSdkCallonly).
3× curl /v1/records/query post-fix, same workspace (d8e8e830), same column set, same limit 50:
run 1 2,376ms 176,705 bytes downloaded
run 2 2,360ms 176,705
run 3 2,266ms 176,705
API log per-request:
⬇️ 200 hasura http://localhost:8081/v1/graphql (2,282ms)
⬅️ POST /v1/records/query 200 (2,340ms)
Delta: 8,024ms → 2,340ms — 71% reduction, 5,684ms saved per records query.
API overhead now ~58ms (was ~109ms). The 2,282ms Hasura call matches the direct-curl baseline (2,134ms ± noise — small drift from cold connection / different column set).
No behavioural regression observed. Records page renders identically; F-07 scope chip still resolves; PR-2132 +N navigation still works.
- Profile script:
/tmp/profile-records-query.mjs(saved this session; promote toscripts/profile-records-query.mjsin PR-R1) - Full tmux buffer with captured GraphQL bodies:
/tmp/well-dev-buffer.txt(this session only) - Migration applied this session:
apps/api/src/database/migrations/Migration20260512015500_records_page_sort_indexes.ts— already merged into the developer's local DB; ready to ship as a separate small PR
- Does prod actually log Hasura response bodies, or does Winston route through
@google-cloud/logging-winston(async, batched)? If prod is async, Phase 0.5 only helps local dev. Action: checkconfig/logger.tsand confirm. - Was this overhead worse before PR #2113 added
+28 -4lines totraced-http.helper.ts? Quickgit log -pcheck on the file. - Should the env knob default differ between dev and prod, or just use one knob driven by
NODE_ENV?