You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prefer skills over documentation. Conventions live in the project's skills directory. This file captures cross-cutting rules only.
Skill Types
Skills follow a naming convention that signals how they're used:
codechange-* — Step-by-step guides for a specific type of code modification (e.g., adding a migration, creating an API endpoint). Invoked when performing that kind of change.
workflow-* — Multi-step orchestration guides for work that spans apps or PRs (e.g., adding a new primitive end-to-end). Invoked when planning or executing that workflow.
reference-* — Domain-specific knowledge that is too detailed for the project rules but reused across multiple skills. Not invoked directly — referenced as a dependency by codechange and workflow skills. Read the reference when a dependent skill lists it under ## Dependencies. Examples: observability conventions, cloud CLI command catalogs.
skillbuilder-* — Meta-skills for creating new skills of a given type.
Dependencies Between Skills
Codechange and workflow skills may declare a ## Dependencies section listing reference skills they depend on. When you invoke a skill that has dependencies, also read the referenced skills to inform your work. Reference skills are shared context — they avoid duplicating the same conventions across multiple codechange skills.
Feature Development Workflows
Single-PR one-off features (fixes, small updates, etc) can be handled as one-offs. Use the appropriate skill(s), or if it's not something captured by any specific skills consider looking over whichever skills seems closest to glean best practices, but allow yourself to be flexible.
Multi-PR features follow a phased approach:
Branch: feature/{name} from main
Plan: Create llm-usage-notes/features/{YYYY-MM-DD}-{name}/plan.md with high-level spec, ordered phases, which apps/skills each phase touches. Consult workflow skills (e.g., workflow-new-primitive) when applicable.
Phase0 (optional): Plan-only PR as RFC. Recommended for complex features.
Phase branches: feature/{name}/phase{N}/{app}-{slug} for each PR
{slug} matches codechange skill (e.g., add-migration, model) or is descriptive
Phase-specific planning goes in PR description under a collapsible <details> section
PR scope: One app per PR unless bundling is necessary
Iterate: PR feedback updates planning docs and skills
NEVER MERGE A PR: The user is responsible for merging the PR once approval has been granted, and should let you know when to update and advance to the next phase.
Pull Requests
All PRs MUST be created in draft mode. Use the /pr command, which handles description generation and runs gh pr create --draft. Never create a non-draft PR.
Healthcare Security
This is a healthcare system handling PHI. Security is a cross-cutting concern:
No PHI in logs. Patient data must be sanitized from all log output.
Encryption: PHI encrypted at rest and in transit.
Access controls: Practice-level data isolation. RBAC enforced on every endpoint.
Input validation: All external input validated and sanitized. SQL injection, XSS, and injection attacks prevented.
Sensitive data: Passwords hashed. API keys in environment variables, never in code. Secrets masked in logs.
Use the /review command for a structured security-aware code review.
General Tips
Each session should at start confirm the nature of the work being done, and should confirm (a) what branches already exist with respect to that work, and (b) what branch is currently checked out.
Make it a point to read PR feedback, and consider whether it may be worth it to update skills to capture what's being recommended more generally.
If a complex unit of work seems to deviate significantly from the guidance provided by existing skills, consider prompting the user to create a new skill.
Pre-commit self-check: Before presenting code or creating a commit, re-read the relevant project rules files and verify each convention was followed. Pay particular attention to styling conventions (sx prop names, spacing units, component choices) and truthiness checks (isDefined/isTruthy), which are easy to forget mid-implementation.
Coding Conventions
Plain Dates
APIs return plain dates as 2020-04-20T00:00:00Z. Browsers west of UTC interpret this as the previous day (off-by-one bugs).
For plain date fields (birthdays, anniversaries):
Use moment.utc(dateOfBirth) or momentTz(dateOfBirth).tz('utc')
Never new Date(plainDate), parseISO(plainDate), or moment(plainDate)
Truthiness
Use isDefined/isTruthy from common utils:
isDefined(value) — value !== null && value !== undefined
isTruthy(value) — truthy check that correctly handles false
isTruthy(obj?.length) not obj && obj.length > 0
if (!isDefined(data)) return null not if (!data) return <></>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
IMPORTANT: Use CREATE INDEX CONCURRENTLY based on table size, not on how many rows will be indexed.
Even with a partial index like WHERE column IS NOT NULL that matches zero rows, PostgreSQL must still scan the entire table to evaluate the WHERE clause. On a large table, this scan:
Takes significant time (if SELECT COUNT(*) takes 30+ seconds, so will index creation)
Holds a write lock, blocking all INSERTs/UPDATEs during the scan
Rule of thumb: If the table is large (COUNT takes more than a few seconds), use CONCURRENTLY regardless of how many rows the index will contain.
Concurrent Index Pattern
constTABLE_NAME='large_table';constINDEX_NAME='large_table_column_idx';// REQUIRED: disable transaction wrapper for CONCURRENTLYexportconstconfig={transaction: false};exportasyncfunctionup(knex: Knex): Promise<void>{// Optional: extend timeout for very large tablesawaitknex.raw(`set statement_timeout = '20min'`);awaitknex.raw(` CREATE INDEX CONCURRENTLY IF NOT EXISTS ${INDEX_NAME} ON ${TABLE_NAME} ("columnName") WHERE "columnName" IS NOT NULL `);}exportasyncfunctiondown(knex: Knex): Promise<void>{awaitknex.raw(`DROP INDEX CONCURRENTLY IF EXISTS ${INDEX_NAME}`);}
Key points:
export const config = { transaction: false } is required - CONCURRENTLY cannot run inside a transaction
Only do ONE thing per migration when using CONCURRENTLY (no transaction = no rollback safety)
The migration may run for minutes/hours while the index builds, but writes are not blocked
PostGIS / Geography Columns
exportconstconfig={transaction: false};exportasyncfunctionup(knex: Knex): Promise<void>{// Check PostGIS is availableconstpostgisCheck=awaitknex.raw(` SELECT EXISTS ( SELECT 1 FROM pg_extension WHERE extname = 'postgis' ) AS exists `);if(!postgisCheck.rows[0].exists){thrownewError('PostGIS extension not found. Contact #pod-dpi.');}// Add geography column (distances in meters, not degrees)awaitknex.raw(` ALTER TABLE ${TABLE_NAME} ADD COLUMN location GEOGRAPHY(Point, 4326) NULL `);// Partial spatial index with CONCURRENTLY (table is large)awaitknex.raw(` CREATE INDEX CONCURRENTLY ${TABLE_NAME}_location_gist_idx ON ${TABLE_NAME} USING GIST (location) WHERE location IS NOT NULL `);}
GEOGRAPHY vs GEOMETRY:
GEOGRAPHY: Distances in meters, spheroidal calculations, better for "within X miles" queries
GEOMETRY: Distances in SRID units (degrees for 4326), faster but less intuitive
Column Types
Knex Method
PostgreSQL Type
Notes
table.uuid()
UUID
Use for IDs
table.text()
TEXT
Prefer over string()
table.string(length)
VARCHAR(length)
Use when length matters
table.boolean()
BOOLEAN
table.integer()
INTEGER
table.decimal(precision, scale)
NUMERIC(p,s)
For lat/long: decimal(10, 7)
table.timestamp()
TIMESTAMP
table.jsonb()
JSONB
For structured data
table.specificType('col', 'TEXT[]')
TEXT[]
Arrays
Zero Downtime Patterns
Adding a Non-Nullable Column
Migration: Create column as nullable
Code: Set column during all insertions
Migration: Backfill nulls, alter to non-nullable
Renaming a Column
Migration: Create new column
Code: Double-write to both columns
Migration: Backfill old values to new column
Code: Read from new column, stop writing old
Migration: Drop old column
Available Utilities
Import from ../utils:
Utility
Purpose
createBaseColumnsV3
Add id, created (indexed), updated columns
createUpdateTriggerV1
Auto-update updated on row changes
createStandardTableV1
Combines base columns + trigger
createConstraintSafelyV1
Safe constraint creation
createIndexSafelyV1
Safe index creation
dropTableV1
Safe table dropping
Testing
After writing the migration:
cd backend/primary
# Run migration
yarn db-migrate
# Rollback
yarn db-migrate-rollback
# Run again (verify idempotent)
yarn db-migrate
PR Guidelines
Migration-only PRs are preferred (easier review)
Migrations require approval from a special team
Coordinate with #pod-dpi for PostGIS or other extension changes
Add or modify routers and controllers in backend/primary
Adding/Modifying Routers and Controllers in backend/primary
This skill guides you through creating or modifying API endpoints in backend/primary. This includes creating new routes, adding controller handlers, and integrating with the Express app.
Philosophy: Spirit Over Letter
The patterns and examples in this skill are illustrative, not prescriptive. The actual implementation should be informed by:
Existing patterns in the codebase - Look at similar controllers first
The specific requirements - Don't implement what you don't need
Related context - Models, authorization modules, existing middleware
Conversations with the user - When trade-offs exist, discuss them
Dependencies
reference-observability-backend — Logging, metrics, tracing, and error handling conventions
Prerequisites
If the endpoint needs new data, ensure models exist (use codechange-primary-model first)
If authorization patterns are needed, understand which authorization module applies
Understand the HTTP contract (method, path, request/response shapes)
When Invoked
Load related context - Read app.ts, common.ts, and a similar controller
Study existing patterns - Find controllers in the same domain
Gather requirements - Confirm auth, validation, and response needs
Implement the change - Match existing patterns exactly
Write tests - Cover validation, success, and error cases
PR prep - Ensure consistent formatting
Loading Context
Before implementing, read these files to understand the conventions:
Authorization (can this user do this action?) happens inside the handler:
async({ params, user })=>{constaccessible=awaitAuthOrganization.authorizeEntities({
user,organizationIds: [entity.organizationId],});if(!accessible.has(entity.organizationId)){thrownewForbiddenError('Access denied');}// proceed with operation}
Why: Auth middleware is reusable and declarative. Authorization logic often depends on the specific entity being accessed, so it must happen after loading the data.
Never Trust Request Data for Authorization
A common mistake: trusting organizationId or other access-control fields from the request body.
// WRONG - trusts client-provided organizationIdconst{ organizationId }=body;awaitModel.create({ organizationId, ...data});// RIGHT - validate the user has access to that organizationconstaccessible=awaitAuthOrganization.authorizeEntities({
user,organizationIds: [body.organizationId],});if(!accessible.has(body.organizationId)){thrownewForbiddenError('User cannot create in this organization');}
Why: Malicious clients can send any organizationId. Always verify access server-side.
Sensitive Data Should Not Leak
Models that contain secrets should have separate internal vs external query methods:
// Internal use only - returns encrypted secretsconstaccount=awaitModel.findOneByIdInternal(id);// External/API use - secrets strippedconstaccount=awaitModel.findById(id);
Why: Upstream code might forget to sanitize. The model layer should enforce this.
Response Patterns
The controller() function expects a response object with exactly one of:
Use the appropriate error class for proper HTTP status codes:
import{NotFoundError,ForbiddenError,InvalidRequestError}from'./common.js';// 404 - Resource not foundthrownewNotFoundError('Patient not found');// 403 - User doesn't have permissionthrownewForbiddenError('Cannot access this organization');// 400 - Bad request datathrownewInvalidRequestError('End date must be after start date');
Router Organization
Simple domains (single resource, few endpoints): One file at src/controllers/<domain>.ts
Routes are mounted in src/app.ts after global middleware:
// Simple mountingapp.use('/allergies',allergyRouter);// With pod ownership (for monitoring)app.use('/feeds',owner(Pod.FlowStudio),feedRouter);// Webhooks need raw body for signature verificationapp.use('/webhooks/stripe',express.raw({type: 'application/json'}),stripeWebhooksRouter);
Common Reviewer Feedback
Concern
What Reviewers Look For
Authorization
Don't trust client-provided IDs for access control
Role restrictions
Admin-only operations should check role, not just auth
Audit logging
Sensitive operations (secrets, PII) need audit logs (see reference-observability-backend)
describe('controllers/<domain>',()=>{letapp: express.Express;letagent: NotableAgent.NotableAgent;beforeEach(()=>{app=express();app.use(express.json());app.use(markAuthChecked);// Bypass auth for unit testsagent=NotableAgent.agent(app);});describe('GET /:id',()=>{it('returns entity when found',async()=>{app.get('/:id',findById());constres=awaitagent.get('/valid-id');expect(res.status).toBe(200);expect(res.body).toMatchObject({id: 'valid-id'});});it('returns 404 when not found',async()=>{app.get('/:id',findById());constres=awaitagent.get('/nonexistent');expect(res.status).toBe(404);});});});
Match Existing Style
Look at tests for similar controllers and match their patterns. Don't introduce new testing patterns unless there's a clear need.
PR Guidelines
Keep controller changes focused - don't mix route changes with unrelated model changes
Ensure all new endpoints have corresponding tests
Update any API documentation if it exists
If adding a new domain, follow the existing directory structure conventions
Format code consistently with existing files (the pre-commit hook will enforce this)
Create or update a model in backend/primary following Notable conventions
Create or Update a Primary Model
This skill guides you through creating or updating models in backend/primary/src/models/. Models are the data access layer - they handle CRUD operations and queries against the database.
Philosophy: Spirit Over Letter
The patterns and examples in this skill are illustrative, not prescriptive. The actual implementation should be informed by:
Existing patterns in the codebase - Look at similar models first
The specific requirements - Don't implement operations you don't need
The migration history - Migrations tell you what the schema actually looks like
Conversations with the user - When trade-offs exist, discuss them
Code examples below show one way something could work. Always check how similar things are done elsewhere in the codebase before implementing.
Dependencies
reference-observability-backend — Error classes and error handling conventions used in model throw paths
Prerequisites
Migration already merged that creates/modifies the underlying table
Clear understanding of the data model (fields, relationships, nullability)
Confirmation of which CRUD operations are needed
When Invoked
Load migration context - Find and read migrations related to this model (see below)
Study existing patterns - Find similar models in the codebase
Gather requirements - Confirm what operations are actually needed
Implement model - Match existing patterns in the codebase
Write tests - Cover the operations you implemented
PR prep - Remind user of review expectations
Step 1: Load Migration Context
Model PRs are typically followups to migration PRs. Before implementing model changes, load the relevant migrations into context:
# Find migrations for this table
ls backend/primary/db/migrations/**/ | xargs grep -l "table_name"# Or search by feature name
git log --oneline --all -- backend/primary/db/migrations/ | grep "feature-keyword"
Read the migrations to understand:
Column names and types (migrations use snake_case, models use camelCase)
Nullability constraints
Foreign key relationships
Indexes (hint at common query patterns)
Any special column types (JSONB, geography, arrays)
Even if your current change isn't anticipated by a migration, the migration history provides valuable context about how the table has evolved.
Step 2: Study Existing Patterns
Before writing code, find 2-3 similar models and understand their patterns:
# Find models in the same domain
ls backend/primary/src/models/
# Look at a model that's similar to what you're building
Pay attention to:
How types are defined (interfaces vs type aliases)
Which CRUD operations exist (not all models need all operations)
How queries are structured
Testing patterns in the corresponding .test.ts file
File Locations
Purpose
Path
Models
backend/primary/src/models/<name>.ts
Model tests
backend/primary/src/models/<name>.test.ts
Common types
backend/primary/src/models/common.ts
Test fixtures
backend/primary/src/test/fixtures/
Test factories
backend/primary/src/test/factory/
General Structure
Model files typically follow this ordering:
Imports - Knex, common types, db, errors
Constants - TABLE_NAME from TableName enum
Enums - Any enums specific to this model
Types - Base interface, exported model type, CreateValues, UpdateValues
Table accessor - Function returning typed Knex query builder
When you have both a throwing and non-throwing find function, the throwing version should call the non-throwing one:
// findOneById should use findMaybeOneById, not duplicate the queryexportasyncfunctionfindOneById(id: string): Promise<MyModel>{constfound=awaitfindMaybeOneById(id);if(!found)thrownewNotFoundError(TABLE_NAME,id);returnfound;}
Return Updated Entities
Update functions should return the modified record. This avoids extra database roundtrips and prevents read-after-write issues:
// Good: returns the updated recordconstupdated=awaitMyModel.update(id,{status: 'active'});console.log(updated.status);// 'active'// Avoid: requires a second query to see the resultawaitMyModel.update(id,{status: 'active'});constupdated=awaitMyModel.findOneById(id);// extra roundtrip
Transaction Support
Mutating functions should accept an optional transaction parameter. This allows callers to compose operations atomically:
When a function takes multiple IDs, use named parameters to prevent mix-ups:
// Clear: parameter names are explicitfindByRelationship({parentId: 'abc',childId: 'def'})// Risky: easy to swap argumentsfindByRelationship('abc','def')// which is which?
Only Build What You Need
Don't implement CRUD operations speculatively. If the feature only needs create and findOneById, don't add update, delete, and pagination "just in case."
Testing Principles
Use Existing Test Infrastructure
The codebase has fixtures, factories, and seed data. Before creating test data:
Check if a fixture already exists for what you need
Consider if a factory exists that can generate test entities
Only create new fixtures if your needs are genuinely different and reusable
Use inline test data for truly one-off cases
Don't cargo-cult specific fixture APIs from examples - explore what's available and use what fits.
Test Behavior, Not Implementation
Focus on:
Does create return the expected shape?
Does findOneById throw when the record doesn't exist?
Does update actually persist the changes?
Avoid:
Testing every possible field combination
Testing framework behavior (Knex works correctly)
Exhaustive negative test cases
Match Existing Test Style
Look at tests for similar models. Match their:
describe/test structure
Setup patterns (beforeAll, beforeEach)
Assertion style
Level of coverage
Common Reviewer Feedback
These are themes from actual PR reviews - things reviewers consistently check for:
Concern
What Reviewers Look For
DRY
Are you duplicating query logic between functions?
Return values
Does update return the entity?
Transactions
Can operations be composed atomically?
Type inference
Are you forcing types that TypeScript could infer?
Scope
Did you only implement what's needed?
Consistency
Does this match patterns in similar models?
TableName Enum
If you're creating a new model, add the table to the TableName enum in common.ts. The enum value should match the actual table name in the database (snake_case).
PR Guidelines
Model PRs follow migration PRs - The migration should be merged first
Include tests for all implemented operations
Keep models focused on data access - Business logic belongs in controllers
Authorization happens in controllers - Models don't check permissions
Match existing patterns - Consistency is more important than cleverness
Guide for creating or modifying standalone CLI tools in backend/tools/
Scripts and Tools
This skill covers creating and maintaining standalone CLI tools in backend/tools/. These are internal utilities — data ingestion, reporting, infrastructure helpers — that run in isolation, typically on an ad-hoc or scheduled basis. They live outside the main application but follow the same coding standards.
Philosophy: Spirit Over Letter
The patterns here are illustrative, not prescriptive. The actual implementation should be informed by:
Existing patterns in the codebase — Look at sibling tools first
The specific requirements — Don't implement what you don't need
Related context — What data does this tool consume or produce?
Conversations with the user — When trade-offs exist, discuss them
Keep it minimal. Don't include commented-out defaults from tsc --init.
Workspace Registration
Add the tool to the root pnpm-workspace.yaml:
packages:
- backend/tools/<tool-name>
ESLint Configuration
Use typescript-eslint strict + stylistic type-checked. Copy from backend/tools/ncpdp-ingestion/eslint.config.js as the canonical starting point — it has been vetted through PR review. Key rules to preserve:
restrict-template-expressions with { allowNumber: true }
consistent-type-imports with { fixStyle: "inline-type-imports" }
no-unused-vars with { ignoreRestSiblings: true, argsIgnorePattern: "^_" }
return-await with "always" (matches monorepo convention)
Don't disable rules speculatively. Only disable a rule if you hit an actual violation that is justified.
Key Principles
Use Object Arguments for Functions
Functions that accept more than one parameter should take a single options object. This is a consistent pattern across the codebase and is enforced in PR review.
When a tool consumes output from another tool or an external source, validate the input structure defensively. This makes errors obvious when tools are composed incorrectly.
constrequiredColumns=["address","city","state","zip"];constmissingColumns=requiredColumns.filter((col)=>!headers.includes(col));if(missingColumns.length>0){thrownewError(`Input CSV is missing required columns: ${missingColumns.join(", ")}. `+`Expected output from the parse command.`,);}
No Floating Promises
Since tools use ESM ("type": "module"), top-level await is supported natively. Use it instead of void to properly await the promise:
New dependencies require evaluation. Document in the PR description:
Criterion
Details
Package
Name and version
License
Must be permissive (MIT, Apache-2.0, BSD)
Weekly Downloads
From npmjs.com — indicates community adoption
Existing Usage
Whether it's already used elsewhere in the monorepo
Justification
Why this package vs alternatives or building in-house
Prefer packages already used in the monorepo. If yargs is used in error-reporting, use it in new tools too rather than introducing commander or meow.
Testing Principles
What to Test
Pure transformation functions — parsers, formatters, mappers
Edge cases in data handling — empty strings, missing fields, malformed input
Business logic — status determination, field selection logic
Input validation — verify correct errors for invalid inputs
What Not to Test
CLI argument parsing — yargs handles this; testing it adds no value
External API calls — mock at the boundary if needed, but don't test the API client itself
File I/O wiring — test the transformation logic, not that fs.writeFileSync works
Test Setup
Use vitest with no config file (defaults are sufficient for most tools):
import{describe,it,expect}from"vitest";import{myFunction}from"./module.js";describe("myFunction",()=>{it("handles the standard case",()=>{expect(myFunction("input")).toBe("expected");});});
For tests that need temp files, use os.tmpdir() with beforeEach/afterEach cleanup.
Match Existing Style
Look at backend/tools/ncpdp-ingestion/src/parse.test.ts for the canonical test style.
Common Reviewer Feedback
Concern
What Reviewers Look For
Object arguments
Functions with >1 param should take an options object
Explicit checks
=== "" instead of falsy, Array.isArray() for arrays
Create a draft pull request with a structured description
Create Pull Request
Create a pull request for the current branch. All PRs MUST be created in draft mode.
Steps
Determine the base branch. The base is feature/{name} if it's part of a multi-PR stack using the feature/{name}/* convention. For other branches, the base is main.
Analyze ALL commits on the current branch since diverging from the base branch (git log and git diff <base>...HEAD). Look at every commit, not just the latest.
Draft a PR description using the template below.
Create the PR using gh pr create --draft.
Return the PR URL.
PR Description Guidelines
Use unambiguous, professional, and succinct language. No filler, no hyperbole.
Focus on what was changed and why, not implementation details.
Logging, metrics, tracing, and error handling conventions for backend services and CLI tools
Backend Observability Reference
Conventions and patterns for logging, metrics, tracing, and error handling across backend services (backend/primary/, backend/integration-proxy/, backend/tools/, etc.). This skill is a reference — it describes what exists and how to use it correctly.
NtblLogger
The monorepo's standard structured logger. Built on Winston, optimized for GCP Cloud Logging.
NtblLogger uses serialize-error internally to handle non-standard Error shapes.
Output Format
Controlled by environment:
Environment
Format
Controlled By
Local dev
Colorized, human-readable
NODE_ENV=local
GCP / production
Structured JSON for Cloud Logging
USE_CLOUD_LOG_FORMAT=true
In production, Cloud Logging ingests the structured JSON automatically. Fields like severity, logging.googleapis.com/trace, and httpRequest are recognized natively.
Sensitive Data
NtblLogger supports AES-256-CBC encryption of log payloads when LOGGING_ENCRYPTION_KEY is set. Certain fields are always left unencrypted for auditability: messageId, apiInstanceId, ehrInstanceId, practiceId, organizationId.
For CLI tools that don't handle PHI, encryption is typically unnecessary.
CLI Tool Observability Pattern
Most CLI tools in backend/tools/ currently use raw console.log. The target pattern uses NtblLogger with a structured summary.
Why It Matters
When tools run in GCP (Cloud Run jobs, GKE CronJobs), structured JSON to stdout is automatically ingested by Cloud Logging. This enables:
Dashboards tracking run-over-run trends ("invalid phone count increased this month")
Alerts on anomalous failure rates
Searchable error context without grepping terminal output
Implementation Pattern
Create a logger — new NtblLogger({ exitOnError: true })
Log events with metadata — not just human-readable strings
Track stats — count successes, failures, and interesting edge cases
Emit a structured summary — a single log entry at the end with all counts
conststats={totalRows: 0,skippedRows: 0,invalidPhones: 0};// During processingstats.totalRows+=1;if(phone===''){stats.invalidPhones+=1;}// At the endlogger.info('Parse complete',{summary: stats});
The summary log entry is the most important one — it's what dashboards and alerts key off.
Progress Logging
For long-running operations, log progress at a reasonable interval (per-batch, not per-row):
Metrics are exposed on a /metrics endpoint scraped by GKE PodMonitoring every 30 seconds.
Existing Metrics
Metric
Type
Labels
Purpose
http_response_count
Counter
hostName, method, path, status
Request counting
scheduling_performance
Histogram
organizationId, action
Scheduling endpoint latency
assistant_conversation_performance
Histogram
assistantId
Assistant response times
background_task_runtime
Histogram
title, status
Background job duration
feed_ingest
Counter
status, statusCode, feedId, source
Feed processing
primary_worker_queue_backlog_duration
Gauge
type
Worker queue depth
Adding a New Metric
Define in backend/primary/src/prometheus/:
import{Histogram,Counter,Gauge}from'prom-client';// Histogram for measuring durationsexportconstmyOperationDuration=newHistogram({name: 'my_operation_duration_seconds',help: 'Duration of my operation in seconds',labelNames: ['status']asconst,buckets: [0.1,0.5,1,5,10,30,60],});// Counter for counting eventsexportconstmyOperationTotal=newCounter({name: 'my_operation_total',help: 'Total number of my operations',labelNames: ['status','type']asconst,});
Packages:@opentelemetry/sdk-node, @opentelemetry/api, plus per-library instrumentations
Configured in backend/primary/src/tracing.ts (loaded before all other imports). Exports to GCP Cloud Trace in production, Jaeger locally.
Trace Helpers
Located in backend/primary/src/utils/traceHelpers.ts:
import{trace,getCurrentSpan,addAttributeToCurrentSpan,getGcpTraceId}from'./utils/traceHelpers.js';// Wrap a function in a traced spanconstresult=awaittrace(async()=>doExpensiveWork(),'doExpensiveWork',);// Add context to the current spanaddAttributeToCurrentSpan('pharmacy.ncpdpId',ncpdpId);// Get the current GCP trace ID (for linking to Cloud Trace UI)consttraceId=getGcpTraceId();
Auto-Instrumented Libraries
Express, HTTP, PostgreSQL (with query text in span attributes), Router, Net, and Winston are all auto-instrumented. You don't need to manually create spans for standard request handling.
When to Add Custom Spans
Add custom spans for:
Operations that cross service boundaries (external API calls not covered by HTTP instrumentation)
Long-running business logic where you want visibility into sub-steps
Background tasks that don't originate from HTTP requests
Error Handling
API Error Responses
Use returnError() from backend/primary/src/controllers/errorHelper.ts:
import{returnError,ErrorType}from'./errorHelper.js';returnError({
res,type: ErrorType.INVALID_REQUEST_ERROR,
err,message: 'End date must be after start date',logPrefix: 'updateSchedule',});
This:
Returns structured JSON { type, message, traceId } to the client
Logs the error with trace context
Records the exception on the active OTel span
In dev mode, includes __dev_error with the full serialized error
Error Classes
Class
HTTP Status
Use When
NotFoundError
404
Resource doesn't exist
InvalidArgsError
400
Bad request data
ForbiddenError
403
User lacks permission
MethodNotImplementedError
501
Endpoint stub
Unhandled Errors
backend/primary/src/lifecycle.ts handles unhandledRejection and uncaughtException events. Known fatal errors (like DB connection failure) trigger graceful shutdown. Unknown exceptions force exit with Slack notification.
DB has users, request context exists, no uncaught errors
Kubernetes probes:
Readiness: /_notable/health-check, 10s initial delay, 60s period
Liveness: configured in helm values, 60s initial delay, 10s period
Request Context
AsyncLocalStorage-based context propagation in backend/primary/src/utils/requestContext.ts. Middleware injects user ID, org ID, impersonator info, and Sentry trace ID. Available throughout the request lifecycle via RequestContext.getContext().
The logger's getContext callback reads from this to attach request metadata to every log entry automatically.
Error tracking, analytics, logging, and monitoring conventions for web applications
Frontend Observability Reference
Conventions and patterns for error tracking, analytics, logging, and performance monitoring across web applications (web/patient/, web/staff/, web/admin/, web/analyst/, web/assistant/). This skill is a reference — it describes what exists and how to use it correctly.
Sentry
Error tracking across all web apps. Each app has its own Sentry project (separate DSN) for isolation.
Initialization
Sentry is initialized in each app's src/index.tsx, production-only:
Production-only — import.meta.env.PROD gate prevents local dev noise
Release tagging — VITE_GIT_SHA links errors to specific deployments
App-specific DSN — each app reports to its own Sentry project
Performance Tracing
Only enabled in staff and assistant apps:
App
Sample Rate
Integrations
patient
None
Basic error reporting only
staff
1% (0.01)
browserTracingIntegration()
admin
None
Basic error reporting only
analyst
None
Basic error reporting only
assistant
50% (0.5)
browserTracingIntegration(), replayIntegration()
The assistant app also captures session replays on 20% of error sessions (replaysOnErrorSampleRate: 0.2) for visual debugging context.
Which Package
Use @sentry/react for all web apps. All apps in this monorepo are React-based, and @sentry/react provides component profiling, React Router integration, and better error boundary support.
Error Boundaries
Use react-error-boundary for error boundaries. Do not write custom ErrorBoundary class components.
import*asloggerfrom'../logger';try{awaitriskyOperation();}catch(error){logger.logError(error,'Failed to parse address from answer');}
logger.logError(error) — logs to console + sends to Sentry
logger.logError(error, 'context message') — also sends a Sentry message for the context string
Use logger.logError instead of bare console.error when the error should be tracked in Sentry. Use console.error for debug/development output that doesn't need tracking.
Thrown by the HTTP client when responses have status >= 400.
Retry Logic
tanstack-query retry function in most apps:
functionretry(failureCount: number,error: unknown): boolean{if(errorinstanceofAPIError&&error.status>=400&&error.status<500){returnfalse;// Don't retry client errors}returnfailureCount<3;// Retry server errors up to 3 times}
Error Reporting Gap
The API layer (vivaa-api) does not automatically report errors to Sentry. It throws APIError, but Sentry capture only happens if the calling component uses logger.logError() or an error boundary catches it. Be aware of this when handling API errors — explicitly report if the error matters for monitoring.
PostHog Analytics (Staff Only)
Product analytics for tracking user behavior. Currently integrated only in the staff app.
Architecture
Provider:web/staff/src/utils/events/Provider.tsx
User identification:web/staff/src/utils/events/useIdentifyEffect.ts
API host:https://us.i.posthog.com
Key Configuration
posthog.init(key,{autocapture: false,// No automatic event capturecapture_performance: false,// No automatic performance metricscapture_pageview: false,// No automatic page viewscapture_pageleave: false,// No automatic page leave eventsdisable_session_recording: true,});
Everything is opt-in. Events are captured explicitly, not automatically.
PHI/PII Rules
Never include PHI or PII in analytics events. This is enforced by convention and code comments.
User identification includes:
user.id (internal UUID)
email (staff are internal users)
defaultOrganizationId and defaultOrganizationName
name (formatted user name)
Patient-facing apps must never identify users with email, address, zip code, health data, or financial data (SSN, partial SSN). Be aware that practice names can also leak information (e.g. "Cancer Institute") — avoid including them in patient-facing analytics context.
Restricted Imports
Direct imports of posthog-js are restricted via ESLint no-restricted-imports. Only the provider and designated files may import PostHog directly. This prevents accidental event capture across the codebase.
// Only allowed in Provider.tsx and useIdentifyEffect.ts// eslint-disable-next-line no-restricted-importsimportposthogfrom'posthog-js';
LaunchDarkly Feature Flags (Staff Only)
Feature flag management, currently only in the staff app.
Architecture
Provider:web/staff/src/utils/flags/Provider.tsx
User identification:web/staff/src/utils/flags/useIdentifyEffect.ts
Structured code review with healthcare security checklist
Code Review
Review the current diff as a staff software engineer on a healthcare system. Prioritize security, performance, and reliability.
Directives
Report issues only. Do not mention what meets criteria or strengths.
Propose fixes. Suggest concrete changes with file paths and line numbers.
State uncertainty. If unsure about an issue or fix, say so explicitly.
Run tests. If adding or modifying tests, ensure they pass. Use yarn test src/path/to/test.ts (or yarn test-no-migrate src/path/to/test.ts in backend/primary if re-running all migrations is unnecessary).
Review Scope
1. Security (CRITICAL for Healthcare Data)
HIPAA: No patient data in logs. PHI encrypted at rest and in transit. Access controls enforced. Data minimization applied.
Authentication & Authorization: Token validation correct. RBAC and practice-level permissions enforced. API endpoints secured.
Data Validation: Input validated and sanitized. SQL injection prevented. FHIR resources validated. File uploads scanned.
Sensitive Data Handling: Passwords/secrets hashed. API keys managed securely. Sensitive data masked in logs.
2. Architecture & Design
Code structure (monorepo conventions, domain separation, DB abstraction).
Design patterns (error handling, service layer, transactions, separation of concerns).
Create a new codechange skill for a specific type of code modification
Codechange Skill Builder
This skill helps you create new codechange-* skills - structured guides for making specific types of changes to the codebase.
When to Create a Codechange Skill
Create a skill when:
A type of change is made repeatedly (models, controllers, primitives, etc.)
There are conventions and patterns that should be followed consistently
PR reviewers frequently give the same feedback on this type of change
New contributors would benefit from structured guidance
Naming Convention
codechange-<app>-<detail>
Examples:
codechange-primary-model - Models in backend/primary
codechange-primary-controller - Controllers in backend/primary
codechange-primary-router - Routers in backend/primary
codechange-staff-component - React components in web/staff
codechange-patient-primitive - Previsit primitives in web/patient
codechange-vivaa-api-types - Types in web/vivaa-api
When Invoked
Ask what type of change the skill should cover
Research the codebase - Find examples, look at PR history for feedback patterns
Identify key principles - What matters for this type of change?
Draft the skill - Following the structure below
Review with user - Refine based on their knowledge
Researching PR Feedback
Before writing a skill, look at merged PRs for this type of change:
# Search for merged PRs by keyword
gh search prs --repo VivaaHealth/vivaa --merged "<keyword>" --limit 20 --json number,title,author
# View a specific PR with reviews
gh pr view <number> --repo VivaaHealth/vivaa --json title,body,reviews,comments,files
# Get inline review comments (where the real feedback lives)
gh api repos/VivaaHealth/vivaa/pulls/<number>/comments --jq '.[] | {author: .user.login, path: .path, body: .body[0:500]}'
Look for patterns in:
What reviewers consistently ask for
Common mistakes that get corrected
Discussions about approach or design
Skill Structure
Create a new skill at codechange-<app>-<detail>/SKILL.md in the project's skills directory:
---name: codechange-<app>-<detail>description: <One-line description>---# <Title>
<Oneparagraphexplainingscopeandwhentouse.>
## Philosophy: Spirit Over Letter
The patterns and examples in this skill are **illustrative, not prescriptive**. The actual implementation should be informed by:
1.**Existing patterns in the codebase** - Look at similar code first
2.**The specific requirements** - Don't implement what you don't need
3.**Related context** - Migrations, dependencies, prior art
4.**Conversations with the user** - When trade-offs exist, discuss them
## Prerequisites-[ ] <Whatmustbetruebeforethischangecanbemade>
## When Invoked1.**Load related context** - <Whattoreadfirst>
2.**Study existing patterns** - Find similar implementations
3.**Gather requirements** - Confirm what's actually needed
4.**Implement the change** - Match existing patterns
5.**Write tests** - Cover the functionality
6.**PR prep** - Review expectations
## Loading Context
<Whatcontextshouldbeloadedandwhy>
## File Locations| Purpose | Path ||---------|------|| ... | ... |## Key Principles
<Focus on the "why" - each principle should explain reasoning>
### <PrincipleName>
<Why this matters, not just what to do>
## Common Reviewer Feedback| Concern | What Reviewers Look For ||---------|------------------------|| ... | ... |## Testing Principles### What to Test- <Behaviorsthatmatter>
### What Not to Test- <Anti-patterns>
### Match Existing Style
Look at tests for similar code and match their patterns.
## PR Guidelines- <Relevantguidelinesforthistypeofchange>
Key Philosophy Points
Every codechange skill should emphasize:
Spirit over letter - Principles and reasoning, not just code to copy
Look at existing code first - The codebase is the source of truth
Load context before implementing - Migrations, dependencies, similar code
Don't over-implement - Only build what's actually needed
Test infrastructure exists - Explore fixtures/factories before creating new ones
Consistency matters - Match existing patterns even if you'd do it differently
After Creating the Skill
Test-drive it on an actual change
Refine based on what's missing or unclear
Consider whether it should live in the personal skills dir or the repo
Create a new workflow skill for planning domain-specific multi-phase features
Workflow Skill Builder
This skill helps you create new workflow-* skills—planning guides that inform how to structure a feature's phases for a specific domain.
What Workflow Skills Are For
Workflow skills are planning aids, not execution scripts. They inform the structure of features/{date}-{name}/plan.md by answering:
What phases does this type of feature need?
In what order? What are the dependencies?
Which codechange skills apply to each phase?
What domain-specific gotchas should the plan account for?
The canonical feature development workflow lives in the project rules. Workflow skills extend it for specific domains.
When to Create a Workflow Skill
Create a workflow skill when:
A type of feature consistently requires a specific sequence of phases
The phase structure is non-obvious (wouldn't be intuitive from the project rules alone)
There are domain-specific considerations that affect planning
The pattern recurs enough to warrant documentation
Don't create a workflow skill for:
General cross-app features (that's what the project rules describe)
One-off features that won't recur
Patterns that are obvious from the codechange skills alone
Naming Convention
workflow-<domain> or workflow-<action>-<subject>
Examples:
workflow-new-primitive - Adding a new previsit primitive
workflow-org-offboarding - Removing an organization
workflow-new-integration - Adding a new EHR integration
workflow-new-practice-preference - Adding a new practice preference
When Invoked
Understand the domain - What type of feature does this workflow cover?
Research PR history - Find examples of this workflow in merged PRs, paying careful attention to PR feedback for revealed insitutional preference
Map the phases - What apps are touched? In what order? Why?
Identify codechange skills - Which skills apply to each phase?
Document gotchas - What's non-obvious about this domain?
Draft the skill - Following the structure below
Review with user - Validate against their experience
Skill Structure
Create a new skill at workflow-<name>/SKILL.md in the project's skills directory:
---name: workflow-<name>description: Planning guide for <domain-specific feature type>---# <WorkflowTitle>
This workflow skill informs feature planning for <description>. Use it when drafting your `features/{date}-{name}/plan.md` to understand what phases are needed.
## Overview| Aspect | Detail || ---------------- | ----------------------- || Apps Involved | <listofapps> || Number of Phases | <countafterphase0> || Key Dependencies | <whatmustexistfirst> |## Phases for Plan
When planning this type of feature, include these phases:
### Phase0: RFC (optional)**Branch**: `feature/{name}/phase0/plan`**Contains**: Only the plan document
Recommended for complex features. Define:
- <Keydecisionsforthisdomain>
### Phase1: <PhaseName>**Branch**: `feature/{name}/phase1/{app}-{slug}`**Codechange skills**: `codechange-<app>-<aspect>`
<Whatthisphaseaccomplishes>
**Key files:**- <file 1>
- <file 2>
### Phase2: <PhaseName>
...
## Phase Dependencies
Phase0 ──► Phase1 ──► Phase2 ──► Phase3
## Domain-Specific Considerations
<Gotchas, edge cases, things that aren't obvious>
## Example PRs
<Reference implementations with PR numbers>
## Rollback Considerations
<What to do if phases need to be reverted>
Key Principles
Inform planning, don't duplicate the project rules - Reference the canonical workflow, don't restate it
Focus on what's non-obvious - If the project rules would lead you to the right structure, you don't need a workflow skill
Map to codechange skills - Each phase should reference which codechange skills apply
Include real examples - PR numbers help future users understand the pattern
Branch naming follows the project rules - Use feature/{name}/phase{N}/{app}-{slug} convention
After Creating the Skill
Test-drive it on an actual feature plan
Identify missing codechange skills it should reference
Refine phase descriptions based on actual execution
Planning guide for adding a new previsit form primitive across all required apps
Adding a New Previsit Primitive
This workflow skill informs feature planning for adding a new previsit form primitive. Use it when drafting your features/{date}-{name}/plan.md to understand what phases are needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ "name": "<tool-name>", "version": "1.0.0", "description": "<One-line description>", "main": "index.js", "type": "module", "private": true, "packageManager": "pnpm@10.28.2", "scripts": { "build": "tsc", "start": "tsx src/index.ts", "lint": "eslint .", "test": "vitest run", "test:watch": "vitest" } }