Risk Adjustment — Lambda Timeout Analysis Report

Date: March 15, 2026
Triggered by: CloudWatch Alarm vapor-Curitics-RA-production-d-timeout-warning
Alarm threshold: 28,000ms (approaching 30s Lambda limit)
Datapoint that fired: 28,877ms at 14:53 UTC

TL;DR

All timeout issues originate from a single tenant (tenant_id: 3). The root cause is data volume growth outpacing query performance — primarily on the Performance Dashboard and Encounter Logs pages. Three DashboardService methods have no caching and re-execute heavy queries on every user interaction. The dashboard timeout is accelerating: 58 of 151 total events occurred in just the last 2 days.

How This Was Identified

The LogSlowRequests middleware fires PerformanceMonitor::logTimeout() for any request exceeding 25,000ms, which logs a CRITICAL to CloudWatch and captures a Sentry error (always, regardless of the ENABLE_PERFORMANCE_LOGGING flag).

CloudWatch logs for the Lambda function confirmed the exact request:

"Potential request timeout" → method: GET, path: /, duration_ms: 28779.44
status_code: 200, user_id: 3236, tenant_id: 3
REPORT: Duration: 28877.69 ms ← exact datapoint that triggered the alarm

Sentry was then used to identify the full scope of timeout issues across the application.

Sentry Issue Summary

All 5 unresolved timeout issues — project: risk-adjustment, org: curitics-health.

Sentry ID	Endpoint	Events	Avg Duration	Range	First Seen	Last Seen
RISK-ADJUSTMENT-S0	`GET /` (dashboard)	151	30.2s	25–39s	Feb 17	Today
RISK-ADJUSTMENT-S4	`GET /logs/encounter-logs`	63	43.7s	28–100s	Feb 17	Mar 12
RISK-ADJUSTMENT-SA	`POST /livewire/update`	24	~30s	25–40s	Feb 18	Mar 13
RISK-ADJUSTMENT-W6	`GET /awv-upload-documents`	1	35s	—	Mar 12	Mar 12
RISK-ADJUSTMENT-VK	`GET /concurrent-review`	1	27s	—	Mar 9	Mar 9

Critical observation: Every single event across all 5 issues is attributed to tenant_id: 3.

Issue Deep Dives

1. `GET /` — Performance Dashboard (151 events, ACCELERATING)

File: app/Filament/Pages/Dashboard.php
Widgets: app/Filament/Widgets/Dashboard/
Service: app/Services/DashboardService.php

On every page load, the Filament dashboard mounts all widgets simultaneously. Each widget calls DashboardService methods that run independent, heavy database queries. On initial load there is no Livewire lazy-loading — everything executes synchronously in the same request.

Time-of-day distribution (UTC): Flat across all 24 hours — no peak window. This rules out scheduled jobs and confirms the cause is data volume, not concurrent load.

Volume trend:

Prior to Mar 14: ~93 events over ~4 weeks
Mar 14–15 alone: 58 events in 2 days
The issue is worsening as tenant 3's dataset grows

Queries that fire on every dashboard load:

DashboardService Method	Cached?	Notes
`getMemberTypeCounts()`	✅ 59 min	Safe
`getAwvStatusSummary()`	✅ 59 min	Safe
`getAverageRaf()`	✅ 30 min	Safe
`getAverageStarScore()`	✅ 5 min	Safe
`getHccConditionCoverageSummary()`	✅ 30 min	Safe
`getMedexClaimSummaryQuery()`	❌ No cache	Returns query builder; executed by ClaimsSnapshot widget on every request
`getHedisGapTrackingQuery()`	❌ No cache	Most complex query — DISTINCT ON + multiple JOINs + LEFT JOIN subquery; executed by _GapTracker on every request
`getProvidersWithMedexSummary()`	❌ No cache	Multi-JOIN with medex aggregation subquery; executed by _ProviderSummary on every request

The three uncached methods are the primary bottleneck for the initial dashboard load.

2. `GET /logs/encounter-logs` — Encounter Log Resource (63 events, up to 100s)

File: app/Filament/Clusters/Logs/Resources/EncounterLogResource.php
List page: app/Filament/Clusters/Logs/Resources/EncounterLogResource/Pages/ListEncounterLogs.php
Model: EncounterLogSummary → encounter_log_summaries table

Time-of-day distribution (UTC): Clusters at 16:00–18:00 UTC (9–11am Pacific) — this is business-hours driven, triggered by staff starting their day and loading the logs page.

Duration note: 50 of 63 events exceed 30 seconds (max: 100.5s). This page is on the web Lambda function (timeout: 600s), not the 30s function, but these durations are still completely unacceptable.

Root cause — N+1 queries on every row:

The ListEncounterLogs page never overrides the table query to eager-load relationships. Filament lazy-loads each relationship per rendered row. With the default page size of 25 rows:

provider_name → 1 query per row (25 queries)
created_by_name → 1 query per row (25 queries)
updated_by_name → 1 query per row (25 queries)

That's 75+ extra queries just to display the table, before counting the 7 relationship-based filter joins (tenant, status, provider ×2, concurrent review status, createdBy, updatedBy).

Fix: Add eager loading to the list page query:

// In ListEncounterLogs.php
public function getTableQuery(): Builder
{
    return parent::getTableQuery()
        ->with(['provider', 'createdBy', 'updatedBy', 'status', 'member']);
}

3. `POST /livewire/update` — Dashboard Filter Interactions (24 events)

Affected widgets: _GapTracker, ClaimsSnapshot, _ProviderSummary
Environments affected: Both production and uat

Every dashboard filter change (provider, market, IPA, DOS year) dispatches a chart-filter-changed Livewire event. All widgets listen to this event and re-execute their queries simultaneously in a new POST /livewire/update request — with no debouncing and no caching.

Smoking gun from Sentry data:
On March 12, 2026 between 16:25–16:36 UTC, 10 timeout events fired in 11 minutes. This is a single user clicking through the filter dropdowns — each click triggering the three uncached DashboardService methods in parallel, consuming ~30s per interaction.

The three uncached methods re-executed on every filter click:

getHedisGapTrackingQuery() — Most dangerous. Complex subquery with:
- DISTINCT ON (member_id) subquery
- JOINs: hedis_gaps, member_market, markets, member_ipa, ipas
- LEFT JOIN with aggregation subquery
- GROUP BY on MasterHedisMeasure
- No cache wrapper at all
getMedexClaimSummaryQuery() — Returns a query builder (not a result), executed fresh by ClaimsSnapshot on every updateTable() call. No cache.
getProvidersWithMedexSummary() — Complex provider query with:
- Conditional JOINs: provider_market, markets, provider_ipa, ipas, member_provider, members
- Subquery for medex_summary aggregation with raw SQL
- No cache. Also called a second time in exportCsv().

Fix direction: Wrap each method in tenant+filter-aware cache keys (following the same pattern as the already-cached methods):

// Example key pattern
$cacheKey = "hedis_gap_tracking:{$tenantId}:{$providerId}:{$market}:{$ipa}:{$dosYear}";
Cache::remember($cacheKey, now()->addMinutes(5), fn() => /* query */);

Infrastructure Context

Function: vapor-Curitics-RA-production-d (Lambda, us-west-1)
Web timeout: 600s (vapor.yml), but effective query timeout ~30s for this function
Queue timeout: 900s
Runtime: PHP 8.3.30, Laravel 11, Filament, Laravel Vapor
Database: PostgreSQL (RDS RA-Prod)
Cache: Redis (RA-Production)

Recommended Fixes (Priority Order)

Priority 1 — Cache the three uncached DashboardService methods

Impact: Eliminates dashboard load timeouts and all /livewire/update timeouts
Files: app/Services/DashboardService.php
Methods: getHedisGapTrackingQuery(), getMedexClaimSummaryQuery(), getProvidersWithMedexSummary()
Approach: Use tenant+filter composite cache keys with a 5–15 minute TTL, matching the pattern used by getAverageStarScore() and getHccConditionCoverageSummary()

Priority 2 — Eager-load relationships in EncounterLogResource

Impact: Eliminates N+1 queries — reduces encounter-logs page from 75+ queries to ~8
File: app/Filament/Clusters/Logs/Resources/EncounterLogResource/Pages/ListEncounterLogs.php
Change: Override getTableQuery() to add .with(['provider', 'createdBy', 'updatedBy', 'status', 'member'])

Priority 3 — Enable slow query logging temporarily

Impact: Identifies the exact SQL causing the remaining slowness
Action: Set ENABLE_PERFORMANCE_LOGGING=true in production env vars
Note: This enables logSlowQuery() (threshold: 500ms) which logs to CloudWatch and sends to Sentry for queries >2s. Disable after diagnosis.

Priority 4 — Wire up Sentry user identity

Impact: Allows Sentry to track which users are affected (currently all show as null)
Action: Configure Sentry\Laravel\Integration::configureScope() in AppServiceProvider to call Sentry\configureScope() with the authenticated user's details

Monitoring

The existing LogSlowRequests middleware + PerformanceMonitor::logTimeout() setup is solid — it already caught this issue and reported it to both CloudWatch and Sentry. The CloudWatch alarm threshold of 28,000ms gives a 2-second buffer before the Lambda hard timeout.

Once fixes are deployed, the Sentry issues above can be resolved and the alarm should return to OK state within minutes of the first dashboard load completing under the threshold.

futzlarson/gist:7c95c7a7e451fab7769f3ec8d1d46964

Select an option

No results found

Select an option

No results found

Risk Adjustment — Lambda Timeout Analysis Report

TL;DR

How This Was Identified

Sentry Issue Summary

Issue Deep Dives

1. `GET /` — Performance Dashboard (151 events, ACCELERATING)

2. `GET /logs/encounter-logs` — Encounter Log Resource (63 events, up to 100s)

3. `POST /livewire/update` — Dashboard Filter Interactions (24 events)

Infrastructure Context

Recommended Fixes (Priority Order)

Priority 1 — Cache the three uncached DashboardService methods

Priority 2 — Eager-load relationships in EncounterLogResource

Priority 3 — Enable slow query logging temporarily

Priority 4 — Wire up Sentry user identity

Monitoring

futzlarson/gist:7c95c7a7e451fab7769f3ec8d1d46964

Risk Adjustment — Lambda Timeout Analysis Report

TL;DR

How This Was Identified

Sentry Issue Summary

Issue Deep Dives

1. GET / — Performance Dashboard (151 events, ACCELERATING)

2. GET /logs/encounter-logs — Encounter Log Resource (63 events, up to 100s)

3. POST /livewire/update — Dashboard Filter Interactions (24 events)

Infrastructure Context

Recommended Fixes (Priority Order)

Priority 1 — Cache the three uncached DashboardService methods

Priority 2 — Eager-load relationships in EncounterLogResource

Priority 3 — Enable slow query logging temporarily

Priority 4 — Wire up Sentry user identity

Monitoring

1. `GET /` — Performance Dashboard (151 events, ACCELERATING)

2. `GET /logs/encounter-logs` — Encounter Log Resource (63 events, up to 100s)

3. `POST /livewire/update` — Dashboard Filter Interactions (24 events)