On May 9, import-prod avg request duration jumped from ~10s to 3,240s (54 minutes), autoscaling kicked in from 2 to 7 tasks, and 500 errors spiked 20x. No code was deployed to the import service. The root cause was a Transloadit outage that exposed the fact that the import service has no enforceable request-level timeout anywhere in the stack.
Agentic import was ruled out — its CPU was ~0.5%, and import-prod request volume was flat across the incident window.