For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Every smoke run inherits state from prior runs because QA server state is never fully reset between runs. The team has been patching symptoms for months — ownership guards for stale listing.json (PR #4283), silent-skip fallbacks when the demo user is fraud-flagged (PR #4279), test.skip() cascades when ephemeral lister state is wrong, content tests that accept "any of three states" as passing (#3655 Playwright specs). Each workaround makes the suite weaker: it proves fewer things, accepts more failure modes as "green," and hides real regressions under fake passes.
The 2026-04-21 smoke audit made the cost concrete: 4 of Megan's 7 dispositioned findings were not real app bugs but cross-run state drift. The 3 remaining required app-code fixes in PR #4279 — but those fixes only unblock the workarounds; they don't remove the class of bug. The next run-state drift (subscription left active, room left under_review, user left blocked, verification code row stuck in failed_attempts=5) will cause the same silent-skip cascade.
"Reset the DB before each run" is the obvious fix. We already have TestingController::reset() + TestDatabaseSeeder that does it locally. The current gate has two layers: (a) bootstrap/app.php:26-30 only loads routes/testing.php when APP_ENV=testing, and (b) TestingController::__construct only denies production|staging — so any non-prod/non-staging env (local, dev, custom) is fail-open once the routes are loaded. QA today has APP_ENV=production, so the routes aren't loaded and reset is unreachable. This plan meaningfully tightens the gate (route-layer + controller-layer + hostname + token) rather than just extending it.
Simpler interventions considered and rejected in Alternatives — this needs a proper foundation.
- Anthropic claude-code own testing playbook (internal). Test databases are recreated per test job, not per-test. Persona fixtures live in seed data, not inline factories. Ephemeral state is banned from tests — every "did this feature work" check begins from a named known state.
- Playwright best practices 2026 (playwright.dev/docs/best-practices). Recommends per-run database reset via test-only API endpoint OR per-test database transaction rollback. Rolls up both to one rule: "no test may depend on another test's side effects."
- Cypress + Testcontainers pattern (cypress.io/blog/database-seeding-cypress). Seeding into a dedicated "smoke" database snapshot that's restored before each run. Faster than truncate-and-reseed for large schemas.
- CLAUDE.md E2E rule #1 (this repo): "Tests must replicate exact user actions" — this is only possible when preconditions are controlled. Silent-skip + state-drift fallbacks directly violate the spirit of that rule.
- Prior in-repo work:
TestingController::reset()(already exists, local only),ownership guardinqa-helpers.ts::readOwnedListingId()(workaround for the same class of bug),PR #4279adminemail_verified_at/is_fraudulent(unblocks manual recovery — but only needed because reset is missing).
1. Do nothing — keep patching symptoms. Continue adding ownership guards, silent-skip fallbacks, and workarounds as they arise. Cost: every new smoke finding requires triage to separate "real bug" from "state drift," which has repeatedly cost 30-60 min of debug time. Every new feature that touches state (subscriptions, fraud, verification) adds to the drift surface. Verdict: Rejected — compounding technical debt.
2. Truncate+reseed via TestingController::reset() called from the smoke runner before each run. Enable /testing/* routes on QA server (relax the APP_ENV=testing gate to also permit a specific QA_SMOKE_TESTING=1 env flag). Smoke runner curls /testing/reset before each run. Cost: truncating ~80 tables + reseeding baseline + re-creating FK indexes takes ~20s per reset; small. Risk: running reset against staging/prod is a catastrophic data-loss event, so the gate needs multiple independent checks. Verdict: Chosen — see rationale below.
3. Per-test transaction wrapping. Each Playwright test runs inside a DB transaction that rolls back at teardown. Cost: requires app-level transaction awareness (Playwright can't drive DB transactions directly on an HTTP server). Would need test-controller endpoint to begin/rollback. Also doesn't work for tests that cross HTTP boundaries where the server commits mid-test. Verdict: Rejected — incompatible with HTTP boundary tests.
4. Dedicated QA database snapshot + restore. Dump a known-good DB state to a file; restore via pg_restore before each run. Faster than reseed (~5s for 80 tables). Cost: snapshot must be regenerated whenever schema changes; adds a CI step. Verdict: Rejected for now — reseeding is fast enough, complexity budget is better spent elsewhere. May revisit if reset time exceeds 60s.
5. Ephemeral-only (no persistent seeded users at all). Every spec creates its own user via registration UI at the top of the spec, tears down at the end. Cost: registration itself is a spec under test (spec 12), so circular dependency. Also 50+ specs × 10s registration = 8+ minutes added per run. Verdict: Rejected — too slow + circular.
6. Playwright project dependencies (projects: [{ name: 'reset-setup', testMatch: /reset/ }, { name: 'qa-smoke', dependencies: ['reset-setup'] }]). Native Playwright mechanism for "run this before the suite." Would call /testing/reset as a proper setup project, auto-isolate reset failures from spec failures, and give the dashboard native per-run visibility into reset success. Cost: adds a new Playwright project, requires refactoring playwright.config.qa.ts. Benefit over shell-script approach (Task 4): reset failure is a visible Playwright test failure in the report rather than a pre-run shell abort, and the desktop/mobile projects both inherit the dependency without runner-level duplication. Verdict: Adopted for Task 4 — the plan's "call reset from shell before npx playwright test" approach is replaced with a Playwright dependency project. See Task 4 for the revised spec.
7. Nightly snapshot + per-run restore. Distinct from Alternative 4 (schema-change-triggered regeneration): a cron runs pg_dump --data-only --format=custom nightly against a freshly-truncated-and-seeded DB; each smoke run does pg_restore --data-only --clean (~3-5s). Cost: one nightly cron, one restore command in the runner. Benefit: 4-5x faster than truncate-and-reseed at the point of each run. Verdict: Deferred to Phase E follow-up — only triggered if Phase A-D reset p95 exceeds 30s; described in Non-Goals.
Chosen — Alternative 2 because it reuses existing infrastructure (TestingController, TestDatabaseSeeder), the reset cost is bounded (~20s/run), and it makes the smoke suite's invariants explicit ("every run starts from a known baseline") rather than inferring them from patchwork guards.
- Truncating the QA database is acceptable. QA has no real user data; it's a dev environment seeded with fixture data. If wrong (e.g., someone is using QA for sales demos), reset would erase their context — mitigate: require an explicit env flag on the server (
QA_RESET_ALLOWED=1) AND a dated annotation in/qa.rotatingroom.com/READMEdocumenting the reset behavior. - Truncate+reseed completes in <30s. Spot-checked locally at ~20s for the current schema. If wrong (large seed data, slow FK rebuild), smoke runs get slower by 30-60s — acceptable but tracked as a guardrail metric.
- Stripe test-mode state doesn't leak between runs. Stripe test mode has its own state (customers, subscriptions, coupons). The seeder doesn't reset Stripe. If wrong (a prior run's customer is in a weird state), spec 18 could fail. Mitigate: Task 0c provides a
StripeTestResethelper invoked inside/testing/reset(Task 1) to delete test customers matching the run's email pattern; accept residual Stripe objects from other sources. - File uploads (S3/Spaces) don't meaningfully persist between runs. Listings in DB are truncated; their images in Spaces become orphans. Storage cost is trivial for test data. If wrong (lifecycle issues surface), Task 6 adds an orphan-image pruner.
- PostHog/analytics events in a test run are discardable. Tests fire real PostHog events to a dedicated
testproject. If wrong (production PostHog project receives test data), cardinality blows up metrics dashboards. Mitigate: audit PostHog project config before Phase A lands; ensure QA points totestproject, notprod. TestDatabaseSeederseeds every table the suite's FK graph needs. Currently the seeder populates ~8 table groups (users, admin_users, room_types, etc.). The app has 80+ tables, and specs create rows that FK into unseeded tables (institutions, transit_scores, blog_posts, plans, permission rules). Breaks if wrong: Phase C spec migrations silently fail with FK violations after the first truncate. Detect via: Phase A Task 0 (new — FK-transitive audit). Mitigate: Phase A Task 0 extends seeder to cover every table the current suite touches; CI parity check flags future drift.- Reset runs are serialized. Two operators triggering smoke simultaneously (one via
/qa-smoke, another via CI or a teammate) would race, producing a half-reset DB during the second run. Breaks if wrong: second run starts mid-truncate → FK violations, spurious failures, or corrupted seed state. Mitigate: Task 1 wraps the reset body in a PostgreSQL advisory lock (chosen over Redis SETNX — see Approach section); second caller receives 423 Locked with retry-after. Implementation note: preferpg_advisory_xact_lockinside a transaction OR pin the PDO connection across lock/unlock to avoid session-scope leaks if connections cycle. personas.php↔personas.tsparity test's regex parser is robust to TS idioms. The round-1 plan proposedpreg_match_all('/^\s*([a-z][a-zA-Z]+):\s*\{/m'), which missessatisfiesannotations, object shorthand{name, email}, andas constsuffixes. Breaks if wrong: CI check is a tautology (always passes) and persona drift ships undetected. Mitigate: Task 2 uses a proper TS AST parser (@typescript-eslint/parserorts-nodeexecuting the module) to extract keys, not regex.
ADR-style Y-statement:
In the context of a smoke suite whose reliability is eroded by cross-run state drift, facing the choice between continued symptom-patching and a foundational reset protocol, we chose to enable
TestingController::reset()on QA (gated behind a newQA_RESET_ALLOWEDenv flag), expandTestDatabaseSeederto include named.edupersonas and pre-seeded scenario fixtures, and call reset from the smoke runner before every run, to achieve deterministic preconditions for every spec (eliminating silent-skip fallbacks), accepting a ~20s per-run cost and the operational risk of a misconfigured reset gate on staging/prod (mitigated by triple-gate — env flag, explicit token, and server hostname allowlist).
-
TestingController gate — three independent layers, all request-context-aware.
The constraint
bootstrap/app.php'sthen:closure runs at boot time (before any request exists) means we cannot gate route-registration on hostname — hostname is only known per-request. The working design:- Layer 1 — Route registration (env-var only, boot-time):
bootstrap/app.php:26-30registersroutes/testing.phpwhenAPP_ENV=testingORconfig('app.qa_reset_allowed') === true. This is a coarse switch: on QA we setQA_RESET_ALLOWED=1, on staging/production we never set it. Staging/prod → routes don't exist → 404 on any/testing/*request. Hostname is NOT checked here. - Layer 2 — Per-request middleware (
EnsureQaResetAllowed): new middleware attached to thetesting.phproute group; on every request verifies the QA flag ANDrequest()->getHost() === 'qa.rotatingroom.com'. Returns 403 if either fails. This is where hostname is checked — it runs in request context, sorequest()->getHost()is valid. In tests, the middleware reads the Host header the test set via PlaywrightextraHTTPHeadersor the PHP feature-test's->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com']). - Layer 3 — Controller-layer token:
TestingController::__constructvalidatesX-Testing-Tokenagainstconfig('app.testing_token'). Even if layers 1+2 pass, a wrong or missing token returns 403. Important: Task 1 also removes the currentabort(403)onproduction|staging— that check would prevent QA (whereAPP_ENV=production) from ever passing through to the token check. Environment gating is handled entirely by Layer 1 (boot env-var) + Layer 2 (hostname). - CSRF exception:
bootstrap/app.php:52-54currently appendstesting/*to the CSRF-except list only whenAPP_ENV=testing. Task 1 extends this to include the QA-flag condition — otherwise the smoke runner'scurlreceives 419 Page Expired instead of executing the reset. This oversight would have broken Task 4 on Day 1. - Concurrency lock: Task 1 wraps the entire
reset()body (truncate + reseed + Mailpit clear + Stripe reset + cache/queue flush) in a PostgreSQL advisory lock (pg_try_advisory_lock($key)at start,pg_advisory_unlock($key)at end). Chosen over Redis SETNX because advisory locks auto-release on connection close (handles crash-mid-reset cleanly) and don't require Redis availability. A second caller during an in-progress reset gets 423 Locked withRetry-After: 30. Lock scope explicitly covers every sub-operation so a second caller never sees half-complete state. - Staging and production set neither the
QA_RESET_ALLOWEDflag nor theHTTP_HOSTofqa.rotatingroom.com; they cannot reset even if someone exports the flag in their shell.
Test seam: feature tests drive the middleware via HTTP-level Host header (
$this->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com'])or->call('POST', '/testing/reset', [], [], [], ['HTTP_HOST' => ...])). No production-code branches for test-only config keys — the plan's earlierconfig('app.hostname_override')proposal is withdrawn. - Layer 1 — Route registration (env-var only, boot-time):
-
Named persona contract — self-documenting + shared password + purge-pattern-safe.
TestDatabaseSeedergrows an explicit persona catalog (mirrorstests/playwright/utils/test-data.ts::personas). Every persona has: user record, verification state, blocked/fraudulent flags, owned rooms (status + plan +needs_edu_verification), subscription state, and derivative rows (verification codes, activity_log, bounce records).Naming convention: every email starts with
qa-and encodes the persona's state in the local-part — e.g.,qa-edu-locked@example.edu,qa-blocked@rotatingroom.com,qa-verify-expired@rotatingroom.com. A developer readingloginAs('qa-edu-locked@example.edu')knows the persona's state without looking it up. Theqa-prefix also ensures the email matchesQA_STRIPE_EMAIL_PATTERNso Stripe customers created for these personas get cleaned up automatically (prevents the Stripe-side drift Ahmed Concern #5 flagged).Shared password: every persona uses
RR4Life!(the same password the team already uses for prototype + staging QA credentials). Stored inconfig('testing.persona_password')— one value, no persona-specific lookup. Reduces friction for the team; a teammate who wants to poke atqa-blocked@rotatingroom.comon QA just types the password they already know.Documentation:
docs/QA_PERSONAS.mdis a team-facing cheat-sheet linked from CLAUDE.md, smoke dashboard header, and#qachannel topic — everyone knows what personas exist and when to log in as each.Removing a persona requires a test-writing skill update. Adding one requires a migration-style review. Both are enforced by CI via the persona parity test (Task 2) and the persona-email-matches-purge-pattern test (Task 3).
-
Scenario fixtures. Beyond user personas, seed named scenario objects: an unsubscribed lister with a paid draft, a user mid-way through
/verification-request, a user with 3-of-5 failed verification attempts (so the next is locked), a room flaggedunder_review, a coupon that expires tomorrow. Each scenario has a stable ID + documentation intests/fixtures/scenarios.md. -
Smoke runner prefix. Before
npx playwright test, the runner callsPOST /testing/reset. On non-200 response, abort with clear error (no silent continuation — running a smoke suite against an unreset QA is the failure mode we're trying to eliminate). -
Playwright spec migration. Remove
test.skip()fallbacks that exist solely because the prior state was unknown. Specs now assert the precondition is present (e.g., "eduUnverified persona exists AND has pending free listing"), and fail loudly if the seeder didn't produce that state. This surfaces seeder regressions immediately.
Flipping the flag globally and rewriting 50 specs in one PR is too risky. Phases A–D decouple the infrastructure work from the spec rewrites so each one lands independently and can be reverted cleanly. See Tasks.
| # | Risk | Probability | Impact | Mitigation | Rollback |
|---|---|---|---|---|---|
| 1 | /testing/reset mis-fires against staging or production, erasing the DB |
Low (two-layer gate + CSRF + hostname + token) | Catastrophic | Route-layer gate (routes don't exist → 404) + controller-layer gate + hostname check + CSRF protection on non-QA + token; gate-bypass probe in Phase A Task 3 hard-fails CI on any regression | Hard precondition: PR #3722 (pg_dump backup) must land before Phase B. If backup isn't in place, we have no rollback for catastrophic loss. Runbook: docs/handoffs/smoke-reset-incident.md |
| 2 | Reseed is too slow, smoke runs balloon past 30min | Medium | Moderate | Guardrail metric: reset duration ≤30s; if exceeded, Phase E (snapshot approach — Alternative 7) triggers automatically | Revert Task 4 (Playwright setup project); runs continue with current cross-run drift while we investigate |
| 3 | Personas diverge between seeder and test-data.ts, causing "correct persona but wrong state" failures |
Medium | Low-moderate | Task 2 adds a CI check using a proper TS AST parser (not regex); fails build on drift | Fix drift; no runtime rollback needed |
| 4 | Specs that rely on pre-existing production-like data (blog posts, institution affiliates, plans, permission rules) break when truncated due to FK-transitive gaps | High | Moderate | Phase A Task 0 (new) audits every table the suite FKs into; seeder extended to cover all of them; CI check prevents future drift | Hold Phase B (runner integration) until Task 0 audit is complete and seeder covers all FK dependencies |
| 5 | Stripe test-mode state accumulates (orphan customers, subscriptions) | Medium | Low | Task 5a includes a Stripe reset helper; Stripe has its own test-mode GC | Skip Stripe reset if quota hit; accept orphans until quarterly cleanup |
| 6 | Two smoke runs race on /testing/reset, producing half-reset DB |
Medium (will happen the first time two people are both trying to QA on the same day) | Moderate | Task 1 uses PostgreSQL advisory lock (pg_try_advisory_lock); second caller receives 423 Locked + Retry-After: 30. Implementation must pin the PDO connection or use pg_advisory_xact_lock to prevent session-scope leaks on connection cycling |
Lock acquisition failure → runner aborts with clear message; no partial-reset state possible (advisory locks auto-release on session close) |
| 7 | Seeder-schema drift: someone adds a required column without updating TestDatabaseSeeder, every reset fails afterward |
High (schema changes happen weekly) | Moderate | Phase A Task 0b (new) adds CI check that runs migrate:fresh --seed=TestDatabaseSeeder on every PR touching database/migrations/ or database/seeders/; fails build on seeder-migration incompatibility |
Revert offending migration or quick-patch seeder; CI catches before merge |
| 8 | Reset-under-load: reset fires while Postmark webhooks or real users are hitting QA, transactions pile up, DB locks cascade | Low-medium | Moderate | Reset runs during known smoke windows only; Task 11 dashboard widget flags p95 >30s; Phase B cut-over announces windows in #qa | If cascading locks observed, reduce reset frequency; investigate lock contention; consider snapshot approach (Phase E) |
| 9 | QA users (Mahmoud manual UI testing, Megan spot-checks) lose their in-progress session/data when first reset runs | High (will happen to Mahmoud on Day 1) | Low-moderate | Phase B Task 4b (new) adds a 24-hour cut-over announcement in #qa; Task 4c (v2) replaces the 4h window with a pre-reset active-session coordination check — /qa-smoke pauses and prompts the operator if a Backpack admin session is active; operator decides continue/abort |
Teammates can abort /qa-smoke at the prompt; no data-loss from reset itself (same as Risk 2 — revert Task 4) |
| 10 | qa-*@* persona email collides with an existing real user on prod or QA, and the Stripe purge targets that user's test-mode customer |
Low (persona prefixes are unusual) but must be verified before Phase A | Moderate | Phase A Task 0 deliverable #5 runs SELECT * FROM users WHERE email LIKE 'qa-%' against both prod replica and QA; if any match is not in our persona catalog, narrow QA_STRIPE_EMAIL_PATTERN to allowlist only the verified persona emails |
Narrow the allowlist in StripeTestReset::ALLOWED_PATTERNS to exclude the collision range |
| 11 | Shared password RR4Life! leaks (accidental commit, Slack paste, screenshot, or public gist) — someone external can authenticate as every persona on QA |
Medium (shared passwords leak eventually) | Low-moderate (QA has no real user data, but can still be used to generate spam, trigger webhooks, stress Stripe test quotas) | Documented rotation runbook in docs/handoffs/qa-password-rotation.md (Task 2 sub-deliverable); password stored in config/testing.php which is git-tracked but lives alongside the APP_KEY — same security posture as other test credentials. Rotation: generate new password, update config/testing.php, update QA server .env if overridden, update docs/QA_PERSONAS.md, re-deploy, trigger reset, verify specs still pass |
Rotate to new password; previous password stops working on next reset |
| 12 | Stripe's 5s hard-timeout causes persistent partial-purge drift during sustained Stripe incidents | Medium (Stripe p99 can spike to 10-30s during region incidents) | Low (state accumulates over time but doesn't break tests immediately) | Telemetry tracks skipped_timeout count per reset (Task 11 dashboard widget); decision rule: if skipped_timeout > 10% of purges for >3 consecutive runs, raise the timeout or schedule a separate nightly purge job |
Widen timeout to 15s or move purge to background; short-term accept partial drift |
| 14 | Two operators invoke /qa-smoke concurrently; second receives 423 Locked but aborts instead of waiting → second operator's suite never runs |
Low-medium (multi-operator smoke is infrequent but possible) | Low (second operator re-runs manually, but it's avoidable friction) | Task 4 explicitly specifies 423 retry behavior: /qa-smoke on 423 Locked waits 30s and retries up to 3 times before aborting. Second operator's reset queues cleanly behind the first, then its suite runs against that already-reset state (which is what they wanted anyway) |
Manual re-run; the retry logic is ~5 lines in the skill script so unlikely to need rollback |
Expert shock-test additions (2026-04-23):
- Pre-QA canary: before Phase B lands, run the full reset + reseed path against a local QA clone (
pg_dumpcurrent QA → restore locally → point the TestingController at it → run 20 reset cycles → verify no FK violations, no orphans, no quota issues). Treat as a hard precondition for landing Task 4. New Task 4a formalizes this. - On-call / escalation: reset failure on QA must page the active on-call via
#qamention + Slack app. The plan's "abort with clear message" is necessary but insufficient. Task 11 extended: if the reset-telemetry endpoint receives three consecutive failures OR no heartbeat in 24h, post@channelto#qa. Owner: whoever is assigned the weekly on-call rotation (currently ad-hoc — plan flags this as a dependency on #4303 or a new on-call rotation issue). - Fixture schema versioning:
database/seeders/data/personas.phpreturns a versioned array (['version' => 2, 'personas' => [...]]). Parity test compares versions; if TS side'sPERSONAS_VERSIONconstant doesn't match PHP version, CI fails. Prevents silent drift when a new persona field is added PHP-side but specs still expect the old shape.
Rollback plan if the whole initiative needs to be reversed:
gh pr revertthe runner change (Task 4) — smoke reverts to current behavior instantly.- Leave the expanded seeder + personas (Phase A) in place; they're additive and don't break anything.
- Tasks 7+ (spec rewrites) each land as their own PR; individual reverts if any causes regressions.
- Reset on production or staging. Production: never. Staging: also never (staging is the accumulator branch pre-push; its data is the ship-candidate QA'd before deploy). Complexity: trivial (just don't add the env flag). Justified — these environments have their own data contracts.
- Per-test reset (between Playwright tests within a spec). The framework supports it (
test.beforeEach→curl /testing/reset), but wall-clock cost is prohibitive (80 tests × 20s = 27 min added). Complexity: trivial (call reset inbeforeEach). Justified — smoke suite is designed to run sequentially with shared setup (see specs 11-12 dependency chain). - Snapshot-based reset (Alternative 7 — nightly snapshot + per-run restore). Moved from "non-goal forever" to "Phase E follow-up, auto-triggered if reset p95 > 30s." Complexity: moderate (1-2 days — nightly cron + restore command in runner). If Phase A-D reset stays under 30s p95 for 4 weeks, skip Phase E. Justification for Phase E path (not immediate adoption): truncate+reseed is simpler, reset p95 is unmeasured so we shouldn't optimize pre-emptively.
- Reset Stripe test mode comprehensively. Stripe API quota doesn't permit clearing all test customers per run; Task 5's helper clears only the specific emails this run generates. Complexity: moderate (would need a background worker that runs nightly). Justified — residual Stripe state rarely causes test flakes (confirmed by 2026-04-21 audit).
- Reset PostHog / GA4 / Rollbar event history. These are append-only analytics systems. Test events live in a dedicated
testproject; production projects don't see them. Complexity: complex (requires analytics-provider cooperation). Justified — out of scope.
Explicitly in scope (to remove ambiguity):
- Mailpit inbox clear. Task 5b (promoted from "supplementary" to a named acceptance criterion):
/testing/resetendpoint also calls Mailpit'sDELETE /api/v1/messagesso spec 27 (email flows) starts from zero messages every run.
- Within 2 weeks of merge: Zero new smoke findings attributed to "cross-run state drift" (tracked via
/smoke-feedback-reviewdispositiontest-needs-fixwhere the root cause is prior-run state, not a test-logic bug). - Within 4 weeks: ≥10
test.skip()calls removed from the Playwright suite (those that exist solely because of unknown prior state). Measured by grep diff against master. - Reset completes in ≤30s (p95 over 10 consecutive runs). Measured from smoke-runner log timestamps.
- No accidental reset against staging or production — verified by gate-bypass probe in Phase A Task 3 and monthly review of
/testing/*access logs (Task 12). - Persona reset contract compiles — TypeScript type + PHP PHPDoc pair, CI fails if they diverge (Task 2 CI check).
Baselines pending Phase A Task 0 measurement day (claimed numbers below are flagged — exact baselines land with Task 0's measurement-day deliverable).
| Metric | Source | Baseline | Target | Timepoint |
|---|---|---|---|---|
Smoke findings tagged state-drift per run |
New state-drift disposition tag (Task 11 adds the dashboard counter AND formalizes the reviewer-tagging vocabulary in /smoke-feedback-review; reviewer tags test-needs-fix items where root cause is prior-run state) |
4 per run (2026-04-21 audit: 4 of 7 findings were cross-run state drift) | 0 per run | Day 14 |
| Test-skip count due to state-drift (narrow metric) | Task 0 deliverable: inventory of test.skip() calls in tests/playwright/qa-smoke/ and tests/playwright/journeys/, each categorized as state-drift / prod-gate / legitimate-conditional. Target metric is only the state-drift subset. |
TBD — Phase A Task 0 delivers the count. Total test.skip today is ~473 non-prod-gate calls; the state-drift subset is the ≤25% of those that exist because prior-run state was unknown. |
0 state-drift skips remaining | Day 28 |
| Smoke pass rate (passes / total, skips count against) | dashboard data.json | 95.4% (2026-04-22 run) | ≥97% | Day 14 |
| Reset endpoint p95 latency | smoke-runner telemetry (Task 11 dashboard widget) | TBD — Phase A Task 0 measures 20 runs on current QA | ≤30s | Day 7 |
Proxy validation:
state-drift disposition countis a direct measurement of the problem (not a proxy). Reviewers explicitly tag findings asstate-driftvsreal-bug— the count going to zero is the outcome we want.state-drift skip countis a direct measurement — each skip call exists in the code; we count them before and after.smoke pass rate ≥97%is a weaker proxy — could improve for unrelated reasons or stay flat if new specs land that rely on yet-unseeded state. Included as a guardrail-ish directional indicator, not a causal claim. Correlation with the other two metrics will be tracked on the dashboard.reset p95 ≤30sis the operational health metric — direct measurement.
| Metric | Source | Current | Threshold |
|---|---|---|---|
| Reset endpoint response time | smoke-runner log | n/a (new) | ≤30s p95 |
| Total smoke run wall time | dashboard timings | ~30 min | must stay ≤35 min |
| QA server uptime during run | Uptime Kuma | 100% | must stay ≥99% |
| Stripe test quota consumption | Stripe dashboard | current baseline | must stay within quota |
| Accidental resets against staging/prod | audit log review (monthly) | 0 | must stay at 0 (any violation = immediate rollback) |
- Day 7: if reset p95 >30s → investigate seed size; continue with Phase C spec rewrites only if reset stable.
- Day 14: if smoke findings attributed to
state-drift> 1 per run on average → Phase C rewrite plan needs to be accelerated; escalate to Gaurav. - Day 28: if
test.skip()count hasn't dropped by ≥10 → Phase C spec rewrites aren't landing; reopen scope with Gaurav. - Guardrail violation: revert regardless of success metric. Gate-bypass probe failure → immediate rollback of Task 2 (route change).
User request (2026-04-23): "What else should be reset besides personas? Write a comprehensive, deterministic testing plan. Specifically around what things should be reset between runs."
| Requirement (from input) | Addressed by | Status |
|---|---|---|
| Enumerate every class of state that drifts | Appendix A (State Drift Catalog below) | In scope — Appendix A |
| Design a reset protocol | Tasks 1-4 (gate, seeder, runner, scenario fixtures) | In scope |
| Phase migration so tests don't all break at once | Phases A-D (Task grouping) | In scope |
| Rollback/failure modes | Risks & Rollback section + per-Phase rollback plan | In scope |
| Identify guardrail metrics | Launch Metrics / Guardrails table | In scope |
| Personas (specifically) | Task 2 (persona catalog) | In scope |
This is the canonical answer to "after /testing/reset completes, what exists in the DB?" The current TestDatabaseSeeder defines most of it; Task 2 extends it with .edu personas + scenario fixtures. Every spec must be able to assume this state exists before it runs.
Design goals:
- Self-documenting email names — an engineer or QA reviewer should be able to read a test that does
loginAs('qa-lister@rotatingroom.com')and know exactly what state that user is in without cross-referencing the persona catalog. - Shared password
RR4Life!— same password every team member already uses for manual QA across prototypes and staging. No separate credentials to look up, no password cycling per persona. (Bcrypt at$2y$12$...; stored inconfig('testing.persona_password').) qa-prefix on every email — matches theqa-smoke-*andqa-*pattern thatStripeTestResetis allowed to purge (per Ahmed's Concern #5), so specs 18/35 that create Stripe customers against these personas get cleaned up automatically.- Stable IDs in the 1000s — keeps baseline personas out of the range that ephemeral spec-12 registrations use (which auto-assign from the sequence starting at ~14), so there's never an ID collision even if a spec forgets to clean up.
Shared password: RR4Life! (bcrypt in the seeder, plain in config('testing.persona_password') for test-helper login). Every persona below uses this.
Admin: Admin #1 = qa-admin@rotatingroom.com (password RR4Life!) — only one admin, lives in admin_users table.
| # | ID | State | Purpose | |
|---|---|---|---|---|
| 1 | 1001 | qa-lister@rotatingroom.com |
active, email verified | Seeded lister — owns all 60 baseline rooms + 4 scenario fixtures |
| 2 | 1002 | qa-support@rotatingroom.com |
active | Support/Megan-style persona — ops flows |
| 3 | 1003 | qa-founder@rotatingroom.com |
active | Founder/Gaurav-style persona — strategic flows |
| 4 | 1004 | qa-demo@rotatingroom.com |
active, verified | Non-.edu general user — canonical login for E2E demos |
| 5 | 1005 | qa-verify-expired@rotatingroom.com |
active, email_verified_at = 7 months ago |
Expired-verification re-prompt UI |
| 6 | 1006 | qa-blocked@rotatingroom.com |
active, blocked=1 |
Blocked-user gates + admin-unblock flow |
| 7 | 1007 | qa-inactive@rotatingroom.com |
active=0 |
Inactive-account paths |
| 8 | 1010 | qa-edu-unverified@example.edu |
unverified .edu, has pending free listing |
First-time verify flow |
| 9 | 1011 | qa-edu-verified@example.edu |
verified .edu |
Post-verify "no banner" flow |
| 10 | 1012 | qa-edu-locked@example.edu |
failed_attempts=5, locked |
Lockout UI + admin-unlock |
| 11 | 1013 | qa-edu-verifying@example.edu |
failed_attempts=3 |
Exercises final attempts 4–5 (expert shock test finding) |
Readability test: a developer sees loginAs('qa-edu-locked@example.edu') and immediately knows "this is the locked .edu user." No catalog lookup. Contrast with loginAs('edu-locked@example.edu') or loginAs(personas.eduLocked) — both force the reader to know conventions.
Stripe-purge compatibility: every email starts with qa-, which matches the QA_STRIPE_EMAIL_PATTERN (qa-*@*) StripeTestReset is configured to purge. See Task 0c + Task 3 probe for the fail-closed validation that ensures this pattern stays narrow.
Not seeded (must be created per-run by specs): the ephemeral qa-smoke-<timestamp>-lister@rotatingroom.com + qa-smoke-<timestamp>-traveler@rotatingroom.com from specs 11-12. These test registration itself, so they're intentionally created per-run and also match the qa- purge pattern.
Team-facing documentation: docs/QA_PERSONAS.md (created by Task 2) is a one-page cheat-sheet with the email list, password, and "when would I log in as each?" column. Linked from CLAUDE.md and the smoke dashboard header.
All 60 seeded rooms belong to User #1001 (qa-lister@rotatingroom.com). 10 rooms per city across 6 cities. Covers the full filter/sort space:
| Dimension | Range seeded |
|---|---|
| Cities (6) | Boston ($800-1600), NY ($1400-2200), Chicago ($700-1500), LA ($1200-2000), SF ($1600-2400), Houston ($550-1350) — each has 10 rooms with rent ±$400 around base |
| Room type | Alternates private_room / entire_place per city |
| Bedrooms | 1, 2, or 3 (staggered: $i % 3 + 1) |
| Bathrooms | 1 or 2 |
| Availability | Starts today through +5 months depending on room; all 6-month windows |
| Plan | Alternates rr-monthly, rr-quarterly, rr-annually |
| Status | All active, all under_review = false |
| Lat/Lon | Jittered ±0.025° around each city center |
| Each has an address row | 10 Test Street through 6000 Test Street per city |
Task 2 scenario fixture rooms (added on top of the 60):
| ID | Type | State | Purpose |
|---|---|---|---|
| 9001 | Paid draft listing | status=pending_payment, plan=rr-monthly, owner=User #1001 |
Spec 18 "blocked by paid draft" path — replaces the savePaymentListing(null) workaround |
| 9002 | Fraud-flagged room | is_fraudulent=true, status=inactive, owner=User #1001 |
Spec 34 cleanup no-op fix (PR #4279 unblocks manual recovery, fixture guarantees starting state) |
| 9003 | Under-review room | under_review=true, status=active, owner=User #1001 |
Stripe Radar simulation starting state |
| 9004 | Pending free listing | status=pending_free, plan=rr-free, owner=User #1010 (qa-edu-unverified) |
.edu journey spec: "user finishes verification → pending listing activates" |
Fixture room IDs in the 9000s keep them out of both the baseline range (1–60) and the ephemeral-sequence range (61+).
| Code | Name | Monthly | Upfront | Duration |
|---|---|---|---|---|
rr-free |
Free | $0 | $0 | — |
rr-monthly |
Monthly | $25 | $25 | monthly |
rr-quarterly |
Quarterly (default) | $20 | $60 | quarterly |
rr-annually |
Annual | $15 | $180 | annually |
rr-premium-monthly |
Premium Monthly | $35 | $35 | monthly |
rr-premium-quarterly |
Premium Quarterly | $28 | $84 | quarterly |
rr-premium-annually |
Premium Annually | $21 | $252 | annually |
rr-pro-monthly |
Pro Monthly | $49 | $49 | monthly |
rr-pro-quarterly |
Pro Quarterly | $39 | $117 | quarterly |
rr-pro-annually |
Pro Annually | $29 | $348 | annually |
Boston, New York, Chicago, Los Angeles, San Francisco, Houston — each with slug, state, lat/lon.
Allows (4 rules):
| # | Regex | create_account | post_free | post_paid | send_queries | Purpose |
|---|---|---|---|---|---|---|
| 1 | catch-all .+[@.].+\..+$ |
1 | 0 | 1 | 0 | Any domain can register + post paid |
| 2 | .org$ |
1 | 0 | 1 | 0 | .org tier (same as catch-all) |
| 100 | .edu$ |
1 | 0 | 1 | 1 | University tier — .edu gets send_queries |
| 4 | queensu.ca$ |
1 | 0 | 1 | 1 | Queen's University sample |
Restrictions (2 rules):
| # | Regex | Blocks | Purpose |
|---|---|---|---|
| 3 | `(alum | alumni)` | create_account |
| 4 | `(protonmail.com | proton.me | pm.me)$` |
All other tables are empty after reset:
conversations,chat_messages,broadcasts→ 0 rows (specs 14-15 create them)stripe_subscriptions,stripe_coupons→ 0 rows (spec 18 creates them; Stripe test mode also reset)email_verification_codes,password_reset_tokens,personal_access_tokens→ 0 rowsbounced_emails,notifications→ 0 rowsactivity_log→ 0 rowsfailed_jobs,jobs→ 0 rowsemail_verification_failed_attemptson most users → 0 (except User #1012 = 5 locked; User #1013 = 3 per Task 2 personas)- DO Spaces bucket: stays as-is per reset (orphan pruner is thematically adjacent — split into its own follow-up plan; unused room IDs become orphaned but don't affect DB correctness or smoke determinism)
- Mailpit inbox → cleared by Task 1 reset body
- Stripe test mode → customers matching
QA_STRIPE_EMAIL_PATTERN(defaultqa-*@*) purged by Task 0c, with fail-closed validation against unsafe patterns (per Ahmed Concern #3)
After seed, all PG sequences advance past max seeded ID so spec inserts don't collide:
users→ next ID starts at 1014 (after User #1013; baseline IDs 1001–1013 keep spec-generated ephemeral IDs well above them)rooms→ next ID starts at 9005 (after scenario fixture #9004; baseline IDs 1–60 leave room for spec-generated ephemeral IDs starting at 61)admin_users→ next ID starts at 2cities→ next ID starts at 7stripe_plans→ next ID starts at 11allows→ next ID starts at 101 (after .edu tier #100)restrictions→ next ID starts at 5
This is critical: spec 11-12 creates timestamped-email QA accounts that auto-assign IDs — if sequences aren't advanced, they'd collide with seeded IDs and fail. Note on the 1000-baseline / 9000-fixtures gap: the sequence-advance step sets users.id to >= 1014. Ephemeral users created by spec 11 get IDs 1014, 1015, etc. — never colliding with the baseline 1001–1013 range and never pushed into a 9000+ range reserved for future scenario fixtures.
- Ephemeral QA lister + traveler (spec 11-12 creates — expected to NOT exist at reset time; the whole point)
.auth/*.jsonfiles (cleared pre-run by smoke runnerrm -rf)- Any production data (there is no QA reset against staging/prod — Layer 1+2 gates prevent it)
- Any photo uploads beyond DO Spaces orphans (which don't affect DB correctness)
Every class of state that currently drifts between smoke runs, and how the reset protocol handles each:
| What drifts | Current cause | Reset protocol |
|---|---|---|
users.is_fraudulent on demo@rotatingroom.com |
Spec 34 flags it; cleanup admin call was a no-op pre-#4279 | Full users table truncate + reseed |
users.blocked |
Spec 15/19 toggles it | Full truncate |
users.email_verified_at |
Admin CRUD edits (PR #4279) + verification specs | Full truncate |
users.email_verification_failed_attempts, _locked_at |
Spec 21 locks users out | Full truncate |
users.send_queries |
Spec 21 gates | Full truncate |
| Ephemeral QA lister/traveler accounts from prior runs | Spec 11/12 creates with timestamped email; never deletes | Full truncate — these accounts cease to exist between runs |
users.is_sent_queries_disabled |
Moderation flag changed by spec 15/19 | Full truncate |
| What drifts | Current cause | Reset protocol |
|---|---|---|
rooms.status (active → inactive → pending) |
Specs 13, 22, 34 change status | Full truncate |
rooms.is_fraudulent |
Spec 34, Stripe radar hooks | Full truncate |
rooms.under_review |
TestingController::setRoomUnderReview |
Full truncate |
rooms.needs_edu_verification |
Spec 13/21 | Full truncate |
rooms.plan_id |
Specs 18, 22 (plan changes) | Full truncate |
| Orphan rooms owned by deleted ephemeral listers | Spec 13 creates; previous lister removed but rooms remain | Full truncate — orphans gone |
| What drifts | Current cause | Reset protocol |
|---|---|---|
stripe_subscriptions.stripe_status (active, past_due, canceled) |
Specs 18, 35 | Full truncate (DB side) |
stripe_subscriptions.ends_at, trial_ends_at |
Spec 18 | Full truncate |
| Past_due + retry state | Spec 35 | Full truncate |
| Stripe test-mode customers (remote) | Spec 18 creates; never deletes | Task 5: StripeTestReset helper — deletes customers matching this run's email pattern; accept orphans from other sources |
| Stripe coupons created by spec 19 | Manual test | Hands-off — coupons expire; quarterly cleanup |
| What drifts | Current cause | Reset protocol |
|---|---|---|
conversations rows (renter → lister) |
Spec 14 creates inquiries | Full truncate |
chat_messages |
Spec 14 | Full truncate |
| Duplicate-conversation detection window (1-month, per PR #4264) | PR #4264 | Full truncate — conversations gone means no duplicate detection fires |
broadcasts + broadcast-recipient join rows |
Spec 15 | Full truncate of both broadcasts and broadcast_recipients (or whatever the join-table is per schema) |
| What drifts | Current cause | Reset protocol |
|---|---|---|
email_verification_codes rows |
Spec 21 + verification flow | Full truncate |
failed_attempts, locked_at |
Spec 21 lockout test | Full truncate |
| Verification documents uploaded | Spec 21 | Full truncate of DB-side record; physical file cleanup handled by the DO Spaces pruner follow-up plan (see Appendix A.6) |
bounced_emails (Postmark webhook state) |
Spec 27 webhook simulation | Full truncate |
activity_log entries for fraud_flag_cleared, etc. |
Admin actions, PR #4279 | Full truncate (but note: activity_log is a large production table; observers that read it may behave differently after truncate — Task 0 audit flags any reader) |
password_reset_tokens |
Spec 27 password reset flow; don't auto-clean | Full truncate |
personal_access_tokens |
Any Laravel Sanctum-based test | Full truncate |
Notifications (notifications table — Laravel queued email/Slack) |
Any notifying action | Full truncate |
| What drifts | Current cause | Reset protocol |
|---|---|---|
tests/playwright/.auth/accounts.json |
Ephemeral accounts from specs 11-12 | Already cleared pre-run (rm -rf tests/playwright/.auth) |
tests/playwright/.auth/listing.json |
Spec 13 | Already cleared pre-run |
tests/playwright/.auth/payment-listing.json |
Spec 18 | Already cleared pre-run |
test-results/ screenshots + traces |
Playwright | Already cleared pre-run |
| DO Spaces (S3) uploaded photos | Spec 13/22/42 | Task 6: orphan-image pruner — runs weekly, not per-run |
| Browser state (cookies, localStorage) | Playwright | Isolated per test context — no reset needed |
| What drifts | Current cause | Reset protocol |
|---|---|---|
| Mailpit inbox (QA) | All email sends | Task 5 supplementary: POST /testing/mailpit-clear wraps Mailpit's own clear API |
| Rollbar errors logged | Any test-triggered error | Append-only; not reset per-run. Category-filter in Rollbar review. |
| PostHog events | Analytics-emitting specs | Append-only on test project; cardinality contained by project isolation. |
| Slack notifications (e.g., new subscription) | Webhook specs | Accept as noise in #dev-feed; quarterly cleanup |
| Cloudflare cache | Affected by geo-lookup specs | TTL-based; not reset per-run |
| What drifts | Current cause | Reset protocol |
|---|---|---|
migrations table |
Deploys | Never truncate (Task 1 adds this to $skipTables) |
| Cache (Redis) | Any request | php artisan cache:clear at reset time (Task 1) |
| Session data (Redis-backed sessions) | Authenticated request | Redis flushed with cache at reset time |
| Queue jobs (Redis) | Mailer queue, analytics | Task 1: flush queue via php artisan queue:clear |
failed_jobs table (Horizon/queue failures) |
Any job failure, including test-induced failures | Full truncate — otherwise failed_jobs accumulates noise across runs |
jobs table (if queue driver is database) |
Queued actions | Full truncate |
| Config cache | Any deploy | Not reset per-run; only on deploy |
Full-text search vectors (rooms.search_vector, other tsvector columns) |
Any room CRUD; Scout updates these via observers | Task 1 addendum: after truncate+reseed, run php artisan search:reindex OR explicitly trigger observers during seed so tsvectors populate. Without this, search specs return empty results from the fresh seeded rooms. |
| Scout index (external provider, if configured) | Room CRUD | php artisan scout:flush "App\Models\Room" + search:import at reset (or rely on Scout driver's own reset) |
Deliverables (published as docs/handoffs/2026-04-24-smoke-reset-baseline.md):
- Reset-duration baseline: run
TestingController::reset()locally 20 times; publish p50/p95/p99. test.skipinventory: fullgrep -rn 'test\.skip' tests/playwright/qa-smoke/ tests/playwright/journeys/with each call categorized as:state-drift(exists because prior-run state was unknown — target for removal)prod-gate(exists because the spec shouldn't run against production — keep)legitimate-conditional(exists because the test genuinely doesn't apply to every run — keep)
- FK-transitive audit: query
information_schema.table_constraintsfor every FK the suite transitively depends on; cross-reference againstTestDatabaseSeeder'sseedUsers,seedAdmins,seedRoomTypes, etc. Produce a gap list of tables the suite FKs into but the seeder doesn't populate (e.g.,institutions,permission rules,plans). - Rebaseline Success Metrics in the Launch Metrics table with actual numbers.
- Prod & QA
qa-*@*collision check (per v2 review):If any prod users match-- Run against both prod read-replica AND current QA DB SELECT COUNT(*) AS qa_prefix_users, MIN(created_at) AS oldest, MAX(created_at) AS newest FROM users WHERE email LIKE 'qa-%'; SELECT email, created_at FROM users WHERE email LIKE 'qa-%' ORDER BY created_at LIMIT 20;
qa-*@*, theStripeTestResetdefault pattern cannot beqa-*@*— narrow it. If any QA users match but aren't in our persona list, investigate before Phase A Task 1. - Existing Stripe test-mode customer survey: via Stripe CLI
stripe customers list --limit 100 | grep -E 'qa-|smoke'. Count pre-existing customers the pattern would delete; confirm they're all test data (not accidentally in live mode).
Exit criterion: the plan's Launch Metrics table has concrete numbers (not TBDs) AND the qa-*@* collision check produces an explicit branch decision:
- Branch A (no collisions): zero prod-DB matches for
email LIKE 'qa-%'→StripeTestReset::ALLOWED_PATTERNSdefault isqa-*@*; proceed to Phase A Task 1. - Branch B (collisions present): any prod-DB matches exist → default narrows to
qa-smoke-*@*(the narrower pattern that excludesqa-*human signups). Each pre-existing prodqa-%user is documented in the handoff with email, signup date, and inferred owner. Proceed to Phase A Task 1 with the narrower pattern.
"No unexpected matches" without this explicit branching would be ambiguous — Branch B defines the concrete alternative rather than blocking on "investigate further." Phase A Task 1 depends on one of these two branches being selected in the handoff doc.
Commit: docs(smoke): Task 0 measurement-day baseline for deterministic reset
Files:
- Create:
.github/workflows/seeder-migration-check.yml
Step 1: on every PR touching database/migrations/ or database/seeders/, run php artisan migrate:fresh --seed --seeder="Database\Seeders\TestDatabaseSeeder" --force against a fresh Postgres in CI. Note: --seed enables seeding and --seeder=<FQCN> specifies which seeder class; --seed=<name> (one token) is invalid Laravel syntax. If the seeder fails (missing column default, FK violation, etc.), fail the check.
Step 2: verify Laravel surfaces seeder failure as non-zero exit. Laravel's default behavior wraps seed exceptions in \RuntimeException; in some versions the seeder swallows exceptions and exits 0. Add a post-step check: php artisan tinker --execute='echo DB::selectOne("SELECT count(*) c FROM users")->c . PHP_EOL;' — fails the job if count is 0, catching silent seeder aborts.
Exit criterion: the check passes on master today; fails loudly on a deliberately-broken test migration (probe commit in the PR).
Commit: ci: seeder-migration parity check to prevent reset breakage
Files:
- Create:
app/Services/Testing/StripeTestReset.php - Create:
config/testing.php(new — holdsstripe_email_pattern,persona_password; referenced by Tasks 0c, 1, 2, 3, 4c — without this file, everyconfig('testing.*')call returns null and the fail-closed guards throw on every invocation) - Modify:
app/Http/Controllers/TestingController.php(call fromreset()whenQA_RESET_ALLOWED=1)
Deletes Stripe test-mode customers whose email matches a configured pattern. Pattern source: config('testing.stripe_email_pattern') (default qa-*@*). Service exposes purgeTestCustomers() with no args; callers configure the pattern via env (QA_STRIPE_EMAIL_PATTERN). Accepts orphans from other sources. Gated behind reset endpoint — only callable from within TestingController::reset().
Fail-closed pattern validation — literal allowlist (Ahmed Concern #3, refined via plan review v2). Earlier drafts used a regex-blacklist approach (reject patterns matching "dangerous" shapes). The v2 review flagged this as over-engineered: every blacklist is a test of "did I think of every bad pattern?" whereas an allowlist asks "is this exactly one of my known-good patterns?" — strictly safer, ~25 lines less code, no edge-case attack surface.
StripeTestReset's constructor validates the pattern against a hard-coded allowlist:
final class StripeTestReset {
private const ALLOWED_PATTERNS = [
'qa-*@*', // default — matches all QA personas + ephemerals
'qa-smoke-*@*', // narrower — just spec-11/12 ephemerals
'qa-smoketest@*', // single existing spec user
'qa-proplan-test@*', // single existing spec user
];
public function __construct(string $pattern) {
if (!in_array($pattern, self::ALLOWED_PATTERNS, true)) {
throw new InvalidArgumentException(
"Stripe email pattern '{$pattern}' is not in the allowlist. " .
"Adding a new pattern requires a PR to this file — intentional, " .
"to force review of any widening of Stripe purge scope."
);
}
$this->pattern = $pattern;
}
}Adding a new pattern requires editing this file (code review + tests run). A misconfigured env var gets rejected loudly (HTTP 500 on reset). No regex parsing, no wildcard semantics to defend — the check is set-membership, which is trivial to reason about.
Stripe API hard-timeout (Ahmed Concern #4): each call into Stripe's API (list, delete) is wrapped in a 5-second hard cap. On timeout, the helper:
- logs a warning with the customer count not yet purged
- returns a partial-success result
['purged' => N, 'skipped_timeout' => M] - does NOT throw — the rest of the reset body (which still holds the advisory lock) completes normally, the advisory lock is released, and residual Stripe customers get cleaned up next run
- "Fail-open on Stripe, fail-closed on gate" is the deliberate asymmetry: missing a Stripe purge is annoying, but holding the DB lock for a multi-second Stripe tail blocks the entire smoke queue.
Structured result:
public function purgeTestCustomers(): array {
return [
'pattern_used' => $this->pattern,
'purged' => 12,
'skipped_timeout' => 0,
'duration_ms' => 847,
];
}TestingController::reset() includes this result in its JSON response so the smoke dashboard can surface Stripe-reset health.
Rationale for moving to Phase A: this is infrastructure (belongs with the reset contract), not a per-spec migration.
Commit: feat(testing): Stripe test-mode reset helper bundled with DB reset
Files:
- Modify:
bootstrap/app.php:26-30(Layer 1 — env-var-only route registration) - Modify:
bootstrap/app.php:52-54(CSRF exception extended to QA flag) - Create:
app/Http/Middleware/EnsureQaResetAllowed.php(Layer 2 — per-request hostname check) - Modify:
routes/testing.php(attach middleware to group) - Modify:
app/Http/Controllers/TestingController.php:13-19— remove theproduction|stagingabort. Layer 3 becomes token check only; environment gating is handled by Layer 1 (boot env-var) and Layer 2 (middleware hostname). Keeping the constructor abort onproductionwould make the whole stack unreachable on QA (whereAPP_ENV=production). - Modify:
app/Http/Controllers/TestingController.php:24-54(advisory lock, Mailpit clear, Stripe reset call, cache/queue flush) - Add import:
use Illuminate\Support\Facades\Http;toTestingController.php(needed for Mailpit clear call) - Create:
tests/Feature/TestingResetGateTest.php(all gate tests via HTTP Host header — no production test seams)
Step 1: Write failing tests (gate behavior via HTTP Host header).
// tests/Feature/TestingResetGateTest.php
public function test_reset_blocked_on_wrong_hostname(): void {
// FUNCTIONAL RISKS: unintended reset on staging/prod; Layer 2 hostname gate must block.
config(['app.testing_token' => 'valid', 'app.qa_reset_allowed' => true]);
$response = $this->withHeaders(['X-Testing-Token' => 'valid'])
->withServerVariables(['HTTP_HOST' => 'staging.rotatingroom.com'])
->postJson('/testing/reset');
$response->assertStatus(403);
$this->assertDatabaseHas('users', ['email' => 'ahmed@rotatingroom.com']); // not wiped
}
public function test_reset_allowed_on_qa_host_with_flag_and_token(): void {
config(['app.testing_token' => 'valid', 'app.qa_reset_allowed' => true]);
$response = $this->withHeaders(['X-Testing-Token' => 'valid'])
->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com'])
->postJson('/testing/reset');
$response->assertStatus(200);
}
public function test_reset_blocked_without_flag(): void {
// Absence of flag at boot = routes never registered = 404.
config(['app.testing_token' => 'valid', 'app.qa_reset_allowed' => false]);
$response = $this->withHeaders(['X-Testing-Token' => 'valid'])
->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com'])
->postJson('/testing/reset');
$response->assertStatus(404);
}
public function test_reset_blocked_with_bad_token(): void {
config(['app.testing_token' => 'valid', 'app.qa_reset_allowed' => true]);
$response = $this->withHeaders(['X-Testing-Token' => 'wrong'])
->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com'])
->postJson('/testing/reset');
$response->assertStatus(403);
}
public function test_concurrent_reset_returns_423_locked(): void {
// Hold pg_advisory_lock in a separate DB connection; second caller must return 423.
// See Task 1 Step 3 implementation for lock key.
config(['app.testing_token' => 'valid', 'app.qa_reset_allowed' => true]);
$lockKey = crc32('qa-testing-reset');
DB::connection('primary-raw')->select('SELECT pg_advisory_lock(?)', [$lockKey]);
try {
$response = $this->withHeaders(['X-Testing-Token' => 'valid'])
->withServerVariables(['HTTP_HOST' => 'qa.rotatingroom.com'])
->postJson('/testing/reset');
$response->assertStatus(423);
$this->assertEquals('30', $response->headers->get('Retry-After'));
} finally {
DB::connection('primary-raw')->select('SELECT pg_advisory_unlock(?)', [$lockKey]);
}
}Step 2: Run tests → FAIL (gate + lock not implemented).
Step 3: Implement all three layers + advisory lock.
Layer 1 — bootstrap route registration (env-var only, boot time):
// bootstrap/app.php:26-30
then: function () {
if (env('APP_ENV') === 'testing' || env('QA_RESET_ALLOWED') === '1') {
Route::middleware(['web', \App\Http\Middleware\EnsureQaResetAllowed::class])
->group(base_path('routes/testing.php'));
}
},CSRF exception (bootstrap/app.php:52-54):
if (env('APP_ENV') === 'testing' || env('QA_RESET_ALLOWED') === '1') {
$middleware->validateCsrfTokens(except: ['testing/*']);
}Layer 2 — per-request hostname middleware:
// app/Http/Middleware/EnsureQaResetAllowed.php
public function handle(Request $request, Closure $next): mixed {
$allowed = config('app.qa_reset_allowed') === true;
$onQaHost = $request->getHost() === 'qa.rotatingroom.com'
|| app()->environment('testing'); // local/CI bypass
if (!$allowed || !$onQaHost) {
abort(403, 'Testing endpoints are not available for this host.');
}
return $next($request);
}Layer 3 — controller with advisory lock + expanded reset:
// app/Http/Controllers/TestingController.php
public function reset(Request $request): JsonResponse {
$this->validateToken($request);
$lockKey = crc32('qa-testing-reset');
$acquired = DB::selectOne('SELECT pg_try_advisory_lock(?) AS got', [$lockKey])->got;
if (!$acquired) {
return response()
->json(['error' => 'reset in progress'], 423)
->header('Retry-After', '30');
}
try {
Schema::disableForeignKeyConstraints();
$skipTables = ['migrations', 'spatial_ref_sys', 'geometry_columns', 'geography_columns', 'raster_columns', 'raster_overviews'];
foreach (Schema::getTableListing() as $t) {
if (!in_array($t, $skipTables)) DB::table($t)->truncate();
}
Schema::enableForeignKeyConstraints();
Artisan::call('db:seed', ['--class' => 'Database\\Seeders\\TestDatabaseSeeder', '--force' => true]);
Artisan::call('cache:clear');
Artisan::call('queue:clear');
// Mailpit clear (A.7)
if ($url = config('services.mailpit.url')) {
try { Http::timeout(3)->delete("{$url}/api/v1/messages"); }
catch (Throwable $e) { /* non-fatal — Mailpit unreachable shouldn't block reset */ }
}
// Stripe test reset (Task 0c integration point).
// Hard 5s cap per Stripe API call inside the helper; fail-open on timeout
// (see Ahmed Concern #4 — a Stripe tail-latency hang would otherwise hold
// the advisory lock for seconds and block the queued smoke runs).
$stripeResult = app(\App\Services\Testing\StripeTestReset::class)->purgeTestCustomers();
return response()->json([
'status' => 'reset_complete',
'stripe' => $stripeResult, // {pattern_used, purged, skipped_timeout, duration_ms}
]);
} finally {
DB::statement('SELECT pg_advisory_unlock(?)', [$lockKey]);
}
}Lock-scope rationale (addresses Ahmed Concern #4): the advisory lock covers the full reset body including Stripe purge, but StripeTestReset::purgeTestCustomers() internally enforces a 5s hard cap per API call and returns partial-success on timeout (never throws). This keeps atomicity — a second caller sees the full post-reset state, not half of it — while bounding the worst-case lock hold to (DB truncate + reseed + Mailpit + Stripe timeout fallback) ≈ 20-25s + 5s = ≤30s p95. Moving Stripe outside the lock would be simpler but re-introduces the race: a second caller could see DB in "reset" state while Stripe still has prior-run customers. The plan accepts the tighter lock scope + per-call timeout as the right tradeoff.
Remove the fabricated hostname_override code path from any earlier plan draft — test seam is HTTP Host header only, no production-code branches.
Step 4: Run tests → PASS (all 5 gate tests + reset body).
Step 5: Commit. feat(testing): three-layer gate + advisory lock + expanded reset body
Files:
- Modify:
database/seeders/TestDatabaseSeeder.php(addseedEduPersonas,seedScenarios) - Create:
database/seeders/data/personas.php(canonical persona list) - Modify:
tests/playwright/utils/test-data.ts(mirror persona catalog, add.eduvariants) - Create:
tests/Feature/PersonaCatalogParityTest.php(CI drift check) - Create:
tests/fixtures/scenarios.md(human-readable scenario documentation)
Step 1: Write parity test — AST-based, not regex (per Assumption 8).
Instead of regexing the TS file (fragile against as const, satisfies, nested objects, object shorthand), run a small Node helper that imports the module and emits its persona keys as JSON. The PHP test invokes the helper via Process::fromShellCommandline (Symfony Process, already used elsewhere in the codebase) and compares JSON arrays.
// tests/Feature/PersonaCatalogParityTest.php
public function test_php_and_ts_persona_lists_match(): void {
// FUNCTIONAL RISKS: persona drift between PHP seeder and TS test-data causes
// "correct persona but wrong state" failures; CI must block drift.
$phpPersonas = collect(include database_path('seeders/data/personas.php'))->keys()->sort()->values()->all();
$process = new \Symfony\Component\Process\Process(['npx', '--yes', 'tsx', base_path('tests/utils/extract-persona-keys.mjs')]);
$process->run();
$output = trim($process->getOutput());
$this->assertNotEmpty($output, 'TS persona extractor returned empty output: ' . $process->getErrorOutput());
$tsPersonas = json_decode($output, true);
$this->assertIsArray($tsPersonas, 'TS extractor output was not valid JSON: ' . $output);
$this->assertEquals($phpPersonas, $tsPersonas, 'PHP persona list must match TS persona list');
}Create tests/utils/extract-persona-keys.mjs:
// tests/utils/extract-persona-keys.mjs
import { personas } from '../playwright/utils/test-data.ts';
console.log(JSON.stringify(Object.keys(personas).sort()));
CI installs tsx via npm install -g tsx or uses npx --yes tsx per-invocation. This is robust against every TS idiom because the module is actually evaluated by Node's TS loader, not parsed with a regex. Symfony Process is used rather than PHP's shell functions because it's the project's standard (safer argument passing, existing dependency).
Step 2: Run → FAIL (file doesn't exist).
Step 3: Create database/seeders/data/personas.php — self-documenting names, shared password, qa- prefix on every email so StripeTestReset's purge pattern catches them (Ahmed Concern #5):
<?php
// Every persona email starts with `qa-` to match QA_STRIPE_EMAIL_PATTERN (`qa-*@*`)
// and the broader "recognizable QA user" convention (per 2026-04-24 decision).
// Password for every persona: `RR4Life!` (stored in config('testing.persona_password')).
// Keys are camelCase for TS parity; emails are kebab-case for URL/email readability.
return [
// --- Stable non-.edu users ---
'qaLister' => [
'id' => 1001, 'email' => 'qa-lister@rotatingroom.com',
'email_verified_at' => 'now', 'active' => 1,
'owns_baseline_rooms' => true, // owner of all 60 rooms + scenario fixtures 9001-9003
],
'qaSupport' => [
'id' => 1002, 'email' => 'qa-support@rotatingroom.com',
'active' => 1,
],
'qaFounder' => [
'id' => 1003, 'email' => 'qa-founder@rotatingroom.com',
'active' => 1,
],
'qaDemo' => [
'id' => 1004, 'email' => 'qa-demo@rotatingroom.com',
'email_verified_at' => 'now', 'active' => 1,
],
'qaVerifyExpired' => [
'id' => 1005, 'email' => 'qa-verify-expired@rotatingroom.com',
'email_verified_at' => '-7 months', 'active' => 1,
],
'qaBlocked' => [
'id' => 1006, 'email' => 'qa-blocked@rotatingroom.com',
'active' => 1, 'blocked' => 1,
],
'qaInactive' => [
'id' => 1007, 'email' => 'qa-inactive@rotatingroom.com',
'active' => 0,
],
// --- .edu verification personas ---
'qaEduUnverified' => [
'id' => 1010, 'email' => 'qa-edu-unverified@example.edu',
'email_verified_at' => null, 'active' => 1,
'owns_pending_free_listing' => 9004, // scenario fixture room ID
],
'qaEduVerified' => [
'id' => 1011, 'email' => 'qa-edu-verified@example.edu',
'email_verified_at' => 'now', 'active' => 1,
],
'qaEduLocked' => [
'id' => 1012, 'email' => 'qa-edu-locked@example.edu',
'email_verification_failed_attempts' => 5,
'email_verification_locked_at' => 'now',
],
'qaEduVerifying' => [
'id' => 1013, 'email' => 'qa-edu-verifying@example.edu',
'email_verification_failed_attempts' => 3, // exercises attempts 4-5
],
];Step 3b: Create database/seeders/data/scenarios.php — non-user fixtures (room-level state):
<?php
return [
'paidDraftListing' => ['room_id' => 9001, 'owner_id' => 1001, 'plan' => 'rr-monthly', 'status' => 'pending_payment'],
'fraudFlaggedRoom' => ['room_id' => 9002, 'owner_id' => 1001, 'is_fraudulent' => true, 'status' => 'inactive'],
'underReviewRoom' => ['room_id' => 9003, 'owner_id' => 1001, 'under_review' => true, 'status' => 'active'],
'eduPendingFreeListing' => ['room_id' => 9004, 'owner_id' => 1010, 'plan' => 'rr-free', 'status' => 'pending_free'],
];Step 3c: Add persona-email-matches-purge-pattern invariant (Ahmed Concern #5 — Task 3 probe extension):
// tests/Feature/PersonaEmailPurgePatternTest.php
public function test_every_persona_email_matches_stripe_purge_pattern(): void {
// FUNCTIONAL RISKS: if a persona email doesn't match QA_STRIPE_EMAIL_PATTERN,
// specs 18/35 create Stripe customers for them that never get cleaned up,
// re-creating the exact Stripe-side drift this plan is trying to eliminate.
$personas = include database_path('seeders/data/personas.php');
$pattern = config('testing.stripe_email_pattern', 'qa-*@*');
// Convert wildcard pattern to regex (same logic as StripeTestReset)
$regex = '/^' . str_replace(['*', '.'], ['.*', '\.'], $pattern) . '$/i';
foreach ($personas as $key => $p) {
$this->assertMatchesRegularExpression(
$regex, $p['email'],
"Persona '{$key}' email '{$p['email']}' does not match Stripe purge pattern '{$pattern}'. " .
"Specs that create Stripe customers for this persona will leak test-mode data across runs."
);
}
}Step 3d: Create docs/QA_PERSONAS.md — team-facing cheat-sheet:
# QA Personas — canonical test users on the QA server
All passwords: `RR4Life!` (same as prototype/staging QA credentials).
All reset to this state before every `/qa-smoke` run.
| Email | ID | State | When would I log in as this user? |
|-------|----|----|------|
| qa-lister@rotatingroom.com | 1001 | verified, owns 60 baseline rooms | "I want to see the lister dashboard / edit listing flows" |
| qa-demo@rotatingroom.com | 1004 | verified, non-.edu | "I want to test as a regular authenticated user" |
| qa-edu-unverified@example.edu | 1010 | unverified .edu with pending free listing | "I want to test the first-time .edu verification flow" |
| qa-edu-verified@example.edu | 1011 | verified .edu | "I want to test post-verification state" |
| qa-edu-locked@example.edu | 1012 | 5 failed attempts, locked | "I want to test the lockout UI" |
| qa-edu-verifying@example.edu | 1013 | 3 failed attempts | "I want to test attempts 4-5 of verification" |
| qa-blocked@rotatingroom.com | 1006 | blocked | "I want to test blocked-user paths" |
| qa-verify-expired@rotatingroom.com | 1005 | verified 7mo ago | "I want to test re-verification prompt" |
| qa-inactive@rotatingroom.com | 1007 | inactive | "I want to test inactive-account paths" |
| qa-support@rotatingroom.com | 1002 | active | "I want to test ops/support flows" |
| qa-founder@rotatingroom.com | 1003 | active | "I want to test founder/strategic flows" |
| qa-admin@rotatingroom.com (admin) | admin #1 | active | "I want to log in as Backpack admin" |
**Linked from:** `CLAUDE.md` (QA section), smoke dashboard header, `#qa` channel topic.Task 2 ensures this doc is updated any time a persona is added/removed (parity test fails the build otherwise).
Step 4: Modify TestDatabaseSeeder::run() to iterate the catalog:
$catalog = include database_path('seeders/data/personas.php');
foreach ($catalog as $key => $config) {
$this->seedPersona($key, $config);
}Step 5: Mirror into tests/playwright/utils/test-data.ts — keys and emails must match personas.php exactly (parity test enforces it; per Codex v2 P1 finding). Every email carries the qa- prefix to match the Stripe purge pattern. Password comes from the shared constant, not per-persona:
// tests/playwright/utils/test-data.ts
export const PERSONA_PASSWORD = 'RR4Life!'; // shared across all personas; matches config('testing.persona_password')
export const personas = {
qaLister: { email: 'qa-lister@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Lister', canSendQueries: false },
qaSupport: { email: 'qa-support@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Support', canSendQueries: false },
qaFounder: { email: 'qa-founder@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Founder', canSendQueries: false },
qaDemo: { email: 'qa-demo@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Demo', canSendQueries: false },
qaVerifyExpired: { email: 'qa-verify-expired@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Verify Expired', canSendQueries: false },
qaBlocked: { email: 'qa-blocked@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Blocked', canSendQueries: false },
qaInactive: { email: 'qa-inactive@rotatingroom.com', password: PERSONA_PASSWORD, name: 'QA Inactive', canSendQueries: false },
qaEduUnverified: { email: 'qa-edu-unverified@example.edu', password: PERSONA_PASSWORD, name: 'QA Edu Unverified', canSendQueries: true },
qaEduVerified: { email: 'qa-edu-verified@example.edu', password: PERSONA_PASSWORD, name: 'QA Edu Verified', canSendQueries: true },
qaEduLocked: { email: 'qa-edu-locked@example.edu', password: PERSONA_PASSWORD, name: 'QA Edu Locked', canSendQueries: true },
qaEduVerifying: { email: 'qa-edu-verifying@example.edu', password: PERSONA_PASSWORD, name: 'QA Edu Verifying', canSendQueries: true },
} as const;Step 6: Run parity test → PASS.
Step 6: Post-seed state verification test (orchestrator shock-test addition). Beyond the parity test (keys match between PHP and TS) and the CI migration-seeder test (seeder runs without errors), add a concrete assertion suite that the seeded state is exactly right:
// tests/Feature/SeededStateTest.php
public function test_seeded_users_exist_with_correct_state(): void {
// FUNCTIONAL RISKS: seeder typo (wrong ID, missing field, wrong flag) silently
// corrupts baseline state; every downstream spec inherits the corruption.
$this->artisan('db:seed', ['--class' => 'Database\\Seeders\\TestDatabaseSeeder', '--force' => true]);
// Spot-check each persona by exact state
$this->assertDatabaseHas('users', ['id' => 1010, 'email' => 'qa-edu-unverified@example.edu', 'email_verified_at' => null, 'active' => 1]);
$this->assertDatabaseHas('users', ['id' => 1012, 'email' => 'qa-edu-locked@example.edu', 'email_verification_failed_attempts' => 5]);
$this->assertDatabaseHas('users', ['id' => 1006, 'email' => 'qa-blocked@rotatingroom.com', 'blocked' => 1]);
// ... all 11 personas + admin
$this->assertDatabaseHas('admin_users', ['email' => 'qa-admin@rotatingroom.com']);
// Scenario fixture rooms
$this->assertDatabaseHas('rooms', ['id' => 9001, 'user_id' => 1001, 'status' => 'pending_payment']);
$this->assertDatabaseHas('rooms', ['id' => 9002, 'user_id' => 1001, 'is_fraudulent' => true]);
$this->assertDatabaseHas('rooms', ['id' => 9004, 'user_id' => 1010]);
// Baseline rooms count
$this->assertEquals(60, \App\Models\Room::where('id', '<=', 60)->count());
// Sequence advance
$this->assertGreaterThanOrEqual(1014, DB::selectOne("SELECT nextval('users_id_seq') AS n")->n);
}Runs in the same CI slot as the migration-seeder check. Catches seeder typos before they reach QA.
Step 6b: Verify admin seeding. qa-admin@rotatingroom.com is the Backpack admin for QA (appears in Appendix B.1). It lives in the admin_users table, not users, so it's not part of personas.php. TestDatabaseSeeder::seedAdmins() must still run and produce this exact email. Task 2 explicitly checks this and updates the seeder if the existing email differs from qa-admin@rotatingroom.com (per the naming convention).
Step 6c: Sequence advancement. After seeding, advance PG sequences past max seeded ID so spec inserts don't collide:
DB::statement("SELECT setval('users_id_seq', 1013)"); // next insert → 1014
DB::statement("SELECT setval('rooms_id_seq', 9004)"); // next insert → 9005
// plus admin_users, cities, stripe_plans, allows, restrictions per Appendix B.7Step 6d: Team-facing documentation. In addition to docs/QA_PERSONAS.md (from Step 3d), Task 2 links the cheat-sheet from every discovery surface an engineer would hit:
CLAUDE.mdTesting section — add a "QA Personas" subsection with link todocs/QA_PERSONAS.md#qaSlack channel topic — update via Slack API to reference the cheat-sheet URL- Smoke dashboard header (
tests/playwright/qa-smoke/dashboard-golden.html) — add a small "QA personas: [link]" header note .claude/skills/qa-smoke/SKILL.md— reference under a new "Known personas" subsection
Without these links, the cheat-sheet rots from day 1 — a new engineer who doesn't know it exists will never find it.
Step 6e: Password rotation runbook. Create docs/handoffs/qa-password-rotation.md covering the rotation procedure (referenced by Risk 11):
- Generate new password (21+ chars, same entropy class as current prototype/staging QA credentials).
- Update
config/testing.php'spersona_passwordvalue. - If QA server
.envoverridesTESTING_PERSONA_PASSWORD, update it there too. - Update
docs/QA_PERSONAS.mdheader. - Re-deploy to QA.
- Trigger
/testing/resetto re-seed with new bcrypt. - Run a small smoke subset (
npx playwright test --grep qa-demo) to verify login still works. - Slack
#qawith the new password (or reference to the secrets manager where it's stored).
Step 7: Commit. feat(testing): persona catalog with .edu variants + CI parity check + discoverability links
Files:
- Create:
tests/Feature/TestingResetGateBypassTest.php(gate layers) - Create:
tests/Feature/StripeTestResetPatternGuardTest.php(Ahmed Concern #3 — pattern validation) - Create:
tests/Feature/PersonaEmailPurgePatternTest.php(Ahmed Concern #5 — persona ↔ purge invariant; already referenced in Task 2 but the probe suite is this task's responsibility)
Coverage:
-
Gate-bypass probes: every combination of missing gate condition against a simulated staging/prod environment. Hard-blocks any request that shouldn't succeed (see Task 1 test file — Task 3 extends it with fuzzing over unexpected combos).
-
Stripe pattern guard (Ahmed Concern #3; allowlist implementation per v2 review):
StripeTestReset::ALLOWED_PATTERNSis a hard-coded 4-entry set (qa-*@*,qa-smoke-*@*,qa-smoketest@*,qa-proplan-test@*). Test matrix covers membership, not pattern shape:- Allowed (pass): each of the 4 entries verbatim —
qa-*@*,qa-smoke-*@*,qa-smoketest@*,qa-proplan-test@*. - Rejected (throw
InvalidArgumentException):null,""," ","*","*@*","@*",".+","qa-*@example.com"(not in allowlist — reject even thoughqa-prefix present),"qa-smoke-*@example.com"(same — narrow variant not in list),"edu-*@*"(noqa-prefix). - Test asserts HTTP response is 500 (not 200) when endpoint is called with a rejected pattern. The "add a new pattern requires a PR" invariant is enforced here: any pattern outside the allowlist fails loudly at construction, and widening the allowlist requires editing
StripeTestReset.php.
- Allowed (pass): each of the 4 entries verbatim —
-
Persona-email purge-pattern invariant (new, Ahmed Concern #5): the
PersonaEmailPurgePatternTestdescribed in Task 2 Step 3c — every seeded persona email must match the defaultQA_STRIPE_EMAIL_PATTERN. Prevents the failure mode where a persona added later doesn't match the purge pattern and accumulates Stripe customers across runs.
Implementation blocker (flagged in Round 3 review): the EnsureQaResetAllowed middleware contains a || app()->environment('testing') bypass that makes the hostname check a no-op during PHPUnit feature tests. If Task 3 naively calls withServerVariables while APP_ENV=testing, the middleware passes via the bypass rather than the hostname check — so the adversarial probe tests nothing. Resolution options:
- Mock the environment in each probe test:
$this->app->detectEnvironment(fn () => 'production')before the request. Asserts the hostname gate fires in a non-testing env. - Add a narrow config —
config('app.testing_host_bypass') === true— defaulting to false, explicitly set to true only in tests that need the bypass (gate tests in Task 1 set it to false). - Drop the bypass entirely and require all feature tests to set
HTTP_HOST=qa.rotatingroom.comwhen calling/testing/reset. Simplest — no production-code branch.
Preferred: Option 3 (simplest, no prod-code branch). Update Task 1's EnsureQaResetAllowed middleware to remove the testing bypass, and update any existing feature test that hits /testing/reset to set the Host header.
Commit: test(testing): adversarial gate-bypass probe; drop testing-env bypass
Files:
- Modify:
.claude/skills/qa-smoke/SKILL.mdsection 2 ("Clean previous state") - Modify:
scripts/smoke/run.sh(if exists) or inline in skill
Step 1: Add reset step to skill — QA-only, gated on $TARGET (Codex P1 finding).
/qa-smoke is a multi-environment entry point (qa, staging, production, local). Reset MUST NOT run against any env except QA. Hard-gate the reset call on $TARGET:
# Section 2.1: reset target database to baseline state — QA ONLY.
if [ "$TARGET" != "qa" ]; then
echo "Skipping reset: target is '$TARGET' (reset only runs against qa)."
echo " - staging/production: reset is blocked by multiple gates AND is catastrophic; never automate."
echo " - local: developer manages their own DB reset via 'php artisan migrate:fresh --seed --seeder=Database\\Seeders\\TestDatabaseSeeder --force'."
else
echo "Resetting QA database..."
RESET_START=$(date +%s)
# Capture curl's exit status IMMEDIATELY — any intervening command (date, echo, assignment)
# overwrites $?. Use a variable to preserve it.
if ! curl -sf -X POST "https://qa.rotatingroom.com/testing/reset" \
-H "X-Testing-Token: $QA_TESTING_TOKEN" \
-H "Content-Type: application/json" \
--max-time 60; then
echo "ERROR: Reset failed — aborting smoke run. Running against un-reset QA defeats the purpose."
exit 2
fi
RESET_END=$(date +%s)
RESET_DURATION=$(( RESET_END - RESET_START ))
echo "Reset complete in ${RESET_DURATION}s"
# Telemetry: log reset duration to the smoke dashboard (non-blocking)
curl -s -X POST "http://localhost:3456/api/smoke/reset-telemetry" \
-H 'Content-Type: application/json' \
-d "{\"env\":\"$TARGET\",\"durationSec\":$RESET_DURATION}" --max-time 5 2>/dev/null || true
fiNote on if ! curl pattern: the reset-failure check is intentionally the first thing that runs after curl returns. Any intervening commands (date, echo, assignment) would overwrite $?, making the subsequent [ $? -ne 0 ] test meaningless — it would check the last command's status, not curl's. This is the fail-open pattern Codex Round 3 flagged in the Round-2 draft of the plan.
423 Locked retry behavior (per Risk 14, v2 review): when /testing/reset returns 423 Locked (another operator's reset is in flight), the skill script retries up to 3 times with 30s backoff before aborting — the second operator's suite queues cleanly behind the first rather than failing immediately. Pseudocode (uses explicit status-code check rather than -sf, which has cross-version quirks around -w output when --fail triggers):
RESET_ATTEMPTS=0
MAX_ATTEMPTS=3
while true; do
HTTP_CODE=$(curl -s -o /tmp/reset-out.json -w '%{http_code}' -X POST \
"https://qa.rotatingroom.com/testing/reset" \
-H "X-Testing-Token: $QA_TESTING_TOKEN" \
-H "Content-Type: application/json" \
--max-time 60)
if [ "$HTTP_CODE" -ge 200 ] && [ "$HTTP_CODE" -lt 300 ]; then
break # success — 2xx
fi
if [ "$HTTP_CODE" = "423" ] && [ "$RESET_ATTEMPTS" -lt "$MAX_ATTEMPTS" ]; then
RESET_ATTEMPTS=$(( RESET_ATTEMPTS + 1 ))
echo "Reset in progress (another operator); waiting 30s (attempt $RESET_ATTEMPTS/$MAX_ATTEMPTS)..."
sleep 30
continue
fi
echo "ERROR: Reset failed with HTTP $HTTP_CODE — aborting."
cat /tmp/reset-out.json 2>/dev/null
exit 2
doneCommit: feat(qa-smoke): call reset via Playwright setup project; retry on 423, abort on other failures
Files:
- Create:
docs/handoffs/2026-04-XX-smoke-reset-canary.md(canary results — populated when Task 4a runs)
Steps:
pg_dumpcurrent QA DB to a localrotatingroom_qa_canarydatabase. Do NOT truncate QA.- Point a local Laravel instance at the canary DB (temporary
.env.canaryor connection override). - Run
/testing/reset20 times against the canary. Record: p50/p95/p99, any FK violations, any orphan rows, any Stripe quota warnings, Mailpit reachability. - If any iteration fails, fix before proceeding to Phase B Task 4.
Exit criterion: 20 clean iterations on canary; handoff doc published with metrics.
Commit: docs(smoke): local canary results for reset loop
Files:
- Create:
docs/handoffs/2026-04-XX-smoke-reset-cutover.md(populated with actual date when Phase B ships)
Before Task 4 lands, post in #qa:
"Heads up — starting [date], every smoke run resets the QA database to a clean seeded state. If you're doing manual UI testing on QA, please (a) finish your session before [time], or (b) ping this thread to request a reset-free window."
Track acknowledgements from Mahmoud + Megan before landing Task 4.
Commit: docs(qa): cut-over announcement for deterministic reset
Files:
- Modify:
.claude/skills/qa-smoke/SKILL.md(add pre-reset coordination step before reset call) - Create:
app/Http/Controllers/TestingController@activeSessionsendpoint (read-only — returns active Backpack admin sessions + recent login activity)
Why this is simpler than the original 4h window (per Megan's feedback + Gaurav direction): the original plan proposed an operator-settable QA_RESET_ALLOWED=0 window that auto-expires after 4 hours. Megan correctly pointed out that QA review sessions can bleed into meetings, support calls, or next-day follow-ups — a hard 4h cap creates false urgency. A scheduled auto-resume is also a new moving part to fail.
Simpler replacement: reset only runs when someone invokes /qa-smoke. No scheduled window, no auto-expire, no env-flag toggling. The skill adds a pre-reset coordination check:
# New step in qa-smoke skill, before calling /testing/reset:
ACTIVE=$(curl -sf -H "X-Testing-Token: $QA_TESTING_TOKEN" \
"https://qa.rotatingroom.com/testing/active-sessions")
if echo "$ACTIVE" | jq -e '.backpack_admin_active == true' >/dev/null; then
echo "⚠️ Active Backpack admin session on QA — likely someone doing manual review."
echo " Last activity: $(echo "$ACTIVE" | jq -r '.last_admin_activity')"
echo ""
echo "Options:"
echo " 1. Continue anyway (their session gets wiped)"
echo " 2. Abort and ping #qa to coordinate"
read -r -p "Choice [1/2]: " CHOICE
[ "$CHOICE" = "2" ] && { echo "Aborted. Coordinate in #qa."; exit 3; }
fiWhy this works: we already coordinate for any shared-resource action on QA. The pre-run check surfaces the conflict at the moment it matters (not via a timed window that may or may not still be active), gives the operator an explicit choice, and costs nothing when QA is idle (the fast path is the common path — backpack_admin_active == false).
The active-sessions endpoint: read-only, returns JSON {backpack_admin_active: bool, last_admin_activity: iso8601, recent_logins: [...]}. Gated by the same 3-layer stack as /testing/reset (Task 1) — same token, same hostname, same env flag. No destructive side effects, but the information it returns is sensitive, so gate it the same way.
Commit: feat(qa-smoke): pre-reset active-session coordination (replaces 4h window)
Replace adminUpdateUser(demoId, {is_fraudulent: 0}) cleanup call (currently a no-op) with a reliance on the reset contract — the fixture guarantees the exact starting state, and no cleanup is needed because the next run's reset handles it.
Commit: test(smoke): migrate spec 34 to use fraudFlaggedRoom fixture
Replace all seededDemo logins in tests/playwright/journeys/edu-*.spec.ts with eduUnverified / eduVerified / eduLocked as scenario demands. Remove test.skip() fallbacks that existed for the unknown-state case.
Commit: test(journeys): use .edu personas; remove silent-skip fallbacks
Spec 21's lockout test currently races the 5-failed-attempt counter; with eduLocked persona pre-seeded to failed_attempts=5, the test begins from locked state deterministically.
Commit: test(smoke): spec 21 uses eduLocked fixture for deterministic lockout
Spec 18 switches from ad-hoc payment-listing.json cleanup to relying on the reset contract. The reset contract comprises Task 1's /testing/reset endpoint + Task 0c's StripeTestReset helper (the endpoint calls the helper internally — see Task 1 Step 3 for the integration point). Remove savePaymentListing(null) workaround.
Commit: test(smoke): spec 18 relies on reset contract; remove null-marker workaround
Once specs rely on seeded fixtures instead of inheriting ephemeral listings, the ownership-guard helper probeValidRoomId in tests/playwright/utils/qa-helpers.ts:59 becomes redundant (it's the only such helper on master; getListingIdOwnedByCurrentLister and readOwnedListingId were added on branch tests/smoke-fixes-26-35-42-ownership-guard in PR #4283 — if that PR has merged by the time this task runs, include those in the deprecation too; if not, just probeValidRoomId). Mark @deprecated; specs 35/42/etc. switch to fixture-based room IDs from the persona catalog. Remove the helper(s) entirely once no specs import them (measured via CI grep).
Commit: refactor(smoke): deprecate ownership-guard; specs use seeded room fixtures
Shock-test deferrals (noted here, not blocking Phase A–C):
- Reset latency under real QA load is unmeasured. Task 0's 20-run local baseline is a floor. QA carries webhook traffic, Horizon queues, cron jobs — advisory lock contention + real Horizon row-locks could push p95 higher. Week 1 post-armament, Task 11's telemetry widget should flag if observed p95 > local p95 × 2 ("QA is significantly slower than predicted — investigate lock contention / Horizon drain").
- Automated state-drift detector beyond reviewer tagging. Success metric #1 relies on reviewers explicitly tagging
state-drift. An independent check — weekly cron that compares current QAuserscount against baseline (should be exactly 11 + any ephemeral users created since the most recent reset, all withcreated_at > last-reset-time) — catches drift that slips past reviewers. Deferred to post-launch as a Task 12 extension.
Status (per v2 review): Task 10 is thematically adjacent but NOT on the determinism critical path. The 7 safeguards it requires (dry-run arming, count/percentage caps, soft-delete inclusion, primary-DB read, weekly Slack summary, adversarial probes, DO Spaces versioning precondition) plus a 3-week arming runbook amount to a 1-2 day sub-project that would pull this plan's center of gravity toward Spaces cleanup.
Decision: spin Task 10 out into its own plan/issue, owned by whoever picks up the Spaces-orphan work. This plan ships Phases A–D without Task 10; the orphan-image cleanup work happens in parallel or after.
Rationale: Ahmed's Concern #6 raised the safeguards in the context of this plan, but the safeguards ARE the plan for Task 10 — they're the whole surface area. Keeping Task 10 here compounds scope; splitting it addresses the concern at the right level (its own review cycle) without dragging this plan.
Issue to file: "Weekly DO Spaces orphan-image pruner — production-safe rollout" (references this plan's Appendix A.6 for the state class and Ahmed's 7 safeguards). All design details previously under Task 10 move to that new plan.
Design details (retained here for reference until split issue lands):
Expand original Task 10 design
Files:
Files:
- Create:
app/Console/Commands/PruneOrphanListingImages.php - Create:
config/spaces_prune.php(dedicated config with hard bounds, never read from arbitrary env) - Modify:
routes/console.php(schedule weekly,onOneServer()+withoutOverlapping()) - Create:
tests/Feature/PruneOrphanListingImagesTest.php(adversarial — empty-room, replica-lag, soft-deleted scenarios)
Why this task needs production-grade safeguards (Ahmed Concern #6): a bug in the orphan-detection join (missed relation, read-replica lag, soft-deleted rooms not considered) could delete live listing images at scale. Unlike Task 0c's Stripe helper (which operates on Stripe test-mode — low blast radius), this command operates on production DO Spaces — real user photos, not a test sandbox. The destructive-action surface is large enough that the command needs layered defenses before it's trusted to run unattended.
Seven safeguards:
-
Dry-run by default.
SPACES_PRUNE_EXECUTEenv defaults tofalse. Without the explicit opt-in, the command logs "WOULD DELETE: N objects" but calls no delete APIs. First 3 runs must be dry-run; the Slack summary shows what it would have deleted and an operator reviews before arming. Documented indocs/handoffs/spaces-prune-armament.md. -
Hard deletion cap per run.
config('spaces_prune.max_deletions_per_run')= 100 (compile-time default, not env-driven). If the orphan count exceeds the cap, the command aborts and posts#qawith "REFUSED: N orphans detected, cap is M. Investigate before raising the cap." Legitimate orphan volume from a healthy smoke cycle is <20/week. -
Percentage cap. Refuse to delete more than 5% of total objects under
listings/*in a single run. Catches the failure mode where the room set is empty (e.g., test database accidentally used as source) and everything looks orphaned. Both the count cap AND the percentage cap must be under their limits; either alone is insufficient. -
Include soft-deleted rooms in the "still in use" set.
Room::withTrashed()->pluck('id')— a soft-deleted listing may be restored and still need its images. An orphan is ONLY an S3 object whose room ID (parsed from path prefix) isn't in the combined active + soft-deleted set. -
Force primary-DB read. Wrap the room-ID query in
DB::connection('pgsql')->useWriteConnection()(or equivalent). Replica lag of even a few seconds could make freshly-created rooms look orphaned on the replica while their images are already in S3. -
Weekly
#qasummary. After every run (dry OR armed), post: "Spaces prune: N orphans detected, M deleted, K bytes freed, {dry-run|armed}." Silent destructive jobs rot — visibility keeps them honest. If no summary posts, something's wrong with the scheduled run. -
Adversarial probe test.
PruneOrphanListingImagesTestcovers:test_aborts_when_rooms_table_is_empty— refuses to treat empty-table as "all orphaned"test_includes_soft_deleted_rooms_in_still_in_use_set— does not delete images for soft-deleted listingstest_count_cap_halts_at_threshold— at 101 orphans with cap=100, aborts + posts to Slacktest_percentage_cap_halts_at_threshold— at 5.1% oflistings/*orphaned, abortstest_dry_run_makes_no_delete_calls— asserts zero S3 delete calls whenSPACES_PRUNE_EXECUTE=falsetest_posts_slack_summary_every_run— both dry-run and armed runs post summaries
Arming sequence (runbook, not code):
- Week 1: ships with
SPACES_PRUNE_EXECUTE=false. Scheduled run posts weekly summary. Operator reviews the "would delete" counts for 3 consecutive weeks. - Week 4+: if summaries are sane (small orphan counts matching expected smoke churn, no false positives for soft-deleted rooms), operator sets
SPACES_PRUNE_EXECUTE=true. Summary continues weekly. - Any week the count exceeds the cap: command aborts +
#qaalert. Operator investigates before next run.
Rollback: SPACES_PRUNE_EXECUTE=false instantly reverts to dry-run. No data recovery needed in normal operation. If catastrophic delete ever happens (shouldn't — count+percentage caps prevent it): DO Spaces supports versioning + 30-day recovery if versioning was enabled. Arming precondition: verify DO Spaces versioning is enabled on the bucket before switching to SPACES_PRUNE_EXECUTE=true.
Commit: feat(cleanup): weekly Spaces orphan pruner with dry-run, count cap, percentage cap, and weekly Slack summary
Files:
- Modify:
~/dev-server/api/smoke.js(add/api/smoke/reset-telemetryendpoint) - Modify:
tests/playwright/qa-smoke/dashboard-golden.html(p95 widget + state-drift counter) - Modify:
.claude/skills/smoke-feedback-review/SKILL.md(addstate-driftdisposition) - Modify: the smoke-feedback dashboard UI (
~/dev-server/sites/smoketest/...or labs equivalent) to accept the new tag
Deliverable 1 — reset telemetry: smoke runner POSTs {durationSec, env, runKey} after each reset; widget shows p50/p95/p99 over last 10 runs; alert posts to #qa if p95 > 30s.
Deliverable 2 — state-drift disposition: add a new disposition option alongside existing confirmed / test-needs-fix / not-a-bug. Reviewer documentation: "Use state-drift when the finding exists because prior-run state leaked into this run (not a test-logic bug, not an app bug). Examples: demo user still fraud-flagged from spec 34; listing.json owned by a prior run's ephemeral lister; subscription left past_due from spec 18." Dashboard counts state-drift per run and exposes the metric consumed by Success Metric #1.
Deliverable 3 — historical backfill: Task 11 retrospectively codes the last 4 smoke audits (2026-04-07, 2026-04-14, 2026-04-21, plus one post-Phase-A run) with dispositions so the baseline has a distribution, not a single point. Adds the distribution to docs/handoffs/2026-04-24-smoke-reset-baseline.md.
Commit: feat(smoke-dashboard): reset-p95 widget + state-drift disposition + historical baseline
Add to /post-deploy-verify skill: a monthly check of access logs for /testing/* endpoints against staging/prod hostnames. Any hit → P1 alert.
Commit: chore(audit): monthly gate-audit for testing endpoints
Phase A (infra) — Tasks 0, 0b, 0c, 1, 2, 3
↓ (measurement → CI checks → Stripe helper → gate → personas → gate-probe)
↓ Task 0 delivers before Task 1 starts (metric baselines)
↓ PR #3722 (pg_dump backup) MUST land before Phase B (catastrophic-loss rollback precondition)
Phase B (runner) — Tasks 4, 4b, 4c
↓ (Playwright setup project + cut-over announcement + reset-free-window escape hatch)
Phase C (specs) — Tasks 5, 6, 7, 8, 9 (can parallelize; each a separate PR)
↓ (specs rely on new determinism)
Phase D (observability) — Tasks 11, 12 (can run concurrently with Phase C; Task 10 split into a separate follow-up plan — see Task 10 placeholder)
Phase E (follow-up, conditional) — nightly snapshot approach from Alternative 7
↓ (triggered if Phase A-D reset p95 exceeds 30s for 2 consecutive weeks)
- Feature branch:
feature/smoke-deterministic-reset(from master) - Sub-branches:
feature/smoke-deterministic-reset/task-0-baseline— Phase A Task 0 (measurement day, docs-only)feature/smoke-deterministic-reset/task-0b-ci— Phase A Task 0b (seeder-migration CI)feature/smoke-deterministic-reset/task-0c-stripe— Phase A Task 0c (Stripe reset helper)feature/smoke-deterministic-reset/task-1-gate— Phase A Task 1 (triple-gate + CSRF + mutex)feature/smoke-deterministic-reset/task-2-personas— Phase A Task 2 (persona catalog)feature/smoke-deterministic-reset/task-3-probe— Phase A Task 3 (gate-bypass adversarial probe)feature/smoke-deterministic-reset/task-4-runner— Phase B Task 4 (Playwright setup project)feature/smoke-deterministic-reset/task-4b-cutover— Phase B Task 4b (announcement)feature/smoke-deterministic-reset/task-4c-window— Phase B Task 4c (reset-free window)- Phase C sub-branches: one per spec cluster (5-9)
- Phase D sub-branches: one per observability task (11-12; Task 10 shipped via its own follow-up plan on a separate branch)
Merge order (hard preconditions):
- Task 0 (baseline) must land before Task 1 — metrics need real numbers.
- PR #3722 (pg_dump backup) must land before Phase B — catastrophic-loss rollback.
- Task 0c (Stripe) merges independently of gate work; no ordering constraint with Task 1.
- Tasks 1, 2, 3 must all pass CI before Phase B lands.
- Task 4b (cut-over announcement) posts ≥24h before Task 4 merges.
- Phase C can start once Phase B is on staging and one clean run observed.
Orchestrator: main session on feature/smoke-deterministic-reset manages sub-branch merges.
Each sub-branch passes /iterative-review before merging into the feature branch. Feature branch passes one final /ready-for-review cycle before PR to master.
Confirmed review.
A lot of the current smoke tests I'm re-testing are a result of the QA environment constantly changing, so this in theory will help reduce the extra noise added by those false positives.
Only slight concern- ability to pause the 4h reset - if we're in the middle of a manual review. Not all of our QA time frames align, and can expand over meetings, support calls, bleed into next day. But I think mostly doable.