Status: Proposed. Findings from data pipeline investigation during V1 instrumentation work.
Companion addendum: Technical Approach Addendum (unified alerts, Snowflake integration path, CloudFront geo root cause)
Builds on: KPI Instrumentation Analysis which covers event model, state machines, funnel, and KPI definitions.
Deep-dive: Data Pipeline Investigation (full lineage traces, VGW data platform inventory, root cause analysis)
Roadmap: Rollout Observability Roadmap (phased ticket breakdown)
Geo events need to exist in two systems. They serve different audiences asking different questions at different timescales.
| Question | Who asks | Timescale |
|---|---|---|
| Is the verification failure rate spiking? | On-call engineer | Minutes |
| Is GeoComply SDK latency degrading? | On-call engineer | Minutes |
| Are users being signed out mid-session? | On-call engineer | Hours |
| Is the canary group reaching the lobby? | Feature owner | Hours |
| Is geo causing more friction than control? | Feature owner | Hours |
Rate/count/quantile queries over sliding time windows. Triggers alerts. Aggregate rates are sufficient.
Status: DONE. 9 events instrumented, 29-panel dashboard, 6 alert rules, P50/P95 latency tracking.
| Question | Who asks | Timescale | Why Loki can't answer it |
|---|---|---|---|
| D7 retention for players whose first session included geo? | Product manager | Weeks | Requires joining first event with session 7 days later |
| How many unique players failed verification this week? | Product manager | Days | Loki counts events, not distinct players |
| Did a specific player complete all funnel steps? | Support engineer | Per-session | Requires joining events by sessionId |
| What % of new signups never reach the lobby? | Growth team | Weeks | Requires correlating registration with absence of lobby events |
| Does geo friction affect high-value players differently? | Analytics team | Months | Requires joining geo events with player value segment |
| Geo completion rate by US state? | Compliance/legal | Months | No GROUP BY with distinct counts in LogQL |
| Are canary group players depositing less than control? | Product manager | Weeks | Requires joining geo assignment with transaction data |
SQL queries with joins, window functions, distinct counts, cross-domain data. Per-player precision required.
Status: NOT YET BUILT. Data pipeline investigation complete. Integration path identified.
- Loki answers: "What's happening?" (aggregate, real-time, operational)
- Snowflake answers: "What happened, to whom, and what was the impact?" (per-player, historical, analytical)
The same event (e.g., a verification failure) needs to exist in both. In Loki it increments a counter on a dashboard. In Snowflake it's a row tied to a specific player that can be joined with their registration date, deposit history, and future sessions to determine whether that failure caused them to churn.
Every metric today is a blended average. A 15% verification failure rate could mean 40% of new players fail (catastrophic) and 5% of returning players fail (normal). When the canary scales from 5% to 100%, 95% of active players hit geo for the first time. Without segmentation, dashboards become unreadable.
Two distinct problems were being conflated:
Geo-specific (belongs in geo instrumentation, queryable in Loki):
| Dimension | Type | Description |
|---|---|---|
isFirstGeoVerification |
boolean | Has this player ever completed geo verification? |
geoVerifyCount |
integer | How many successful verifications? |
verificationSequence |
integer | Nth attempt within this session (fresh vs retry) |
Source: Server-side flag in pok-user (we own it).
Platform-wide (belongs in LogManager or platform analytics, queryable in Snowflake):
| Dimension | Why it's not geo-specific |
|---|---|
playerTenureDays |
Useful on every event in the system |
isNewAccount |
Same |
daysSinceLastVisit |
Session-level attribute |
| D1/D7/D30 retention | Cross-session, cross-day correlation |
| Unique player counts | Requires COUNT(DISTINCT) |
| Cohort analysis | Requires grouping by first-event date |
These require SQL (Snowflake), not LogQL (Loki).
- First-time players have lower friction tolerance (~50-80 units) vs returning players (~150-300 units). A verification failure that barely registers for a returning player is budget-destroying for a new one.
- Habituation to the geo step completes around visit 8-15. Track
geoVerifyCountto verify empirically. - The canary-to-100% transition creates a specific cohort: loyal players encountering new friction. Loss aversion predicts they'll react more negatively than brand-new players who never knew a frictionless flow.
Browser -> LogManager -> /log endpoint -> stdout -> CloudWatch -> Firehose -> Loki
Geo analytics events travel this path. Dashboards and alerts query Loki.
PostgreSQL Event Stores -> ECS Projectors -> S3 DLZ Buckets -> Snowpipe -> Snowflake
Five projector services run this pattern today:
| Projector | Source | S3 DLZ | Events |
|---|---|---|---|
| user-eventstore-snowflake-pm | aurora-pg-user | customer-dlz/user-eventstore/ | Registration, login, identity |
| game-eventstore-snowflake-pm | aurora-pg-game | game-dlz/v2_casino/ | Casino/slots |
| cdd-eventstore-snowflake | aurora-pg-cdd | customer-dlz/cdd-eventstore/ | KYC/AML |
| store-eventstore-snowflake-pm | aurora-pg-store | transaction-dlz/v2_store/ | Purchases |
| player-account-snowflake-pm | player-account DB | player-account-dlz/player-items/ | Wallet, items |
Additionally, 7 Kafka topics feed Snowflake via a Kafka Connect connector (connect-pok-events-{env}).
Geo events exist only in Loki. They do not reach Snowflake. The IDENTITY_LOGIN table in Snowflake has 5 geo columns (GEO_LOCATION_COUNTRY_CODE, GEO_LOCATION_SUBDIVISION_CODE, GEO_LOCATION_PROVIDER_NAME, GEO_LOCATION_PROVIDER_VERSION, GEO_LOCATION_SOURCE_TYPE) but all are hardcoded NULL. The schema was designed anticipating geo data that never arrived.
Three geo sources exist at login time. None reach Snowflake:
| Source | Available where | Why it's not in Snowflake |
|---|---|---|
| CloudFront headers (viewer_country, viewer_country_region) | Rendered into page globals, sent to Auth0 as query params, persisted to user_metadata.cookies, embedded in JWT | Filtered out by post-audit-login-event.ts cookie allow-list |
| Auth0 geoip (event.request.geoip) | Available in all Auth0 actions | Never read by any action |
| Raw IP (event.request.ip) | Sent in audit POST body | Stored but not geo-resolved |
The CloudFront geo data is captured, flows through Auth0, gets embedded in the JWT (used as jwtGeo in Loki events), but is filtered out by a cookie allow-list before reaching the user event store. A one-line fix to the allow-list would start flowing IP-based geo to Snowflake for all users.
| Table | Key Fields | Use |
|---|---|---|
| CLEANSED.USER_EVENTSTORE_LOGGED_IN | email, platform, time, IP, authId, userAgent | Login events |
| CURATED.CUSTOMER_ATTRIBUTES_OUTPUT | registrationDate, lastLogInDate, valueSegmentTier | Player tenure |
| CURATED.ACCOUNT_ACTIVITY_SUMMARY | firstLoginDate, lastLoginDate, firstPlayDate | Retention fields |
| CURATED.IDENTITY_LOGIN | geo columns (all NULL) | Placeholder for geo data |
Retention baselines can be established from existing data before the canary scales.
- Merge game client instrumentation PRs (Loki path)
- Merge dashboard + alerts PR (Grafana)
- Fix CloudFront geo allow-list in pok-auth0 (one-line change, establishes Snowflake baseline)
- Add geo fields to LOGGED_IN event in pok-user
- Validate dashboards with real traffic, tune alert thresholds
- Add
isFirstGeoVerificationserver-side flag - Segmented Loki dashboard panels (first-time vs repeat)
- New
GEO_VERIFICATION_COMPLETEDevent type in pok-user event store - Game client calls pok-user endpoint after verification (parallel to LogManager)
- Snowflake ingestion via existing pipeline (projector -> S3 -> Snowpipe)
- Retention baseline queries in Snowflake
geoRolloutPhasedimension ("canary_5", "ga")- Pre-populate first-geo flags for canary-period verifiers
- First-time user funnel dashboard with separate alert thresholds
All components are owned by our team except Snowflake task updates:
| Component | Owner |
|---|---|
| gp-game-client, pok-user, pok-auth0, pok-infra | Our team |
| pok-snowflake (IDENTITY_LOGIN task, new tables) | Data Engineering |