Skip to content

Instantly share code, notes, and snippets.

@sibljon
Last active April 29, 2026 23:13
Show Gist options
  • Select an option

  • Save sibljon/bfd93706bd215806458f89686527bb2a to your computer and use it in GitHub Desktop.

Select an option

Save sibljon/bfd93706bd215806458f89686527bb2a to your computer and use it in GitHub Desktop.
Spec: Reporting Attribution from a Server-Side Caller (Spruce associateAttribution mutation)
<?php
/**
* Spruce attribution reporting for l.sprucehealth.com.
*
* Wired into the click-resolution flow so each click on a pre-blessed link
* lands a row in the Spruce `attributions` Looker model.
*
* Spec: https://gist.github.com/sibljon/bfd93706bd215806458f89686527bb2a
*/
const SPRUCE_ATTRIBUTION_ENDPOINT = 'https://msg-api.sprucehealth.com/graphql';
const SPRUCE_ATTRIBUTION_USER_AGENT = 'l-sprucehealth-com/1.0';
const SPRUCE_ATTRIBUTION_TIMEOUT_SECONDS = 5;
const SPRUCE_ATTRIBUTION_QUERY = <<<'GQL'
mutation AssociateAttribution($input: AssociateAttributionInput!) {
associateAttribution(input: $input) {
success
errorCode
errorMessage
}
}
GQL;
/**
* POST associateAttribution to msg-api, then relay any Set-Cookie response
* headers back to the user (so the backend's `did` cookie minting works).
*
* Failures are logged and swallowed. Never throws. Never lets an attribution
* problem affect the user-facing redirect.
*
* Call this BEFORE you call `header('Location: ...')` if the user has no
* `did` cookie inbound (so we can relay the freshly-minted Set-Cookie).
* Call it AFTER `fastcgi_finish_request()` if the user already has `did`
* (fire-and-forget, no Set-Cookie needed).
*
* @param string $requestUrl Full URL the user hit, e.g. "https://l.sprucehealth.com/x123?utm_source=newsletter".
* @param string $cookieHeader Raw inbound Cookie header ($_SERVER['HTTP_COOKIE'] ?? '').
*/
function spruceReportAttribution(string $requestUrl, string $cookieHeader): void {
try {
$parsed = parse_url($requestUrl);
if (!is_array($parsed) || empty($parsed['host'])) {
error_log('spruceReportAttribution: unparseable requestUrl');
return;
}
$hostname = $parsed['host'];
$pathname = $parsed['path'] ?? '/';
$scheme = $parsed['scheme'] ?? 'https';
$urlValue = $scheme . '://' . $hostname . $pathname;
$rawQuery = $parsed['query'] ?? '';
// Build values: synthetic `url` first, then one entry per inbound query
// param. Split the raw query string ourselves rather than using
// parse_str(), which collapses repeated keys (the spec requires us to
// emit one entry per occurrence).
$values = [['key' => 'url', 'value' => $urlValue]];
if ($rawQuery !== '') {
foreach (explode('&', $rawQuery) as $pair) {
if ($pair === '') {
continue;
}
$eq = strpos($pair, '=');
if ($eq === false) {
$values[] = ['key' => urldecode($pair), 'value' => ''];
} else {
$values[] = [
'key' => urldecode(substr($pair, 0, $eq)),
'value' => urldecode(substr($pair, $eq + 1)),
];
}
}
}
// JSON_INVALID_UTF8_SUBSTITUTE replaces invalid UTF-8 byte sequences
// (e.g. a query value of "%FF" decoded to raw byte 0xFF) with U+FFFD
// rather than failing the whole encode and dropping the row.
$body = json_encode([
'operationName' => 'associateAttribution',
'query' => SPRUCE_ATTRIBUTION_QUERY,
'variables' => [
'input' => [
'values' => $values,
'origin' => $hostname,
'originDetails' => $pathname,
],
],
], JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE);
if ($body === false) {
error_log('spruceReportAttribution: json_encode failed: ' . json_last_error_msg());
return;
}
$headers = [
'Content-Type: application/json',
'User-Agent: ' . SPRUCE_ATTRIBUTION_USER_AGENT,
];
if ($cookieHeader !== '') {
$headers[] = 'Cookie: ' . $cookieHeader;
}
$ch = curl_init(SPRUCE_ATTRIBUTION_ENDPOINT);
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $body,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_HEADER => true, // include response headers in output so we can read Set-Cookie
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => SPRUCE_ATTRIBUTION_TIMEOUT_SECONDS,
CURLOPT_CONNECTTIMEOUT => SPRUCE_ATTRIBUTION_TIMEOUT_SECONDS,
CURLOPT_FOLLOWLOCATION => false,
]);
$response = curl_exec($ch);
$httpStatus = (int) curl_getinfo($ch, CURLINFO_HTTP_CODE);
$headerSize = (int) curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$curlErr = curl_error($ch);
curl_close($ch);
if ($response === false) {
error_log(sprintf(
'spruceReportAttribution: transport error origin=%s details=%s err=%s',
$hostname, $pathname, $curlErr
));
return;
}
$rawHeaders = (string) substr($response, 0, $headerSize);
$rawBody = (string) substr($response, $headerSize);
if ($httpStatus < 200 || $httpStatus >= 300) {
error_log(sprintf(
'spruceReportAttribution: http %d origin=%s details=%s body=%s',
$httpStatus, $hostname, $pathname, substr($rawBody, 0, 500)
));
// Don't relay Set-Cookie on a non-2xx response — those headers may
// be from an error path we don't trust.
return;
}
// Relay Set-Cookie response headers back to the user, but only if
// we haven't already sent the user-facing response. In fire-and-forget
// mode (after fastcgi_finish_request), headers_sent() will be true and
// these calls are silently no-ops.
if (!headers_sent()) {
foreach (preg_split('/\r?\n/', $rawHeaders) as $line) {
if (stripos($line, 'set-cookie:') === 0) {
header($line, false);
}
}
}
// Log GraphQL-layer failures (HTTP 200 with errors[] or success=false).
$json = json_decode($rawBody, true);
if (is_array($json)) {
if (!empty($json['errors'])) {
error_log(sprintf(
'spruceReportAttribution: graphql errors origin=%s details=%s errors=%s',
$hostname, $pathname, json_encode($json['errors'], JSON_UNESCAPED_SLASHES)
));
return;
}
$payload = $json['data']['associateAttribution'] ?? null;
if (is_array($payload) && ($payload['success'] ?? null) !== true) {
error_log(sprintf(
'spruceReportAttribution: success=false origin=%s details=%s code=%s message=%s',
$hostname, $pathname,
$payload['errorCode'] ?? '',
$payload['errorMessage'] ?? ''
));
}
}
} catch (\Throwable $t) {
// Never let attribution problems affect the redirect.
error_log('spruceReportAttribution: exception ' . $t->getMessage());
}
}
/* --------------------------------------------------------------------------
* Usage in the click-resolution handler.
*
* Adapt to your existing l.sprucehealth.com routing — this is the shape, not
* a drop-in entry point.
* -------------------------------------------------------------------------- */
// $destination = lookupPreBlessedLink($_SERVER['REQUEST_URI']); // your existing logic
// if ($destination === null) {
// http_response_code(404);
// exit;
// }
// // Hardcode the host rather than using $_SERVER['HTTP_HOST'], which is
// // attacker-controllable and would let a spoofed Host: header poison the
// // attribution row's `origin` and `originDetails` fields.
// $requestUrl = 'https://l.sprucehealth.com' . $_SERVER['REQUEST_URI'];
// $cookieHeader = $_SERVER['HTTP_COOKIE'] ?? '';
// $hasDid = isset($_COOKIE['did']) && $_COOKIE['did'] !== '';
// if ($hasDid) {
// // Repeat click. Send the redirect first, then attribute in the
// // background so the user gets to their destination immediately.
// //
// // NOTE: after fastcgi_finish_request(), error_log() writes only land if
// // php.ini's `error_log` directive points at a file path. The default
// // SAPI logger is detached once the request is finalized, so failure logs
// // from this background task would be silently dropped. Verify ops has
// // `error_log = /var/log/php-attribution.log` (or similar) configured.
// header('Location: ' . $destination, true, 302);
// if (function_exists('fastcgi_finish_request')) {
// fastcgi_finish_request();
// }
// ignore_user_abort(true);
// spruceReportAttribution($requestUrl, $cookieHeader);
// exit;
// }
// // First click. The backend will mint a `did` cookie and Set-Cookie it; we
// // need to relay that header to the user before redirecting, otherwise this
// // first click is unattributable. Adds ~50–200 ms to the first click only.
// spruceReportAttribution($requestUrl, $cookieHeader);
// header('Location: ' . $destination, true, 302);
// exit;

Spec: Reporting Attribution from a Server-Side Caller

Audience: an implementing agent (or human) wiring up a server-side caller that needs to report attribution data to Spruce's backend. The first concrete use case is l.sprucehealth.com (the existing pre-blessed link shortener), but the spec is caller-agnostic — any backend service that handles a user's HTTP request and wants to record attribution can use it.

This spec covers the outbound call to associateAttribution only. The caller decides when to fire it (on a redirect, a page load, a click handler), what values to include, and how to handle its own user-facing response. This spec describes the contract on the wire.

What associateAttribution does

It records a row of attribution data keyed by a per-device identifier. Each row is a bag of {key, value} pairs (UTM params, referrer, partner promo codes, custom keys, etc.) plus an origin (hostname) and originDetails (path). Rows feed the Looker attributions model used to measure campaign performance.

The mutation is unauthenticated. The only identity signal is a per-device cookie called did. The backend handles cookie minting itself (see "Device-ID handling" below), so the caller just needs to relay cookies in both directions.

Endpoint

POST https://msg-api.sprucehealth.com/graphql
Content-Type: application/json

There's only one environment for this caller: production. (Spruce has dev/staging variants of msg-api, but they aren't relevant to a server-side caller running in production against the production attribution system.)

This is a server-to-server call. CORS does not apply — CORS is a browser-side enforcement mechanism, and your handler is a server. Don't set, read, or worry about Origin, Access-Control-Allow-*, preflight OPTIONS, or credentials headers.

Device-ID handling — the load-bearing part

The Spruce attribution system is keyed on a per-device cookie called did. The backend already knows how to mint and rotate this cookie. Your caller's job is just to be a faithful proxy for the cookie in both directions.

1. On the inbound user request, capture the Cookie header (everything, not just `did`).

2. When making the outbound POST to msg-api, attach those cookies as the
   Cookie request header.

3. On msg-api's response, capture every Set-Cookie response header.

4. On YOUR response back to the user, attach those Set-Cookie headers
   verbatim.

That's it. The backend (sprucehealth/backend/device/headers.go's ExtractSpruceHeaders) covers all three cases:

  • User has a did cookie: backend reads it from the forwarded Cookie header, no Set-Cookie is sent back. Your relay is a no-op on the response side.
  • User has no did cookie (first click): backend mints an opaque ~22-character token (16 random bytes, URL-safe base64-encoded) and emits Set-Cookie: did=<token>; Domain=sprucehealth.com; Path=/; Secure; HttpOnly; SameSite=Lax; Max-Age=315360000. Your caller forwards that header verbatim to the user, who is now bound to a device ID for ~10 years across all *.sprucehealth.com properties.
  • User has an S-Device-ID header (e.g. native apps): backend reads it directly. Doesn't apply to web callers like l.sprucehealth.com, but mentioned here for completeness.

Don't mint device IDs in your caller code. Don't Set-Cookie did from your caller. Don't transform or filter the backend's Set-Cookie headers — just relay them. Doing your own minting fragments device IDs across services and breaks attribution stitching.

If your runtime's HTTP-client library or web framework strips Set-Cookie from responses by default (some do), make sure you've explicitly opted out of that.

Cookie forwarding (inbound)

Forward all inbound cookies from the user's request to msg-api as a Cookie header on the outbound request. Don't filter to just did. Other cookies (e.g. _fbc, _fbp for Meta Ads attribution) are needed for future analytics paths the backend may add.

Recommended outbound headers

  • Content-Type: application/json (required)
  • User-Agent: <your-service-name>/<version> — set explicitly so attribution rows are recognizable (e.g. l-sprucehealth-com/1.0). The backend records this on the row.

Body

{
  "operationName": "associateAttribution",
  "query": "mutation AssociateAttribution($input: AssociateAttributionInput!) { associateAttribution(input: $input) { success errorCode errorMessage } }",
  "variables": {
    "input": {
      "values": [
        {"key": "url",          "value": "<origin + pathname of the URL the user hit, no query string>"},
        {"key": "utm_source",   "value": "<...>"},
        {"key": "utm_medium",   "value": "<...>"},
        {"key": "utm_campaign", "value": "<...>"}
      ],
      "origin":        "<hostname of the URL the user hit>",
      "originDetails": "<pathname of the URL the user hit>"
    }
  }
}

The selection set is a deliberate subset of what marketing-website's attribution.js selects today (which also pulls attributionSuccessModal { ... }). Server-side callers don't render that UI, so omit it.

values rules

  • One entry per inbound query parameter the user sent. Include all of them; the backend filters and remaps as needed.
  • Include a synthetic url entry: <origin><pathname> of the URL the user actually hit on your service, with no query string. This mirrors what the browser-side code sends and is the convention the backend's analytics pipeline expects.
  • If your caller has additional context worth recording (e.g. for l.sprucehealth.com, the pre-blessed-link's resolved destination URL), add it as another {key, value} entry. Pick a stable key name and document it in your caller's code.
  • If a key appears multiple times in the query string, include each occurrence as a separate values entry. Don't dedupe, don't comma-join.
  • Keys and values are forwarded as the user sent them, decoded: ?utm_source=email%20blast becomes {"key": "utm_source", "value": "email blast"}.
  • Don't filter or rename keys. The backend handles:
    • Synonym mapping for browser privacy modes that strip utm_*: uca → utm_campaign, ume → utm_medium, uso → utm_source, uco → utm_content, ute → utm_term.
    • OAuth-leakage guards: state and code are stripped server-side.
    • fbclid → fbc auto-creation: if you send fbclid and not fbc, the backend generates fbc.

origin / originDetails

  • origin is the hostname of the URL the user hit on your service (e.g. l.sprucehealth.com).
  • originDetails is the pathname of the same URL (e.g. /x123).
  • Don't include the query string in either field; the query params are already in values.
  • If you leave either field empty, the backend falls back to header-derived defaults (Platform, DeviceType). Don't rely on this; set them explicitly.

Backend supplements (don't double-send)

The resolver automatically appends platform, os, ip_address, user_agent, organization_id from request context and headers before persisting. As long as your outbound HTTP client sends a sensible User-Agent and the user's IP is forwarded (CloudFront does this via X-Forwarded-For), those fields populate themselves. You don't need to put them in values.

Error handling

  • Timeout the outbound call at ~5 seconds. A hung msg-api should not block the user's response.
  • Don't fail the user's request because attribution failed. Log the error, swallow it, and continue. The user gets their redirect / page / response either way.
  • Network error, non-2xx HTTP, or data.associateAttribution.success === false all count as failure. Log all three with enough detail to debug (status code, error body excerpt, the originDetails you sent — but not raw cookie values or PII).

Timing

If your runtime is a long-running server (Go, PHP-FPM, Node SSR, etc.), fire the call as a background task without awaiting it before returning the user-facing response. The user gets their redirect / page immediately; the attribution call completes ~50–200 ms later.

If you do this, make sure you've already captured the relevant inbound state (the cookies, the URL, the query params) into local variables before kicking off the background task — don't reach back into a request object that may be torn down once your handler returns. Acceptable risk for either timing mode: if the process is killed in the small in-flight window, that one call's row is lost.

If your runtime is serverless (Lambda, Cloudflare Worker without event.waitUntil, etc.) or otherwise can't keep an outgoing request alive past the response, await the call (with the 5 s timeout above) before returning. The latency cost is acceptable as a fallback.

Set-Cookie pass-through note for fire-and-forget mode: if you fire the attribution call after returning the user's response, you can't relay the response's Set-Cookie to the user — they've already gotten their bytes. This is fine for repeat clicks (the user already has did) but means first-click users won't get their cookie set on that first click. Two ways to handle:

  1. Accept it. The user will hit a Spruce property again at some point (sprucehealth.com, app.sprucehealth.com, help.sprucehealth.com, the next l.sprucehealth.com link), and that property will set the did cookie. The first click's attribution row gets a freshly-minted backend-side device ID that won't be linked to subsequent clicks. Some loss of stitching, no broken UX.
  2. Await on first click only. Detect "no did cookie inbound", await the attribution call in that case so you can capture the Set-Cookie, and fire-and-forget for repeat clicks. Adds ~200 ms to the first click only.

Document which mode you chose in the code, near the call site.

Operational expectations

  • Logging: every call logs the originDetails, the count of values, and whether the call succeeded. Don't log raw cookie values, raw URLs (those may carry PII or campaign secrets), or the contents of values.
  • Metrics: count of calls, success rate, p50/p99 latency. Alert on success rate dropping below ~95 % over a 15-minute window.
  • No retries. If the call fails, log it and move on. Retrying creates duplicate rows; the analytics pipeline is more sensitive to that than to occasional missed rows.

Acceptance criteria

A correct implementation:

  • Includes one values entry per inbound query parameter.
  • Includes a synthetic url value: <origin><pathname> of the URL the user hit, with no query string.
  • Produces multiple values entries when the same query-string key appears more than once (no deduping or comma-joining).
  • Sets origin to the hostname of the URL the user hit and originDetails to its pathname.
  • Forwards the inbound Cookie header verbatim on the outbound request.
  • Forwards every Set-Cookie header from msg-api's response verbatim on the response back to the user (when its own response hasn't already been sent).
  • Does not fail or surface an error to the user when the attribution call fails (timeout, network error, non-2xx, or data.associateAttribution.success === false).
  • Logs every failure with enough context to debug (status code, the originDetails, an error excerpt — never raw cookie values, full URLs, or values contents).

Out of scope

  • The user-facing behavior of the calling service (e.g. how l.sprucehealth.com resolves a pre-blessed link to a destination, or what response shape it returns). That belongs in the caller's own spec.
  • Authenticated-user attribution. The mutation is public; did is the only identity signal at this layer.
  • Rate limiting on the caller side. Spruce's edge handles abuse. If your caller is high-volume, add your own protections.
  • Idempotency. Do not retry. The mutation is not designed to be idempotent and retried calls would create duplicate rows.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment