Skip to content

Instantly share code, notes, and snippets.

@xpe
Created April 22, 2026 20:53
Show Gist options
  • Select an option

  • Save xpe/654af2731d40a145e1d0b8b694fe8fd3 to your computer and use it in GitHub Desktop.

Select an option

Save xpe/654af2731d40a145e1d0b8b694fe8fd3 to your computer and use it in GitHub Desktop.
A map of privacy-preserving telemetry, from randomized response to OHTTP

Generated by Claude Opus 4.7 on April 22, 2026.

A map of privacy-preserving telemetry, from randomized response to OHTTP

When a vendor says "we don't collect any personally identifiable information," what did they actually build? The honest answer, across two decades of deployed systems, ranges from "nothing at all, they just stripped the obvious identifiers" to "a multi-party cryptographic protocol with published epsilon budgets and open-source code running in trusted enclaves." This survey is a field guide to the territory between those extremes — the deployed systems you will run into by name, and the formal guarantee types that those systems do or do not meet.

Three things make this landscape hard for a newcomer. First, the same marketing words — "anonymous," "aggregated," "privacy-preserving" — get used for mechanisms that differ by many orders of magnitude in strength. Second, the strongest guarantees are conditional on assumptions (non-collusion, trusted hardware, crypto hardness, client honesty) that are rarely highlighted by vendors. Third, the gap between the headline epsilon in an academic paper and the epsilon that ships per-user-per-day in a real product has repeatedly been embarrassing — most famously for Apple's local differential privacy, where reverse engineering found a per-day privacy budget an order of magnitude looser than academic convention, and effectively unbounded over a user's lifetime.

The goal here is to leave you with a map, a vocabulary, and a few reliable heuristics for parsing vendor claims — not a deep dive into any one system.

The vocabulary problem: Pfitzmann-Hansen as a decoder ring

The canonical technical vocabulary for privacy-by-data-minimization comes from Andreas Pfitzmann and Marit Hansen's "A Terminology for Talking About Privacy by Data Minimization" (v0.34, August 2010, TU Dresden). Its core distinctions are surprisingly useful when reading product announcements. Anonymity is the state of being not identifiable within a set of subjects (the "anonymity set"). Unlinkability of two items of interest (IOIs) means an attacker cannot sufficiently distinguish whether they are related or not — e.g., two telemetry reports from the same user. Undetectability means an attacker cannot sufficiently distinguish whether an IOI exists at all. Unobservability combines undetectability of the IOI with anonymity of the subject involved. Pseudonymity is the use of identifiers other than real names — the weakest category, and the one most vendor claims silently invoke.

When a vendor says "we anonymize your data," you should ask which property they actually mean. Stripping an IP and replacing it with a hashed device ID gives you pseudonymity, nothing more. A mixnet gives you unlinkability under assumptions about the mix network. Local differential privacy gives you a formal indistinguishability property (a cousin of unlinkability) at the record level. OHTTP gives you unlinkability between network identity and request content, conditional on non-collusion. These are not interchangeable.

Why identifier stripping fails: three canonical case studies

Every introduction to this field needs the re-identification trilogy. In 1997, Latanya Sweeney (then a graduate student, later CTO at the FTC) purchased $20 worth of Massachusetts voter-registration data and cross-referenced it with a "de-identified" Group Insurance Commission release of state employees' hospital records, producing the medical records of then-Governor William Weld, who had recently collapsed at a public event. Her subsequent work ("Simple Demographics Often Identify People Uniquely," Carnegie Mellon Data Privacy Working Paper 3, 2000) showed that 87% of the US population is uniquely identified by the combination of five-digit ZIP code, date of birth, and sex. A refined 2006 analysis lowered the figure to around 63%, but the point stands: ordinary demographics are near-identifiers.

In August 2006, AOL released twenty million search queries from 650,000 "anonymized" users as a research gesture; within days The New York Times identified AOL user 4417749 as Thelma Arnold, a 62-year-old woman in Lilburn, Georgia, from her queries alone. A month later, Netflix launched its $1M Prize with a release of "anonymized" movie ratings for 500,000 subscribers. Arvind Narayanan and Vitaly Shmatikov ("Robust De-Anonymization of Large Sparse Datasets," IEEE S&P 2008) showed that knowledge of as few as eight movie ratings — with dates approximate to ±14 days — uniquely identified 99% of Netflix users in the released data, enabling cross-reference with IMDb to recover political and sexual-orientation signals. These cases are the reason the field moved to formal guarantees at all.

k-anonymity and its successors: the syntactic dead end

Sweeney's k-anonymity ("k-Anonymity: A Model for Protecting Privacy," IJUFKS 2002) requires that each record be indistinguishable from at least k−1 others on "quasi-identifier" attributes, typically via generalization (ZIP → ZIP3) and suppression. It is intuitive and satisfies regulators but has three structural problems that kept patching the definition.

The homogeneity attack exploits the case where all k records share a sensitive value — k-anonymity does not constrain sensitive-attribute diversity. Machanavajjhala, Kifer, Gehrke, and Venkitasubramaniam's l-diversity (ICDE 2006, TKDD 2007) requires each equivalence class to contain at least l "well-represented" sensitive values. This in turn fails against skew and similarity attacks (sensitive values close in meaning leak information), motivating t-closeness (Li, Li, Venkatasubramanian, ICDE 2007), which bounds the distance between the sensitive-attribute distribution in each class and the overall distribution. A deeper limitation — independent of any patch — is Aggarwal's curse of dimensionality ("On k-Anonymity and the Curse of Dimensionality," VLDB 2005): for high-dimensional data like web logs or genomic data, any useful k-anonymization destroys utility, because almost every record is unique on enough attributes that generalization has to be extreme.

The modern consensus is that k-anonymity-family definitions do not compose — two separate k-anonymous releases of overlapping populations can re-identify each other — and are inadequate for the kinds of data (browsing, location, biometrics, LLM prompts) that dominate telemetry today. They persist in GDPR/HIPAA guidance and some healthcare pipelines, and as a cohort-level primitive they show up in threshold-release systems like STAR.

Differential privacy: the dominant formal family

Differential privacy (Dwork, McSherry, Nissim, Smith, "Calibrating Noise to Sensitivity in Private Data Analysis," TCC 2006) is the rare academic concept that became a deployed industry standard. The definition: a randomized mechanism M is (ε, δ)-differentially private if for any two databases differing in one record, the output distributions of M are within a factor of e^ε of each other, with an additive δ slack. Small ε is strong, small δ is strong; academic convention treats ε ≤ 1 as meaningful, ε ≈ 10 as weak, ε ≥ 20 as "cosmetic." The key property is that privacy is defined over the mechanism, not the data — it holds regardless of auxiliary information, which is exactly what k-anonymity fails to do.

Four deployment models matter:

Central DP assumes a trusted curator holding the raw data who adds calibrated noise (Laplace for ε-DP, Gaussian for (ε,δ)-DP) to query results. Low noise, strong trust assumption — appropriate when the curator is a statistics agency or a single organization analyzing its own data. The 2020 US Census uses this model.

Local DP (descended from Warner's 1965 randomized-response survey technique) adds noise on the client, before any server ever sees the data. No trusted curator needed, but the noise required to hide any single user's contribution is so large that either ε must be loose or the user base must be enormous. RAPPOR and Apple's keyboard telemetry live here.

Shuffle-model DP splits the difference: clients add modest local noise, then a cryptographic shuffler (often a trusted enclave or a mix-like protocol) removes the link between records and senders. The anonymity provided by shuffling amplifies the local guarantee into something close to the central model — this is the crucial result of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta's "Amplification by Shuffling" (SODA 2019), Balle et al.'s "Privacy Blanket of the Shuffle Model" (CRYPTO 2019), Cheu et al.'s "Distributed Differential Privacy via Shuffling" (EUROCRYPT 2019), and Feldman, McMillan, and Talwar's "Hiding Among the Clones" (FOCS 2021). Google's Prochlo / ESA (Bittau et al., SOSP 2017) was the first concrete deployment of this architecture, using SGX-based oblivious shuffling.

Variants for tighter composition: concentrated DP (Dwork-Rothblum 2016), zero-concentrated DP (Bun-Steinke 2016), and Rényi DP (Mironov 2017) replace the basic (ε,δ) definition with ones that compose additively under repeated queries. zCDP is what the Census Bureau ended up using for its 2020 deployment; Rényi DP with the moments accountant (Abadi et al. CCS 2016) is what powers almost all modern DP-SGD machine learning. The practical significance: under basic composition, running k queries each at ε degrades to ; under advanced/RDP composition, it degrades more like √k·ε.

The composition bookkeeping is where deployed systems get into trouble. Per-event ε means little if a user contributes thousands of events. Per-user-lifetime ε is the honest metric but it is almost never reported; systems that reset the budget daily or per-feature achieve a per-day ε that is a reasonable number and a lifetime ε that is effectively unbounded.

The Apple LDP controversy: what longitudinal composition actually looks like

Apple announced at WWDC 2016 that iOS 10 and macOS Sierra would use local differential privacy to collect frequency data for emoji, new words typed into QuickType, Safari crash/energy domains, Spotlight lookup hints, and Health data types. Apple used Count Mean Sketch and Hadamard CMS (later documented in "Learning with Privacy at Scale," Apple ML Journal, December 2017), which are technically sound mechanisms. What Apple did not initially publish was the ε values.

Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and XiaoFeng Wang (arXiv:1709.02753, September 2017) reverse-engineered /System/Library/PrivateFrameworks/DifferentialPrivacy.framework and the com.apple.dprivacyd daemon. The per-datum ε they found was 1 for Emoji/Lookup/Health and 2 for NewWords — in line with academic norms — but the combined per-day ε across features was 14 on macOS and 16 on iOS, and the budget reset daily with no lifetime cap. Frank McSherry, a DP co-inventor, told Wired that ε=14 "is not something I'd feel comfortable having my life rest on." Apple disputed the additive combination on the grounds of feature independence, but the substantive point — no cryptographically enforced lifetime budget, config files modifiable with root, no independent audit of the RNG or deletion policy — remained. Apple's December 2017 overview PDF published per-feature ε values (2 for Health, 4 for Emoji/Lookup/Safari, 8 for QuickType) and daily submission caps. Those numbers are still the public specification as of 2026; Apple has not published a lifetime policy.

This episode is the canonical example of how a formally correct mechanism can be deployed in a way that produces a vastly weaker real-world guarantee than the published per-event ε suggests. Any vendor DP claim should be read by first asking: per what, and for how long?

The 2020 Census: central DP at civilization scale

The US Census Bureau's 2020 Disclosure Avoidance System (DAS) is the largest central-DP deployment on record, and the cleanest case study in what formal privacy actually costs in the real world. The TopDown Algorithm ingests the census edited file, computes hierarchical noisy counts under zCDP accounting, and post-processes to enforce non-negativity and geographic consistency.

The final privacy-loss budget for the Redistricting Data (PL 94-171) release was approved in June 2021 at ε = 19.61 total, decomposed as ε = 17.14 for the persons file and ε = 2.47 for the housing-unit file, with δ = 10⁻¹⁰. These numbers sit far above the ε ≤ 1 academic convention and generated substantial controversy. Steven Ruggles and colleagues ("Differential Privacy and Census Data: Implications for Social and Economic Research," AEA Papers and Proceedings 2019) argued the noise introduced would distort counts for small populations, rural areas, and minority subgroups beyond usable thresholds. A 2024 rural-populations study found the final algorithm produced systematically worse accuracy for non-white populations than for non-Hispanic whites — a genuine equity problem from an ostensibly neutral mechanism. Lawsuits followed; the National Academies' 2023 Assessing the 2020 Census: Final Report documented the tradeoffs at length. John Abowd, then Census Chief Scientist, defended the approach in a special issue of Harvard Data Science Review (2022) on the grounds that the prior swapping-based disclosure avoidance provided no quantifiable protection and failed database-reconstruction attacks.

The Census case illustrates two honest facts about real DP: the ε budget is a political decision about the privacy-utility frontier, not a technical constant; and the invariants (total state population, housing-unit counts) that are held exact for legal reasons leak more than any chosen ε can hide. It also shows that shipping DP is possible at the scale of an entire country — the DAS code is open-source, the parameters are public, and the noise distributions are specified.

Deployed system survey, part 1: local DP telemetry

Google RAPPOR (Erlingsson, Pihur, Korolova, "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response," ACM CCS 2014) was the first large-scale LDP deployment in a consumer product. Chrome used it from 2014 onward to collect string-valued statistics (hijacked homepages, default search engines, suspicious Windows registry values). Each client encodes its value in a Bloom filter, applies permanent randomized response to memoize a noisy version (bounding longitudinal ε), then applies instantaneous randomized response per report. The canonical Chrome configuration had ε∞ = ln(3) ≈ 1.1 for stable values, with per-report ε₁ ≈ 0.67 per hash. The big caveat is that the longitudinal guarantee only holds if the underlying value does not change — for correlated time series the bound degrades. Follow-up work (Wang, Blocki, Li, Jha, USENIX Security 2017) showed RAPPOR is strictly suboptimal versus Optimized Unary Encoding at the same ε. RAPPOR is effectively deprecated in Chrome today, superseded by Prochlo/ESA-style shuffle mechanisms and the Privacy Sandbox Aggregation Service. Reference code at github.com/google/rappor is public but unmaintained.

Apple's LDP (discussed above) remains the only mass-market keyboard and browser telemetry pipeline using local DP, with per-feature ε between 2 and 8, per-day budgets of 14–16, and no published lifetime policy. It is closed-source; the December 2017 technical overview PDF is still the primary document.

Deployed system survey, part 2: two-server MPC (Prio, DAP, Divvi Up)

Prio (Corrigan-Gibbs & Boneh, "Prio: Private, Robust, and Scalable Computation of Aggregate Statistics," NSDI 2017) replaces the trusted-curator assumption with a two-server non-collusion assumption: each client secret-shares its contribution between two aggregators, which sum their shares locally and combine only at the end. A Secret-Shared Non-Interactive Proof (SNIP) lets the servers jointly verify that each client's input is well-formed without learning the input — preventing a single malicious client from poisoning the aggregate. Privacy holds as long as at least one aggregator is honest.

Mozilla's Firefox Origin Telemetry (2018–2019) was the first notable Prio deployment, measuring tracking-protection blocklist effectiveness. Mozilla ran both Prio servers themselves during the pilot, explicitly documenting that this "does not provide a meaningful privacy benefit" — a candid admission that the pilot validated integration and scale, not the non-collusion assumption. The pilot collected about 3 million values over six weeks of Firefox Nightly at 1% rollout. This system is now superseded by DAP.

Exposure Notifications Private Analytics (ENPA) was the largest Prio deployment to date: Apple/Google exposure-notification apps in 11+ US states plus DC used it for aggregate usage analytics, with Apple and Google as HPKE-encrypted ingestion points, ISRG as one aggregator, NIH/NCI as the other, and MITRE as the portal that combined shares and released aggregates to public-health authorities. ISRG self-reported processing over 12 billion metrics by late 2022. Privacy was strengthened with on-device Gaussian noise for (ε,δ)-DP in addition to Prio's MPC privacy. ENPA is the proof that two-server MPC works at nation-state scale.

ISRG Divvi Up (renamed from "ISRG Prio Services" in December 2021) is the production continuation, operated by the same organization that runs Let's Encrypt. Divvi Up implements the Distributed Aggregation Protocol (DAP), currently at draft-ietf-ppm-dap-16 (September 2025) — still an Internet-Draft, not an RFC, despite four years of revisions. DAP builds on VDAF (Verifiable Distributed Aggregation Functions, draft-irtf-cfrg-vdaf-18, January 2026), which generalizes Prio3 (for sums, histograms, count-vectors) and Poplar1 (for private heavy-hitters over bit strings, based on Boneh et al. "Lightweight Techniques for Private Heavy Hitters," IEEE S&P 2021). The open-source Rust implementation is Janus at github.com/divviup/janus; Cloudflare's alternative is Daphne. Production subscribers include Mozilla (Firefox telemetry), Horizontal (human-rights tech, first non-Mozilla production user August 2023), and Tinfoil Analytics. Divvi Up's USENIX PEPR 2026 talk describes "processing billions of contributions" from Firefox. DAP itself is DP-agnostic; (ε,δ)-DP is added per-task via draft-wang-ppm-differential-privacy.

Firefox Privacy-Preserving Attribution (PPA) shipped in Firefox 128 (July 2024) as a joint Mozilla/Meta proposal based on DAP, with ε = 1.0 default per the W3C Private Attribution First Public Working Draft (April 2025). It was enabled by default with no upfront disclosure, triggering a noyb GDPR complaint to the Austrian DPA in September 2024 and substantial community backlash. Mozilla's current position (support.mozilla.org) states PPA "was never activated and was later removed" — the intended pilot on two Mozilla-operated sites was delayed past usefulness, and the feature was withdrawn without ever being turned on in production.

Deployed system survey, part 3: threshold aggregation (STAR, Brave P3A)

STAR (Davidson, Snyder, Quirk, Genereux, Livshits, Haddadi, "Secret Sharing for Private Threshold Aggregation Reporting," ACM CCS 2022) provides a different flavor of guarantee: cryptographic κ-anonymity. A measurement value is revealed to the server if and only if at least κ clients submit the same value; otherwise the server learns nothing about it. The mechanism uses Shamir secret sharing keyed to a per-value tag derived via a VOPRF randomness server (preventing offline dictionary attacks on low-entropy inputs). The variant STARLite skips the randomness server entirely and is secure only when input distributions are inherently high-entropy. Constellation, Brave's extension, nests STAR layers for hierarchical reporting.

Brave's Privacy-Preserving Product Analytics (P3A) runs STAR/Constellation with κ = 50 per layer, enabled by default since Brave 1.62 (late 2023); legacy JSON fallbacks were fully removed in Brave 1.81 (June 2025). The randomness server (star-randsrv) runs inside an AWS Nitro Enclave with reproducible builds and remote attestation — a layered defense: even if Brave's infrastructure is compromised, the enclave protects the VOPRF key. Brave's Nebula variant adds (ε=1, δ=10⁻⁸) DP sampling at p = 0.105 on top of STAR to provide plausible deniability for undecodable tags, addressing a leakage channel identified in §4.6 of the STAR paper. Code is open at github.com/brave/sta-rs, /constellation, /constellation-processors, /star-randsrv (MPL-2.0).

The practical appeal of STAR versus DAP is the single-server trust model (the randomness server is a separate auxiliary entity, not a co-equal aggregator) and the clean κ-anonymity semantics — when an analyst asks "which values did at least 50 clients report?", the answer is definitionally accurate with no DP noise. The tradeoff is that STAR provides no protection for values that do clear the threshold, which is appropriate for some telemetry (popular tracker domains, common user-agent strings) but not for open-ended data.

Deployed system survey, part 4: TEE-backed aggregation (Privacy Sandbox)

Google's Privacy Sandbox Aggregation Service is the most significant deployed TEE-based privacy system. Clients (Chrome or Android) produce HPKE-encrypted aggregatable reports containing histogram contributions. The adtech batches these and submits them to the Aggregation Service running inside AWS Nitro Enclaves or GCP Confidential Space. The enclave attests its binary hash to two CoordinatorsGoogle and Accenture — each holding half of the decryption-key material; only an allow-listed enclave binary can decrypt reports. The enclave sums contributions, adds discrete Laplace noise with scale L1/ε, and returns the aggregate. The L1 contribution budget per source is 2¹⁶ = 65,536, and published adtech documentation shows ε up to 64 during the 2023–24 origin trial — far above academic norms, making the noise effectively cosmetic for most queries. The spec is explicit that the privacy scope is per-source, not per-user; rate limits approximate user-level bounds.

Google's formal analysis (Ghazi, Kamath, et al., "On the Differential Privacy and Interactivity of Privacy Sandbox Reports," PoPETs 2025) proves Individual DP under assumptions of TEE integrity and honest Coordinators. Those are real assumptions. Nitro Enclaves have been less attacked than SGX but share the overall class of concerns; TEE non-collusion is a meaningful threat model; the specific ε chosen by adtechs is loose. The source is at github.com/privacysandbox/aggregation-service (Apache 2.0).

The broader Privacy Sandbox story is a cautionary tale about deployment as product rather than research. Google announced third-party cookie deprecation in January 2020, repeatedly delayed, then reversed twice: July 2024 (no mandatory phase-out), April 2025 (no user-choice prompt either), and October 2025 (formal retirement of low-adoption APIs, retaining CHIPS, FedCM, Private State Tokens, ARA, and Aggregation Service). Google's own tests found ~20% publisher-revenue decline under Sandbox-only measurement, and UK CMA/ICO regulatory pressure reshaped the timeline. The Topics API (successor to the abandoned FLoC) remains active but has no formal ε — it classifies browsing into one of 469 taxonomy topics with a 5% random-noise rate, heuristic rather than formal privacy.

Deployed system survey, part 5: network-layer unlinkability

Oblivious HTTP (RFC 9458, January 2024) by Martin Thomson (Mozilla) and Christopher Wood (Cloudflare) codified a pattern that had been emerging for years: a four-party architecture (client, relay, gateway, target) in which the client HPKE-encrypts a binary-HTTP request to the gateway, the relay forwards the encrypted blob without seeing plaintext, and the gateway decrypts without seeing client IP. Unlinkability holds under non-collusion of relay and gateway. It explicitly does not protect against traffic analysis, timing correlation, or a colluding relay-gateway pair — the RFC itself notes that Prio and Tor provide stronger guarantees at higher cost. The supporting stack includes RFC 9180 (HPKE), RFC 9292 (Binary HTTP), and RFC 9540 (DNS-based key discovery).

Within two years of the RFC, OHTTP has become the de facto network-layer privacy primitive for telemetry. Cloudflare's Privacy Gateway (2022) was the first commercial relay; the health app Flo uses it for its "Anonymous Mode." Mozilla's Firefox routes some telemetry and search-suggestion queries through Fastly's OHTTP relay paired with Divvi Up as the DAP aggregator (October 2023). Google's Enhanced Safe Browsing since March 2024 routes URL-hash-prefix lookups through a Fastly-operated OHTTP relay, protecting about 5 billion devices and 10 billion URLs per day — likely the largest OHTTP deployment in production. Apple's Private Cloud Compute (June 2024, powering Apple Intelligence cloud inference) routes requests through a third-party-operated OHTTP relay with RSA-Blind-Signature-based single-use tokens decoupling authorization from identity.

Apple iCloud Private Relay (iOS 15+, October 2021) predates the RFC and uses MASQUE over QUIC rather than OHTTP encapsulation, but achieves the same non-collusion property: Apple operates the ingress proxy, Cloudflare, Fastly, and Akamai operate egress proxies; Safari HTTP/HTTPS traffic (and opt-in NSURLSession traffic on iOS 17+) is protected, WebRTC and BSD-socket traffic is not. A 2023 TUM/APNIC measurement study (Sattler et al.) found that Akamai's AS36183 appears on both ingress and egress paths in some configurations, raising a concrete concern about single-operator observation — a reminder that non-collusion is only as strong as the operator diversity that actually ships.

Tor remains the strongest-guarantee deployed anonymity network (3-hop onion routing, ~4M daily users, BSD-licensed), but its latency and volunteer-relay bandwidth make it unsuitable for billion-scale telemetry, and its threat model explicitly excludes global passive adversaries. Mixnets — Chaum 1981 in lineage, Loopix (Piotrowska, Hayes, Elahi, Meiser, Danezis, USENIX Security 2017) in modern design, Nym Technologies' mainnet in deployment — provide unobservability against a global adversary via Sphinx packets, Poisson-mixing delays, and cover traffic. Nym's 2021 mainnet launch and 2024–25 NymVPN product ship real traffic, but the latency (seconds to minutes for high-anonymity modes) and bandwidth cost of cover traffic rule mixnets out of telemetry use cases. Katzenpost and Meson are academic and cryptocurrency-focused descendants at testnet scale.

Deployed system survey, part 6: anonymous credentials and Privacy Pass

Privacy Pass — Davidson, Goldberg, Sullivan, Tankersley, Valsorda, "Privacy Pass: Bypassing Internet Challenges Anonymously," PoPETs 2018 — was standardized in June 2024 as RFC 9576 (architecture), RFC 9577 (HTTP authentication scheme), and RFC 9578 (issuance protocols), with RFC 9474 (RSA Blind Signatures) and RFC 9497 (VOPRF) as crypto building blocks. The protocol lets a client prove "I am authorized" or "I am not a bot" at an Origin without the Origin being able to link the token to its issuance context. Issuance and redemption are cryptographically unlinkable under VOPRF or RSA blind-signature hardness plus non-collusion between Issuer and Origin.

Cloudflare's 2017 browser extension was the first deployment (trade one CAPTCHA for 30 tokens). Apple's Private Access Tokens (iOS 16, September 2022) put Apple in the Attester role, verifying device genuineness and account status, with Cloudflare and Fastly as Issuers using RSA Blind Signatures. Chrome's Private State Tokens (renamed from Trust Tokens, 2023) use VOPRF for cross-site anti-fraud signaling without cross-site tracking. Kagi Search uses Privacy Pass for unlinkable subscription authentication. Ongoing IETF work covers rate-limited tokens, batched issuance, and Anonymous Rate-Limited Credentials (ARC).

The broader anonymous-credentials family — CL signatures, U-Prove, Idemix, IRMA on the legacy side, and BBS+ signatures (being standardized by IETF CFRG, adopted in W3C Verifiable Credentials, eIDAS 2.0, Hyperledger AnonCreds v2) on the modern side — provides multi-show presentation unlinkability with selective disclosure. These are the primitives that let "one submission per device per day" work without linking the submissions. Direct Anonymous Attestation (DAA/EPID) ships in billions of TPMs today, making it arguably the most widely deployed anonymous credential system by volume, though it is rarely marketed as such.

A legitimate concern flagged by Eric Rescorla and others: Privacy Pass entrenches gatekeeper Attesters (Apple for iOS, Google for Chrome, Cloudflare at the edge). The cryptographic guarantee is real; the web-openness tradeoff is real too.

Deployed system survey, part 7: TEEs and homomorphic encryption

Trusted Execution Environments — Intel SGX, Intel TDX, AMD SEV-SNP, AWS Nitro Enclaves, ARM CCA — provide a "trust primitive" rather than a formal privacy guarantee. A TEE claims that code running inside can attest its identity to a remote verifier and process data that neither the host OS, hypervisor, nor cloud operator can read. Privacy systems using TEEs include Google's Privacy Sandbox Aggregation Service, Signal's SGX-based contact discovery, Apple's Private Cloud Compute (using Apple Silicon with custom Secure Enclaves), Brave's STAR randomness server, and Prochlo's oblivious shuffler. The attraction is a "distributed trusted curator" — the TEE approximates the central-DP curator without any single party being trusted.

The concerns are serious. The transient-execution family (Spectre, Meltdown, Foreshadow CVE-2018-3615, Downfall, SGAxe, ÆPIC Leak, Plundervolt) has repeatedly broken SGX isolation. Attestation roots trust in the silicon vendor. The defense-in-depth argument is that a TEE compromise plus server compromise plus non-collusion failure is still hard. The honest statement is that TEEs are a real but imperfect primitive, appropriate as one layer in a multi-layer privacy system, not as a sole guarantee.

Homomorphic encryption, despite two decades of research and mature libraries (Microsoft SEAL, HElib, OpenFHE), is rarely deployed because of performance cost. The notable exceptions are Apple's 2024 rollouts using the BFV scheme: Live Caller ID Lookup in iOS 18 does Private Information Retrieval against a server database so the server learns neither the phone number nor the lookup result; and Enhanced Visual Search for Photos uses a Private Nearest Neighbor Search protocol called Wally (Apple's swift-homomorphic-encryption library, arxiv 2406.06761) to match image embeddings against a landmark database without revealing the embedding. Apple reports post-quantum 128-bit security via BFV parameters grounded in Ring-LWE hardness. These are interesting proof-of-concepts for HE-at-scale but remain narrow use cases; general telemetry through HE is still impractical.

Zero-knowledge proofs show up in privacy-preserving telemetry primarily as integrity, not statistical privacy: Prio's SNIPs validate that a client's input is well-formed; VDAF generalizes this as verifiability; Boneh et al.'s "Zero-Knowledge Proofs on Secret-Shared Data via Fully Linear PCPs" (CRYPTO 2019) is the theoretical basis for Prio3. Privacy Pass uses VOPRFs and blind signatures, which are ZK-adjacent. SNARKs and Bulletproofs dominate blockchain privacy but have not meaningfully entered telemetry pipelines.

Exposure Notifications: architectural privacy without DP

The Google/Apple Exposure Notifications (GAEN) framework deployed April 2020 is a separate case study — it provides architectural privacy (no location, no persistent identifiers, no cross-user graph) without claiming a DP-style formal guarantee. Devices rotate Temporary Exposure Keys every 24 hours, derive Rolling Proximity Identifiers every 10–20 minutes, and broadcast them via BLE. On positive diagnosis, users upload their 14-day TEK history to a health-authority backend, which publishes signed key exports that other devices download and match locally.

GAEN shipped in 30+ countries; peak European Federation Gateway traffic reached 700,000+ keys in a single day in March 2022; cumulative downloads hit 206M before decommissioning by early 2023. Academic scrutiny found the core cryptographic claims largely hold, but the system suffered real implementation failures: AppCensus's Joel Reardon documented in April 2021 that Android's GAEN implementation logged RPIs to system logs readable by hundreds of pre-installed apps; Leith and Farrell at Trinity College Dublin showed Google Play Services leaked IMEI and SIM metadata alongside GAEN use; post-upload TEK publication makes an infected user's entire 14-day beacon trail linkable to anyone with passive BLE receivers, as the DP-3T consortium's Paparazzi and Orwell attack analyses detailed. GAEN is a useful reminder that architectural privacy claims are only as strong as the implementation, and cryptographic unlinkability does not imply side-channel resistance.

Comparison matrix across deployed systems

System Guarantee family Parameter Non-collusion / trust assumption Auditability Open source Scale Known issues
RAPPOR (Chrome, 2014) ε-LDP ε∞ = ln(3) ≈ 1.1; per-report ε₁ ≈ 0.67 None (pure LDP) Paper + code Yes (Apache) Hundreds of millions of clients Deprecated; suboptimal vs OUE; longitudinal bound only if value static
Apple LDP (iOS/macOS, 2016+) ε-LDP Per-feature ε = 2–8; daily ε ≈ 14–16; lifetime unbounded None (pure LDP) RE-only (Tang et al. 2017) No All iOS/macOS devices with telemetry opt-in No lifetime budget; ε-values not cryptographically enforced; opaque RNG
Prochlo/ESA (Google, 2017) Shuffle-DP Amplified per-deployment SGX shuffler integrity Paper Partial Internal Seeded later systems; SGX attack surface
GAEN (Apple/Google, 2020) Architectural (no identifier) None formal Health-authority honesty Spec public; system code closed Partial (apps yes, framework no) 30+ countries; 200M+ installs Post-upload TEK linkability; GMS log leakage
ENPA (GAEN analytics, 2020–22) Prio MPC + (ε,δ)-LDP + shuffle amplification Small single-digit ε per metric ISRG vs NIH/NCI; Apple/Google can't decrypt Paper + open code Yes (Apache) 12B+ metrics by late 2022 Wound down with EN apps
Firefox Origin Telemetry (2018–19) Prio MPC No DP Violated in pilot — Mozilla ran both servers Acknowledged Yes (MPL-2.0) 3M values, Nightly at 1% Pilot had no real non-collusion; superseded by DAP
iCloud Private Relay (2021+) Unlinkability (IP↔destination) None formal Apple + Cloudflare/Fastly/Akamai Spec only No Tens–hundreds of millions (iCloud+) AS36183 appears both sides in some paths; disabled in some jurisdictions
Brave P3A STAR/Constellation (2023+) Cryptographic κ-anonymity + (Nebula: ε=1 DP) κ = 50 per layer; Nebula p=0.105, ε=1, δ=10⁻⁸ Randomness server vs aggregation server; Nitro Enclave adds hardware Open source + Nitro attestation Yes (MPL-2.0) All default-on Brave users since 1.62 Leakage from undecodable-tag counts; trusts Brave infra
Privacy Sandbox Aggregation Service (Chrome/Android, 2023+) (ε,δ)-DP in TEE L1 = 2¹⁶; ε up to 64 per-source Google + Accenture coordinators; Nitro/GCP CS TEEs Reproducible builds, attestation Yes (Apache) Chrome, billions of users Per-source not per-user; ε too loose for academic norms; Sandbox scope reduced Oct 2025
Firefox PPA (Firefox 128, 2024) DAP + (ε,δ)-DP ε = 1.0 default Mozilla + ISRG Open source Yes (MPL-2.0) Never activated, removed Default-on triggered noyb GDPR complaint
Divvi Up / DAP (ISRG, 2022+) Prio3/Poplar1 MPC + optional DP Per-task; PPA=1.0 Leader + Helper (e.g., Mozilla + ISRG) Open source Yes (MPL-2.0) "Billions of contributions" per USENIX PEPR 2026 DAP still Internet-Draft after 16 revisions
Chrome Safe Browsing OHTTP (2024) Unlinkability (URL hash ↔ IP) None (k-anonymity in 4-byte prefixes) Google + Fastly Public API Chromium yes 5B devices, 10B URLs/day Timing side-channels; 4-byte prefix k-anon is weak
Apple Private Cloud Compute (2024) Confidential computing + attestation + OHTTP relay None formal Third-party relay + Apple Silicon attestation Transparency log + VRE Partial All Apple Intelligence cloud inference 3rd-party AI (ChatGPT handoff) not covered; no full OS open source
Privacy Pass / PAT (RFCs 9576–78, 2024) Unlinkability (issue↔redeem) None Issuer + Origin; VOPRF/RSA-blind hardness Standardized RFCs Yes 100s of millions (Apple PAT, Chrome PST) Entrenches gatekeeper attesters
2020 US Census DAS Central zCDP ε = 19.61 total; persons file ε = 17.14, housing ε = 2.47; δ = 10⁻¹⁰ Trusted curator (Census Bureau) Open source + published params Yes Entire US population ε too loose by academic norms; accuracy impact on small/minority populations

How to parse a vendor privacy claim into a formal rung

After this tour, three heuristic questions will get you surprisingly far when a vendor announces a new "privacy-preserving" feature.

First, what Pfitzmann-Hansen property is actually at stake? If the claim is about IP stripping or hashing, that's pseudonymity at best. If it's about aggregation cutoffs, that's a k-anonymity-flavored cohort guarantee. If it's about formal indistinguishability of my record from my neighbor's, that's DP. If it's about my traffic looking like everyone else's, that's network unlinkability or unobservability.

Second, what is the parameter, per what unit, over what time horizon? ε = 1 per event for a one-time release is strong; ε = 1 per event for a billion events is meaningless. The Apple lifetime-budget critique applies to any LDP system that resets its budget periodically. The Privacy Sandbox ε = 64 per-source cap with rate limits approximating a per-user bound is a specific formulation that deserves specific scrutiny. The Census's ε = 19.61 total for a one-time release is a different beast from a per-query ε.

Third, what is the non-collusion or trust assumption, and is it actually met in deployment? Two aggregators run by the same organization is not a non-collusion guarantee (Mozilla's own Origin Telemetry pilot was candid about this). A TEE's guarantee is no stronger than the vendor's silicon. A two-hop relay architecture where the same CDN provider appears on both hops (as measurement found for some iCloud Private Relay paths) is a single-operator architecture in non-collusion clothing.

Emerging domains: LLM telemetry

The next frontier, only lightly explored in deployed systems, is LLM conversational telemetry. Prompts are high-entropy, multi-kilobyte, and often contain PII by construction — they are the opposite of the low-dimensional categorical frequencies that RAPPOR and Apple LDP were designed for. Apple's Private Cloud Compute is a confidential-computing + attestation answer for inference itself, not for telemetry about it. Apple's Wally / PNNS uses HE to avoid revealing embeddings for search. Differential privacy for LLM training data (DP-SGD, Abadi et al.) is a separate, well-established thread, but deployed privacy for the telemetry of LLM usage — which prompts work, which completions users edit, which features get used — has no mature formal answer. Expect the existing stack (OHTTP + Privacy Pass for rate-limiting + DAP for aggregates) to be stitched together for this use case over 2026–2027, with all the attendant composition caveats.

Conclusion: a small set of durable lessons

The landscape surveyed here is the result of two decades of learning, much of it hard. The biggest lesson is that formal guarantees are not self-executing — a correct mechanism deployed with sloppy composition bookkeeping (Apple's lifetime ε), with violated non-collusion (Mozilla's single-operator Prio pilot), with loose parameters chosen for business reasons (Privacy Sandbox's ε = 64), or with side-channel leakage (GAEN on Android) produces far less privacy than the paper promises. The second lesson is that guarantee families genuinely differ in strength and assumptions: LDP needs no trusted party but demands loose ε; two-server MPC is strong under non-collusion; TEEs collapse trust to the silicon vendor; mixnets give the strongest guarantees but no one uses them for telemetry because the latency is prohibitive; threshold aggregation gives clean semantics for popular values but nothing for the long tail.

The third lesson is that the honest deployments are the ones that document their assumptions — Census parameters are public, Divvi Up's code is on GitHub, Brave's STAR κ is in the wiki, the STAR paper acknowledges its leakage channel, Mozilla publicly noted that its two-server Prio pilot was not really two-party. The dishonest deployments share a common tell: the marketing says "anonymous" and the FAQ does not mention ε, the adversary model, the non-collusion assumption, or the composition behavior. A reader who internalizes the vocabulary of this survey — LDP, shuffle-DP, central DP, MPC with non-collusion, TEE attestation, OHTTP unlinkability, Privacy Pass unlinkability, threshold aggregation, the Pfitzmann-Hansen hierarchy — has most of what they need to ask the right next question of any new system.

The field is still young, the standards (DAP, VDAF) are still drafts, and the deployed parameters are still frequently embarrassing when reverse-engineered. The trajectory is nonetheless encouraging: OHTTP went from RFC to Safe-Browsing-scale deployment in under two years; ISRG's Divvi Up now processes billions of contributions; Apple ships homomorphic encryption in a consumer OS; the 2020 Census survived formal-privacy deployment at civilization scale. The remaining hard problems — honest lifetime accounting, independent audits, open implementations, and privacy for the high-dimensional data that now dominates — are tractable, if the industry chooses tractability over marketing.

@xpe
Copy link
Copy Markdown
Author

xpe commented Apr 23, 2026

Comments welcome. I'm reading through this. No guarantees -- provenance shown above (Claude Opus 4.7 adaptive, web, research mode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment