Skip to content

Instantly share code, notes, and snippets.

@xpe
Last active April 2, 2026 13:54
Show Gist options
  • Select an option

  • Save xpe/2ef252f89f45b01140b0a9312b957afe to your computer and use it in GitHub Desktop.

Select an option

Save xpe/2ef252f89f45b01140b0a9312b957afe to your computer and use it in GitHub Desktop.
Drop “Indexed By” From the GP Definition

“Indexed by $X$” Is unnecessary when defining a Gaussian process (GP)

One definition

A Gaussian process is a collection of random variables ${f(x)}_{x \in X}$, indexed by a set $X$, such that every finite subcollection has a joint Gaussian distribution.

The same definition, without “indexed by $X$

A Gaussian process is a collection of random variables ${f(x)}_{x \in X}$ such that every finite subcollection has a joint Gaussian distribution.

Nothing was lost. The notation ${f(x)}_{x \in X}$ already says “for each $x \in X$, there is a random variable $f(x)$.” The phrase “indexed by a set $X$” restates this in words, adding no information.

Why this matters

The phrase is not just redundant — it is actively misleading for the audience most likely to encounter this definition for the first time.

“Indexed” has a formal meaning in mathematics (an indexed family is a function from an arbitrary set to some collection). But ML practitioners, applied statisticians, and engineers are unlikely to have that formal concept loaded. They will read “indexed” in its colloquial sense: sequential enumeration. First, second, third.

That interpretation is wrong. The kernel $k(x, x’)$ is a pairwise relation; the elements of $X$ have no ordering that matters. But someone encountering GPs for the first time has no way to know this. They either (a) detour into looking up what “indexed family” means formally, or (b) proceed with the wrong mental model.

Both outcomes are strictly worse than just not including the phrase.

The tradeoff

The standard phrasing is written for an audience that already knows what an indexed family is — an audience that does not need the definition. The audience that does need the definition is the one most likely to be confused by it.

That is a bad tradeoff, and it persists because conventions replicate without re-examination.

@xpe
Copy link
Copy Markdown
Author

xpe commented Apr 2, 2026

This was written by Claude Opus 4.6 Extended. That said, the underlying frustration is mine; finding good teaching materials about GP's is hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment