- Setting: we have some theorem goal
$g$ , a dataset of mathematical lemmas$D$ , a set of actually useful lemmas$A$ , and a set of predicted lemmas$P$ . - We want to provide a good measure of how "good"
$P$ is with respect to the ground truth$A$ . - We begin by defining precision, which is the fraction of
$P$ that was correct:$|P \cap A|/|P|$ . Probabilistically, this can be seen asP(actual|predicted). - Simlarly, we define recall as the fraction of
Athat was correctly predicted:$|P \cap A|/|A|$ . Probabilistically, this isP(predicted|actual). - Let us now change the setting a little bit, where we swap the set
$P$ for a \emph{sequence} over the full universe of lemmas$A$ . - We can get a set by truncating
$P$ at some threshold. So we will defineprecision@kto be the precision of the set$P[:k]$ . - We note that recall as a function of k,
recall@kis non-decreasing. As we see more predictions, we can only get more actually useful things. See that recall has a fixed denominator (the size of the number of actually useful things). - Since
recall@kis non-decreasing, we can build an inverse,k@recall(r). For a given level of recallr, we map to the smallest$k$ (smallest number of items we need to take from the ranking) to get that level of recall. - This lets us define a precision-recall function, where
precision@recall(r) = precision@k(k@recall(r)).
- Suppose for a given goal
$g$ , we have 3 correct premisesa, b, c. The universe has premisesa, b, c, x, y, z. Our model predicts premises in the rankingx, a, y, b, c, z. Then we summarize this as follows:
Rank | Val | Prec | Rec
1 | x | 0 | 0/3
2 | a | 1/2 | 1/3
3 | y | 1/3 | 1/3
4 | b | 2/4 | 2/3
5 | c | 3/4 | 3/3
5 | z | 3/5 | 3/3
- Note that recall is monotonically non-decreasing, while precisoin both increases (
0 -> 1/2) and decreases (3/4 -> 3/5). - We introduce an auxiliary function delta,
$\delta(i) \equiv \text{lemma at i is correct}$ . This lets us write the above quantities as follows: - Let
$s(n) \equiv \sum_{i=0}^n \delta(i)$ . - The total number of correct elements is
$s(N)$ where$N$ is the total number of correct premises. - The precision at
$k$ is given by$p(k) \equiv s(k)/k$ . The recall at$k$ is given by$r(k) \equiv s(k)/s(N)$ . - Now note that the discrete difference
$dr(k) = r(k) - r(k-1)$ , which equals$(s(k)-s(k-1)/s(N)$ , which is$\delta(k)/s(N)$ .
- The best the
precision@recallfunction can be is a flat line withprecision=1for all levels of recall. - Deviation from this tells us how much worse our model is from the best model.
- So, let's compute the area under the
precision@recallcurve. This is going to be the average precision,$ap \equiv \int_{r=0}^1 p(r) dr$ . - Recall that the "best" precision will always have
p(r) = 1. Thus the theoretical maximum of this value we can have is$ap = \int_{r=0}^1 1 dr = 1$ . This gives us a good scale, where$0 \leq ap \leq 1$ . - We use recall as a way to "standardize" across the size of the dataset by the recall.
- Let's change of variables into
$k$ . So we want to change$r$ into$r(k)$ . - This will change the bounds of integration.
- The lower limit is given by
$0 = r(l)$ which solves for$l = 0$ . - The upper lmit is
$1 = r(u)$ which solves for$u = N$ (the size of the dataset). - This also changes
$dr$ to$dr(k)dk$ . - In the discrete case, we set
$dk = 1$ , and $dr(k)` becomes$r(k) - r(k-1)$ . This is$\Sigma_{i=0}^k \delta(i)/s(N) - \Sigma_{i=0}^{k-1} \delta(i))/s(N)$ which evaluates to$\delta(k)/s(N)$ . - This gives us the discrete calulation of
$ap$ to be$ap \equiv \Sigma_{k=1}^N p(k) \delta(k)/s(N)$ . - This recovers the formula from wikipedia.
- I believe this to be an incoherent concept; Recall that we chose to define average precision
as the area under the precision recall curve. This is a sensible quantity, because it's a normalized
value (recall is between
$0$ and$1$ ). We got the expression in terms of$k$ via a change of variables. We did not start at$k$ ! We started from$r$ and got to$k$ . - We can try to hack the expression for
$ap$ to artifically create$ap@K$ . Let's try. - First, to go from
$k$ to$r$ , we find a number$r_k$ such that the recall at$r(k) = r_k$ - Next, we calculate
$ap@K \equiv \int_0^{r_K} p(r) dr$ . - We must now find we must find new lower and upper bounds in terms of
$k$ . - The lower bounds needs us to find
0 = r(l)orl = 0. - The upper bound needs us to find
r_K = r(u), oru = K. - We will next have to calculate
$dr(k) dk$ . Previously, we set$dk = 1$ , and we calculated$dr(k) \equiv r(k) - r(k-1)$ . This will give us$dr(k) \equiv \delta(k)/s(N)$ . But note that$s(N)$ will count all documents, not just limited to the top-K. So let's used$s(K)$ instead --- a hack! - Combining these, we get the formula to be
$ap@K \equiv \int_{0}^K p(r(k)) dr(k) = \Sigma_{k=0}^K p(k) \delta(k) / s(K)$ .