Skip to content

Instantly share code, notes, and snippets.

@Jach
Created August 24, 2025 03:35
Show Gist options
  • Save Jach/a79159518da5b1eebc1de788505d2dcd to your computer and use it in GitHub Desktop.
Save Jach/a79159518da5b1eebc1de788505d2dcd to your computer and use it in GitHub Desktop.
Notes on using Prometheus metrics in Common Lisp with the library https://github.com/deadtrickster/prometheus.cl
(ql:quickload "prometheus")
(ql:quickload "prometheus.formats.text")
(defpackage #:metrics-example
(:use #:cl))
(in-package #:metrics-example)
; ensure we have an empty registry:
(setf prometheus:*default-registry* (prometheus:make-registry))
; metrics are stored in registeries. For this example, we'll just use the default registry for everything.
; marshal is used to output all the metric data in a format prometheus expects, often exposed on /metrics
; it is essentially how metrics are "sampled", measured, or exposed.
(prometheus.formats.text:marshal)
(defvar some-gauge (prometheus:make-gauge :name "hello_gauge"
:help "Some gauge"
:value 3))
; the make-gauge was made in the default registry, so it will now be exposed:
(prometheus.formats.text:marshal)
#|
# TYPE hello_gauge gauge
# HELP hello_gauge Some gauge
hello_gauge 3
|#
; we can make another gauge and not store it as a var
(prometheus:make-gauge :name "gone_gauge"
:help "Gone gauge"
:value 0)
; but it will be present in the marshal
(prometheus.formats.text:marshal)
#|
# TYPE hello_gauge gauge
# HELP hello_gauge Some gauge
hello_gauge 3
# TYPE gone_gauge gauge
# HELP gone_gauge Gone gauge
gone_gauge 0
|#
; we can update the value of our some-guage
(prometheus:gauge.set some-gauge 83)
(prometheus.formats.text:marshal)
#|
# TYPE hello_gauge gauge
# HELP hello_gauge Some gauge
hello_gauge 83
# TYPE gone_gauge gauge
# HELP gone_gauge Gone gauge
gone_gauge 0
|#
; gauges are good for numeric values that can change arbitrarily.
; the package also has default implementations for:
; counters (which can atomically increment), histograms, simple summaries (basically a combo of summing and counting), and full summaries (breaks things down into quantiles)
(defvar simple-sum (prometheus:make-simple-summary :name "simple_summary" :help "example summary"))
(prometheus.formats.text:marshal)
#| Now you see:
# TYPE simple_summary summary
# HELP simple_summary example summary
simple_summary_sum 0
simple_summary_count 0
|#
(prometheus:summary.observe simple-sum 42)
(prometheus.formats.text:marshal)
; now simple_summary_sum will be 42, and count will be 1.
(prometheus:summary.observe simple-sum 13)
(prometheus.formats.text:marshal)
; now sum will be 55, and count will be 2.
; there's also a "time" macro that I don't really recommend, but it's there:
(prometheus:summary.time simple-sum (sleep 1))
(prometheus.formats.text:marshal)
; now sum will be 1055 -- basically 1000ms were automatically added to the sum. thus the time macro can be used to turn a summary metric into a sum of execution time.
(prometheus:summary.time simple-sum (sleep 5))
(prometheus:summary.time simple-sum (sleep 2))
(prometheus:summary.time simple-sum (sleep 1))
(prometheus.formats.text:marshal)
; now the sum is 9055 -- so 9 seconds have elapsed throughout however many times sum has been called.
; this is more useful with a full summary rather than a simple summary, because the full summary will report quantiles automatically as extra labeled values:
(defvar duration-sum (prometheus:make-summary :name "durations_example" :help "durations example"))
(prometheus.formats.text:marshal)
#| With no data yet, we see:
# HELP durations_example durations example
durations_example{quantile="0.5"} NIL
durations_example{quantile="0.9"} NIL
durations_example{quantile="0.99"} NIL
durations_example_sum 0
durations_example_count 0
|#
; let's say we have our own time macro and we track time in floating point seconds, instead of integer milliseconds. then:
(prometheus:summary.observe duration-sum 2.5)
(prometheus.formats.text:marshal)
; now all three quantile values will be 2.5 instead of nil, sum will be 2.5, and count will be 1.
; let's add some more observations to see how that affects the quantiles
; 9 more "typical" values around 2–3
(dotimes (i 9)
(prometheus:summary.observe duration-sum (+ 2 (/ i 10.0))))
; one larger value that will pull up the 90th quantile
(prometheus:summary.observe duration-sum 5.0)
; one "extreme" outlier that mostly affects the 99th quantile
(prometheus:summary.observe duration-sum 20.0) ; you might notice this returns 49.1, this is the sum of all observations
(format t "~a~%" (prometheus.formats.text:marshal))
#|
# TYPE durations_example summary
# HELP durations_example durations example
durations_example{quantile="0.5"} 2.5
durations_example{quantile="0.9"} 2.200000047683716
durations_example{quantile="0.99"} 2.299999952316284
durations_example_sum 49.099998474121094
durations_example_count 12
|#
; you can change what quantiles there are in the creation of the summary by passing a :quantiles keyword argument.
; it is an association list of (quantile . error). the default quantiles are 0.5, 0.9, and 0.99, with respective errors of 0.05, 0.01, and 0.001.
; this is a good time to mention labels as they're used for the quantiles. Labels let you parameterize your metrics in custom ways.
; for example, you might have a single product_counter metric, but with a label of product_name, and now you can track separate counts for each product name without making separate metric objects for each one.
; you need to define your label categories ahead of time, however. i.e. this will raise an error:
(prometheus:summary.observe simple-sum 10 :labels '("foo"))
; we could unregister and reregister a new one but let's just use the product count idea directly.
(defvar product-count (prometheus:make-counter :name "product_count" :help "counter for products" :labels '("product_name")))
(prometheus.formats.text:marshal)
#| You'll see it's registered, but no data yet
# TYPE product_count counter
# HELP product_count counter for products
|#
(prometheus:counter.inc product-count :labels '("apple"))
(prometheus:counter.inc product-count :labels '("banana"))
(prometheus:counter.inc product-count :labels '("banana"))
(prometheus.formats.text:marshal)
#|
# TYPE product_count counter
# HELP product_count counter for products
product_count{product_name="banana"} 2
product_count{product_name="apple"} 1
|#
; product_count{product_name="banana"} and product_count{product_name="apple"} are two separate time series derived from one metric definition.
; you can also make your own collections of metrics, which is good for:
; * metrics whose value you want to always represent the instantaneous value of something at time of measurement (marshal)
; * code organization of related values
; the way you do it is to make a custom class inheriting from prometheus:collector, make an instance of it and register it to your metrics registry (or the default one), and write a custom prometheus:collect method.
; the collect method takes a callback argument. you call this callback with a prometheus metric that you want exposed for this particular collection.
; when marshalling happens, i.e. metrics are sampled, under the hood 'collect' is called for everthing registered to the registry.
; let's start over with a fresh registry
(setf prometheus:*default-registry* (prometheus:make-registry))
(defclass my-collection (prometheus:collector)
((always-tracked :accessor .always-tracked
:initform (prometheus:make-counter :name "collects_count"
:help "Times the collect method has been called for this obj"
:value 0
:registry nil))))
; notice that we've passed an explicit :registry nil to prevent any automatic metric registration.
(defvar coll (make-instance 'my-collection))
; verify it is not present:
(prometheus.formats.text:marshal)
(prometheus:register coll)
; the above will produce an error -- we forgot to give our collection a name! this should be a required slot of the parent prometheus:collector and fail in the constructor tbh...
(setf coll (make-instance 'my-collection :name "my_collection"))
(prometheus:register coll)
; now the above succeeded, but if we try to marshal, we'll get an error that there's no defmethod yet:
(prometheus.formats.text:marshal)
; let's make an empty one
(defmethod prometheus:collect ((c my-collection) cb)
)
; and now we no longer error, and no new output is given:
(prometheus.formats.text:marshal)
; now let's expose our always-tracked metric, and also increment it:
(defmethod prometheus:collect ((c my-collection) cb)
(prometheus:counter.inc (.always-tracked c))
(funcall cb (.always-tracked c)))
(prometheus.formats.text:marshal)
; now it's there:
#|
# TYPE collects_count counter
# HELP collects_count Times the collect method has been called for this obj
collects_count 1
|#
(prometheus.formats.text:marshal)
; now the value is 2.
; we can add a new metric without storing it in the object
; just be careful to make sure its registry is nil so it doesn't get auto-registered to the default registry
(defmethod prometheus:collect ((c my-collection) cb)
(prometheus:counter.inc (.always-tracked c))
(funcall cb (.always-tracked c))
(funcall cb (prometheus:make-gauge :name "temp_thingy" :help "This gauge is nothing but a memory" :value 128
:registry nil)))
(prometheus.formats.text:marshal)
; count is 3, and we now see this new temp thing.
#|
"# TYPE collects_count counter
# HELP collects_count Times the collect method has been called for this obj
collects_count 3
# TYPE temp_thingy gauge
# HELP temp_thingy This gauge is nothing but a memory
temp_thingy 128
|#
(defmethod prometheus:collect ((c my-collection) cb)
(prometheus:counter.inc (.always-tracked c))
(funcall cb (.always-tracked c)))
(prometheus.formats.text:marshal)
; count is 4, and the temp_thingy is gone from the output.
(defmethod prometheus:collect ((c my-collection) cb)
(prometheus:counter.inc (.always-tracked c)))
(prometheus.formats.text:marshal)
; now the output is empty again, but we know the counter was still incremented.
;
(defmethod prometheus:collect ((c my-collection) cb)
(prometheus:counter.inc (.always-tracked c))
(funcall cb (.always-tracked c))
(funcall cb (prometheus:make-gauge :name "temp_thingy" :help "This gauge is nothing but a memory" :value 129
:registry nil)))
(prometheus.formats.text:marshal)
; now the counter is back, with value 6, and temp thingy is back with value 129.
; the last metric type is a histogram. Histograms sort your observations into bucket counters. You specify your buckets with a list of numbers:
; (a b c)
; and values will get counted into buckets with ranges (-inf, a], (a, b], (b, c], and (c, inf)
(defvar histo (prometheus:make-histogram :name "histo" :help "histo example" :buckets '(10 20 30)))
(prometheus.formats.text:marshal)
#|
# TYPE histo histogram
# HELP histo histo example
histo_bucket{le="10"} 0
histo_bucket{le="20"} 0
histo_bucket{le="30"} 0
histo_bucket{le="+Inf"} 0
histo_sum 0
histo_count 0
|#
; in other words, our buckets are for values less than or equal (le) to 10, 20, 30, and +inf.
(prometheus:histogram.observe histo 3)
(prometheus.formats.text:marshal)
#|
# TYPE histo histogram
# HELP histo histo example
histo_bucket{le="10"} 1
histo_bucket{le="20"} 1
histo_bucket{le="30"} 1
histo_bucket{le="+Inf"} 1
histo_sum 3
histo_count 1
|#
; note that we got a count into all buckets! Because 3 is <= 10, 20, 30, and +inf.
(prometheus:histogram.observe histo 10.1)
(prometheus.formats.text:marshal)
; now we'll see every bucket except le=10 has a value of 2. The sum is 13.1.
; from this we can infer that one observation was le 10, and the other was gt 10.
(prometheus:histogram.observe histo 19.9)
(prometheus:histogram.observe histo 500)
(prometheus.formats.text:marshal)
; now le=20 and le=30 have a count of 3 (from the 19.9) and le=inf has a count of 4 (from the 19.9 and the 500). le=10 still is at 1.
; if you need to, you can get the specific counts into each bucket, not a running sum of them, with code like:
(loop for bucket across (prometheus:histogram-buckets (prometheus:get-metric histo nil))
collect (list (prometheus:bucket-bound bucket) (prometheus:bucket-count bucket)))
; -> ((10 1) (20 2) (30 0) (#.SB-EXT:DOUBLE-FLOAT-POSITIVE-INFINITY 1))
; there are a couple helper functions to create linear and exponential buckets where the distance between each bucket differs by a count.
(prometheus:generate-linear-buckets 0 5 10)
; -> (0 5 10 15 20 25 30 35 40 45)
; i.e. we'll have 10 buckets + 1 for the final le=+inf one, with the first representing values in the range (-inf, 0], the second (0, 5], the third (5, 10] and so on,
; each bucket range spanning 5 integers until the final two of (40, 45] and (45, +inf)
(prometheus:generate-exponential-buckets 1 2 10)
; -> (1 2 4 8 16 32 64 128 256 512)
; i.e. starting at 1, successively multiply by 2 and collect the results. in this case we reach 2^9 as the final value.
; if you start at 2:
(prometheus:generate-exponential-buckets 2 2 10)
; -> (2 4 8 16 32 64 128 256 512 1024)
; you can almost think of it like the series 2^n from n=1 to n=10.
; one reason to prefer histograms vs. summaries with quantiles is that you can aggregate histogram counts across multiple instances / machines.
; a common use case is to track duration/latency distribution information. e.g.
(defvar latency-histo (prometheus:make-histogram :name "latency_ms"
:help "time to serve requests in milliseconds"
:buckets '(5 10 25 50 100 250 500 1000)))
(prometheus:histogram.observe latency-histo 3) ; very fast
(prometheus:histogram.observe latency-histo 40) ; typical
(prometheus:histogram.observe latency-histo 800) ; slow
(prometheus.formats.text:marshal)
#|
# TYPE latency_ms histogram
# HELP latency_ms time to serve requests in milliseconds
latency_ms_bucket{le=\"5\"} 1
latency_ms_bucket{le=\"10\"} 1
latency_ms_bucket{le=\"25\"} 1
latency_ms_bucket{le=\"50\"} 2
latency_ms_bucket{le=\"100\"} 2
latency_ms_bucket{le=\"250\"} 2
latency_ms_bucket{le=\"500\"} 2
latency_ms_bucket{le=\"1000\"} 3
latency_ms_bucket{le=\"+Inf\"} 3
latency_ms_sum 843
latency_ms_count 3
|#
; prometheus itself can estimate percentiles from this data, e.g. a query like:
; histogram_quantile(0.95, sum(rate(latency_ms_bucket[5m])) by (le))
; will estimate the 95th percentile latency over the last 5 minutes.
; if we wanted to simulate that in lisp, we could:
(defun histogram-quantile (q histo)
"Approximate the q-th quantile (e.g. 0.95) from a histogram metric.
Note that if we had the raw sorted data, e.g. 1000 samples, then the 90th percentile would simply
be the value at 0.9 * 1000 = 900. This is also known as the rank.
Since we only have histogram counts, we are estimating the rank and quantile.
Let total be the total histogram count, and the target rank estimate be q * total,
then walking the buckets with a cumulative count until we find the one where the cumulative count exceeds the target rank.
Finally, linearly interpolate between the previous bound and that bucket's bound."
(let* ((metric (prometheus:get-metric histo nil))
(buckets (prometheus:histogram-buckets metric))
(total (prometheus:histogram-count metric))
(rank (* q total)))
(loop with cumulative = 0
for i from 0
for bucket across buckets
for bound = (prometheus:bucket-bound bucket)
for count = (prometheus:bucket-count bucket)
do (incf cumulative count)
when (>= cumulative rank)
do (let* ((prev-bound (if (zerop i) bound
(prometheus:bucket-bound (aref buckets (1- i)))))
(prev-cum (- cumulative count))
(frac (/ (- rank prev-cum)
(max 1 count)))
; linear interpolation between prev-bound and current bound
(value (+ prev-bound
(* frac (- bound prev-bound)))))
(return value)))))
; The above helper function is also not exactly what the Prometheus query would do, but close enough.
(histogram-quantile 0.95 latency-histo)
; -> 924.99994 -- fairly close to the large outlier we had at 800ms
(histogram-quantile 0.5 latency-histo)
; -> 37.5 -- pretty close to the median (which by definition given values (8 40 800)) would be 40.
; Note again that these are approximations bounded by bucket resolution, hence the overestimate in the 95th percentile.
; And if we tried lower percentiles, e.g.
(histogram-quantile 0.05 latency-histo)
(histogram-quantile 0.1 latency-histo)
(histogram-quantile 0.3 latency-histo)
; all the above return 5.0, i.e. the bound on the lowest histogram.
(histogram-quantile 0.34 latency-histo)
; at 0.34 we see an estimate of 25.5 now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment