When Improving Test Coverage Raised a Different Question Entirely

I started this work with a fairly narrow, pragmatic goal.

I’ve been exploring an exact- and interval-oriented number format, heavily inspired by John Gustafson’s work on posits. Before taking that idea very far, I wanted a concrete proof of concept—something substantial enough to explore design tradeoffs and failure modes in practice.

After some research, I decided that building on top of the Stillwater Universal library made sense. It already implements a wide range of numeric representations, including elastic rationals, and it’s actively maintained.

As I integrated Universal into a new project, I ran into a handful of compiler and testing issues. Over the course of several weeks, that turned into multiple pull requests, some back-and-forth with the maintainer, and eventually those fixes were merged.

At that point, our conversations widened a bit.

I mentioned that I was interested in using elastic rationals to represent exact values. In that context, the maintainer noted that this part of the library had relatively little test coverage—not because it was neglected, but because it isn’t always obvious what meaningful tests should assert. In several cases, the correctness criteria themselves are subtle, and in some situations genuinely unclear. I took that as a signal to proceed carefully.

Before building anything substantial on top of that code, I decided to generate a test coverage report, particularly focused on the elastic rational implementation.

Over the last few years, I’ve spent a lot of time working in Go, where line-by-line HTML coverage reports are essentially built in. Getting a comparable view for a large C++ codebase took more effort, but after some wrestling with tooling, I eventually got there.

That’s where the experience changed.

Before I had the coverage report, I hadn’t really been looking at the code. I knew roughly where it lived. I might have skimmed a file or two. But once the code was color-coded—once parts of it were visibly exercised or untouched—it suddenly demanded attention. The coverage didn’t just show execution; it made the code feel legible in a new way.

There were, unsurprisingly, uncovered areas. That’s something that can be addressed over time. But what caught my attention wasn’t just what hadn’t been tested—it was what the code was doing once I began examining it closely.

Because I already have a mental model of how rational arithmetic is supposed to behave, certain implementation choices stood out quickly. In particular, some transcendental functions convert a rational value to a double, perform the operation, and then convert the result back. That’s a pragmatic approach, and in many cases a reasonable one—but it raises an important question about what the result is meant to represent.

Take square root as an example. For some rational numbers, the square root is exactly representable. For most, it isn’t. In principle, you could factor the numerator and denominator, eliminate paired factors, and determine whether an exact result exists. If it does, return it. If it doesn’t, the result is genuinely irrational.

In that latter case, what should the function return?

Once a value has been converted to a double, precision has already been lost in a way that isn’t easily recoverable. Converting the result back into a rational representation may produce something that looks exact, but the fact that the value is only an approximation is no longer explicit.

This isn’t really a testing problem.

It’s a question about how much epistemic information the API preserves. If an operation sometimes produces exact results and sometimes produces approximations, the type system needs some way to express that distinction—whether through a variant return, an explicit indicator of exactness, or some other mechanism.

Similar issues arise with logarithms and other transcendental functions. Occasionally they produce rational results. Often they don’t. Determining which case you’re in can be straightforward in some situations and extremely nontrivial in others. In the general case, it drifts toward symbolic propagation and simplification—an area where it isn’t always clear what is algorithmically feasible.

Seen in that light, the earlier comment about testing made complete sense.

The difficulty with testing elastic rationals isn’t just a lack of coverage. It’s the absence of a reliable point of comparison. Floating-point arithmetic can’t consistently serve that role, because elastic rationals are often more precise than floating point. Comparing against a less precise system doesn’t validate correctness; it obscures it.

Test coverage can tell you where execution occurs. It can even prompt you to look closely at code you might otherwise ignore. But it can’t tell you whether the system itself knows what it’s claiming to compute—or whether its APIs preserve the difference between exactness, approximation, and uncertainty.

I set out to improve test coverage.

What I found instead was a set of unanswered questions about correctness, semantics, and how much meaning a numeric API is willing—or able—to carry.

That doesn’t make the code bad. It makes the problem space more demanding than it first appears.

And it explains why coverage alone was never going to be enough.

loren-osborn/coverage-and-numeric-semantics.md

Select an option

No results found

Select an option

No results found

When Improving Test Coverage Raised a Different Question Entirely

loren-osborn commented Feb 4, 2026

Uh oh!