About me
Proposal -- Version constraint resolution in fpm

About me

Name: Emma Bastås
Position: Pursuing a bachelors in mathematics at Stockholm University.
Bio: I've been programming as a hobby for some time, and I have some hobby projects listed on my github and sourcehut profiles. I've also contributed to elm-format via GSoC2021. This one is pretty minor but I also implemented the mustache template specification for Elm. In connection with this proposal I've also submitted a PR to fpm. If you want to read some more about me I also introduced myself in the Fortran discourse
Timezone: UTC+1
Contact:
- Email: [email protected]
- GitHub/SourceHut/Fortran discourse/etc: @emmabastas

Proposal -- Version constraint resolution in `fpm`

The is based off an existing project idea, which I will refer back to at times in this document.

Introduction

fpm let's a user conveniently depend on third-party Fortran packages, a huge step-up for the Fortran package ecosystem. However, as the situation is right not it's only possible to specify an exact version of a dependency. This limits the usability of external dependencies; If your package depends on A and B which in turn depends on C 1.0.0 resp. C 1.0.1 then there's no way of resolving this. The solution is to add the ability of specifying version constraints, so that A can depend on C >=1.0.0, <2.0.0 and B can depend on C >=1.0.1, <2.0.0, and this let's fpm pick a version of C that compatible with both A and B.

The outcome of this project is a pull request to fpm adding a suitable version constraint syntax to fpm.toml and a suitable version constraint resolution algorithm. This pull request is complete with documentation and tests.

There are many important design-decisions when it comes to constraint resolution, and I think it's good to work these out as much as possible as early as possible, in the next sections I'll outline some considerations that I think are important, and which decisions I think are the right ones. Hopefully this prompts a fruitful discussion.

Terminology

In this document I will write of semantic versioning, semver compatibility, flat and nested package hierarchies, backtracking resolution algorithms and multiple versions of the same package algorithms. If some of these terms are unclear I've tried clarifying them in https://gist.github.com/emmabastas/57349c14db195b0ccedb78ed747f2a4f

Use-cases we want to support

Here I outline what types of dependency constraints I think that fpm package authors will want to express, it based off two things

My anecdotal experience.
A very small survey of what types of dependency constraints popular Cargo, NPM and PIP packages use.

Here comes some use-cases that I think are common and that I think would be good to support. Note that this is only about use-cases, and not about constraint syntax or anything like that.

"I want this version or any newer semver-compatible version"

This is by far the most common constraint that I've encountered. Of the form >=1.2.3, <2.0.0

"In practice I can work with semver-incompatible versions of this dependency"

The scenario is that your-package depends on A 1.0.0 which exposed the is_even and is_odd function. In the future is_odd is removed from A (use !is_even instead!) and so A 2.0.0 is released. Even though this is a breaking change you-package only every used is_even and so it's still compatible with A 2.0.0, so we might want to express the constraint >=1.0.0, <3.0.0.

"In practice this specific version is problematic"

The scenario is that you-package depends on A >=1.2.3, <2.0.0. However, in A 1.5.0 a grave typo breaks the API/introduces a serious bug/whatever. The package author never fixed this in 1.5.1 either, and so the fix isn't available until A 1.6.0, for this reason you want to express a constraint of the form A >=1.2.3, <2.0.0, !1.5.*.

Package resolution complications

Here I list some common issues that people run in to w.r.t dependency resolution, and give some of my opinions on hiw I think fpm should handle them.

Unsatisfiable constraints on a transient dependency

Suppose my-pkg depends on

pkg-a >=1.0.0 <2.0.0
pkg-b >=1.0.0 <2.0.0 Our resolution algorithm picks the latest version of pkg-a and pkg-b which are pkg-a 1.5.0 and pkg-b 1.9.0.

pkg-a 1.5.0 depends on

common-utils >=1.0.0 <2.0.0

and pkg-b 1.9.0depends on

common-utils >=2.0.0 <3.0.0

In this scenario there is no way to pick a single version of common-utils that will satisfy all the constraints, what should we do?

Option 1: Error

Resolution failed, I think this is what will happen with pip. Now it's up to the user to manually override certain dependency constraints as described here https://stackoverflow.com/questions/70449073/pip-how-to-override-version-of-sub-dependency-in-requirements-txt

This option isn't ideal because it can be very frustrating for a user to have to do this.

Option 2: Backtrack / backsolve

We actually haven't exhausted all possibilities when it comes to finding a solution to our constraints: What if we consider older versions of pkg-a and pkg-b? If there's an older version of pkg-b that depends on common-utils >=1.0.0, <2.0.0 then we have solved the constraint.

The problem with this is that we might select an ancient version of pkg-b, let's say pkg-b 1.1.0 in order to solve the constraints. This ancient version is probably nowhere near as good as pkg-b 1.9.0, there might even be old security issues that are fixed in recent versions of pkg-b but that where no back-ported to version 1.1.0. See https://iscinumpy.dev/post/bound-version-constraints/#fixes-are-not-always-backported

Another problem if the dependency graph is really big is that the finding a solution this way can a really long time

Option 3: Multiple versions of same package

This is the solution employed by NPM and Cargo, which is to simply install both a 1.*.* version of common-utils that pkg-a can use and a2.*.* version for pkg-b. Doing it this way hinges the possibility of nested package hierarchies in fpm/Fortran.

I think this is the best option in this scenario, however, there are some things to consider:

Question: What about very narrow constraints? Imagine instead that pkg-a depends on common-utils >=1.2.0, <1.3.0 and pkg-b depends on common-utils >=1.3.0, <1.4.0, should we the install both common-utils 1.2.0 AND 1.3.0? This type of constraint feels like a bit of an unreasonable constraint anyways and maybe we shouldn't allow it?? Is that to draconian???
Question: How does this work when dependency my-pkg exposes a type from package pkg-a which it depends on? See Exposing types from dependencies

I think option (3) makes the scenario of exposing types from dependencies more common/complicated.

Exposing types from dependencies

In this scenario my-pkg depends on common-utils 1.0.0 which exposes a meters and inches datatype. my-pkg re-exposes the meters type and via the exposed function meter measure_distance(A x, B y);.

Now common-utils 2.0.0 is released, and it has prefixed all SI-units, so now we have si_meters. Would it be a breaking change for my-pkg to update it's dependency and expose si_meter measure_distance(A x, B y) instead?

I'm not really sure what I think is a good way of dealing with this. There is a proposal for how Cargo should do it: https://internals.rust-lang.org/t/pre-rfc-superseding-public-private-dependencies/19708

Here are some other articles that I found a little helpful reading up on the topic:

My proposal

Here I will list more concretely what I want to implement and why.

In short, I'm proposing the following:

Add the ability for fpm to install different versions of the same package (nested package hierarchies).
Add a very limited form of version constraints (all constraints are of the form >= a.b.c < (a+n).0.0 or >= 0.a.b < 0.(a+n).0).
Implement minimal version selection, a simple and predictable constraint resolution algorithm.

Nested package hierarchies

I mentioned previously that I think multiple- versions-of-the-same-package is a better approach than backsolving when it comes to handling unsatsifiable constraints. However, right now fpm uses a flat package hierarchy (I think, am I wrong?), while my proposal hinges on a nested package hierarchy.

I propose that I modify the naming of dependencies in build/dependencies/ such that dependencies are prefixed by a hash generated from version + git url + git revision. I might also need to modify the build process.

So, if my build/dependencies/ look like this right now:

fortran-regex
fortran-shlex
jonquil
M_CLI2
toml-f

then that would be changed to:

<HASH1>-fortran-regex
<HASH2>-fortran-shlex
<HASH3>-jonquil
<HASH4>-M_CLI2
<HASH5>-toml-f

This means that different versions of the same package can coexist in build/dependencies as they will be prefixed by different hashes.

Constraint features and syntax

Add new syntax to the dependencies.<package-name>.v field using parse_comp_set from version-f, this is now valid:

dependencies.example-package.v = ">= 1.0.0 < 2"

This is also valid:

dependencies.example-package.v = ">= 1.0.0 < 3"

This is also valid:

dependencies.example-package.v = ">= 0.1.0 < 0.2"

However, this is not valid (because it's a niche/bad use-case):

dependencies.example-package.v = ">= 1.0.0 < 1.2.3"

It's not possible to use wildcards (1.2.*) and it's not possible to exclude specific versions (>= 1.0.0, < 2, != 1.2.3). It's not possible to specify an "or" constraint (1.2.3 | 1.2.4 | 1.2.5). Basically I propose very restricted version constraint features/syntax

Why so restrictive!?

We can always add more features afterwards if we find that my proposal isn't expressive enough (as in: people have real-world use-cases for more advanced version constraints), removing features is not as easy and impacts what sorts of resolution algorithms are available to us. See next section for how this is relevant.

QUESTION from me: What about `git+tag` dependencies?

Would it be nice if dependencies not from the package registry can also have version constraints? So that something like:

[dependencies]
toml-f = { git = "https://github.com/toml-f/toml-f", tag = "v0.2.1" }

can become:

[dependencies]
toml-f = { git = "https://github.com/toml-f/toml-f", v = ">= 0.2.1 < 0.3" }

Constraint resolution algorithm

I propose we implement the Minimal Version Selection algorithm described in https://research.swtch.com/vgo-mvs, which touts reproducible installs, upgrades and downgrades without the need for a lock-file (!!) It does this by restricting the types of constraints that packages are allowed to express, turning the generally complex boolean satisfiability problem into specific constrained versions for which good algorithms exist.

The article ends with:

More than anything else, I wanted to find a version selection algorithm that was understandable. Predictable. Boring. Where other systems instead seem to optimize for displays of raw flexibility and power, minimal version selection aims to be invisible. I hope it succeeds.

I like boring and predictable :-)

Issues

While the implementation would be quite simple, this approach is very different from the more standard "throw a SAT-solver at it", and since the algorithm isn't really battle-tested it might turn out to be a decision we'll regret (although golang seams to be commiting to MVS, so if it's a bad choice we won't be the only ones regretting it). Further, if in the end we realize that we want more expressive constraint syntax (like not's and or's) then we need to throw away this algorithm, there's no good way of extending it without losing all of it's nice features. If we went the traditional SAT-solver approach we wouldn't have this problem.

It's also worth mentioning that supposed problems with the traditional SAT-solver approach might be overstated and actually caused by other bad design decisions1 2

Now that I'm writing this I realize that maybe minimal version selection is too daring. I'll be happy to implement a more traditional algorithm too, in which case I propose that I investigate using libsolv or some SAT-solver written in C for the resolution algorithm.

Bonus feature: Beyond SemVer

Elm's package manager is able to detect when the API of your package breaks, and, as a result, when you want to publish your package it will automatically figure out what the new version should be for you. I wonder if it's possible to take this approach even further: Is it possible for a package-manager to figure out which versions of a package it can use simply by looking at how you're using the package API?

[dependencies]
example-package.v = "# 1.2.3"

which would mean: I want to use version 1.2.3 as a starting point, but any newer versions that don't change the specific parts of the API that I'm using are fine.

If I end up finishing the project ahead of schedule I would like to investigate this.

Timeline and deliverables

Project length: I will present this timeline as if the duration of my GSoC project will be the "standard 12 weeks", however I would have a preference for stretching it out by 2-3 weeks, as I think it would result in less stress for me juggling prior commitments with GSoC. But this can be discussed with my mentor later.

Prior commitments

Week 21: I have my last exam of the semester.
Week 24: I present my bachelors thesis.
Week 35: I will be traveling and unavailable.

Week 19-22 -- Community bonding period

I would like to use this time to set expectations about communication, check-ins etc. with my mentor, in addition to getting to know them.

When it commies to the technical aspect: I would like to talk to my mentor and the community about specifics of the constraint resolution: What syntax to use for version constraints, what resolution algorithm to use, and try to anticipate potential problems and time-sinks. I have certain question regarding best practices for C interop in Fortran, how Fortran dependencies are linked in the build process (related to Nested package hierarchies), etc., that maybe my mentor can help me figure out.

The community bonding period is towards the end of the semester, and as such my time and energy is skewed towards school. I will communicate with mentor and the community, do some reading and have discuss the project with my mentor if they have the time, but other than that I don't expect to spend much time working on GSoC.

Week 23-26 -- Nested package hierarchies

I work on implementing Nested package hierarchies for fpm. At the end of week 26 I have a PR to fpm that implements nested package hierarchies.

4 week sounds a little excessive, but considering that week 23 and 24 will be full of thesis work for me I think it makes sense. It can also be expected that I'm still getting up to speed with Fortran at this point.

NOTE: This feature in particular is a bit of an unknown to me: I don't have a clear idea of what mechanisms in fpm would need change to implement this, so the time allotted for this feature might be revised during the community bonding period.

Week 27 -- Minimal constraint resolution demo

At the end of week 27 i have an algorithm implemented in Fortran that given some internal representation of a dependency graph selects an appropriate set of packages and versions. There is also some sort of test-suite with a bunch of different mock dependency graphs that the algorithm can be tested against.

Week 28 -- Parse and Crawl

It's no longer necessary to give a complete dependency graph to the algorithm, instead it's enough to supply it with "direct dependencies" (i.e. not transient dependencies) and the algorithm can then, by itself recursively parse through package manifests and build up a dependency graph on it's own.

This also includes implementing the version constraint parsing in the fpm.toml.

Week 29 -- Midterms -- Integrate

Now it's starting to look like a complete feature. fpm install, fpm update and fpm build now integrate with the resolution algorithmn. Some unit tests will have to be added, some will have to be changed. There is probably lot's error handling and messaging to work on.

Week 30 and 31 -- Polish

Work on adding unit-tests, making existing unit-tests pass. Add/change documentation, both online and in the cli.

At this point I submit a PR that implements version constraint resolution 🎉

Week 32-34 -- Margin / bonus feature

These three weeks can be used for a variety of things:

If things fell behind schedule for whatever reason, we have some margins here.
I continuously respond to concerns/requests on the PR I submitted on week 31, so that it may be merged at some point.
With whatever time is free I work on Bonus feature: Beyond SemVer.

Even if I have the full 3 weeks to work on the bonus feature I don't expect to have a full PR or anything like that, instead I imagine that the deliverable would be a proof of concept or a blog-post where I go into detail on this feature, my pitch for why I think it's great for the Fortran community, etc.

Week 35 -- Final week

This week I will be traveling, so the final submission is sent in by me late week 34 or early week 35.

Post GSoC

It could be nice to write some blog-post about this whole journey.

emmabastas/GSoC2025-proposal-draft.md