- About me
- Proposal -- Version constraint resolution in
fpm
- Name: Emma Bastås
- Position: Pursuing a bachelors in mathematics at Stockholm University.
- Bio: I've been programming as a hobby for some time, and I have some hobby projects listed on my github and sourcehut profiles. I've also contributed to elm-format via GSoC2021. This one is pretty minor but I also implemented the mustache template specification for Elm. In connection with this proposal I've also submitted a PR to fpm. If you want to read some more about me I also introduced myself in the Fortran discourse
- Timezone:
UTC+1
- Contact:
- Email:
[email protected]
- GitHub/SourceHut/Fortran discourse/etc:
@emmabastas
- Email:
The is based off an existing project idea, which I will refer back to at times in this document.
fpm
let's a user conveniently depend on third-party Fortran packages, a huge step-up for the Fortran package ecosystem. However, as the situation is right not it's only possible to specify an exact version of a dependency. This limits the usability of external dependencies; If your package depends on A
and B
which in turn depends on C 1.0.0
resp. C 1.0.1
then there's no way of resolving this. The solution is to add the ability of specifying version constraints, so that A
can depend on C >=1.0.0, <2.0.0
and B
can depend on C >=1.0.1, <2.0.0
, and this let's fpm
pick a version of C
that compatible with both A
and B
.
The outcome of this project is a pull request to fpm
adding a suitable version constraint syntax to fpm.toml
and a suitable version constraint resolution algorithm.
This pull request is complete with documentation and tests.
There are many important design-decisions when it comes to constraint resolution, and I think it's good to work these out as much as possible as early as possible, in the next sections I'll outline some considerations that I think are important, and which decisions I think are the right ones. Hopefully this prompts a fruitful discussion.
In this document I will write of semantic versioning, semver compatibility, flat and nested package hierarchies, backtracking resolution algorithms and multiple versions of the same package algorithms. If some of these terms are unclear I've tried clarifying them in https://gist.github.com/emmabastas/57349c14db195b0ccedb78ed747f2a4f
Here I outline what types of dependency constraints I think that fpm
package authors will want to express, it based off two things
- My anecdotal experience.
- A very small survey of what types of dependency constraints popular
Cargo
,NPM
andPIP
packages use.
Here comes some use-cases that I think are common and that I think would be good to support. Note that this is only about use-cases, and not about constraint syntax or anything like that.
This is by far the most common constraint that I've encountered. Of the form >=1.2.3, <2.0.0
The scenario is that your-package
depends on A 1.0.0
which exposed the is_even
and is_odd
function. In the future is_odd
is removed from A
(use !is_even
instead!) and so A 2.0.0
is released. Even though this is a breaking change you-package
only every used is_even
and so it's still compatible with A 2.0.0
, so we might want to express the constraint >=1.0.0, <3.0.0
.
The scenario is that you-package
depends on A >=1.2.3, <2.0.0
. However, in A 1.5.0
a grave typo breaks the API/introduces a serious bug/whatever. The package author never fixed this in 1.5.1
either, and so the fix isn't available until A 1.6.0
, for this reason you want to express a constraint of the form A >=1.2.3, <2.0.0, !1.5.*
.
Here I list some common issues that people run in to w.r.t dependency resolution, and give some of my opinions on hiw I think fpm
should handle them.
Suppose my-pkg
depends on
pkg-a >=1.0.0 <2.0.0
pkg-b >=1.0.0 <2.0.0
Our resolution algorithm picks the latest version ofpkg-a
andpkg-b
which arepkg-a 1.5.0
andpkg-b 1.9.0
.
pkg-a 1.5.0
depends on
common-utils >=1.0.0 <2.0.0
and pkg-b 1.9.0
depends on
common-utils >=2.0.0 <3.0.0
In this scenario there is no way to pick a single version of common-utils
that will satisfy all the constraints, what should we do?
Resolution failed, I think this is what will happen with pip
. Now it's up to the user to manually override certain dependency constraints as described here https://stackoverflow.com/questions/70449073/pip-how-to-override-version-of-sub-dependency-in-requirements-txt
This option isn't ideal because it can be very frustrating for a user to have to do this.
We actually haven't exhausted all possibilities when it comes to finding a solution to our constraints: What if we consider older versions of pkg-a
and pkg-b
? If there's an older version of pkg-b
that depends on common-utils >=1.0.0, <2.0.0
then we have solved the constraint.
The problem with this is that we might select an ancient version of pkg-b
, let's say pkg-b 1.1.0
in order to solve the constraints. This ancient version is probably nowhere near as good as pkg-b 1.9.0
, there might even be old security issues that are fixed in recent versions of pkg-b
but that where no back-ported to version 1.1.0
. See https://iscinumpy.dev/post/bound-version-constraints/#fixes-are-not-always-backported
Another problem if the dependency graph is really big is that the finding a solution this way can a really long time
This is the solution employed by NPM
and Cargo
, which is to simply install both a 1.*.*
version of common-utils
that pkg-a
can use and a2.*.*
version for pkg-b
. Doing it this way hinges the possibility of nested package hierarchies in fpm
/Fortran
.
I think this is the best option in this scenario, however, there are some things to consider:
-
Question: What about very narrow constraints? Imagine instead that
pkg-a
depends oncommon-utils >=1.2.0, <1.3.0
andpkg-b
depends oncommon-utils >=1.3.0, <1.4.0
, should we the install bothcommon-utils 1.2.0
AND1.3.0
? This type of constraint feels like a bit of an unreasonable constraint anyways and maybe we shouldn't allow it?? Is that to draconian??? -
Question: How does this work when dependency
my-pkg
exposes a type from packagepkg-a
which it depends on? See Exposing types from dependenciesI think option (3) makes the scenario of exposing types from dependencies more common/complicated.
In this scenario my-pkg
depends on common-utils 1.0.0
which exposes a meters
and inches
datatype. my-pkg
re-exposes the meters
type and via the exposed function meter measure_distance(A x, B y);
.
Now common-utils 2.0.0
is released, and it has prefixed all SI-units, so now we have si_meters
. Would it be a breaking change for my-pkg
to update it's dependency and expose si_meter measure_distance(A x, B y)
instead?
I'm not really sure what I think is a good way of dealing with this. There is a proposal for how Cargo should do it: https://internals.rust-lang.org/t/pre-rfc-superseding-public-private-dependencies/19708
Here are some other articles that I found a little helpful reading up on the topic:
- https://medium.com/@ruben.alapont/advanced-npm-working-with-peer-dependencies-b7c43aa852d8
- https://discourse.elm-lang.org/t/from-golang-package-versioning/1022
Here I will list more concretely what I want to implement and why.
In short, I'm proposing the following:
- Add the ability for
fpm
to install different versions of the same package (nested package hierarchies). - Add a very limited form of version constraints (all constraints are of the form
>= a.b.c < (a+n).0.0
or>= 0.a.b < 0.(a+n).0
). - Implement minimal version selection, a simple and predictable constraint resolution algorithm.
I mentioned previously that I think multiple- versions-of-the-same-package is a better approach than backsolving when it comes to handling unsatsifiable constraints. However, right now fpm
uses a flat package hierarchy (I think, am I wrong?), while my proposal hinges on a nested package hierarchy.
I propose that I modify the naming of dependencies in build/dependencies/
such that dependencies are prefixed by a hash generated from version
+ git url
+ git revision
. I might also need to modify the build process.
So, if my build/dependencies/
look like this right now:
fortran-regex
fortran-shlex
jonquil
M_CLI2
toml-f
then that would be changed to:
<HASH1>-fortran-regex
<HASH2>-fortran-shlex
<HASH3>-jonquil
<HASH4>-M_CLI2
<HASH5>-toml-f
This means that different versions of the same package can coexist in build/dependencies
as they will be prefixed by different hashes.
Add new syntax to the dependencies.<package-name>.v
field using parse_comp_set
from version-f
, this is now valid:
dependencies.example-package.v = ">= 1.0.0 < 2"
This is also valid:
dependencies.example-package.v = ">= 1.0.0 < 3"
This is also valid:
dependencies.example-package.v = ">= 0.1.0 < 0.2"
However, this is not valid (because it's a niche/bad use-case):
dependencies.example-package.v = ">= 1.0.0 < 1.2.3"
It's not possible to use wildcards (1.2.*
) and it's not possible to exclude specific versions (>= 1.0.0, < 2, != 1.2.3
). It's not possible to specify an "or" constraint (1.2.3 | 1.2.4 | 1.2.5
). Basically I propose very restricted version constraint features/syntax
We can always add more features afterwards if we find that my proposal isn't expressive enough (as in: people have real-world use-cases for more advanced version constraints), removing features is not as easy and impacts what sorts of resolution algorithms are available to us. See next section for how this is relevant.
Would it be nice if dependencies not from the package registry can also have version constraints? So that something like:
[dependencies]
toml-f = { git = "https://github.com/toml-f/toml-f", tag = "v0.2.1" }
can become:
[dependencies]
toml-f = { git = "https://github.com/toml-f/toml-f", v = ">= 0.2.1 < 0.3" }
I propose we implement the Minimal Version Selection algorithm described in https://research.swtch.com/vgo-mvs, which touts reproducible installs, upgrades and downgrades without the need for a lock-file (!!) It does this by restricting the types of constraints that packages are allowed to express, turning the generally complex boolean satisfiability problem into specific constrained versions for which good algorithms exist.
The article ends with:
More than anything else, I wanted to find a version selection algorithm that was understandable. Predictable. Boring. Where other systems instead seem to optimize for displays of raw flexibility and power, minimal version selection aims to be invisible. I hope it succeeds.
I like boring and predictable :-)
While the implementation would be quite simple, this approach is very different from the more standard "throw a SAT-solver at it", and since the algorithm isn't really battle-tested it might turn out to be a decision we'll regret (although golang seams to be commiting to MVS, so if it's a bad choice we won't be the only ones regretting it). Further, if in the end we realize that we want more expressive constraint syntax (like not's and or's) then we need to throw away this algorithm, there's no good way of extending it without losing all of it's nice features. If we went the traditional SAT-solver approach we wouldn't have this problem.
It's also worth mentioning that supposed problems with the traditional SAT-solver approach might be overstated and actually caused by other bad design decisions12
Now that I'm writing this I realize that maybe minimal version selection is too daring. I'll be happy to implement a more traditional algorithm too, in which case I propose that I investigate using libsolv
or some SAT-solver written in C for the resolution algorithm.
Elm
's package manager is able to detect when the API of your package breaks, and, as a result, when you want to publish your package it will automatically figure out what the new version should be for you. I wonder if it's possible to take this approach even further: Is it possible for a package-manager to figure out which versions of a package it can use simply by looking at how you're using the package API?
[dependencies]
example-package.v = "# 1.2.3"
which would mean: I want to use version 1.2.3
as a starting point, but any newer versions that don't change the specific parts of the API that I'm using are fine.
If I end up finishing the project ahead of schedule I would like to investigate this.
Project length: I will present this timeline as if the duration of my GSoC project will be the "standard 12 weeks", however I would have a preference for stretching it out by 2-3 weeks, as I think it would result in less stress for me juggling prior commitments with GSoC. But this can be discussed with my mentor later.
- Week 21: I have my last exam of the semester.
- Week 24: I present my bachelors thesis.
- Week 35: I will be traveling and unavailable.
I would like to use this time to set expectations about communication, check-ins etc. with my mentor, in addition to getting to know them.
When it commies to the technical aspect: I would like to talk to my mentor and the community about specifics of the constraint resolution: What syntax to use for version constraints, what resolution algorithm to use, and try to anticipate potential problems and time-sinks. I have certain question regarding best practices for C interop in Fortran, how Fortran dependencies are linked in the build process (related to Nested package hierarchies), etc., that maybe my mentor can help me figure out.
The community bonding period is towards the end of the semester, and as such my time and energy is skewed towards school. I will communicate with mentor and the community, do some reading and have discuss the project with my mentor if they have the time, but other than that I don't expect to spend much time working on GSoC.
I work on implementing Nested package hierarchies for fpm. At the end of week 26 I have a PR to fpm that implements nested package hierarchies.
4 week sounds a little excessive, but considering that week 23 and 24 will be full of thesis work for me I think it makes sense. It can also be expected that I'm still getting up to speed with Fortran at this point.
NOTE: This feature in particular is a bit of an unknown to me: I don't have a clear idea of what mechanisms in fpm would need change to implement this, so the time allotted for this feature might be revised during the community bonding period.
At the end of week 27 i have an algorithm implemented in Fortran that given some internal representation of a dependency graph selects an appropriate set of packages and versions. There is also some sort of test-suite with a bunch of different mock dependency graphs that the algorithm can be tested against.
It's no longer necessary to give a complete dependency graph to the algorithm, instead it's enough to supply it with "direct dependencies" (i.e. not transient dependencies) and the algorithm can then, by itself recursively parse through package manifests and build up a dependency graph on it's own.
This also includes implementing the version constraint parsing in the fpm.toml
.
Now it's starting to look like a complete feature. fpm install
, fpm update
and fpm build
now integrate with the resolution algorithmn.
Some unit tests will have to be added, some will have to be changed. There is probably lot's error handling and messaging to work on.
Work on adding unit-tests, making existing unit-tests pass. Add/change documentation, both online and in the cli.
At this point I submit a PR that implements version constraint resolution 🎉
These three weeks can be used for a variety of things:
- If things fell behind schedule for whatever reason, we have some margins here.
- I continuously respond to concerns/requests on the PR I submitted on week 31, so that it may be merged at some point.
- With whatever time is free I work on Bonus feature: Beyond SemVer.
Even if I have the full 3 weeks to work on the bonus feature I don't expect to have a full PR or anything like that, instead I imagine that the deliverable would be a proof of concept or a blog-post where I go into detail on this feature, my pitch for why I think it's great for the Fortran community, etc.
This week I will be traveling, so the final submission is sent in by me late week 34 or early week 35.
It could be nice to write some blog-post about this whole journey.