The openshift/bpfman-operator repository builds three container images from the same codebase via Konflux:
- bpfman-operator (the operator binary)
- bpfman-agent (the agent DaemonSet binary)
- bpfman-operator-bundle (the OLM bundle)
A fourth image, bpfman (the daemon), is built from a separate repository (openshift/bpfman) but its pullspec is also consumed by the bundle.
Each component's Tekton push pipeline declares which file it owns via the
build.appstudio.openshift.io/build-nudge-files annotation. When Konflux
successfully builds a component, it opens a PR that updates a single .txt
file containing the image digest:
hack/konflux/images/bpfman-operator.txthack/konflux/images/bpfman-agent.txthack/konflux/images/bpfman.txt
Each file contains exactly one line: a registry.redhat.io/...@sha256:...
pullspec.
The bundle push pipeline
(bpfman-operator-bundle-ystream-push.yaml) has a CEL trigger that fires
when any of the .txt files (or bundle/, hack/openshift/, config/,
OPENSHIFT-VERSION) change on main. When a nudge PR merges,
Containerfile.bundle.openshift runs:
-
update-bundle.py-- transforms the CSV with Red Hat branding, the operator pullspec (frombpfman-operator.txt), architecture labels, and version string. -
update-configmap.py-- stamps the agent and bpfman pullspecs (frombpfman-agent.txtandbpfman.txt) into the bundle's ConfigMap manifest (bundle/manifests/bpfman-config_v1_configmap.yaml).
At release time, validate-snapshot.py extracts the bundle image, parses
the CSV and ConfigMap, and checks that every sha256 digest matches the
corresponding component digest in the Konflux snapshot. If any mismatch is
found, the release is blocked.
The three component builds and their nudge PRs are independent. They
complete and merge at different times. Each merge triggers a bundle rebuild,
but the bundle is built from whatever .txt files are on main at that
moment. This means:
-
Agent builds and nudges
bpfman-agent.txt. Bundle rebuilds with the new agent digest but the old operator and bpfman digests. -
Operator builds and nudges
bpfman-operator.txt. Bundle rebuilds with the new operator digest but potentially the old agent digest (if that nudge hasn't merged yet, or merged in a different cycle).
The snapshot assembled from these components is not self-consistent: the
bundle references digests that do not match the component digests in the
same snapshot. validate-snapshot.py catches this and blocks the release.
The system eventually converges -- after all nudge PRs merge and the final
bundle rebuild runs, the snapshot becomes consistent -- but there is no
mechanism to ensure this happens atomically.
This is the core issue: Konflux does not provide a way to gate the bundle build until all parent component nudge PRs have merged. Each nudge is independent, each merge triggers a separate bundle rebuild, and only the last rebuild in a cycle produces a valid snapshot.
bpfman/bpfman-operator#498 ("Bootstrap Config CR from operator on startup") makes the following changes relevant to the downstream bundle:
-
Removes the static Config CR manifest from the bundle. The file
bundle/manifests/bpfman-config_v1_configmap.yaml(previously migrated tobundle/manifests/bpfman.io_v1alpha1_config.yamlby openshift/bpfman-operator commit e10e766) is deleted entirely. OLM rejects custom resource instances in bundles asUnsupportedResource, so the Config CR cannot be shipped this way. -
The operator bootstraps the Config CR on startup. Image references are read from environment variables
BPFMAN_IMGandBPFMAN_AGENT_IMGon the operator deployment. Both are required; missing either is a fatal startup error. The deployment manifest (config/bpfman-operator-deployment/deployment.yaml) carries upstream defaults (quay.io/bpfman/bpfman:latestandquay.io/bpfman/bpfman-agent:latest). -
The deleted
config/bpfman-deployment/directory contained the kustomise overlay (config.yaml,kustomization.yaml.env) that was previously used bymake patch-image-references. That Makefile target now patches the env vars directly ondeployment.yamlvia sed.
After merging PR #498 upstream and pulling it into
openshift/bpfman-operator:
-
update-configmap.py/update-config.pyhas no target file. There is no standalone Config CR or ConfigMap manifest in the bundle to patch. -
The agent and bpfman pullspecs must instead be stamped into the CSV's deployment spec as env var values for
BPFMAN_IMGandBPFMAN_AGENT_IMG. This is where the operator reads them at runtime. -
validate-snapshot.pymust extract the image refs from the CSV deployment env vars instead of from a standalone manifest. -
Containerfile.bundle.openshiftmust be updated to call the revised script targeting the CSV rather than a removed manifest.
The substitution target changes (standalone manifest to CSV env vars) but the fundamental coordination problem is unchanged:
-
Agent builds. Konflux nudges
bpfman-agent.txt. PR merges. Bundle rebuilds with the new agent digest but old operator/bpfman digests stamped into the CSV env vars. -
Operator builds. Konflux nudges
bpfman-operator.txt. PR merges. Bundle rebuilds again, now with both new, but only if step 1 has already merged. -
If both nudge PRs are open simultaneously and merge in sequence, only the bundle built after the second merge is self-consistent.
The snapshot produced between steps 1 and 2 fails
validate-snapshot.py and cannot be released. This is the same race that
exists today; the upstream change does not make it worse, but it does not
fix it either.
The root cause is that Konflux treats each component nudge as an independent event. To produce a self-consistent snapshot on every bundle build, one of the following would be needed:
-
Atomic multi-component nudge. Konflux would wait for all components in an application (or a defined group) to complete their builds before raising a single nudge PR that updates all
.txtfiles at once. The bundle would then rebuild exactly once with all digests current. -
Snapshot-level gating. Rather than triggering the bundle build on each
.txtfile change, Konflux would only trigger it when all component digests in the snapshot are newer than those currently in the bundle. This is effectively a "quorum" gate. -
Release-time validation only. Accept that intermediate bundle builds may be inconsistent. Rely on
validate-snapshot.py(or Konflux Enterprise Contract policies) to block release of any snapshot where the bundle's embedded digests do not match the component digests. The system converges after the final nudge merges. The cost is wasted bundle builds and a slower release cadence.
Option 3 is what exists today. Options 1 and 2 would require changes to Konflux's nudging and build-triggering infrastructure.
The operator and agent are built from the same repository, which makes this a monorepo problem. Konflux has some monorepo awareness but it does not solve this case:
-
PR-time group snapshots. When a PR targets a monorepo with multiple components, separate build pipelines trigger for each component. Konflux supports "group snapshot testing" that combines all updated component builds into a single snapshot for unified integration testing. This works at PR time.
-
Post-merge: no grouping. After merge, each component build creates its own intermediate snapshot. The Konflux documentation explicitly acknowledges this gap: group snapshot testing "is unfortunately currently not directly available after the Pull Request is merged", so "individual build pipelines will result in intermediate Snapshots which will not contain all the changes until the final build pipelineRun completes."
-
Recommended workaround. The documentation recommends creating a custom
IntegrationTestScenarioforpushevents that validates whether a snapshot contains all expected component updates, failing the test if incomplete. This is a hand-rolled quorum gate: the integration test checks "are all component digests in this snapshot consistent?" and blocks release until they are.
This is exactly the pattern validate-snapshot.py already implements.
The problem is not that invalid snapshots slip through to release --
they don't -- but that valid snapshots are slow to materialise. Every
intermediate (inconsistent) bundle build is wasted work, and the
release pipeline stalls until the final nudge lands and the last bundle
rebuild produces a consistent snapshot.
See: Managing Monorepo Applications
This problem has been actively investigated since October 2025 across a series of PRs in openshift/bpfman-operator. Every approach tried has either introduced new problems or only partially mitigated the race.
The idea was to route all component updates through the operator pipeline as a synchronisation point, so the bundle only rebuilds after the operator has incorporated all upstream changes.
-
PR #1083 -- Added nudge file path triggers to component push pipelines so components would rebuild when their image references changed. This caused an infinite build loop: component builds, updates its own
.txtfile, which triggers itself again. -
PR #1090 -- Reverted #1083 to break the infinite loop.
-
PR #1094 -- Removed nudge file path triggers from component push pipelines (kept them in pull-request pipelines only).
-
PR #1097 -- Made the operator pipeline watch
bpfman-agent.txtandbpfman.txt, and the bundle pipeline watch onlybpfman-operator.txt. The operator becomes the synchronisation point: agent/daemon changes flow through the operator before reaching the bundle. This reduced the race window but did not eliminate it. -
PR #1100 -- Removed wasteful bundle validation triggers for component nudge files (pull-request pipelines only).
Attempted to fix the nudging topology and use Konflux annotations to collapse competing PRs.
-
PR #1276 -- Changed agent and daemon to nudge the bundle directly (instead of going through the operator). Added
build-nudge-simple-branch: 'true'annotation to all components so competing nudge PRs targeting the same repo collapse into a single branch. This helped reduce duplicate PRs but did not solve the timing problem: the single branch still updates one.txtfile at a time. -
PR #1282 -- Applied the same nudge configuration fix to z-stream pipelines.
-
PR #1299 -- Deliberately triggered a race condition experiment by changing a file in
cmd/that triggers both agent and operator builds from the same commit. Goal: observe whethersimple-branchannotation prevents the mismatch. Result: the race still exists.
Since pipeline-level fixes could not eliminate the race, the next approach was to let inconsistent snapshots happen but block their release.
-
PR #1393 (closed), then PR #1401 (merged) -- Added
validate-snapshot.pyas a KonfluxIntegrationTestScenario. The script extracts the bundle image, parses the CSV and ConfigMap for embeddedsha256digests, and compares them against the component digests in the snapshot. Blocks release if any mismatch is found. At the time of implementation, the observed release failure rate due to inconsistent snapshots was ~70%. -
PR #1407 -- Scoped validation to push events only (not PR snapshots).
-
PR #1415 -- Further scoped validation to bundle component snapshots only.
-
PR #1436, PR #1437 -- Removed the snapshot validation pipeline and test-scripts integration test. This was a temporary measure while investigating release pipeline issues. The validation was correct but the integration with Konflux's release pipeline had its own problems.
Every mitigation tried has either:
- Introduced new problems (infinite build loops, over-triggering)
- Only narrowed the race window without closing it (operator as synchronisation point, simple-branch annotations)
- Correctly identified bad snapshots but not prevented the wasted work (snapshot validation)
The fundamental issue remains: Konflux does not offer a post-merge mechanism to atomically coordinate multiple component builds before triggering a dependent (bundle) build.
The coordination problem is inherent to how Konflux handles post-merge builds for multi-component applications. The upstream change (PR #498) shifts where image references are stamped (from a standalone bundle manifest to CSV deployment env vars) but does not change the fundamental issue: nudge PRs arrive independently, each triggers a bundle rebuild, and only the final rebuild in a cycle produces a releasable snapshot. Konflux does not currently offer post-merge group snapshots or atomic multi-component nudges that would eliminate this race.
This is not a configuration error. It has been actively investigated over five months with multiple approaches, none of which fully solve the problem without changes to Konflux's nudging and snapshot infrastructure.