Created
November 9, 2022 11:31
-
-
Save pohly/a01313eb671a80e9d452647668becc57 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hello! | |
I would like to ask for an exception that allows | |
https://github.com/kubernetes/kubernetes/pull/111023 to be merged into | |
1.26 after the code freeze. | |
Enhancement name: dynamic resource allocation | |
Enhancement status: alpha | |
SIG: Node, with Scheduling as participating SIG | |
k/enhancements repo issue #: #3063 | |
PR #’s: #111023 | |
Additional time needed (in days): 3 (= till Friday this week) | |
This is needed to give various code owners time to add their final | |
approval. Key stakeholders (Aldo for scheduling, Tim for architecture | |
and API) were basically ready for that already yesterday before the | |
code freeze, but some smaller recent changes still needed to be | |
checked again and other reviewers need more time. | |
Reason this enhancement is critical for this milestone: | |
There is a lot of interest and momentum behind this feature right | |
now. Hardware vendors are ready to showcase it to customers, but | |
that will be harder when using the feature depends on building | |
a fork of Kubernetes from source. Not merging it now risks loosing | |
this momentum. | |
Merging the feature also won't be easier for 1.27. It's now | |
fresh in the minds of reviewers, delaying until then would | |
imply that they need to make themselves familiar with it anew. | |
Risks from adding code late: | |
All the new code is behind a feature gate, so the risk for stability | |
of other features when not enabled is low. | |
The core API gets changed. The new fields are also feature gated, so | |
without the feature gate they won't be visible and not impact clients. | |
This in particular has been heavily scrutinized and changes were | |
made to support future extensions (for example, using a struct | |
instead of a plain string in one place), so the risk of making some | |
undesirable change is low to medium. | |
All code has unit tests that are passing. Test coverage (despite | |
being alpha) is getting close to that of comparable code that is GA. | |
For example, the GA volumebinding scheduler plugin has 82% | |
statement coverage while the new plugin has 70%. | |
E2E tests also exist and are passing in a new, optional Prow | |
pre-merge job that runs for PRs touching the code. | |
The risk for testing stability is low. | |
Risks from cutting enhancement: | |
The biggest risk is that if don't merge, we will loose the momentum | |
and then also won't get it merged in the future. This affects various | |
customer use cases where the currently possible workarounds are | |
not fully solving the problems that customers are having. | |
For example, Kevin Klues said in | |
https://github.com/kubernetes/kubernetes/pull/111023#discussion_r1017545708: | |
"the ability to share resources is one of the main features that | |
NVIDIA is excited about finally being able to support with DRA. We | |
already have an out-of-tree DRA driver running with this | |
functionality and have been communicating to customers that this is | |
the (long-awaited) path towards finally doing GPU-sharing "the right | |
way" in Kubernetes." | |
-- | |
Best Regards | |
Patrick Ohly | |
Cloud Software Architect |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment