Skip to content

Extend NoScaleUpInfo reporting by running simulation on skipped NodeGroups.#9346

Open
shaikenov wants to merge 1 commit intokubernetes:masterfrom
shaikenov:shaikenov-run-schedulablePodGroups-for-skipped-ngs
Open

Extend NoScaleUpInfo reporting by running simulation on skipped NodeGroups.#9346
shaikenov wants to merge 1 commit intokubernetes:masterfrom
shaikenov:shaikenov-run-schedulablePodGroups-for-skipped-ngs

Conversation

@shaikenov
Copy link
Copy Markdown
Contributor

This change introduces the following change:

  • run SchedulablePodGroups on skipped node groups (NG) during the ScaleUp simulation to check if skipped NGs satisfy predicates of podEquivalenceGroups:
    • if a skipped NG satisfies the predicate of a pod group then it stays in the SkippedNodeGroups list associated to this pod group's pods.
    • otherwise this NG moves to the RejectedNodeGroups.
  • since this change introduces the scale up performance overhead, it is covered by the feature flag.

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change will give better understanding to the user of why the scale up did fail and improve overall observability. If a NG is in the backoff but it does not satisfy the predicate, user will know it right away instead of waiting for when this NG becomes available and we would consider it again.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Adds new "scaleup-simulation-for-skipped-node-groups-enabled" flag which enables an extra SchedulablePodGroups run for the skipped node groups during ScaleUp simulation.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaikenov
Once this PR has been reviewed and has the lgtm label, please assign aleksandra-malinowska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @shaikenov. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2026
@shaikenov shaikenov force-pushed the shaikenov-run-schedulablePodGroups-for-skipped-ngs branch from fd176b0 to c9e137d Compare March 13, 2026 11:11
@shaikenov
Copy link
Copy Markdown
Contributor Author

/uncc elmiko
/cc MartynaGrotek

@k8s-ci-robot k8s-ci-robot removed the request for review from elmiko March 13, 2026 11:12
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@shaikenov: GitHub didn't allow me to request PR reviews from the following users: MartynaGrotek.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/uncc elmiko
/cc MartynaGrotek

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shaikenov shaikenov marked this pull request as ready for review March 13, 2026 11:14
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
@shaikenov shaikenov force-pushed the shaikenov-run-schedulablePodGroups-for-skipped-ngs branch from c9e137d to 015d03d Compare March 23, 2026 15:37
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2026
Copy link
Copy Markdown
Contributor

@norbertcyran norbertcyran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I've left some nits and suggestions, but nothing major

// GetRemainingPods returns information about pods which CA is unable to help
// at this moment.
func (o *ScaleUpOrchestrator) GetRemainingPods(egs []*equivalence.PodGroup, nodeGroups []cloudprovider.NodeGroup, skipped map[string]status.Reasons, nodeInfos map[string]*framework.NodeInfo) []status.NoScaleUpInfo {
if !o.autoscalingCtx.ScaleUpSimulationForSkippedNodeGroupsEnabled {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we had a discussion lately about putting more stuff to AutoscalingContext and we agreed that we tend to overuse it: #9353 (comment)

I'd normally ask to avoid using autoscalingCtx to store that flag and instead pass it to the orchestrator via dependency injection. However, IIRC, orchestrator has a weird interface that makes DI a little more complicated (because of the Initialize method). I remember having some issues with that in #8835. Therefore, I won't push on that, but anyway I'd suggest to take a look if injecting ScaleUpSimulationForSkippedNodeGroupsEnabled via DI would be a hassle

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that AutoscalingContext seems to be huge and indeed DI appears to be very complex with all the calling from Initialize. TBH, I think that it does not worth it and will make the implementation more complex.

Side comment:
While I understand that it is better to avoid having huge objects with a lot of things inside such as AutoscalingContext I personally do not think that there is a better way to do it. We are adding a lot of flags which might be used in different parts of CA, I feel like having one big object give us a lot more flexibility in that. You do not need to think twice about what to pass and where since you have a context object which can be accessed everywhere. And as long as we have this object I think it is better to use it and not to avoid it.
On the second thought, if we have some flag that is impacting a particular area of CA it might be worth to DI them into that areas and if we have some flags that impacts different CA parts we can put it in AutoscalingContext. If that is what was meant in that discussion, I fully agree on this.
This is just a comment to hear your opinion, maybe I am missing something.

// This code here runs a simulation to see which pods can be scheduled on which node groups.
for _, nodeGroup := range validNodeGroups {
schedulablePodGroups[nodeGroup.Id()] = o.SchedulablePodGroups(podEquivalenceGroups, nodeGroup, nodeInfos[nodeGroup.Id()])
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered running simulations for skipped nodes somewhere around here? I think it could be cleaner, as with the current proposal scheduling simulations get scattered over the orchestrator code and the logic that was previously responsible only for processing the scale up status now also does scheduling simulations.

We'd have to be extra careful though in order to not include skipped node groups in bin packing. I haven't investigated it in depth, so feel free to discard it if it's not feasible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, here I wanted to do the simulations towards the end of ScaleUp call, because of the binpacking and also because we need to somehow preserve the default behavior and all of this did not seem feasible.
Another point: SchedulablePodGroups is marking the pods as schedulable and it requires something extra to keep the pods that were unschedulable before the "second" simulation, after the "second" simulation and also managing all of this with feature flags.

Also the simulation that we intend to do for the skipped node groups are not "full" scale up simulations, but rather only a predicate checker, so I placed it only in the end and we are doing it only for the nonSchedulable pod groups.

…roups.

This change introduces the following change:
* run SchedulablePodGroups on skipped node groups (NG) during the ScaleUp simulation to check if skipped NGs satisfy predicates of podEquivalenceGroups:
    * if a skipped NG satisfies the predicate of a pod group then it stays in the SkippedNodeGroups list associated to this pod group's pods.
    * otherwise this NG moves to the RejectedNodeGroups.
* run the SchedulablePodGroups simulation even for AllOrNothing or ExpansionOptionsFilteredOutReason simulation after marking all pods unschedulable: it can give us better idea on if this simulations would have succeeded if some NGs were not skipped.
* since this change introduces the scale up performance overhead, it is covered by the feature flag.
This change will give better understanding to the user of why the scale up did fail. If a NG is in the backoff but it does not satisfy the predicate, user will know it right away instead of waiting for when this NG becomes available and we would consider it again.
@shaikenov shaikenov force-pushed the shaikenov-run-schedulablePodGroups-for-skipped-ngs branch from 015d03d to 5b75b41 Compare March 30, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants