Extend NoScaleUpInfo reporting by running simulation on skipped NodeGroups.#9346
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shaikenov The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @shaikenov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
fd176b0 to
c9e137d
Compare
|
/uncc elmiko |
|
@shaikenov: GitHub didn't allow me to request PR reviews from the following users: MartynaGrotek. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
c9e137d to
015d03d
Compare
norbertcyran
left a comment
There was a problem hiding this comment.
Overall looks good, I've left some nits and suggestions, but nothing major
| // GetRemainingPods returns information about pods which CA is unable to help | ||
| // at this moment. | ||
| func (o *ScaleUpOrchestrator) GetRemainingPods(egs []*equivalence.PodGroup, nodeGroups []cloudprovider.NodeGroup, skipped map[string]status.Reasons, nodeInfos map[string]*framework.NodeInfo) []status.NoScaleUpInfo { | ||
| if !o.autoscalingCtx.ScaleUpSimulationForSkippedNodeGroupsEnabled { |
There was a problem hiding this comment.
nit: we had a discussion lately about putting more stuff to AutoscalingContext and we agreed that we tend to overuse it: #9353 (comment)
I'd normally ask to avoid using autoscalingCtx to store that flag and instead pass it to the orchestrator via dependency injection. However, IIRC, orchestrator has a weird interface that makes DI a little more complicated (because of the Initialize method). I remember having some issues with that in #8835. Therefore, I won't push on that, but anyway I'd suggest to take a look if injecting ScaleUpSimulationForSkippedNodeGroupsEnabled via DI would be a hassle
There was a problem hiding this comment.
I agree that AutoscalingContext seems to be huge and indeed DI appears to be very complex with all the calling from Initialize. TBH, I think that it does not worth it and will make the implementation more complex.
Side comment:
While I understand that it is better to avoid having huge objects with a lot of things inside such as AutoscalingContext I personally do not think that there is a better way to do it. We are adding a lot of flags which might be used in different parts of CA, I feel like having one big object give us a lot more flexibility in that. You do not need to think twice about what to pass and where since you have a context object which can be accessed everywhere. And as long as we have this object I think it is better to use it and not to avoid it.
On the second thought, if we have some flag that is impacting a particular area of CA it might be worth to DI them into that areas and if we have some flags that impacts different CA parts we can put it in AutoscalingContext. If that is what was meant in that discussion, I fully agree on this.
This is just a comment to hear your opinion, maybe I am missing something.
| // This code here runs a simulation to see which pods can be scheduled on which node groups. | ||
| for _, nodeGroup := range validNodeGroups { | ||
| schedulablePodGroups[nodeGroup.Id()] = o.SchedulablePodGroups(podEquivalenceGroups, nodeGroup, nodeInfos[nodeGroup.Id()]) | ||
| } |
There was a problem hiding this comment.
Have you considered running simulations for skipped nodes somewhere around here? I think it could be cleaner, as with the current proposal scheduling simulations get scattered over the orchestrator code and the logic that was previously responsible only for processing the scale up status now also does scheduling simulations.
We'd have to be extra careful though in order to not include skipped node groups in bin packing. I haven't investigated it in depth, so feel free to discard it if it's not feasible.
There was a problem hiding this comment.
Exactly, here I wanted to do the simulations towards the end of ScaleUp call, because of the binpacking and also because we need to somehow preserve the default behavior and all of this did not seem feasible.
Another point: SchedulablePodGroups is marking the pods as schedulable and it requires something extra to keep the pods that were unschedulable before the "second" simulation, after the "second" simulation and also managing all of this with feature flags.
Also the simulation that we intend to do for the skipped node groups are not "full" scale up simulations, but rather only a predicate checker, so I placed it only in the end and we are doing it only for the nonSchedulable pod groups.
…roups.
This change introduces the following change:
* run SchedulablePodGroups on skipped node groups (NG) during the ScaleUp simulation to check if skipped NGs satisfy predicates of podEquivalenceGroups:
* if a skipped NG satisfies the predicate of a pod group then it stays in the SkippedNodeGroups list associated to this pod group's pods.
* otherwise this NG moves to the RejectedNodeGroups.
* run the SchedulablePodGroups simulation even for AllOrNothing or ExpansionOptionsFilteredOutReason simulation after marking all pods unschedulable: it can give us better idea on if this simulations would have succeeded if some NGs were not skipped.
* since this change introduces the scale up performance overhead, it is covered by the feature flag.
This change will give better understanding to the user of why the scale up did fail. If a NG is in the backoff but it does not satisfy the predicate, user will know it right away instead of waiting for when this NG becomes available and we would consider it again.
015d03d to
5b75b41
Compare
This change introduces the following change:
What type of PR is this?
/kind feature
What this PR does / why we need it:
This change will give better understanding to the user of why the scale up did fail and improve overall observability. If a NG is in the backoff but it does not satisfy the predicate, user will know it right away instead of waiting for when this NG becomes available and we would consider it again.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: