The golangci-lint tool gets stuck for a variety of reasons when
running in Prow CI. Enabling verbose output in an attempt to make
debugging easier.
ref: https://golangci-lint.run/contributing/debug/
Since the parameter strategies don't exist anywhere in the code or docs, I'm removing it from the chart readme as a possible option.
It just makes things more confusing.
When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only
annotation will have the eviction API error examined for a specific
error code/reason and message. If matched eviction of such a pod will be interpreted
as initiation of an eviction in background.
* add ignoreNonPDBPods option
* take2
* add test
* poddisruptionbudgets are now used by defaultevictor plugin
* add poddisruptionbudgets to rbac
* review comments
* don't use GetPodPodDisruptionBudgets
* review comment, don't hide error
At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.
`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* skip eviction when pod creation time is below minPodAge threshold setting
In the default initialization phase of the descheduler, add a new
constraint to not evict pods that creation time is below minPodAge
threshold.
Added value:
- Avoid crazy pod movement when the autoscaler scales up and down.
- Avoid evicting pods when they are warming up.
- Decreases the overall cost of eviction as no pod will be evicted
before doing significant amount of work.
- Guard against scheduling. Descheduling loops in situations where
the descheduler has a different node fit logic from scheduler,
like not considering topology spread constraints.
* Use *time.Duration instead of uint for MinPodAge type
* Remove '(in minutes)' from default evictor configuration table
* make fmt
* Add explicit name for Duration field
* Use Duration.String()
Currently, all the plugins are run in a sequence.
No plugin executes evictions in parallel within.
Yet, there's no guarantee a future plugin (e.g. a custom one)
will not attemp to evict pods in parallel.
When checking for node limit getting exceeded the pod eviction
never fails. Thus, ignoring the metric reporting when a pod fails
to be evicted due to node limit constrains.
The error also allows plugin to react on other limits getting
exceeded. E.g. the limit on the number of pods evicted per namespace.
Currently, the pod evictor is created during each descheduling cycle
to reset the internal counters and the fake client (in case a dry run is
configured). Instead, create the pod evictor once and reset only what's
needed. So later on the pod evictor can be extended with e.g. a cache
keeping the track of eviction requests that are still in progress and
required more than a single descheduling cycle to complete.
when we cut a new release of descheduler, we have to update the go version in multiple places
which presents an opportunity to miss updating one.
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
in recent kubernetes 1.30, the code-gen flags were changed. --output-file-base -> --output-file based on 144141734d\#diff-beaa4412ca0edb2451061daa9570ce25858ec41951938fc60f17e2370462ad8e
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
Update the profiles to reflect only Deschedule and Balance plugins are
run and the order of first Deschedule of all profiles then Balance of
all profiles.
* Allow the use of existing policy configMap.
* Update charts/descheduler/templates/configmap.yaml
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
* Remove references to unused variable and update documentation regarding deschedulerPolicy
* Add missing newLine at EOF
* Update charts/descheduler/values.yaml
* remove trailing space
---------
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
* Check if Pod matches inter-pod anti-affinity of other pod on node as part of NodeFit()
* Add unit tests for checking inter-pod anti-affinity match in NodeFit()
* Export setPodAntiAffinity() helper func to test utils
* Add docs for inter-pod anti-affinity in README
* Refactor logic for inter-pod anti-affinity to use in multiple pkgs
* Move logic for finding match between pods with antiaffinity out of framework to reuse in other pkgs
* Move interpod antiaffinity funcs to pkg/utils/predicates.go
* Add unit tests for inter-pod anti-affinity check
* Test logic in GroupByNodeName
* Test NodeFit() case where pods matches inter-pod anti-affinity
* Test for inter-pod anti-affinity pods match terms, have label selector
* NodeFit inter-pod anti-affinity check returns early if affinity spec not set
* feat: profile name for pods_evicted metric
Support new label "profile" for "pods_evicted" metric to allow
understand which profiles are evicting more pods, allowing better
observability
* refactor: evictoptions improved observability
Send profile and strategy names for EvictOptions, allowing Evictors to
access observability information
* cleanup: remove unnecessary evictoption reference
* feat: evictoptions for nodeutilzation
Explicit usage of options when invoking evictPods from the helper
function from nodeutilization for both highnodeutilization and
lownodeutilization
when the pod createtimestamp is greater than the current time (which is
not make sense in real life, but when doing test with such case,
it is possible), it will convert to a large number if we convert it
to uint, and though it can pass the test, but doesn't make sense.
* helm: ability to specify security context for pod
* Update charts/descheduler/templates/cronjob.yaml
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
* Update charts/descheduler/templates/deployment.yaml
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
---------
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
Pods that don't pass the nodeFit condition currently log an
unsuppressable error message to logs. This changes the log level to info
as it's a normal operating condition.
Signed-off-by: Antoine Deschênes <antoine.deschenes@linux.com>
* Add handling for node eligibility
* Make tests buildable
* Update topologyspreadconstraint.go
* Updated test cases failing
* squashed changes for test case addition
corrected function name
refactored duplicate TopoContraint check logic
Added more test cases for testing node eligibility scenario
Added 5 test cases for testing scenarios related to node eligibility
* topologySpreadConstraints e2e: `nodeTaintsPolicy` and `nodeAffinityPolicy` constraints
---------
Co-authored-by: Marc Power <marcpow@microsoft.com>
Co-authored-by: nitindagar0 <81955199+nitindagar0@users.noreply.github.com>
* feat: Implement preferredDuringSchedulingIgnoredDuringExecution for RemovePodsViolatingNodeAffinity
Now, the descheduler can detect and evict pods that are not optimally
allocated according to the "preferred..." node affinity. It only evicts
a pod if it can be scheduled on a node that scores higher in terms of
preferred node affinity than the current one.
This can be activated by enabling the RemovePodsViolatingNodeAffinity
plugin and passing "preferredDuringSchedulingIgnoredDuringExecution" in
the args.
For example, imagine we have a pod that prefers nodes with label "key1:
value1" with a weight of 10. If this pod is scheduled on a node that
doesn't have "key1: value1" as label but there's another node that has
this label and where this pod can potentially run, then the descheduler
will evict the pod.
Another effect of this commit is that the
RemovePodsViolatingNodeAffinity plugin will not remove pods that don't
fit in the current node but for other reasons than violating the node
affinity. Before that, enabling this plugin could cause evictions on
pods that were running on tainted nodes without the necessary
tolerations.
This commit also fixes the wording of some tests from
node_affinity_test.go and some parameters and expectations of these
tests, which were wrong.
* Optimization on RemovePodsViolatingNodeAffinity
Before checking if a pod can be evicted or if it can be scheduled
somewhere else, we first check if it has the corresponding nodeAffinity
field defined. Otherwise, the pod is automatically discarded as a
candidate.
Apart from that, the method that calculates the weight that a pod
gives to a node based on its preferred node affinity has been
renamed to better reflect what it does.
1. Enable OTEL configuration and base framework
2. update generated conversion spec
3. enable docker based conversion and deep copy generate
4. fix broken unit tests
* use pod informers for listing pods in removepodsviolatingtopologyspreadconstraint and removepodsviolatinginterpodantiaffinity
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
* workaround in topologyspreadconstraint test to ensure that informer's index returns pods sorted by name
---------
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
Huge clusters with thousands of pods can quickly exceed the default
watch channel size of the fake clientset. Causing the channel
to panic with "channel full".
* pod anti-affinity check among nodes
* avoid pod equality check with UID field
also add node equality check with Name for short-cut
* add test case for anti-affinity violation among different node
* reduce ListPodsOnANode call
* fix old code
* apply gofumpt -w -extra
move klog/v2 import entry to bottom according to master code
* fix plugin arg conversion when using multiple profiles with same plugin
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
* PR feedback to refactor validateDeschedulerConfiguration error handling
---------
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
* update helm chart to v0.27.0
* update manifest version and docs
* fix 1.27 release version from README.md
Co-authored-by: Mike Dame <mikedame@google.com>
---------
Co-authored-by: Mike Dame <mikedame@google.com>
The Evict extension point is not currently in use.
All DefaultEvictor plugin functionality is exposed through Filter and
PreEvictionFilter extension points instead.
Thus, no need to limit the number of evictors enabled.
* bump to k8s 1.27
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
* bump go version to 1.20.3
* bump k8s version and kine for e2e
---------
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
- Populate extension points automatically from plugin types
- Make a list of enabled extension points based on a profile
configuration
- Populate filter and pre-eviction filter handles from their
corresponding extension points
* Descheduling profile
* Fake plugin + profile unit testing
* Rename Profile config type into DeschedulerProfile
To avoid resamblance with profileImpl
* First run deschedule, then balance extension points
* Adding descheduler policy API Version option
in helm templates
* Updating comment for deschedulerPolicyAPIVersion
field
* Making v1alpha1 the default api version
* v1alph2 docs
* remove internal toc (gh has this natively)
* fix typo and newlines
* name plugins with less confusing names
* add type column
* fix kv selector and nodeSelector desc
* group plugin types in a table
* link the deprecated doc
* warning signs
* add v1alpha2 registry based conversion
* test defaults, set our 1st explicit default
* fix typos and dates
* move pluginregistry to its own dir
* remove unused v1alpha2.Namespace type
* move migration code folders, remove switch
* validate internalPolicy a single time
* remove structured logs
* simplify return
* check for nil methods
* properly check before adding default evictor
* add TODO comment
* bump copyright year
* use plugin registry and prepare for conersion
* Register plugins explicitly to a registry
* check interface impl instead of struc var
* setup plugins at top level
* treat plugin type combinations
* pass registry as arg of V1alpha1ToInternal
* move registry yet another level up
* check interface type separately
* Remove log level from Errors
Every error printed via Errors is expected to be important and always
printable.
* Invoke first Deschedule and then Balance extension points (breaking change)
* Separate plugin arg conversion from pluginsMap
* Seperate profile population from plugin execution
* Convert strategy params into profiles outside the main descheduling loop
Strategy params are static and do not change in time.
* Bump the internal DeschedulerPolicy to v1alpha2
Drop conversion from v1alpha1 to internal
* add tests to v1alpha1 to internal conversion
* add tests to strategyParamsToPluginArgs params wiring
* in v1alpha1 evictableNamespaces are still Namespaces
* add test passing in all params
Co-authored-by: Lucas Severo Alves <lseveroa@redhat.com>
--help is now an CMD which means explicitly providing a command override in kubernetes is no longer required. You can now simply provide the necessary arguments
This commit changes build_info metric labels
- AppVersion label will show major+minor version
for example 0.24.1
minor version numbers and commit hash
Signed-off-by: eminaktas <eminaktas34@gmail.com>
This does the following:
1. Enables RemovePodsHavingTooManyRestarts when using Helm by default (it is not currently)
2. Adds RemovePodsHavingTooManyRestarts to the values.yaml for clearer configs
update go to 1.19 and helm kubernetes cluster to 1.25
bump -rc.0 to 1.25 GA
bump k8s utils library
bump golang-ci
use go 1.19 for helm github action
upgrade kubectl from 0.20 to 0.25
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
When an error is returned a strategy either stops completely or starts
processing another node. Given the error can be a transient error or
only one of the limits can get exceeded it is fair to just skip a
pod that failed eviction and proceed to the next instead.
In order to optimize the processing and stop earlier, it is more
practical to implement a check which will say when a limit was
exceeded.
The method uses the node object to only get the node name.
The node name can be retrieved from the pod object.
Some strategies might try to evict a pod in Pending state which
does not have the .spec.nodeName field set. Thus, skipping
the test for the node limit.
Both LowNode and HighNode utilization strategies evict only as many pods
as there's free resources on other nodes. Thus, the resource fit test
is always true by definition.
Add taint exclusion to RemovePodsViolatingNodeTaints. This permits node
taints to be ignored by allowing users to specify ignored taint keys or
ignored taint key=value pairs.
Currently, when the descheduler is running with the --dry-run on, no strategy actually
evicts a pod so every strategy always starts with a complete list of
pods. E.g. when the PodLifeTime strategy evicts few pods, the RemoveDuplicatePods
strategy still takes into account even the pods eliminated by the PodLifeTime
strategy. Which does not correspond to the real case scenarios as the
same pod can be evicted multiple times. Instead, use a fake client and
evict/delete the pods from its cache so the strategies evict each pod
at most once as it would be normally done in a real cluster.
This patch adds the policy(evictFailedBarePods) to allow the failed
pods without ownerReferences to be evicted. For backward compatibility,
disable the policy by default. Address #644.
calcContainerRestarts sums over containers. The new language makes
that clear, avoiding potential confusion vs. an altenative that looked
for pods where a single container had passed the configured threshold.
For example, with three containers with 50 restarts and a threshold of
100, the actual "sum over containers" logic makes that pod a candidate
for descheduling, but the "largest single container restart count"
hypothetical would not have made it a candidate.
Also shifts labelSelector into the parameter table, because when it
was added in 29ade13ce7 (README and e2e-testcase add for
labelSelector, 2021-03-02, #510), it landed a few lines too high.
RemoveDuplicates: take node taints, node affinity and node selector into account when computing a number of feasible nodes for the average occurence of pods per node
Nodes with taints which are not tolerated by evicted pods will never run the
pods. The same holds for node affinity and node selector.
So increase the number of pods per feasible nodes to decrease the
number of evicted pods.
Always use structured logging. Therefore update klog.Errorf() to instead
use klog.ErrorS().
Here is an example of the new log message.
E0428 23:58:57.048912 586 descheduler.go:145] "skipping strategy" err="unknown strategy name" strategy=ASDFPodLifeTime
The master branch always represents the next release of the
descheduler. Therefore applying the descheduler k8s manifests
from the master branch is not considered stable. It is best for
users to install descheduler using the released tags.
Similar to ReplicaSet, ReplicationController, and Jobs pods with a
StatefulSet metadata.ownerReference are considered for eviction.
Document this, so that it is clear to end users.
The resources.cpuRequest and resources.memoryRequest varialbes are
not valid in the helm chart values.yaml file. The correct varialbe
name for setting the requests and limits is resources.
Also, fixed white space alignment in the markdown table.
New metrics:
- build_info: Build info about descheduler, including Go version, Descheduler version, Git SHA, Git branch
- pods_evicted: Number of successfully evicted pods, by the result, by the strategy, by the namespace
For roughly the past year damemi has been the only active approver for
the descheduler. Therefore move the inactive approvers to emeritus
status. This will help clarify to contributors who should be assigned to
pull requests.
Previous to this change official descheduler container images only
supported the AMD64 hardware architecture. This change enables
building official descheduler container images for multiple
architectures.
The initially supported architectures are AMD64 and ARM64.
Prior to this commit the helm chart used to install the descheduler
CronJob did not set container requests or limits. This is considered
an anti-pattern when deploying applications on k8s.
Set descheduler container resources to make it a burstable pod. This
will ensure a high quality experience for end users when deploying
descheduler into their clusters using the instructions from the README.
The default values choosen for CPU/Memory are not based on any real data.
Prior to this change the output from the command "descheduler version"
when run using the official container images from k8s.gcr.io would
always output an empty string. See below for an example.
```
docker run k8s.gcr.io/descheduler/descheduler:v0.19.0 /bin/descheduler version
Descheduler version {Major: Minor: GitCommit: GitVersion: BuildDate:2020-09-01T16:43:23+0000 GoVersion:go1.15 Compiler:gc Platform:linux/amd64}
```
This change makes it possible to pass the descheduler version
information to the automated container image build process and
also makes it work for local builds too.
Prior to this commit the YAML manifests used to install the descheduler
Job and CronJob did not set container requests or limits. This is
considered an anti-pattern when deploying applications on k8s.
Set descheduler container resources to make it a burstable pod. This
will ensure a high quality experience for end users when deploying
descheduler into their clusters using the instructions from the README.
The values choosen for CPU/Memory are not based on any real data.
This adds a strategy to balance pod topology domains based on the scheduler's
PodTopologySpread constraints. It attempts to find the minimum number of pods
that should be sent for eviction by comparing the largest domains in a topology
with the smallest domains in that topology.
This commit adds restrictive PodSecurityPolicy, which can be
optionally created, so descheduler can be deployed on clusters with
PodSecurityPolicy admission controller, but which do not ship default
policies.
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
The k8s.io/klog/v2 package does not currently support structured logging
for warning level log messages. Therefore update the one call in the
code base using klog.Warningf to instead use klog.InfoS.
While non-evictable pods should never be evicted, they should still be
considered when calculating PodAntiAffinity violations. For example, you
may have an evictable pod that should not be running next to a system-critical
static pod. We currently filter IsEvictable before checking for Affinity violations,
so this case would not be caught.
Updated the README with the link to the official descheduler helm chart
on https://hub.helm.sh. This makes it easier for end users to install the
desceduler using helm.
The k8s project recently cut over to the new official k8s.gcr.io
container registry. The descheduler image can now be pulled from
k8s.gcr.io. This is just a new DNS name for the container registry. The
previously documented DNS names for the registry still work, but
require more typing.
Basing this action on push to `chart-*` tags doesn't work: the action itself
creates the new release tag, so trying to push the changes to a tag ends up
with the releaser comparing the changes to themself (and failing).
This also renames the chart from "descheduler" to "descheduler-helm-chart", to
avoid confusion with automated releases.
This moves the kind setup (previously used by Travis) to the e2e runner script
to accomodate the switch to Prow. This provides a KIND_E2E env var to specify
whether to run the tests in kind, or (by default) to run locally).
The kubernetes project has been updated to use Go 1.14.4. See below pull
request.
https://github.com/kubernetes/kubernetes/pull/88638
After making the updates to Go 1.14 "make gen" no longer worked. The
file hack/tools.go had to be created to get "make gen" working with Go
1.14.
Initially users can open a bug report, feature request, or misc request.
Bug reports and feature requests will have the "kind" label
automatically added.
Prior to this change the event created for every pod eviction was
identical. Instead leverage the newly added eviction reason when
creating k8s events. This makes it easier for end users to understand
why the descheduler evicted a pod when inspecting k8s events.
This is a very minor refactor to use a var declaration for the reason
variable. A var declaration is being used because the zero value for
strings is an empty string.
https://golang.org/ref/spec#The_zero_value
This is a minor refactor of PodEvictor to include evictLocalStoragePods as a
property (so that it doesn't need to be passed to each strategy function.) It
also removes ListEvictablePodsOnNode and extends ListPodsOnANode with an optional
"filter" function parameter, so that for example IsEvictable can be passed to it
and achieve the same results as the function formerly known as ListEvictablePodsOnNode.
This approach also now allows strategies to more explicitly extend their criteria of what
IsEvictable considers an evictable pod (such as NodeAffinity, which checks that the pod
can fit on any other node).
1.Set default CPU/Mem/Pods percentage of thresholds to 100
2.Stop evicting pods if any resource ran out
3.Add thresholds verification method and limit resource percentage within [0, 100]
4.Change testcases and readme
This newly documented URL can be used to view the descheduler staging
registry in a web browser. This is easier to browse if the gcloud
command is not available.
The matrix has been updated with the soon the be released v0.18
details. Also, clarified the descheduler and k8s version compatibility
requirements and recommendations.
2020-05-20 10:55:57 -05:00
6972 changed files with 1379789 additions and 274036 deletions
$(CONTAINER_ENGINE) run --entrypoint make -it -v $(CURRENT_DIR):/go/src/sigs.k8s.io/descheduler -w /go/src/sigs.k8s.io/descheduler golang:$(GO_VERSION) gen
description:Descheduler for Kubernetes is used to rebalance clusters by evicting pods that can potentially be scheduled on better nodes. In the current implementation, descheduler does not schedule replacement of evicted pods but relies on the default scheduler for that.
This chart bootstraps a [desheduler](https://github.com/kubernetes-sigs/descheduler/) cron job on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
This chart bootstraps a [descheduler](https://github.com/kubernetes-sigs/descheduler/) cron job on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
## Prerequisites
@@ -22,7 +22,7 @@ This chart bootstraps a [desheduler](https://github.com/kubernetes-sigs/deschedu
To install the chart with the release name `my-release`:
The command deploys _descheduler_ on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation.
@@ -43,17 +43,49 @@ The command removes all the Kubernetes components associated with the chart and
The following table lists the configurable parameters of the _descheduler_ chart and their default values.
| `schedule` | The cron schedule to run the _descheduler_ job on | `"*/2 * * * *"` |
| `cmdOptions` | The options to pass to the _descheduler_ command | _see values.yaml_ |
| `deschedulerPolicy.strategies` | The _descheduler_ strategies to apply | _see values.yaml_ |
| `priorityClassName` | The name of the priority class to add to pods | `system-cluster-critical` |
| `rbac.create`| If `true`, create & use RBAC resources | `true` |
| `serviceAccount.create` | If `true`, create a service account for the cron job | `true` |
| `serviceAccount.name` | The name of the service account to use, if not set and create is true a name is generated using the fullname template | `nil`|
| `namespaceOverride` | Override the deployment namespace; defaults to .Release.Namespace | `""` |
| `cronJobApiVersion` | CronJob API Group Version | `"batch/v1"` |
| `schedule` | The cron schedule to run the _descheduler_ job on | `"*/2 * * * *"` |
| `startingDeadlineSeconds` | If set, configure `startingDeadlineSeconds` for the _descheduler_ job | `nil` |
| `timeZone` | configure `timeZone` for CronJob | `nil` |
| `successfulJobsHistoryLimit` | If set, configure `successfulJobsHistoryLimit` for the _descheduler_ job | `3` |
| `failedJobsHistoryLimit` | If set, configure `failedJobsHistoryLimit` for the _descheduler_ job | `1` |
| `ttlSecondsAfterFinished` | If set, configure `ttlSecondsAfterFinished` for the _descheduler_ job | `nil` |
| `deschedulingInterval` | If using kind:Deployment, sets time between consecutive descheduler executions. | `5m` |
| `replicas` | The replica count for Deployment | `1` |
| `leaderElection` | The options for high availability when running replicated components | _see values.yaml_ |
| `cmdOptions` | The options to pass to the _descheduler_ command | _see values.yaml_ |
| `priorityClassName` | The name of the priority class to add to pods | `system-cluster-critical` |
| `rbac.create` | If `true`, create & use RBAC resources | `true` |
| `resources` | Descheduler container CPU and memory requests/limits | _see values.yaml_ |
| `serviceAccount.create` | If `true`, create a service account for the cron job | `true` |
| `serviceAccount.name` | The name of the service account to use, if not set and create is true a name is generated using the fullname template | `nil` |
| `serviceAccount.annotations` | Specifies custom annotations for the serviceAccount | `{}` |
| `podAnnotations` | Annotations to add to the descheduler Pods | `{}` |
| `podLabels` | Labels to add to the descheduler Pods | `{}` |
| `nodeSelector` | Node selectors to run the descheduler cronjob/deployment on specific nodes | `nil` |
| `service.enabled` | If `true`, create a service for deployment | `false` |
| `serviceMonitor.enabled` | If `true`, create a ServiceMonitor for deployment | `false` |
| `serviceMonitor.namespace` | The namespace where Prometheus expects to find service monitors | `nil` |
| `serviceMonitor.additionalLabels` | Add custom labels to the ServiceMonitor resource | `{}` |
| `serviceMonitor.interval` | The scrape interval. If not set, the Prometheus default scrape interval is used | `nil` |
| `serviceMonitor.honorLabels` | Keeps the scraped data's labels when labels are on collisions with target labels. | `true` |
WARNING: You set replica count as 1 and workload kind as Deployment however leaderElection is not enabled. Consider enabling Leader Election for HA mode.
{{- end}}
{{- if .Values.leaderElection }}
{{- if and (hasKey .Values.cmdOptions "dry-run") (eq (get .Values.cmdOptions "dry-run") true) }}
WARNING: You enabled DryRun mode, you can't use Leader Election.
fs.DurationVar(&rs.DeschedulingInterval,"descheduling-interval",rs.DeschedulingInterval,"Time interval between two consecutive descheduler executions. Setting this value instructs the descheduler to run in a continuous loop at the interval specified.")
fs.StringVar(&rs.KubeconfigFile,"kubeconfig",rs.KubeconfigFile,"File with kube configuration.")
fs.StringVar(&rs.ClientConnection.Kubeconfig,"kubeconfig",rs.ClientConnection.Kubeconfig,"File with kube configuration. Deprecated, use client-connection-kubeconfig instead.")
fs.StringVar(&rs.ClientConnection.Kubeconfig,"client-connection-kubeconfig",rs.ClientConnection.Kubeconfig,"File path to kube configuration for interacting with kubernetes apiserver.")
fs.Float32Var(&rs.ClientConnection.QPS,"client-connection-qps",rs.ClientConnection.QPS,"QPS to use for interacting with kubernetes apiserver.")
fs.Int32Var(&rs.ClientConnection.Burst,"client-connection-burst",rs.ClientConnection.Burst,"Burst to use for interacting with kubernetes apiserver.")
fs.StringVar(&rs.PolicyConfigFile,"policy-config-file",rs.PolicyConfigFile,"File with descheduler policy configuration.")
fs.BoolVar(&rs.DryRun,"dry-run",rs.DryRun,"execute descheduler in dry run mode.")
// node-selector query causes descheduler to run only on nodes that matches the node labels in the query
fs.StringVar(&rs.NodeSelector,"node-selector",rs.NodeSelector,"Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)")
// max-no-pods-to-evict limits the maximum number of pods to be evicted per node by descheduler.
fs.IntVar(&rs.MaxNoOfPodsToEvictPerNode,"max-pods-to-evict-per-node",rs.MaxNoOfPodsToEvictPerNode,"Limits the maximum number of pods to be evicted per node by descheduler")
// evict-local-storage-pods allows eviction of pods that are using local storage. This is false by default.
fs.BoolVar(&rs.EvictLocalStoragePods,"evict-local-storage-pods",rs.EvictLocalStoragePods,"Enables evicting pods using local storage by descheduler")
fs.BoolVar(&rs.DryRun,"dry-run",rs.DryRun,"Execute descheduler in dry run mode.")
fs.BoolVar(&rs.DisableMetrics,"disable-metrics",rs.DisableMetrics,"Disables metrics. The metrics are by default served through https://localhost:10258/metrics. Secure address, resp. port can be changed through --bind-address, resp. --secure-port flags.")
fs.StringVar(&rs.Tracing.CollectorEndpoint,"otel-collector-endpoint","","Set this flag to the OpenTelemetry Collector Service Address")
fs.StringVar(&rs.Tracing.TransportCert,"otel-transport-ca-cert","","Path of the CA Cert that can be used to generate the client Certificate for establishing secure connection to the OTEL in gRPC mode")
fs.StringVar(&rs.Tracing.ServiceName,"otel-service-name",tracing.DefaultServiceName,"OTEL Trace name to be used with the resources")
fs.StringVar(&rs.Tracing.ServiceNamespace,"otel-trace-namespace","","OTEL Trace namespace to be used with the resources")
fs.Float64Var(&rs.Tracing.SampleRate,"otel-sample-rate",1.0,"Sample rate to collect the Traces")
fs.BoolVar(&rs.Tracing.FallbackToNoOpProviderOnError,"otel-fallback-no-op-on-error",false,"Fallback to NoOp Tracer in case of error")
fs.BoolVar(&rs.EnableHTTP2,"enable-http2",false,"If http/2 should be enabled for the metrics and health check")
fs.Var(cliflag.NewMapStringBool(&rs.FeatureGates),"feature-gates","A set of key=value pairs that describe feature gates for alpha/experimental features. "+
The descheduler evicts pods which may be bound to less desired nodes
```
descheduler [flags]
```
### Options
```
--bind-address ip The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or ::), all interfaces and IP address families will be used. (default 0.0.0.0)
--cert-dir string The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates")
--client-connection-burst int32 Burst to use for interacting with kubernetes apiserver.
--client-connection-kubeconfig string File path to kube configuration for interacting with kubernetes apiserver.
--client-connection-qps float32 QPS to use for interacting with kubernetes apiserver.
--descheduling-interval duration Time interval between two consecutive descheduler executions. Setting this value instructs the descheduler to run in a continuous loop at the interval specified.
--disable-http2-serving If true, HTTP2 serving will be disabled [default=false]
--disable-metrics Disables metrics. The metrics are by default served through https://localhost:10258/metrics. Secure address, resp. port can be changed through --bind-address, resp. --secure-port flags.
--dry-run Execute descheduler in dry run mode.
--enable-http2 If http/2 should be enabled for the metrics and health check
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
--http2-max-streams-per-connection int The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means to use golang's default.
--kubeconfig string File with kube configuration. Deprecated, use client-connection-kubeconfig instead.
--leader-elect Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.
--leader-elect-lease-duration duration The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 2m17s)
--leader-elect-renew-deadline duration The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than the lease duration. This is only applicable if leader election is enabled. (default 1m47s)
--leader-elect-resource-lock string The type of resource object that is used for locking during leader election. Supported options are 'leases'. (default "leases")
--leader-elect-resource-name string The name of resource object that is used for locking during leader election. (default "descheduler")
--leader-elect-resource-namespace string The namespace of resource object that is used for locking during leader election. (default "kube-system")
--leader-elect-retry-period duration The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 26s)
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--log-json-info-buffer-size quantity [Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature gate to use this.
--log-json-split-stream [Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate to use this.
--log-text-info-buffer-size quantity [Alpha] In text format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature gate to use this.
--log-text-split-stream [Alpha] In text format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate to use this.
--logging-format string Sets the log format. Permitted formats: "json" (gated by LoggingBetaOptions), "text". (default "text")
--otel-collector-endpoint string Set this flag to the OpenTelemetry Collector Service Address
--otel-fallback-no-op-on-error Fallback to NoOp Tracer in case of error
--otel-sample-rate float Sample rate to collect the Traces (default 1)
--otel-service-name string OTEL Trace name to be used with the resources (default "descheduler")
--otel-trace-namespace string OTEL Trace namespace to be used with the resources
--otel-transport-ca-cert string Path of the CA Cert that can be used to generate the client Certificate for establishing secure connection to the OTEL in gRPC mode
--permit-address-sharing If true, SO_REUSEADDR will be used when binding the port. This allows binding to wildcard IPs like 0.0.0.0 and specific IPs in parallel, and it avoids waiting for the kernel to release sockets in TIME_WAIT state. [default=false]
--permit-port-sharing If true, SO_REUSEPORT will be used when binding the port, which allows more than one instance to bind on the same address and port. [default=false]
--policy-config-file string File with descheduler policy configuration.
--secure-port int The port on which to serve HTTPS with authentication and authorization. If 0, don't serve HTTPS at all. (default 10258)
--tls-cert-file string File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
--tls-cipher-suites strings Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be used.
--tls-sni-cert-key namedCertKey A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. The domain patterns also allow IP addresses, but IPs should only be used if the apiserver has visibility to the IP address requested by a client. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
-v, --v Level number for the log level verbosity
--vmodule pattern=N,... comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)
```
### SEE ALSO
* [descheduler version](descheduler_version.md) - Version of descheduler
After making changes in the code base, ensure that the code is formatted correctly:
```
make fmt
```
## Build Helm Package locally
If you made some changes in the chart, and just want to check if templating is ok, or if the chart is buildable, you can run this command to have a package built from the `./charts` directory.
```
make build-helm
```
## Lint Helm Chart locally
To check linting of your changes in the helm chart locally you can run:
```
make lint-chart
```
## Test helm changes locally with kind and ct
You will need kind and docker (or equivalent) installed. We can use ct public image to avoid installing ct and all its dependencies.
```
make kind-multi-node
make ct-helm
```
### Miscellaneous
See the [hack directory](https://github.com/kubernetes-sigs/descheduler/tree/master/hack) for additional tools and scripts used for developing the descheduler.
See the [hack directory](https://github.com/kubernetes-sigs/descheduler/tree/master/hack) for additional tools and scripts used for developing the descheduler.
The process for publishing each Descheduler release includes a mixture of manual and automatic steps. Over
time, it would be good to automate as much of this process as possible. However, due to current limitations there
is care that must be taken to perform each manual step precisely so that the automated steps execute properly.
### Semi-automatic
## Pre-release Code Changes
1. Make sure your repo is clean by git's standards
2. Create a release branch `git checkout -b release-1.18` (not required for patch releases)
3. Push the release branch to the descheuler repo and ensure branch protection is enabled (not required for patch releases)
4. Tag the repository and push the tag `VERSION=v0.18.0 git tag -m $VERSION $VERSION; git push origin $VERSION`
5. Publish a draft release using the tag you just created
6. Perform the [image promotion process](https://github.com/kubernetes/k8s.io/tree/master/k8s.gcr.io#image-promoter)
7. Publish release
8. Email `kubernetes-sig-scheduling@googlegroups.com` to announce the release
Before publishing each release, the following code updates must be made:
### Manual
- [ ] (Optional, but recommended) Bump `k8s.io` dependencies to the `-rc` tags. These tags are usually published around upstream code freeze. [Example](https://github.com/kubernetes-sigs/descheduler/pull/539)
- [ ] Bump `k8s.io` dependencies to GA tags once they are published (following the upstream release). [Example](https://github.com/kubernetes-sigs/descheduler/pull/615)
- [ ] Ensure that Go is updated to the same version as upstream. [Example](https://github.com/kubernetes-sigs/descheduler/pull/801)
- [ ] Make CI changes in [github.com/kubernetes/test-infra](https://github.com/kubernetes/test-infra) to add the new version's tests (note, this may also include a Go bump). [Example](https://github.com/kubernetes/test-infra/pull/25833)
- [ ] Update local CI versions for utils (such as golang-ci), kind, and go. [Example - e2e](https://github.com/kubernetes-sigs/descheduler/commit/ac4d576df8831c0c399ee8fff1e85469e90b8c44), [Example - helm](https://github.com/kubernetes-sigs/descheduler/pull/821)
- [ ] Update version references in docs and Readme. [Example](https://github.com/kubernetes-sigs/descheduler/pull/617)
1. Make sure your repo is clean by git's standards
2. Create a release branch `git checkout -b release-1.18` (not required for patch releases)
3. Push the release branch to the descheuler repo and ensure branch protection is enabled (not required for patch releases)
4. Tag the repository and push the tag `VERSION=v0.18.0 git tag -m $VERSION $VERSION; git push origin $VERSION`
5. Checkout the tag you just created and make sure your repo is clean by git's standards `git checkout $VERSION`
6. Build and push the container image to the staging registry `VERSION=$VERSION make push`
7.Publish a draft release using the tag you just created
8.Perform the [image promotion process](https://github.com/kubernetes/k8s.io/tree/master/k8s.gcr.io#image-promoter)
9. Publish release
10.Email `kubernetes-sig-scheduling@googlegroups.com` to announce the release
## Release Process
When the above pre-release steps are complete and the release is ready to be cut, perform the following steps **in order**
(the flowchart below demonstrates these steps):
**Version release**
1.Create the `git tag` on `master` for the release, eg `v0.24.0`
2.Merge Helm chart version update to `master` (see [Helm chart](#helm-chart) below). [Example](https://github.com/kubernetes-sigs/descheduler/pull/709)
3. Perform the [image promotion process](https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io#image-promoter). [Example](https://github.com/kubernetes/k8s.io/pull/3344)
4.Cut release branch from `master`, eg `release-1.24`
5. Publish release using Github's release process from the git tag you created
6. Email `kubernetes-sig-scheduling@googlegroups.com` to announce the release
**Patch release**
1. Pick relevant code change commits to the matching release branch, eg `release-1.24`
2. Create the patch tag on the release branch, eg `v0.24.1` on `release-1.24`
3. Merge Helm chart version update to release branch
4. Perform the image promotion process for the patch version
5. Publish release using Github's release process from the git tag you created
6. Email `kubernetes-sig-scheduling@googlegroups.com` to announce the release
### Flowchart

### Image promotion process
Every merge to any branch triggers an [image build and push](https://github.com/kubernetes/test-infra/blob/c36b8e5/config/jobs/image-pushing/k8s-staging-descheduler.yaml) to a `gcr.io` repository.
These automated image builds are snapshots of the code in place at the time of every PR merge and
tagged with the latest git SHA at the time of the build. To create a final release image, the desired
auto-built image SHA is added to a [file upstream](https://github.com/kubernetes/k8s.io/blob/e9e971c/k8s.gcr.io/images/k8s-staging-descheduler/images.yaml) which
copies that image to a public registry.
Automatic builds can be monitored and re-triggered with the [`post-descheduler-push-images` job](https://prow.k8s.io/?job=post-descheduler-push-images) on prow.k8s.io.
Note that images can also be manually built and pushed using `VERSION=$VERSION make push-all` by [users with access](https://github.com/kubernetes/k8s.io/blob/fbee8f67b70304241e613a672c625ad972998ad7/groups/sig-scheduling/groups.yaml#L33-L43).
## Helm Chart
We currently use the [chart-releaser-action GitHub Action](https://github.com/helm/chart-releaser-action) to automatically
This action is triggered when it detects any changes to [`Chart.yaml`](https://github.com/kubernetes-sigs/descheduler/blob/022e07c27853fade6d1304adc0a6ebe02642386c/charts/descheduler/Chart.yaml) on
a `release-*` branch.
Helm chart releases are managed by a separate set of git tags that are prefixed with `descheduler-helm-chart-*`. Example git tag name is `descheduler-helm-chart-0.18.0`.
Released versions of the helm charts are stored in the `gh-pages` branch of this repo.
The major and minor version of the chart matches the descheduler major and minor versions. For example descheduler helm chart version helm-descheduler-chart-0.18.0 corresponds
to descheduler version v0.18.0. The patch version of the descheduler helm chart and the patcher version of the descheduler will not necessarily match. The patch
version of the descheduler helm chart is used to version changes specific to the helm chart.
1. Merge all helm chart changes into the master branch before the release is tagged/cut
1. Ensure that `appVersion` in file `charts/descheduler/Chart.yaml` matches the descheduler version(no `v` prefix)
2. Ensure that `version` in file `charts/descheduler/Chart.yaml` has been incremented. This is the chart version.
2. Make sure your repo is clean by git's standards
3. Follow the release-branch or patch release tagging pattern from the above section.
4. Verify the new helm artifact has been successfully pushed to the `gh-pages` branch
## Notes
The Helm releaser-action compares the changes in the action-triggering branch to the latest tag on that branch, so if you tag before creating the new branch there
will be nothing to compare and it will fail. This is why it's necessary to tag, eg, `v0.24.0`*before* making the changes to the
Helm chart version, so that there is a new diff for the action to find. (Tagging *after* making the Helm chart changes would
also work, but then the code that gets built into the promoted image will be tagged as `descheduler-helm-chart-xxx` rather than `v0.xx.0`).
### Notes
See [post-descheduler-push-images dashboard](https://testgrid.k8s.io/sig-scheduling#post-descheduler-push-images) for staging registry image build job status.
View the descheduler staging registry using [this URL](https://console.cloud.google.com/gcr/images/k8s-staging-descheduler/GLOBAL/descheduler) in a web browser
or use the below `gcloud` commands.
List images in staging registry.
```
gcloud container images list --repository gcr.io/k8s-staging-descheduler
@@ -48,19 +102,3 @@ Pull image from the staging registry.
Helm chart releases are managed by a separate set of git tags that are prefixed with `chart-*`. Example git tag name is `chart-0.18.0`. Released versions of the
helm charts are stored in the `gh-pages` branch of this repo. The [chart-releaser-action GitHub Action](https://github.com/helm/chart-releaser-action) is setup to
build and push the helm charts to the `gh-pages` branch when a `chart-*` git tag is created.
The major and minor version of the chart matches the descheduler major and minor versions. For example descheduler helm chart version chart-0.18.0 corresponds
to descheduler version v0.18.0. The patch version of the descheduler helm chart and the patcher version of the descheduler will not necessarily match. The patch
version of the descheduler helm chart is used to version changes specific to the helm chart.
1. Merge all helm chart changes into the appropriate release branch(i.e. release-1.18)
1. Ensure that `appVersion` in file `charts/descheduler/Chart.yaml` matches the descheduler version(no `v` prefix)
2. Ensure that `version` in file `charts/descheduler/Chart.yaml` has been incremented. This is the chart version.
2. Make sure your repo is clean by git's standards
3. Create the tag and push it `git checkout release-1.18; CHART_VERSION=chart-0.18.0; git tag $CHART_VERSION; git push origin $CHART_VERSION`
4. Verify the new helm artifact has been successfully pushed to the `gh-pages` branch
The [examples](https://github.com/kubernetes-sigs/descheduler/tree/master/examples) directory has descheduler policy configuration examples.
## CLI Options
The descheduler has many CLI options that can be used to override its default behavior.
```
descheduler --help
The descheduler evicts pods which may be bound to less desired nodes
Usage:
descheduler [flags]
descheduler [command]
Available Commands:
help Help about any command
version Version of descheduler
Flags:
--add-dir-header If true, adds the file directory to the header
--alsologtostderr log to standard error as well as files
--descheduling-interval duration Time interval between two consecutive descheduler executions. Setting this value instructs the descheduler to run in a continuous loop at the interval specified.
--dry-run execute descheduler in dry run mode.
--evict-local-storage-pods Enables evicting pods using local storage by descheduler
-h, --help help for descheduler
--kubeconfig string File with kube configuration.
--log-backtrace-at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log-dir string If non-empty, write log files in this directory
--log-file string If non-empty, use this log file
--log-file-max-size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--logtostderr log to standard error instead of files (default true)
--max-pods-to-evict-per-node int Limits the maximum number of pods to be evicted per node by descheduler
--node-selector string Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
--policy-config-file string File with descheduler policy configuration.
--skip-headers If true, avoid header prefixes in the log messages
--skip-log-headers If true, avoid headers when opening log files
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
Use "descheduler [command] --help" for more information about a command.
```
The descheduler has many CLI options that can be used to override its default behavior. Please check the [CLI Options](./cli/descheduler.md) documentation for details
## Production Use Cases
This section contains descriptions of real world production use cases.
This policy configuration file ensures that pods created more than 7 days ago are evicted.
```
---
apiVersion: "descheduler/v1alpha1"
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
strategies:
"LowNodeUtilization":
enabled: false
"RemoveDuplicates":
enabled: false
"RemovePodsViolatingInterPodAntiAffinity":
enabled: false
"RemovePodsViolatingNodeAffinity":
enabled: false
"RemovePodsViolatingNodeTaints":
enabled: false
"RemovePodsHavingTooManyRestarts":
enabled: false
"PodLifeTime":
enabled: true
params:
maxPodLifeTimeSeconds: 604800 # pods run for a maximum of 7 days
profiles:
- name: ProfileName
pluginConfig:
- name: "PodLifeTime"
args:
maxPodLifeTimeSeconds: 604800
plugins:
deschedule:
enabled:
- "PodLifeTime"
```
### Balance Cluster By Node Memory Utilization
If your cluster has been running for a long period of time, you may find that the resource utilization is not very
balanced. The following two strategies can be used to rebalance your cluster based on `cpu`, `memory`
or `number of pods`.
#### Balance high utilization nodes
Using `LowNodeUtilization`, descheduler will rebalance the cluster based on memory by evicting pods
from nodes with memory utilization over 70% to nodes with memory utilization below 20%.
```
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: ProfileName
pluginConfig:
- name: "LowNodeUtilization"
args:
thresholds:
"memory": 20
targetThresholds:
"memory": 70
plugins:
balance:
enabled:
- "LowNodeUtilization"
```
#### Balance low utilization nodes
Using `HighNodeUtilization`, descheduler will rebalance the cluster based on memory by evicting pods
from nodes with memory utilization lower than 20%. This should be use `NodeResourcesFit` with the `MostAllocated` scoring strategy based on these [doc](https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins).
The evicted pods will be compacted into minimal set of nodes.
```
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: ProfileName
pluginConfig:
- name: "HighNodeUtilization"
args:
thresholds:
"memory": 20
plugins:
balance:
enabled:
- "HighNodeUtilization"
```
### Autoheal Node Problems
Descheduler's `RemovePodsViolatingNodeTaints` strategy can be combined with
[Node Problem Detector](https://github.com/kubernetes/node-problem-detector/) and
[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) to automatically remove
Nodes which have problems. Node Problem Detector can detect specific Node problems and report them to the API server.
There is a feature called TaintNodeByCondition of the node controller that takes some conditions and turns them into taints. Currently, this only works for the default node conditions: PIDPressure, MemoryPressure, DiskPressure, Ready, and some cloud provider specific conditions.
The Descheduler will then deschedule workloads from those Nodes. Finally, if the descheduled Node's resource
allocation falls below the Cluster Autoscaler's scale down threshold, the Node will become a scale down candidate
and can be removed by Cluster Autoscaler. These three components form an autohealing cycle for Node problems.
---
**NOTE**
Once [kubernetes/node-problem-detector#565](https://github.com/kubernetes/node-problem-detector/pull/565) is available in NPD, we need to update this section.
echo"+++ Creating a pull request on GitHub at ${GITHUB_USER}:${NEWBRANCH}"
local numandtitle
numandtitle=$(printf'%s\n'"${SUBJECTS[@]}")
prtext=$(cat <<EOF
Cherry pick of ${PULLSUBJ} on ${rel}.
${numandtitle}
For details on the cherry pick process, see the [cherry pick requests](https://git.k8s.io/community/contributors/devel/sig-release/cherry-picks.md) page.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.