mirror of
https://github.com/kubernetes-sigs/descheduler.git
synced 2026-01-27 14:01:12 +01:00
Merge pull request #244 from KohlsTechnology/docs-revamp
Documentation Refresh
This commit is contained in:
182
README.md
182
README.md
@@ -9,8 +9,8 @@ Scheduling in Kubernetes is the process of binding pending pods to nodes, and is
|
||||
a component of Kubernetes called kube-scheduler. The scheduler's decisions, whether or where a
|
||||
pod can or can not be scheduled, are guided by its configurable policy which comprises of set of
|
||||
rules, called predicates and priorities. The scheduler's decisions are influenced by its view of
|
||||
a Kubernetes cluster at that point of time when a new pod appears first time for scheduling.
|
||||
As Kubernetes clusters are very dynamic and their state change over time, there may be desired
|
||||
a Kubernetes cluster at that point of time when a new pod appears for scheduling.
|
||||
As Kubernetes clusters are very dynamic and their state changes over time, there may be desire
|
||||
to move already running pods to some other nodes for various reasons:
|
||||
|
||||
* Some nodes are under or over utilized.
|
||||
@@ -24,78 +24,48 @@ Descheduler, based on its policy, finds pods that can be moved and evicts them.
|
||||
note, in current implementation, descheduler does not schedule replacement of evicted pods
|
||||
but relies on the default scheduler for that.
|
||||
|
||||
## Build and Run
|
||||
## Quick Start
|
||||
|
||||
- Checkout the repo into your $GOPATH directory under src/sigs.k8s.io/descheduler
|
||||
The descheduler can be run as a Job or CronJob inside of a k8s cluster. It has the
|
||||
advantage of being able to be run multiple times without needing user intervention.
|
||||
The descheduler pod is run as a critical pod in the `kube-system` namespace to avoid
|
||||
being evicted by itself or by the kubelet.
|
||||
|
||||
Build descheduler:
|
||||
|
||||
```sh
|
||||
$ make
|
||||
```
|
||||
|
||||
and run descheduler:
|
||||
|
||||
```sh
|
||||
$ ./_output/bin/descheduler --kubeconfig <path to kubeconfig> --policy-config-file <path-to-policy-file>
|
||||
```
|
||||
|
||||
If you want more information about what descheduler is doing add `-v 1` to the command line
|
||||
|
||||
For more information about available options run:
|
||||
```
|
||||
$ ./_output/bin/descheduler --help
|
||||
```
|
||||
|
||||
## Running Descheduler as a Job or CronJob
|
||||
|
||||
The descheduler can be run as a job or cronjob inside of a pod. It has the advantage of
|
||||
being able to be run multiple times without needing user intervention.
|
||||
The descheduler pod is run as a critical pod to avoid being evicted by itself,
|
||||
or by the kubelet due to an eviction event. Since critical pods are created in the
|
||||
`kube-system` namespace, the descheduler job and its pod will also be created
|
||||
in `kube-system` namespace.
|
||||
|
||||
### Setup RBAC
|
||||
|
||||
To give necessary permissions for the descheduler to work in a pod.
|
||||
### Run As A Job
|
||||
|
||||
```
|
||||
$ kubectl create -f kubernetes/rbac.yaml
|
||||
kubectl create -f kubernetes/rbac.yaml
|
||||
kubectl create -f kubernetes/configmap.yaml
|
||||
kubectl create -f kubernetes/job.yaml
|
||||
```
|
||||
|
||||
### Create a configmap to store descheduler policy
|
||||
### Run As A CronJob
|
||||
|
||||
```
|
||||
$ kubectl create -f kubernetes/configmap.yaml
|
||||
kubectl create -f kubernetes/rbac.yaml
|
||||
kubectl create -f kubernetes/configmap.yaml
|
||||
kubectl create -f kubernetes/cronjob.yaml
|
||||
```
|
||||
|
||||
### Create a Job or CronJob
|
||||
## User Guide
|
||||
|
||||
As a Job.
|
||||
```
|
||||
$ kubectl create -f kubernetes/job.yaml
|
||||
```
|
||||
|
||||
Or as a CronJob.
|
||||
```
|
||||
$ kubectl create -f kubernetes/cronjob.yaml
|
||||
```
|
||||
See the [user guide](docs/user-guide.md) in the `/docs` directory.
|
||||
|
||||
## Policy and Strategies
|
||||
|
||||
Descheduler's policy is configurable and includes strategies to be enabled or disabled.
|
||||
Five strategies, `RemoveDuplicates`, `LowNodeUtilization`, `RemovePodsViolatingInterPodAntiAffinity`, `RemovePodsViolatingNodeAffinity` , `RemovePodsViolatingNodeTaints` are currently implemented.
|
||||
Descheduler's policy is configurable and includes strategies that can be enabled or disabled.
|
||||
Five strategies `RemoveDuplicates`, `LowNodeUtilization`, `RemovePodsViolatingInterPodAntiAffinity`,
|
||||
`RemovePodsViolatingNodeAffinity`, and `RemovePodsViolatingNodeTaints` are currently implemented.
|
||||
As part of the policy, the parameters associated with the strategies can be configured too.
|
||||
By default, all strategies are enabled.
|
||||
|
||||
### RemoveDuplicates
|
||||
|
||||
This strategy makes sure that there is only one pod associated with a Replica Set (RS),
|
||||
Replication Controller (RC), Deployment, or Job running on same node. If there are more,
|
||||
Replication Controller (RC), Deployment, or Job running on the same node. If there are more,
|
||||
those duplicate pods are evicted for better spreading of pods in a cluster. This issue could happen
|
||||
if some nodes went down due to whatever reasons, and pods on them were moved to other nodes leading to
|
||||
more than one pod associated with RS or RC, for example, running on same node. Once the failed nodes
|
||||
more than one pod associated with a RS or RC, for example, running on the same node. Once the failed nodes
|
||||
are ready again, this strategy could be enabled to evict those duplicate pods. Currently, there are no
|
||||
parameters associated with this strategy. To disable this strategy, the policy should look like:
|
||||
|
||||
@@ -113,10 +83,10 @@ This strategy finds nodes that are under utilized and evicts pods, if possible,
|
||||
in the hope that recreation of evicted pods will be scheduled on these underutilized nodes. The
|
||||
parameters of this strategy are configured under `nodeResourceUtilizationThresholds`.
|
||||
|
||||
The under utilization of nodes is determined by a configurable threshold, `thresholds`. The threshold
|
||||
The under utilization of nodes is determined by a configurable threshold `thresholds`. The threshold
|
||||
`thresholds` can be configured for cpu, memory, and number of pods in terms of percentage. If a node's
|
||||
usage is below threshold for all (cpu, memory, and number of pods), the node is considered underutilized.
|
||||
Currently, pods' request resource requirements are considered for computing node resource utilization.
|
||||
Currently, pods request resource requirements are considered for computing node resource utilization.
|
||||
|
||||
There is another configurable threshold, `targetThresholds`, that is used to compute those potential nodes
|
||||
from where pods could be evicted. Any node, between the thresholds, `thresholds` and `targetThresholds` is
|
||||
@@ -124,7 +94,7 @@ considered appropriately utilized and is not considered for eviction. The thresh
|
||||
can be configured for cpu, memory, and number of pods too in terms of percentage.
|
||||
|
||||
These thresholds, `thresholds` and `targetThresholds`, could be tuned as per your cluster requirements.
|
||||
An example of the policy for this strategy would look like:
|
||||
Here is an example of a policy for this strategy:
|
||||
|
||||
```
|
||||
apiVersion: "descheduler/v1alpha1"
|
||||
@@ -144,14 +114,19 @@ strategies:
|
||||
"pods": 50
|
||||
```
|
||||
|
||||
There is another parameter associated with `LowNodeUtilization` strategy, called `numberOfNodes`.
|
||||
This parameter can be configured to activate the strategy only when number of under utilized nodes
|
||||
There is another parameter associated with the `LowNodeUtilization` strategy, called `numberOfNodes`.
|
||||
This parameter can be configured to activate the strategy only when the number of under utilized nodes
|
||||
are above the configured value. This could be helpful in large clusters where a few nodes could go
|
||||
under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero.
|
||||
|
||||
### RemovePodsViolatingInterPodAntiAffinity
|
||||
|
||||
This strategy makes sure that pods violating interpod anti-affinity are removed from nodes. For example, if there is podA on node and podB and podC(running on same node) have antiaffinity rules which prohibit them to run on the same node, then podA will be evicted from the node so that podB and podC could run. This issue could happen, when the anti-affinity rules for pods B,C are created when they are already running on node. Currently, there are no parameters associated with this strategy. To disable this strategy, the policy should look like:
|
||||
This strategy makes sure that pods violating interpod anti-affinity are removed from nodes. For example,
|
||||
if there is podA on a node and podB and podC (running on the same node) have anti-affinity rules which prohibit
|
||||
them to run on the same node, then podA will be evicted from the node so that podB and podC could run. This
|
||||
issue could happen, when the anti-affinity rules for podB and podC are created when they are already running on
|
||||
node. Currently, there are no parameters associated with this strategy. To disable this strategy, the
|
||||
policy should look like:
|
||||
|
||||
```
|
||||
apiVersion: "descheduler/v1alpha1"
|
||||
@@ -163,7 +138,11 @@ strategies:
|
||||
|
||||
### RemovePodsViolatingNodeAffinity
|
||||
|
||||
This strategy makes sure that pods violating node affinity are removed from nodes. For example, there is podA that was scheduled on nodeA which satisfied the node affinity rule `requiredDuringSchedulingIgnoredDuringExecution` at the time of scheduling, but over time nodeA no longer satisfies the rule, then if another node nodeB is available that satisfies the node affinity rule, then podA will be evicted from nodeA. The policy file should like this -
|
||||
This strategy makes sure that pods violating node affinity are removed from nodes. For example, there is
|
||||
podA that was scheduled on nodeA which satisfied the node affinity rule `requiredDuringSchedulingIgnoredDuringExecution`
|
||||
at the time of scheduling, but over time nodeA no longer satisfies the rule, then if another node nodeB
|
||||
is available that satisfies the node affinity rule, then podA will be evicted from nodeA. The policy file
|
||||
should look like:
|
||||
|
||||
```
|
||||
apiVersion: "descheduler/v1alpha1"
|
||||
@@ -175,9 +154,13 @@ strategies:
|
||||
nodeAffinityType:
|
||||
- "requiredDuringSchedulingIgnoredDuringExecution"
|
||||
```
|
||||
|
||||
### RemovePodsViolatingNodeTaints
|
||||
|
||||
This strategy makes sure that pods violating NoSchedule taints on nodes are removed. For example: there is a pod "podA" with toleration to tolerate a taint ``key=value:NoSchedule`` scheduled and running on the tainted node. If the node's taint is subsequently updated/removed, taint is no longer satisfied by its pods' tolerations and will be evicted. The policy file should look like:
|
||||
This strategy makes sure that pods violating NoSchedule taints on nodes are removed. For example there is a
|
||||
pod "podA" with a toleration to tolerate a taint ``key=value:NoSchedule`` scheduled and running on the tainted
|
||||
node. If the node's taint is subsequently updated/removed, taint is no longer satisfied by its pods' tolerations
|
||||
and will be evicted. The policy file should look like:
|
||||
|
||||
````
|
||||
apiVersion: "descheduler/v1alpha1"
|
||||
@@ -186,23 +169,68 @@ strategies:
|
||||
"RemovePodsViolatingNodeTaints":
|
||||
enabled: true
|
||||
````
|
||||
|
||||
## Pod Evictions
|
||||
|
||||
When the descheduler decides to evict pods from a node, it employs following general mechanism:
|
||||
When the descheduler decides to evict pods from a node, it employs the following general mechanism:
|
||||
|
||||
* [Critical pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/) (with priorityClassName set to system-cluster-critical or system-node-critical) are never evicted.
|
||||
* Pods (static or mirrored pods or stand alone pods) not part of an RC, RS, Deployment or Jobs are
|
||||
* Pods (static or mirrored pods or stand alone pods) not part of an RC, RS, Deployment or Job are
|
||||
never evicted because these pods won't be recreated.
|
||||
* Pods associated with DaemonSets are never evicted.
|
||||
* Pods with local storage are never evicted.
|
||||
* Best efforts pods are evicted before Burstable and Guaranteed pods.
|
||||
* All types of pods with annotation descheduler.alpha.kubernetes.io/evict are evicted. This
|
||||
annotation is used to override checks which prevent eviction and user can select which pod is evicted.
|
||||
User should know how and if the pod will be recreated.
|
||||
* Best efforts pods are evicted before burstable and guaranteed pods.
|
||||
* All types of pods with the annotation descheduler.alpha.kubernetes.io/evict are evicted. This
|
||||
annotation is used to override checks which prevent eviction and users can select which pod is evicted.
|
||||
Users should know how and if the pod will be recreated.
|
||||
|
||||
### Pod disruption Budget (PDB)
|
||||
Pods subject to Pod Disruption Budget (PDB) are not evicted if descheduling violates its pod
|
||||
disruption budget (PDB). The pods are evicted by using eviction subresource to handle PDB.
|
||||
### Pod Disruption Budget (PDB)
|
||||
|
||||
Pods subject to a Pod Disruption Budget(PDB) are not evicted if descheduling violates its PDB. The pods
|
||||
are evicted by using the eviction subresource to handle PDB.
|
||||
|
||||
## Compatibility Matrix
|
||||
|
||||
Descheduler | supported Kubernetes version
|
||||
-------------|-----------------------------
|
||||
v0.10 | v1.17
|
||||
v0.4-v0.9 | v1.9+
|
||||
v0.1-v0.3 | v1.7-v1.8
|
||||
|
||||
|
||||
## Getting Involved and Contributing
|
||||
|
||||
Are you interested in contributing to descheduler? We, the
|
||||
maintainers and community, would love your suggestions, contributions, and help!
|
||||
Also, the maintainers can be contacted at any time to learn more about how to get
|
||||
involved.
|
||||
|
||||
To get started writing code see the [contributor guide](docs/contributor-guide.md) in the `/docs` directory.
|
||||
|
||||
In the interest of getting more new people involved we tag issues with
|
||||
[`good first issue`][good_first_issue].
|
||||
These are typically issues that have smaller scope but are good ways to start
|
||||
to get acquainted with the codebase.
|
||||
|
||||
We also encourage ALL active community participants to act as if they are
|
||||
maintainers, even if you don't have "official" write permissions. This is a
|
||||
community effort, we are here to serve the Kubernetes community. If you have an
|
||||
active interest and you want to get involved, you have real power! Don't assume
|
||||
that the only people who can get things done around here are the "maintainers".
|
||||
|
||||
We also would love to add more "official" maintainers, so show us what you can
|
||||
do!
|
||||
|
||||
This repository uses the Kubernetes bots. See a full list of the commands [here][prow].
|
||||
|
||||
### Communicating With Contributors
|
||||
|
||||
You can reach the contributors of this project at:
|
||||
|
||||
- [Slack channel](https://kubernetes.slack.com/messages/sig-scheduling)
|
||||
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling)
|
||||
|
||||
Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
|
||||
|
||||
## Roadmap
|
||||
|
||||
@@ -216,22 +244,6 @@ This roadmap is not in any particular order.
|
||||
* Consideration of Kubernetes's scheduler's predicates
|
||||
|
||||
|
||||
## Compatibility matrix
|
||||
|
||||
Descheduler | supported Kubernetes version
|
||||
-------------|-----------------------------
|
||||
0.4+ | 1.9+
|
||||
0.1-0.3 | 1.7-1.8
|
||||
|
||||
## Community, discussion, contribution, and support
|
||||
|
||||
Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
|
||||
|
||||
You can reach the maintainers of this project at:
|
||||
|
||||
- [Slack channel](https://kubernetes.slack.com/messages/sig-scheduling)
|
||||
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling)
|
||||
|
||||
### Code of conduct
|
||||
|
||||
Participation in the Kubernetes community is governed by the [Kubernetes Code of Conduct](code-of-conduct.md).
|
||||
|
||||
5
docs/README.md
Normal file
5
docs/README.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Documentation Index
|
||||
|
||||
- [Contributor Guide](contributor-guide.md)
|
||||
- [Release Guide](release-guide.md)
|
||||
- [User Guide](user-guide.md)
|
||||
42
docs/contributor-guide.md
Normal file
42
docs/contributor-guide.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Contributor Guide
|
||||
|
||||
## Required Tools
|
||||
|
||||
- [Git](https://git-scm.com/downloads)
|
||||
- [Go 1.13+](https://golang.org/dl/)
|
||||
- [Docker](https://docs.docker.com/install/)
|
||||
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
- [kind](https://kind.sigs.k8s.io/)
|
||||
|
||||
## Build and Run
|
||||
|
||||
Build descheduler.
|
||||
```sh
|
||||
cd $GOPATH/src/sigs.k8s.io
|
||||
git clone https://github.com/kubernetes-sigs/descheduler.git
|
||||
cd descheduler
|
||||
make
|
||||
```
|
||||
|
||||
Run descheduler.
|
||||
```sh
|
||||
./_output/bin/descheduler --kubeconfig <path to kubeconfig> --policy-config-file <path-to-policy-file> --v 1
|
||||
```
|
||||
|
||||
View all CLI options.
|
||||
```
|
||||
./_output/bin/descheduler --help
|
||||
```
|
||||
|
||||
## Run Tests
|
||||
```
|
||||
GOOS=linux make dev-image
|
||||
kind create cluster --config hack/kind_config.yaml
|
||||
kind load docker-image <image name>
|
||||
kind get kubeconfig > /tmp/admin.conf
|
||||
make test-unit
|
||||
make test-e2e
|
||||
```
|
||||
|
||||
### Miscellaneous
|
||||
See the [hack directory](https://github.com/kubernetes-sigs/descheduler/tree/master/hack) for additional tools and scripts used for developing the descheduler.
|
||||
@@ -1,4 +1,4 @@
|
||||
# Release process
|
||||
# Release Guide
|
||||
|
||||
## Semi-automatic
|
||||
|
||||
47
docs/user-guide.md
Normal file
47
docs/user-guide.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# User Guide
|
||||
|
||||
Starting with descheduler release v0.10.0 container images are available in these container registries.
|
||||
* `asia.gcr.io/k8s-artifacts-prod/descheduler/descheduler`
|
||||
* `eu.gcr.io/k8s-artifacts-prod/descheduler/descheduler`
|
||||
* `us.gcr.io/k8s-artifacts-prod/descheduler/descheduler`
|
||||
|
||||
The [examples](https://github.com/kubernetes-sigs/descheduler/tree/master/examples) directory has descheduler policy configuration examples.
|
||||
|
||||
The descheduler has many CLI options that can be used to override its default behavior.
|
||||
```
|
||||
descheduler --help
|
||||
The descheduler evicts pods which may be bound to less desired nodes
|
||||
|
||||
Usage:
|
||||
descheduler [flags]
|
||||
descheduler [command]
|
||||
|
||||
Available Commands:
|
||||
help Help about any command
|
||||
version Version of descheduler
|
||||
|
||||
Flags:
|
||||
--add-dir-header If true, adds the file directory to the header
|
||||
--alsologtostderr log to standard error as well as files
|
||||
--descheduling-interval duration Time interval between two consecutive descheduler executions. Setting this value instructs the descheduler to run in a continuous loop at the interval specified.
|
||||
--dry-run execute descheduler in dry run mode.
|
||||
--evict-local-storage-pods Enables evicting pods using local storage by descheduler
|
||||
-h, --help help for descheduler
|
||||
--kubeconfig string File with kube configuration.
|
||||
--log-backtrace-at traceLocation when logging hits line file:N, emit a stack trace (default :0)
|
||||
--log-dir string If non-empty, write log files in this directory
|
||||
--log-file string If non-empty, use this log file
|
||||
--log-file-max-size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
|
||||
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
|
||||
--logtostderr log to standard error instead of files (default true)
|
||||
--max-pods-to-evict-per-node int Limits the maximum number of pods to be evicted per node by descheduler
|
||||
--node-selector string Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
|
||||
--policy-config-file string File with descheduler policy configuration.
|
||||
--skip-headers If true, avoid header prefixes in the log messages
|
||||
--skip-log-headers If true, avoid headers when opening log files
|
||||
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
|
||||
-v, --v Level number for the log level verbosity
|
||||
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
|
||||
|
||||
Use "descheduler [command] --help" for more information about a command.
|
||||
```
|
||||
Reference in New Issue
Block a user