this commit introduces a new customization on the existing PodsWithPVC
protection. this new customization allow users to make pods that refer
to a given storage class unevictable.
for example, to protect pods referring to `storage-class-0` and
`storage-class-1` this configuration can be used:
```yaml
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: ProfileName
pluginConfig:
- name: "DefaultEvictor"
args:
podProtections:
extraEnabled:
- PodsWithPVC
config:
PodsWithPVC:
protectedStorageClasses:
- name: storage-class-0
- name: storage-class-1
```
changes introduced by this pr:
1. the descheduler starts to observe persistent volume claims.
1. a new api field was introduced to allow per pod protection config.
1. rbac had to be adjusted (+persistentvolumeclaims).
NoEvictionPolicy dictates whether a no-eviction policy is prefered or mandatory.
Needs to be used with caution as this will give users ability to protect their pods
from eviction. Which might work against enfored policies. E.g. plugins evicting pods
violating security policies.
with strict eviction policy the descheduler only evict pods if the pod
contains a request for the given threshold. for example, if using a
threshold for an extended resource called `example.com/gpu` only pods
who request such a resource will be evicted.
In some cases it might be usefull to limit how many evictions per a
domain can be performed. To avoid burning the whole per descheduling
cycle budget. Limiting the number of evictions per node is a
prerequisite for evicting pods whose usage can't be easily subtracted
from overall node resource usage to predict the final usage. E.g. when a
pod is evicted due to high PSI pressure which takes into account many
factors which can be fully captured by the current predictive resource
model.
* add ignoreNonPDBPods option
* take2
* add test
* poddisruptionbudgets are now used by defaultevictor plugin
* add poddisruptionbudgets to rbac
* review comments
* don't use GetPodPodDisruptionBudgets
* review comment, don't hide error
* skip eviction when pod creation time is below minPodAge threshold setting
In the default initialization phase of the descheduler, add a new
constraint to not evict pods that creation time is below minPodAge
threshold.
Added value:
- Avoid crazy pod movement when the autoscaler scales up and down.
- Avoid evicting pods when they are warming up.
- Decreases the overall cost of eviction as no pod will be evicted
before doing significant amount of work.
- Guard against scheduling. Descheduling loops in situations where
the descheduler has a different node fit logic from scheduler,
like not considering topology spread constraints.
* Use *time.Duration instead of uint for MinPodAge type
* Remove '(in minutes)' from default evictor configuration table
* make fmt
* Add explicit name for Duration field
* Use Duration.String()
* Check if Pod matches inter-pod anti-affinity of other pod on node as part of NodeFit()
* Add unit tests for checking inter-pod anti-affinity match in NodeFit()
* Export setPodAntiAffinity() helper func to test utils
* Add docs for inter-pod anti-affinity in README
* Refactor logic for inter-pod anti-affinity to use in multiple pkgs
* Move logic for finding match between pods with antiaffinity out of framework to reuse in other pkgs
* Move interpod antiaffinity funcs to pkg/utils/predicates.go
* Add unit tests for inter-pod anti-affinity check
* Test logic in GroupByNodeName
* Test NodeFit() case where pods matches inter-pod anti-affinity
* Test for inter-pod anti-affinity pods match terms, have label selector
* NodeFit inter-pod anti-affinity check returns early if affinity spec not set
* feat: Implement preferredDuringSchedulingIgnoredDuringExecution for RemovePodsViolatingNodeAffinity
Now, the descheduler can detect and evict pods that are not optimally
allocated according to the "preferred..." node affinity. It only evicts
a pod if it can be scheduled on a node that scores higher in terms of
preferred node affinity than the current one.
This can be activated by enabling the RemovePodsViolatingNodeAffinity
plugin and passing "preferredDuringSchedulingIgnoredDuringExecution" in
the args.
For example, imagine we have a pod that prefers nodes with label "key1:
value1" with a weight of 10. If this pod is scheduled on a node that
doesn't have "key1: value1" as label but there's another node that has
this label and where this pod can potentially run, then the descheduler
will evict the pod.
Another effect of this commit is that the
RemovePodsViolatingNodeAffinity plugin will not remove pods that don't
fit in the current node but for other reasons than violating the node
affinity. Before that, enabling this plugin could cause evictions on
pods that were running on tainted nodes without the necessary
tolerations.
This commit also fixes the wording of some tests from
node_affinity_test.go and some parameters and expectations of these
tests, which were wrong.
* Optimization on RemovePodsViolatingNodeAffinity
Before checking if a pod can be evicted or if it can be scheduled
somewhere else, we first check if it has the corresponding nodeAffinity
field defined. Otherwise, the pod is automatically discarded as a
candidate.
Apart from that, the method that calculates the weight that a pod
gives to a node based on its preferred node affinity has been
renamed to better reflect what it does.