Adding highnodeutilization strategy

2026-01-26 05:14:13 +01:00 · 2021-05-07 11:11:33 +08:00
parent 2f18864fa5
commit 4cd1e66ef3
6 changed files with 959 additions and 6 deletions
--- a/README.md
+++ b/README.md
@@ -34,6 +34,7 @@ Table of Contents
 - [Policy and Strategies](#policy-and-strategies)
  - [RemoveDuplicates](#removeduplicates)
  - [LowNodeUtilization](#lownodeutilization)
+  - [HighNodeUtilization](#highnodeutilization)
  - [RemovePodsViolatingInterPodAntiAffinity](#removepodsviolatinginterpodantiaffinity)
  - [RemovePodsViolatingNodeAffinity](#removepodsviolatingnodeaffinity)
  - [RemovePodsViolatingNodeTaints](#removepodsviolatingnodetaints)
@@ -107,9 +108,17 @@ See the [user guide](docs/user-guide.md) in the `/docs` directory.
 ## Policy and Strategies

 Descheduler's policy is configurable and includes strategies that can be enabled or disabled.
-Eight strategies `RemoveDuplicates`, `LowNodeUtilization`, `RemovePodsViolatingInterPodAntiAffinity`,
-`RemovePodsViolatingNodeAffinity`, `RemovePodsViolatingNodeTaints`, `RemovePodsViolatingTopologySpreadConstraint`,
-`RemovePodsHavingTooManyRestarts`, and `PodLifeTime` are currently implemented. As part of the policy, the
+Nine strategies 
+1. `RemoveDuplicates`
+2. `LowNodeUtilization`
+3. `HighNodeUtilization`
+4. `RemovePodsViolatingInterPodAntiAffinity`
+5. `RemovePodsViolatingNodeAffinity`
+6. `RemovePodsViolatingNodeTaints`
+7. `RemovePodsViolatingTopologySpreadConstraint`
+8. `RemovePodsHavingTooManyRestarts`
+9. `PodLifeTime`  
+are currently implemented. As part of the policy, the
 parameters associated with the strategies can be configured too. By default, all strategies are enabled.

 The following diagram provides a visualization of most of the strategies to help
@@ -240,6 +249,58 @@ This parameter can be configured to activate the strategy only when the number o
 are above the configured value. This could be helpful in large clusters where a few nodes could go
 under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero.

+### HighNodeUtilization
+
+This strategy finds nodes that are under utilized and evicts pods in the hope that these pods will be scheduled compactly into fewer nodes. This strategy **must** be used with the 
+scheduler strategy `MostRequestedPriority`. The parameters of this strategy are configured under `nodeResourceUtilizationThresholds`.
+
+The under utilization of nodes is determined by a configurable threshold `thresholds`. The threshold
+`thresholds` can be configured for cpu, memory, number of pods, and extended resources in terms of percentage. The percentage is
+calculated as the current resources requested on the node vs [total allocatable](https://kubernetes.io/docs/concepts/architecture/nodes/#capacity).
+For pods, this means the number of pods on the node as a fraction of the pod capacity set for that node.
+
+If a node's usage is below threshold for all (cpu, memory, number of pods and extended resources), the node is considered underutilized.
+Currently, pods request resource requirements are considered for computing node resource utilization. Any node above `thresholds` is considered appropriately utilized and is not considered for eviction. 
+
+The `thresholds` param could be tuned as per your cluster requirements. Note that this
+strategy evicts pods from `underutilized nodes` (those with usage below `thresholds`) so that they can be recreated in appropriately utilized nodes. The strategy will abort if any number of `underutilized nodes` or `appropriately utilized nodes` is zero.
+
+**Parameters:**
+
+|Name|Type|
+|---|---|
+|`thresholds`|map(string:int)|
+|`numberOfNodes`|int|
+|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
+|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
+
+**Example:**
+
+```yaml
+apiVersion: "descheduler/v1alpha1"
+kind: "DeschedulerPolicy"
+strategies:
+  "HighNodeUtilization":
+     enabled: true
+     params:
+       nodeResourceUtilizationThresholds:
+         thresholds:
+           "cpu" : 20
+           "memory": 20
+           "pods": 20
+```
+
+Policy should pass the following validation checks:
+* Three basic native types of resources are supported: `cpu`, `memory` and `pods`. If any of these resource types is not specified, all its thresholds default to 100%.
+* Extended resources are supported. For example, resource type `nvidia.com/gpu` is specified for GPU node utilization. Extended resources are optional, and will not be used to compute node's usage if it's not specified in `thresholds` explicitly.
+* `thresholds` can not be nil.
+* The valid range of the resource's percentage value is \[0, 100\]
+
+There is another parameter associated with the `HighNodeUtilization` strategy, called `numberOfNodes`.
+This parameter can be configured to activate the strategy only when the number of under utilized nodes
+is above the configured value. This could be helpful in large clusters where a few nodes could go
+under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero.
+
 ### RemovePodsViolatingInterPodAntiAffinity

 This strategy makes sure that pods violating interpod anti-affinity are removed from nodes. For example,
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -92,12 +92,12 @@ strategies:
 ```

 ### Balance Cluster By Node Memory Utilization
-
 If your cluster has been running for a long period of time, you may find that the resource utilization is not very
-balanced. The `LowNodeUtilization` strategy can be used to rebalance your cluster based on `cpu`, `memory`
+balanced. The following two strategies can be used to rebalance your cluster based on `cpu`, `memory` 
 or `number of pods`.

-Using the following policy configuration file, descheduler will rebalance the cluster based on memory by evicting pods
+#### Balance high utilization nodes
+Using `LowNodeUtilization`, descheduler will rebalance the cluster based on memory by evicting pods
 from nodes with memory utilization over 70% to nodes with memory utilization below 20%.

 ```
@@ -114,6 +114,23 @@ strategies:
          "memory": 70
 ```

+#### Balance low utilization nodes
+Using `HighNodeUtilization`, descheduler will rebalance the cluster based on memory by evicting pods
+from nodes with memory utilization lower than 20%. This should be used along with scheduler strategy `MostRequestedPriority`.
+The evicted pods will be compacted into minimal set of nodes.
+
+```
+apiVersion: "descheduler/v1alpha1"
+kind: "DeschedulerPolicy"
+strategies:
+  "HighNodeUtilization":
+    enabled: true
+    params:
+      nodeResourceUtilizationThresholds:
+        thresholds:
+          "memory": 20
+```
+
 ### Autoheal Node Problems
 Descheduler's `RemovePodsViolatingNodeTaints` strategy can be combined with
 [Node Problem Detector](https://github.com/kubernetes/node-problem-detector/) and
--- a/examples/high-node-utilization.yml
+++ b/examples/high-node-utilization.yml
@@ -0,0 +1,10 @@
+---
+apiVersion: "descheduler/v1alpha1"
+kind: "DeschedulerPolicy"
+strategies:
+  "HighNodeUtilization":
+    enabled: true
+    params:
+      nodeResourceUtilizationThresholds:
+        thresholds:
+          "memory": 20
--- a/pkg/descheduler/descheduler.go
+++ b/pkg/descheduler/descheduler.go
@@ -76,6 +76,7 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
 	strategyFuncs := map[api.StrategyName]strategyFunction{
 		"RemoveDuplicates":                            strategies.RemoveDuplicatePods,
 		"LowNodeUtilization":                          nodeutilization.LowNodeUtilization,
+		"HighNodeUtilization":                         nodeutilization.HighNodeUtilization,
 		"RemovePodsViolatingInterPodAntiAffinity":     strategies.RemovePodsViolatingInterPodAntiAffinity,
 		"RemovePodsViolatingNodeAffinity":             strategies.RemovePodsViolatingNodeAffinity,
 		"RemovePodsViolatingNodeTaints":               strategies.RemovePodsViolatingNodeTaints,
--- a/pkg/descheduler/strategies/nodeutilization/highnodeutilization.go
+++ b/pkg/descheduler/strategies/nodeutilization/highnodeutilization.go
@@ -0,0 +1,157 @@
+/*
+Copyright 2021 The Kubernetes Authors.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package nodeutilization
+
+import (
+	"context"
+	"fmt"
+	v1 "k8s.io/api/core/v1"
+	"k8s.io/apimachinery/pkg/api/resource"
+	clientset "k8s.io/client-go/kubernetes"
+	"k8s.io/klog/v2"
+	"sigs.k8s.io/descheduler/pkg/api"
+	"sigs.k8s.io/descheduler/pkg/descheduler/evictions"
+	nodeutil "sigs.k8s.io/descheduler/pkg/descheduler/node"
+	"sigs.k8s.io/descheduler/pkg/utils"
+)
+
+// HighNodeUtilization evicts pods from under utilized nodes so that scheduler can schedule according to its strategy.
+// Note that CPU/Memory requests are used to calculate nodes' utilization and not the actual resource usage.
+func HighNodeUtilization(ctx context.Context, client clientset.Interface, strategy api.DeschedulerStrategy, nodes []*v1.Node, podEvictor *evictions.PodEvictor) {
+	if err := validateNodeUtilizationParams(strategy.Params); err != nil {
+		klog.ErrorS(err, "Invalid HighNodeUtilization parameters")
+		return
+	}
+	thresholdPriority, err := utils.GetPriorityFromStrategyParams(ctx, client, strategy.Params)
+	if err != nil {
+		klog.ErrorS(err, "Failed to get threshold priority from strategy's params")
+		return
+	}
+
+	thresholds := strategy.Params.NodeResourceUtilizationThresholds.Thresholds
+	targetThresholds := strategy.Params.NodeResourceUtilizationThresholds.TargetThresholds
+	if err := validateHighUtilizationStrategyConfig(thresholds, targetThresholds); err != nil {
+		klog.ErrorS(err, "HighNodeUtilization config is not valid")
+		return
+	}
+	targetThresholds = make(api.ResourceThresholds)
+
+	setDefaultForThresholds(thresholds, targetThresholds)
+	resourceNames := getResourceNames(targetThresholds)
+
+	sourceNodes, highNodes := classifyNodes(
+		getNodeUsage(ctx, client, nodes, thresholds, targetThresholds, resourceNames),
+		func(node *v1.Node, usage NodeUsage) bool {
+			return isNodeWithLowUtilization(usage)
+		},
+		func(node *v1.Node, usage NodeUsage) bool {
+			if nodeutil.IsNodeUnschedulable(node) {
+				klog.V(2).InfoS("Node is unschedulable", "node", klog.KObj(node))
+				return false
+			}
+			return !isNodeWithLowUtilization(usage)
+		})
+
+	// log message in one line
+	keysAndValues := []interface{}{
+		"CPU", targetThresholds[v1.ResourceCPU],
+		"Mem", targetThresholds[v1.ResourceMemory],
+		"Pods", targetThresholds[v1.ResourcePods],
+	}
+	for name := range targetThresholds {
+		if !isBasicResource(name) {
+			keysAndValues = append(keysAndValues, string(name), int64(targetThresholds[name]))
+		}
+	}
+
+	klog.V(1).InfoS("Criteria for a node below target utilization", keysAndValues...)
+	klog.V(1).InfoS("Number of underutilized nodes", "totalNumber", len(sourceNodes))
+
+	if len(sourceNodes) == 0 {
+		klog.V(1).InfoS("No node is underutilized, nothing to do here, you might tune your thresholds further")
+		return
+	}
+	if len(sourceNodes) < strategy.Params.NodeResourceUtilizationThresholds.NumberOfNodes {
+		klog.V(1).InfoS("Number of nodes underutilized is less than NumberOfNodes, nothing to do here", "underutilizedNodes", len(sourceNodes), "numberOfNodes", strategy.Params.NodeResourceUtilizationThresholds.NumberOfNodes)
+		return
+	}
+	if len(sourceNodes) == len(nodes) {
+		klog.V(1).InfoS("All nodes are underutilized, nothing to do here")
+		return
+	}
+	if len(highNodes) == 0 {
+		klog.V(1).InfoS("No node is available to schedule the pods, nothing to do here")
+		return
+	}
+
+	evictable := podEvictor.Evictable(evictions.WithPriorityThreshold(thresholdPriority))
+
+	// stop if the total available usage has dropped to zero - no more pods can be scheduled
+	continueEvictionCond := func(nodeUsage NodeUsage, totalAvailableUsage map[v1.ResourceName]*resource.Quantity) bool {
+		for name := range totalAvailableUsage {
+			if totalAvailableUsage[name].CmpInt64(0) < 1 {
+				return false
+			}
+		}
+
+		return true
+	}
+	evictPodsFromSourceNodes(
+		ctx,
+		sourceNodes,
+		highNodes,
+		podEvictor,
+		evictable.IsEvictable,
+		resourceNames,
+		"HighNodeUtilization",
+		continueEvictionCond)
+
+}
+
+func validateHighUtilizationStrategyConfig(thresholds, targetThresholds api.ResourceThresholds) error {
+	if targetThresholds != nil {
+		return fmt.Errorf("targetThresholds is not applicable for HighNodeUtilization")
+	}
+	if err := validateThresholds(thresholds); err != nil {
+		return fmt.Errorf("thresholds config is not valid: %v", err)
+	}
+	return nil
+}
+
+func setDefaultForThresholds(thresholds, targetThresholds api.ResourceThresholds) {
+	// check if Pods/CPU/Mem are set, if not, set them to 100
+	if _, ok := thresholds[v1.ResourcePods]; !ok {
+		thresholds[v1.ResourcePods] = MaxResourcePercentage
+	}
+	if _, ok := thresholds[v1.ResourceCPU]; !ok {
+		thresholds[v1.ResourceCPU] = MaxResourcePercentage
+	}
+	if _, ok := thresholds[v1.ResourceMemory]; !ok {
+		thresholds[v1.ResourceMemory] = MaxResourcePercentage
+	}
+
+	// Default targetThreshold resource values to 100
+	targetThresholds[v1.ResourcePods] = MaxResourcePercentage
+	targetThresholds[v1.ResourceCPU] = MaxResourcePercentage
+	targetThresholds[v1.ResourceMemory] = MaxResourcePercentage
+
+	for name := range thresholds {
+		if !isBasicResource(name) {
+			targetThresholds[name] = MaxResourcePercentage
+		}
+	}
+}
--- a/pkg/descheduler/strategies/nodeutilization/highnodeutilization_test.go
+++ b/pkg/descheduler/strategies/nodeutilization/highnodeutilization_test.go
@@ -0,0 +1,707 @@
+/*
+Copyright 2021 The Kubernetes Authors.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package nodeutilization
+
+import (
+	"context"
+	"fmt"
+	v1 "k8s.io/api/core/v1"
+	"k8s.io/api/policy/v1beta1"
+	"k8s.io/apimachinery/pkg/api/resource"
+	"k8s.io/apimachinery/pkg/runtime"
+	"k8s.io/client-go/kubernetes/fake"
+	core "k8s.io/client-go/testing"
+	"sigs.k8s.io/descheduler/pkg/api"
+	"sigs.k8s.io/descheduler/pkg/descheduler/evictions"
+	"sigs.k8s.io/descheduler/pkg/utils"
+	"sigs.k8s.io/descheduler/test"
+	"strings"
+	"testing"
+)
+
+func TestHighNodeUtilization(t *testing.T) {
+	ctx := context.Background()
+	n1NodeName := "n1"
+	n2NodeName := "n2"
+	n3NodeName := "n3"
+
+	testCases := []struct {
+		name                  string
+		thresholds            api.ResourceThresholds
+		nodes                 map[string]*v1.Node
+		pods                  map[string]*v1.PodList
+		maxPodsToEvictPerNode int
+		expectedPodsEvicted   int
+		evictedPods           []string
+	}{
+		{
+			name: "no node below threshold usage",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:  20,
+				v1.ResourcePods: 20,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, nil),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p2", 400, 0, n1NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p3", 400, 0, n1NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p4", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p7", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p8", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p9", 400, 0, n3NodeName, test.SetRSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   0,
+		},
+		{
+			name: "no evictable pods",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:  40,
+				v1.ResourcePods: 40,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 9, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, nil),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							// A pod with local storage.
+							test.SetNormalOwnerRef(pod)
+							pod.Spec.Volumes = []v1.Volume{
+								{
+									Name: "sample",
+									VolumeSource: v1.VolumeSource{
+										HostPath: &v1.HostPathVolumeSource{Path: "somePath"},
+										EmptyDir: &v1.EmptyDirVolumeSource{
+											SizeLimit: resource.NewQuantity(int64(10), resource.BinarySI)},
+									},
+								},
+							}
+							// A Mirror Pod.
+							pod.Annotations = test.GetMirrorPodAnnotation()
+						}),
+						*test.BuildTestPod("p2", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							// A Critical Pod.
+							pod.Namespace = "kube-system"
+							priority := utils.SystemCriticalPriority
+							pod.Spec.Priority = &priority
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p3", 400, 0, n2NodeName, test.SetDSOwnerRef),
+						*test.BuildTestPod("p4", 400, 0, n2NodeName, test.SetDSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p5", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p7", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p8", 400, 0, n3NodeName, test.SetRSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   0,
+		},
+		{
+			name: "no node to schedule evicted pods",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:  20,
+				v1.ResourcePods: 20,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						// These can't be evicted.
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These can't be evicted.
+						*test.BuildTestPod("p2", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p3", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p4", 400, 0, n3NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 400, 0, n3NodeName, test.SetRSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   0,
+		},
+		{
+			name: "without priorities",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:  30,
+				v1.ResourcePods: 30,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, test.SetRSOwnerRef),
+						// These won't be evicted.
+						*test.BuildTestPod("p2", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							// A Critical Pod.
+							pod.Namespace = "kube-system"
+							priority := utils.SystemCriticalPriority
+							pod.Spec.Priority = &priority
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p3", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p4", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p7", 400, 0, n3NodeName, test.SetRSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   2,
+			evictedPods:           []string{"p1", "p7"},
+		},
+		{
+			name: "without priorities stop when resource capacity is depleted",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:  30,
+				v1.ResourcePods: 30,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 2000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 2000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 2000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p2", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p3", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p4", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p6", 400, 0, n3NodeName, test.SetRSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   1,
+		},
+		{
+			name: "with priorities",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU: 30,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 2000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 2000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodPriority(pod, lowPriority)
+						}),
+						*test.BuildTestPod("p2", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodPriority(pod, highPriority)
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p5", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p7", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p8", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p9", 400, 0, n3NodeName, test.SetDSOwnerRef),
+					},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   1,
+			evictedPods:           []string{"p1"},
+		},
+		{
+			name: "without priorities evicting best-effort pods only",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU: 30,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 3000, 3000, 10, nil),
+				n2NodeName: test.BuildTestNode(n2NodeName, 3000, 3000, 5, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 3000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			// All pods are assumed to be burstable (test.BuildTestNode always sets both cpu/memory resource requests to some value)
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p1", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.MakeBestEffortPod(pod)
+						}),
+						*test.BuildTestPod("p2", 400, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						// These won't be evicted.
+						*test.BuildTestPod("p3", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p4", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 400, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 400, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   1,
+			evictedPods:           []string{"p1"},
+		},
+		{
+			name: "with extended resource",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:   20,
+				extendedResource: 40,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, func(node *v1.Node) {
+					test.SetNodeExtendedResource(node, extendedResource, 8)
+				}),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, func(node *v1.Node) {
+					test.SetNodeExtendedResource(node, extendedResource, 8)
+				}),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p1", 100, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						*test.BuildTestPod("p2", 100, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						// These won't be evicted
+						*test.BuildTestPod("p2", 100, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetDSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p3", 500, 0, n2NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						*test.BuildTestPod("p4", 500, 0, n2NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						*test.BuildTestPod("p5", 500, 0, n2NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						*test.BuildTestPod("p6", 500, 0, n2NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   2,
+			evictedPods:           []string{"p1", "p2"},
+		},
+		{
+			name: "with extended resource in some of nodes",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:   40,
+				extendedResource: 40,
+			},
+			nodes: map[string]*v1.Node{
+				n1NodeName: test.BuildTestNode(n1NodeName, 4000, 3000, 10, func(node *v1.Node) {
+					test.SetNodeExtendedResource(node, extendedResource, 8)
+				}),
+				n2NodeName: test.BuildTestNode(n2NodeName, 4000, 3000, 10, nil),
+				n3NodeName: test.BuildTestNode(n3NodeName, 4000, 3000, 10, test.SetNodeUnschedulable),
+			},
+			pods: map[string]*v1.PodList{
+				n1NodeName: {
+					Items: []v1.Pod{
+						//These won't be evicted
+						*test.BuildTestPod("p1", 100, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+						*test.BuildTestPod("p2", 100, 0, n1NodeName, func(pod *v1.Pod) {
+							test.SetRSOwnerRef(pod)
+							test.SetPodExtendedResourceRequest(pod, extendedResource, 1)
+						}),
+					},
+				},
+				n2NodeName: {
+					Items: []v1.Pod{
+						*test.BuildTestPod("p3", 500, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p4", 500, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p5", 500, 0, n2NodeName, test.SetRSOwnerRef),
+						*test.BuildTestPod("p6", 500, 0, n2NodeName, test.SetRSOwnerRef),
+					},
+				},
+				n3NodeName: {
+					Items: []v1.Pod{},
+				},
+			},
+			maxPodsToEvictPerNode: 0,
+			expectedPodsEvicted:   0,
+		},
+	}
+
+	for _, test := range testCases {
+		t.Run(test.name, func(t *testing.T) {
+			fakeClient := &fake.Clientset{}
+			fakeClient.Fake.AddReactor("list", "pods", func(action core.Action) (bool, runtime.Object, error) {
+				list := action.(core.ListAction)
+				fieldString := list.GetListRestrictions().Fields.String()
+				if strings.Contains(fieldString, n1NodeName) {
+					return true, test.pods[n1NodeName], nil
+				}
+				if strings.Contains(fieldString, n2NodeName) {
+					return true, test.pods[n2NodeName], nil
+				}
+				if strings.Contains(fieldString, n3NodeName) {
+					return true, test.pods[n3NodeName], nil
+				}
+				return true, nil, fmt.Errorf("Failed to list: %v", list)
+			})
+			fakeClient.Fake.AddReactor("get", "nodes", func(action core.Action) (bool, runtime.Object, error) {
+				getAction := action.(core.GetAction)
+				if node, exists := test.nodes[getAction.GetName()]; exists {
+					return true, node, nil
+				}
+				return true, nil, fmt.Errorf("Wrong node: %v", getAction.GetName())
+			})
+			podsForEviction := make(map[string]struct{})
+			for _, pod := range test.evictedPods {
+				podsForEviction[pod] = struct{}{}
+			}
+
+			evictionFailed := false
+			if len(test.evictedPods) > 0 {
+				fakeClient.Fake.AddReactor("create", "pods", func(action core.Action) (bool, runtime.Object, error) {
+					getAction := action.(core.CreateAction)
+					obj := getAction.GetObject()
+					if eviction, ok := obj.(*v1beta1.Eviction); ok {
+						if _, exists := podsForEviction[eviction.Name]; exists {
+							return true, obj, nil
+						}
+						evictionFailed = true
+						return true, nil, fmt.Errorf("pod %q was unexpectedly evicted", eviction.Name)
+					}
+					return true, obj, nil
+				})
+			}
+
+			var nodes []*v1.Node
+			for _, node := range test.nodes {
+				nodes = append(nodes, node)
+			}
+
+			podEvictor := evictions.NewPodEvictor(
+				fakeClient,
+				"v1",
+				false,
+				test.maxPodsToEvictPerNode,
+				nodes,
+				false,
+				false,
+				false,
+			)
+
+			strategy := api.DeschedulerStrategy{
+				Enabled: true,
+				Params: &api.StrategyParameters{
+					NodeResourceUtilizationThresholds: &api.NodeResourceUtilizationThresholds{
+						Thresholds: test.thresholds,
+					},
+				},
+			}
+			HighNodeUtilization(ctx, fakeClient, strategy, nodes, podEvictor)
+
+			podsEvicted := podEvictor.TotalEvicted()
+			if test.expectedPodsEvicted != podsEvicted {
+				t.Errorf("Expected %#v pods to be evicted but %#v got evicted", test.expectedPodsEvicted, podsEvicted)
+			}
+			if evictionFailed {
+				t.Errorf("Pod evictions failed unexpectedly")
+			}
+		})
+	}
+}
+
+func TestValidateHighNodeUtilizationStrategyConfig(t *testing.T) {
+	tests := []struct {
+		name             string
+		thresholds       api.ResourceThresholds
+		targetThresholds api.ResourceThresholds
+		errInfo          error
+	}{
+		{
+			name: "passing target thresholds",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:    20,
+				v1.ResourceMemory: 20,
+			},
+			targetThresholds: api.ResourceThresholds{
+				v1.ResourceCPU:    80,
+				v1.ResourceMemory: 80,
+			},
+			errInfo: fmt.Errorf("targetThresholds is not applicable for HighNodeUtilization"),
+		},
+		{
+			name:       "passing empty thresholds",
+			thresholds: api.ResourceThresholds{},
+			errInfo:    fmt.Errorf("thresholds config is not valid: no resource threshold is configured"),
+		},
+		{
+			name: "passing invalid thresholds",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:    80,
+				v1.ResourceMemory: 120,
+			},
+			errInfo: fmt.Errorf("thresholds config is not valid: %v", fmt.Errorf(
+				"%v threshold not in [%v, %v] range", v1.ResourceMemory, MinResourcePercentage, MaxResourcePercentage)),
+		},
+		{
+			name: "passing valid strategy config",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:    80,
+				v1.ResourceMemory: 80,
+			},
+			errInfo: nil,
+		},
+		{
+			name: "passing valid strategy config with extended resource",
+			thresholds: api.ResourceThresholds{
+				v1.ResourceCPU:    80,
+				v1.ResourceMemory: 80,
+				extendedResource:  80,
+			},
+			errInfo: nil,
+		},
+	}
+
+	for _, testCase := range tests {
+		validateErr := validateHighUtilizationStrategyConfig(testCase.thresholds, testCase.targetThresholds)
+
+		if validateErr == nil || testCase.errInfo == nil {
+			if validateErr != testCase.errInfo {
+				t.Errorf("expected validity of strategy config: thresholds %#v targetThresholds %#v to be %v but got %v instead",
+					testCase.thresholds, testCase.targetThresholds, testCase.errInfo, validateErr)
+			}
+		} else if validateErr.Error() != testCase.errInfo.Error() {
+			t.Errorf("expected validity of strategy config: thresholds %#v targetThresholds %#v to be %v but got %v instead",
+				testCase.thresholds, testCase.targetThresholds, testCase.errInfo, validateErr)
+		}
+	}
+}
+
+func TestHighNodeUtilizationWithTaints(t *testing.T) {
+	ctx := context.Background()
+	strategy := api.DeschedulerStrategy{
+		Enabled: true,
+		Params: &api.StrategyParameters{
+			NodeResourceUtilizationThresholds: &api.NodeResourceUtilizationThresholds{
+				Thresholds: api.ResourceThresholds{
+					v1.ResourceCPU: 40,
+				},
+			},
+		},
+	}
+
+	n1 := test.BuildTestNode("n1", 1000, 3000, 10, nil)
+	n2 := test.BuildTestNode("n2", 1000, 3000, 10, nil)
+	n3 := test.BuildTestNode("n3", 1000, 3000, 10, nil)
+	n3withTaints := n3.DeepCopy()
+	n3withTaints.Spec.Taints = []v1.Taint{
+		{
+			Key:    "key",
+			Value:  "value",
+			Effect: v1.TaintEffectNoSchedule,
+		},
+	}
+
+	podThatToleratesTaint := test.BuildTestPod("tolerate_pod", 200, 0, n1.Name, test.SetRSOwnerRef)
+	podThatToleratesTaint.Spec.Tolerations = []v1.Toleration{
+		{
+			Key:   "key",
+			Value: "value",
+		},
+	}
+
+	tests := []struct {
+		name              string
+		nodes             []*v1.Node
+		pods              []*v1.Pod
+		evictionsExpected int
+	}{
+		{
+			name:  "No taints",
+			nodes: []*v1.Node{n1, n2, n3},
+			pods: []*v1.Pod{
+				//Node 1 pods
+				test.BuildTestPod(fmt.Sprintf("pod_1_%s", n1.Name), 200, 0, n1.Name, test.SetRSOwnerRef),
+				test.BuildTestPod(fmt.Sprintf("pod_2_%s", n1.Name), 200, 0, n1.Name, test.SetRSOwnerRef),
+				test.BuildTestPod(fmt.Sprintf("pod_3_%s", n1.Name), 200, 0, n1.Name, test.SetRSOwnerRef),
+				// Node 2 pods
+				test.BuildTestPod(fmt.Sprintf("pod_4_%s", n2.Name), 200, 0, n2.Name, test.SetRSOwnerRef),
+			},
+			evictionsExpected: 1,
+		},
+		{
+			name:  "No pod tolerates node taint",
+			nodes: []*v1.Node{n1, n3withTaints},
+			pods: []*v1.Pod{
+				//Node 1 pods
+				test.BuildTestPod(fmt.Sprintf("pod_1_%s", n1.Name), 200, 0, n1.Name, test.SetRSOwnerRef),
+				// Node 3 pods
+				test.BuildTestPod(fmt.Sprintf("pod_2_%s", n3withTaints.Name), 200, 0, n3withTaints.Name, test.SetRSOwnerRef),
+			},
+			evictionsExpected: 0,
+		},
+		{
+			name:  "Pod which tolerates node taint",
+			nodes: []*v1.Node{n1, n3withTaints},
+			pods: []*v1.Pod{
+				//Node 1 pods
+				test.BuildTestPod(fmt.Sprintf("pod_1_%s", n1.Name), 100, 0, n1.Name, test.SetRSOwnerRef),
+				podThatToleratesTaint,
+				// Node 3 pods
+				test.BuildTestPod(fmt.Sprintf("pod_9_%s", n3withTaints.Name), 500, 0, n3withTaints.Name, test.SetRSOwnerRef),
+			},
+			evictionsExpected: 1,
+		},
+	}
+
+	for _, item := range tests {
+		t.Run(item.name, func(t *testing.T) {
+			var objs []runtime.Object
+			for _, node := range item.nodes {
+				objs = append(objs, node)
+			}
+
+			for _, pod := range item.pods {
+				objs = append(objs, pod)
+			}
+
+			fakeClient := fake.NewSimpleClientset(objs...)
+
+			podEvictor := evictions.NewPodEvictor(
+				fakeClient,
+				"policy/v1",
+				false,
+				item.evictionsExpected,
+				item.nodes,
+				false,
+				false,
+				false,
+			)
+
+			HighNodeUtilization(ctx, fakeClient, strategy, item.nodes, podEvictor)
+
+			if item.evictionsExpected != podEvictor.TotalEvicted() {
+				t.Errorf("Expected %v evictions, got %v", item.evictionsExpected, podEvictor.TotalEvicted())
+			}
+		})
+	}
+}