CPU isolation in Red Hat OpenShift Container Platform
CPU isolation in Red Hat OpenShift Container Platform
Two complementary features allow admins to partition the node's CPUs according to their needs. The first is
the Kubernetes Static CPU Manager
feature. The second is the OpenShift only Management workload partitioning
feature.
Static CPU manager
The static CPU manager partitions the node into system reserved CPUs (kubelet and system services) and non system reserved CPUs:
1 2 3 |
|
In order to achieve such separation, administrators can configure a PerformanceProfile
1 2 3 4 5 6 7 8 9 |
|
The Node Tuning Operator will then push the following kubelet configuration, in addition to configuring operating system wide CPU isolation with systemd configuration and kernel command line parameters:
1 2 3 |
|
With the static CPU manager in place, normal pods will run in the shared CPU pool which initially spans all CPUs:
1 2 3 4 5 |
|
As exclusive CPUs are requested (by QoS Guaranteed pods with integer number CPU requests), CPUs are taken in ascending order from the non system reserved shared pool and they are moved to the list of exclusive CPUs:
1 2 3 4 5 6 |
|
Eventually, if all available CPUs are reserved, the system will consist of a shared pool which consists of all system reserved CPUs and exclusive CPUs which take up all non system reserved CPUs:
1 2 3 4 5 |
|
Management workload partitioning
In small clusters, especially in Single Node deployments, administrators might want additional control over how their nodes are partitioned. Management workload partitioning allows management workloads in OpenShift clusters to run exclusively on the system reserved CPUs. This way, these workloads can be isolated away from non-management workloads which will run exclusively on non system reserved CPUs.
1 2 3 4 5 |
|
According to the enhancement proposal for Management workload partitioning:
We want to define "management workloads" in a flexible way. (...) "management workloads" include all OpenShift core components necessary to run the cluster, any add-on operators necessary to make up the "platform" as defined by telco customers, and operators or other components from third-party vendors that the customer deems as management rather than operational. It is important to note that not all of those components will be delivered as part of the OpenShift payload and some may be written by the customer or by vendors who are not our partners. Therefore, while this feature will be released as Tech Preview initially with limited formal support, the APIs described are not internal or otherwise private to OpenShift.
At time of this writing, management workload partitioning can only be enabled during cluster installation by setting
cpuPartitioningMode: AllNodes
in the install-config.yaml
file. See
the workload partitioning documentation
for further details. For further details behind the reasoning, you can also have a look at the enhancement proposal for
Wide Availability Workload Partitioning.
After cluster installation, create a PerformanceProfile
, the same as mentioned above:
1 2 3 4 5 6 7 8 9 |
|
The PerformanceProfile
will create the exact same system CPU isolation, and it will configure the kubelet to use the
static CPU manager as in the earlier example. However, it will also configure the API server, the kubelet and crio
for workload partitioning.
Once the cluster is installed with cpuPartitioningMode: AllNodes
and configured for CPU isolation with a
PerformanceProfile
, for a pod to be scheduled to the system reserved CPUs, 2 requirements must be fulfilled:
- The namespace must be annotated by an admin or a privileged user with
workload.openshift.io/allowed: management
. This requirement makes sure that normal users cannot enable the feature without administrator consent. - A pod must opt in to being a management workload via annotation
target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
.