OCP 4.5 Infra nodes with MachineSets without worker label

How to create Infra nodes with Machinesets but without using the worker label

https://docs.openshift.com/container-platform/4.5/machine_management/creating-infrastructure-machinesets.html states how to create infra nodes with MachineSets. However, the "inconvenience" is that these infra nodes will show up as "infra,worker". How can one not have the "worker" label?

The worker label is added by kubelet configuration as part of --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID}. So this will require a custom systemd drop-in for the kubelet.service.

Step 0: Switch to project openshift-machine-api

Switch to the openshift-machine-api project:

1
oc project openshift-machine-api

Step 1: Create kubelet configuration drop-in for infra-node

To change booting node-labels a MachineConfig needs to be created as the labels are defined when the Kubelet is started with:

1
                --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \

In order to create a new kubelet, create 03-infar-kubelet.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
cat <<'EOF' > 03-infra-kubelet.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: infra
  name: 03-infra-kubelet
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    systemd:
      units:
        - name: kubelet.service
          dropins:
            - name: 20-change-label.conf
              contents: |
                [Service]
                ExecStart=
                ExecStart=/usr/bin/hyperkube \
                    kubelet \
                      --config=/etc/kubernetes/kubelet.conf \
                      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
                      --kubeconfig=/var/lib/kubelet/kubeconfig \
                      --container-runtime=remote \
                      --container-runtime-endpoint=/var/run/crio/crio.sock \
                      --runtime-cgroups=/system.slice/crio.service \
                      --node-labels=node-role.kubernetes.io/infra,node.openshift.io/os_id=${ID} \
                      --minimum-container-ttl-duration=6m0s \
                      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
                      --cloud-provider=aws \
                       \
                      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc0fe885d41cc4029caa3feacf71343806c81c8123abc91db90dc0e555fa5636 \
                      --v=${KUBELET_LOG_LEVEL}
EOF

1
$ oc apply -f 03-infra-kubelet.yaml

Step 2: Create infra secret

Create new user data for infra worker. The following takes the userdata from worker nodes (from the worker-user-data secret) and replaces any occurrence of 'worker' with 'infra':

1
oc get secret -n openshift-machine-api worker-user-data -o yaml | awk '/userData:/ {print $2}' | base64 -d | sed 's/worker/infra/g' | base64 -w0 > userdata.base64 

Then, create a new secret:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cat << EOF > infra-user-data.yaml
apiVersion: v1
data:
  disableTemplating: dHJ1ZQo=
  userData: $(cat userdata.base64)  
kind: Secret
metadata:
  name: infra-user-data
  namespace: openshift-machine-api
type: Opaque
EOF
oc apply -f infra-user-data.yaml

Verify:

1
2
$ oc get secret -n openshift-machine-api infra-user-data -o yaml | awk '/userData:/ {print $2}' | base64 -d | jq '.ignition.config.append[0].source'
"https://<cluster URL>:22623/config/infra"

Step 3: Create machine configuration for infra

Create a machine configuration pool. This machine configuration, for simplicity, will include all machine configurations for worker nodes, as well as for infra nodes. This is why it is important to name the machine configuration for the kubelet 03-infra-kubelet (see above):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
cat <<'EOF' > infra-mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/role
      operator: In
      values:
      - worker
      - infra
  maxUnavailable: 1
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
  paused: false
EOF

Apply:

1
oc apply -f infra-mcp.yaml

Step 4: Create a machineset for the new infra node

First, create a new machineset: https://docs.openshift.com/container-platform/4.5/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-aws_creating-infrastructure-machinesets

Look at an existing machineset for how to fill in some of the blank values:

1
2
3
4
5
[akaris@linux infra-nodes]$ oc get machinesets -A
NAMESPACE               NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   akaris2-4kpk4-worker-eu-west-3a   1         1         1       1           112m
openshift-machine-api   akaris2-4kpk4-worker-eu-west-3b   1         1         1       1           112m
openshift-machine-api   akaris2-4kpk4-worker-eu-west-3c   1         1         1       1           112m

1
$ oc get machineset -n openshift-machine-api akaris2-4kpk4-worker-eu-west-3a -o yaml > akaris2-4kpk4-infra-eu-west-3a.yaml

Clean up the file and modify as instructed in https://docs.openshift.com/container-platform/4.5/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-aws_creating-infrastructure-machinesets

The example from the documentation will contain 'worker' in:

1
2
          iamInstanceProfile:
            id: <infrastructureID>-worker-profile 

1
2
3
4
5
          securityGroups:
            - filters:
                - name: tag:Name
                  values:
                    - <infrastructureID>-worker-sg 
1
2
          userDataSecret:
            name: worker-user-data

Make sure to change the userDataSecret to infra-user-data, contrary to what the documentation says:

1
2
          userDataSecret:
            name: infra-user-data

The MachineSet, after modification - in this case, 3 replicas are requested, right away:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
$ cat akaris2-4kpk4-infra-eu-west-3a.yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: akaris2-4kpk4
  name: akaris2-4kpk4-infra-eu-west-3a
  namespace: openshift-machine-api
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: akaris2-4kpk4
      machine.openshift.io/cluster-api-machineset: akaris2-4kpk4-infra-eu-west-3a
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: akaris2-4kpk4
        machine.openshift.io/cluster-api-machine-role: infra
        machine.openshift.io/cluster-api-machine-type: infra
        machine.openshift.io/cluster-api-machineset: akaris2-4kpk4-infra-eu-west-3a
    spec:
      metadata:
        labels:
          node-role.kubernetes.io/infra: ""
      providerSpec:
        value:
          ami:
            id: ami-(...)
          apiVersion: awsproviderconfig.openshift.io/v1beta1
          blockDevices:
          - ebs:
              iops: 0
              volumeSize: 120
              volumeType: gp2
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: akaris2-4kpk4-worker-profile
          instanceType: m5.large
          kind: AWSMachineProviderConfig
          metadata:
            creationTimestamp: null
          placement:
            availabilityZone: eu-west-3a
            region: eu-west-3
          publicIp: null
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - akaris2-4kpk4-worker-sg
          subnet:
            filters:
            - name: tag:Name
              values:
              - akaris2-4kpk4-private-eu-west-3a
          tags:
          - name: kubernetes.io/cluster/akaris2-4kpk4
            value: owned
          userDataSecret:
            name: infra-user-data

Apply the MachineSet:

1
$ oc apply -f akaris2-4kpk4-infra-eu-west-3a.yaml

Step 5: Verify Machine creation

Monitor machine configuration - right after creation of the MachineSet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[akaris@linux infra-nodes]$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-d63625ee2cdc4637cc156a1ebf25e926    True      False      False      0              0                   0                     0                      4m9s
master   rendered-master-8d007e81b9ebb00b4042722f2713f1ae   True      False      False      3              3                   3                     0                      116m
worker   rendered-worker-9d6647aa43752e0eb71a221dd8b7e32e   True      False      False      3              3                   3                     0                      116m
[akaris@linux infra-nodes]$ oc get machines
NAME                                    PHASE         TYPE        REGION      ZONE         AGE
akaris2-f74ht-infra-eu-west-3a-fctd5    Provisioned   m5.large    eu-west-3   eu-west-3a   49s
akaris2-f74ht-infra-eu-west-3a-xtpcv    Provisioned   m5.large    eu-west-3   eu-west-3a   49s
akaris2-f74ht-infra-eu-west-3a-zmvzh    Provisioned   m5.large    eu-west-3   eu-west-3a   49s
akaris2-f74ht-master-0                  Running       m5.xlarge   eu-west-3   eu-west-3a   119m
akaris2-f74ht-master-1                  Running       m5.xlarge   eu-west-3   eu-west-3b   119m
akaris2-f74ht-master-2                  Running       m5.xlarge   eu-west-3   eu-west-3c   119m
akaris2-f74ht-worker-eu-west-3a-4krrd   Running       m5.large    eu-west-3   eu-west-3a   111m
akaris2-f74ht-worker-eu-west-3b-59z6n   Running       m5.large    eu-west-3   eu-west-3b   111m
akaris2-f74ht-worker-eu-west-3c-chst9   Running       m5.large    eu-west-3   eu-west-3c   111m
[akaris@linux infra-nodes]$ oc get machineset
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
akaris2-f74ht-infra-eu-west-3a    3         3                             58s
akaris2-f74ht-worker-eu-west-3a   1         1         1       1           119m
akaris2-f74ht-worker-eu-west-3b   1         1         1       1           119m
akaris2-f74ht-worker-eu-west-3c   1         1         1       1           119m
[akaris@linux infra-nodes]$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   105m   v1.17.1+b83bc57
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   117m   v1.17.1+b83bc57
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   118m   v1.17.1+b83bc57
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   105m   v1.17.1+b83bc57
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   117m   v1.17.1+b83bc57
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   105m   v1.17.1+b83bc57

Wait:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[akaris@linux infra-nodes]$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-d63625ee2cdc4637cc156a1ebf25e926    True      False      False      3              3                   3                     0                      10m
master   rendered-master-8d007e81b9ebb00b4042722f2713f1ae   True      False      False      3              3                   3                     0                      122m
worker   rendered-worker-9d6647aa43752e0eb71a221dd8b7e32e   True      False      False      3              3                   3                     0                      122m
[akaris@linux infra-nodes]$ oc get machineset
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
akaris2-f74ht-infra-eu-west-3a    3         3         3       3           7m15s
akaris2-f74ht-worker-eu-west-3a   1         1         1       1           125m
akaris2-f74ht-worker-eu-west-3b   1         1         1       1           125m
akaris2-f74ht-worker-eu-west-3c   1         1         1       1           125m
[akaris@linux infra-nodes]$ oc get machines
NAME                                    PHASE     TYPE        REGION      ZONE         AGE
akaris2-f74ht-infra-eu-west-3a-fctd5    Running   m5.large    eu-west-3   eu-west-3a   7m17s
akaris2-f74ht-infra-eu-west-3a-xtpcv    Running   m5.large    eu-west-3   eu-west-3a   7m17s
akaris2-f74ht-infra-eu-west-3a-zmvzh    Running   m5.large    eu-west-3   eu-west-3a   7m17s
akaris2-f74ht-master-0                  Running   m5.xlarge   eu-west-3   eu-west-3a   125m
akaris2-f74ht-master-1                  Running   m5.xlarge   eu-west-3   eu-west-3b   125m
akaris2-f74ht-master-2                  Running   m5.xlarge   eu-west-3   eu-west-3c   125m
akaris2-f74ht-worker-eu-west-3a-4krrd   Running   m5.large    eu-west-3   eu-west-3a   117m
akaris2-f74ht-worker-eu-west-3b-59z6n   Running   m5.large    eu-west-3   eu-west-3b   117m
akaris2-f74ht-worker-eu-west-3c-chst9   Running   m5.large    eu-west-3   eu-west-3c   117m
[akaris@linux infra-nodes]$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-128-80.eu-west-3.compute.internal    Ready    infra    2m39s   v1.17.1+b83bc57
ip-10-0-145-35.eu-west-3.compute.internal    Ready    infra    2m43s   v1.17.1+b83bc57
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   111m    v1.17.1+b83bc57
ip-10-0-159-154.eu-west-3.compute.internal   Ready    infra    2m7s    v1.17.1+b83bc57
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   124m    v1.17.1+b83bc57
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   124m    v1.17.1+b83bc57
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   111m    v1.17.1+b83bc57
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   124m    v1.17.1+b83bc57
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   111m    v1.17.1+b83bc57

Step 7: Moving resources to the infra nodes

The following steps are from: https://github.com/lbohnsac/OCP4/tree/master/infrastructure-node-setup

Also see steps from: https://docs.openshift.com/container-platform/4.5/machine_management/creating-infrastructure-machinesets.html

Move the router bits

Move the internal routers to the infra nodes

1
2
oc patch ingresscontrollers.operator.openshift.io default -n openshift-ingress-operator --type=merge \
  --patch '{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/infra":""}}}}}'

Define 3 routers (default are 2!)

1
2
oc patch ingresscontrollers.operator.openshift.io default -n openshift-ingress-operator --type=merge \
  --patch '{"spec":{"replicas": 3}}'

Move the registry bits

Move the internal image registry to the infra nodes

1
2
oc patch configs.imageregistry.operator.openshift.io cluster --type=merge \
  --patch '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra": ""}}}'

Move the registry image pruner (from 4.4) to the infra nodes

1
2
oc patch imagepruners.imageregistry.operator.openshift.io cluster --type=merge \
  --patch '{"spec":{"nodeSelector": {"node-role.kubernetes.io/infra": ""}}}'

Move the logging bits

Move the Kibana pod to the infras

1
2
oc patch clusterloggings.logging.openshift.io instance -n openshift-logging --type=merge \
  --patch '{"spec":{"visualization":{"kibana":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

Move the curator pod to the infras

1
2
oc patch clusterloggings.logging.openshift.io instance -n openshift-logging --type=merge \
  --patch '{"spec":{"curation":{"curator":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

Move the elasticsearch pods to the infras

1
2
oc patch clusterloggings.logging.openshift.io instance -n openshift-logging --type=merge \
  --patch '{"spec":{"logStore":{"elasticsearch":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

Issues

The inconvenience with this is that the ExecStart of the dropin is now "hardcoded" and will not be updated / changed by the installer upon upgrade.

Step 6: Testing upgrades

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[akaris@linux infra-nodes]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.16    True        False         123m    Cluster version is 4.4.16
[akaris@linux infra-nodes]$ oc adm upgrade
Cluster version is 4.4.16

No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss.

#############################
# set to 4.5-stable channel
#############################
[akaris@linux infra-nodes]$ oc edit clusterversion
clusterversion.config.openshift.io/version edited


[akaris@linux infra-nodes]$ oc adm upgrade
Cluster version is 4.4.16

Updates:

VERSION IMAGE
4.5.5   quay.io/openshift-release-dev/ocp-release@sha256:a58573e1c92f5258219022ec104ec254ded0a70370ee8ed2aceea52525639bd4
[akaris@linux infra-nodes]$ oc adm upgrade --to-latest
Updating to latest version 4.5.5

The upgrade succeeds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
[akaris@linux ipi]$ export KUBECONFIG=/home/akaris/cases/02729138/ipi/install-config/auth/kubeconfig
[akaris@linux ipi]$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-80.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-145-35.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-159-154.eu-west-3.compute.internal   Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   19h   v1.18.3+08c38ef
[akaris@linux ipi]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.5     True        False         16h     Cluster version is 4.5.5
[akaris@linux ipi]$ oc get machinesets
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
akaris2-f74ht-infra-eu-west-3a    3         3         3       3           17h
akaris2-f74ht-worker-eu-west-3a   1         1         1       1           19h
akaris2-f74ht-worker-eu-west-3b   1         1         1       1           19h
akaris2-f74ht-worker-eu-west-3c   1         1         1       1           19h
[akaris@linux ipi]$ oc get machines
NAME                                    PHASE     TYPE        REGION      ZONE         AGE
akaris2-f74ht-infra-eu-west-3a-fctd5    Running   m5.large    eu-west-3   eu-west-3a   17h
akaris2-f74ht-infra-eu-west-3a-xtpcv    Running   m5.large    eu-west-3   eu-west-3a   17h
akaris2-f74ht-infra-eu-west-3a-zmvzh    Running   m5.large    eu-west-3   eu-west-3a   17h
akaris2-f74ht-master-0                  Running   m5.xlarge   eu-west-3   eu-west-3a   19h
akaris2-f74ht-master-1                  Running   m5.xlarge   eu-west-3   eu-west-3b   19h
akaris2-f74ht-master-2                  Running   m5.xlarge   eu-west-3   eu-west-3c   19h
akaris2-f74ht-worker-eu-west-3a-4krrd   Running   m5.large    eu-west-3   eu-west-3a   19h
akaris2-f74ht-worker-eu-west-3b-59z6n   Running   m5.large    eu-west-3   eu-west-3b   19h
akaris2-f74ht-worker-eu-west-3c-chst9   Running   m5.large    eu-west-3   eu-west-3c   19h
[akaris@linux ipi]$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-aa17b9b35abf210691d268dd4c7449d3    True      False      False      3              3                   3                     0                      17h
master   rendered-master-2cc3abe4c49c6db04d9456744d0079f9   True      False      False      3              3                   3                     0                      19h
worker   rendered-worker-d65740ba72e305c96782a27b8255dd89   True      False      False      3              3                   3                     0                      19h
[akaris@linux ipi]$ oc get machineconfig
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
00-worker                                                   807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
01-master-container-runtime                                 807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
01-master-kubelet                                           807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
01-worker-container-runtime                                 807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
01-worker-kubelet                                           807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
03-infra-kubelet                                                                                       2.2.0             17h
99-master-d32a1dcd-1b7b-4999-8919-3bd59c24fdae-registries   807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
99-master-ssh                                                                                          2.2.0             19h
99-worker-f569c8d2-84df-4a5b-846a-9e090b9fb68d-registries   807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             19h
99-worker-ssh                                                                                          2.2.0             19h
rendered-infra-aa17b9b35abf210691d268dd4c7449d3             807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             16h
rendered-infra-d63625ee2cdc4637cc156a1ebf25e926             601c2285f497bf7c73d84737b9977a0e697cb86a   2.2.0             17h
rendered-master-2cc3abe4c49c6db04d9456744d0079f9            807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             16h
rendered-master-8d007e81b9ebb00b4042722f2713f1ae            601c2285f497bf7c73d84737b9977a0e697cb86a   2.2.0             19h
rendered-worker-9d6647aa43752e0eb71a221dd8b7e32e            601c2285f497bf7c73d84737b9977a0e697cb86a   2.2.0             19h
rendered-worker-d65740ba72e305c96782a27b8255dd89            807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             16h
[akaris@linux ipi]$ 

The one problem is the pod version on the infra nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[akaris@linux ipi]$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-80.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-145-35.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-159-154.eu-west-3.compute.internal   Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   19h   v1.18.3+08c38ef
[akaris@linux ipi]$ oc debug node/ip-10-0-128-80.eu-west-3.compute.internal
Starting pod/ip-10-0-128-80eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.80
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cat /etc/redhat-release 
Red Hat Enterprise Linux CoreOS release 4.5
sh-4.4# ps aux | grep kubelet
root        1422  3.3  2.0 1461696 161600 ?      Ssl  Aug19  33:46 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/infra,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=aws --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc0fe885d41cc4029caa3feacf71343806c81c8123abc91db90dc0e555fa5636 --v=4
root     1232029  0.0  0.0   9180  1032 ?        S+   10:02   0:00 grep kubelet
sh-4.4# 
sh-4.4# grep pause /etc/crio/ -R
/etc/crio/seccomp.json:             "pause",
/etc/crio/crio.conf.d/00-default:pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0"
/etc/crio/crio.conf.d/00-default:pause_image_auth_file = "/var/lib/kubelet/config.json"
/etc/crio/crio.conf.d/00-default:pause_command = "/usr/bin/pod"
sh-4.4# 

Compare that to the version on the worker nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[akaris@linux ipi]$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-80.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-145-35.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-159-154.eu-west-3.compute.internal   Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   19h   v1.18.3+08c38ef
[akaris@linux ipi]$ oc debug node/ip-10-0-148-174.eu-west-3.compute.internal
Starting pod/ip-10-0-148-174eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.148.174
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux CoreOS release 4.5
sh-4.4# ps aux | grep kubelet
root        1421  5.5  2.5 1454268 196984 ?      Ssl  Aug19  56:01 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=aws --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0 --v=4
root     2138381  0.0  0.0   9180  1060 ?        S+   10:02   0:00 grep kubelet
sh-4.4# grep pause /etc/crio/ -R
/etc/crio/seccomp.json:             "pause",
/etc/crio/crio.conf.d/00-default:pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0"
/etc/crio/crio.conf.d/00-default:pause_image_auth_file = "/var/lib/kubelet/config.json"
/etc/crio/crio.conf.d/00-default:pause_command = "/usr/bin/pod"

That --pod-infra-container-image= pod version is for the pause container; although as you can see above, the pause_image in crio.conf.d is always updated: https://stackoverflow.com/questions/46630377/what-is-pod-infra-container-image-meant-for

1
2
3
The pause container, which image the --pod-infra-container flag selects, is used so that multiple containers can be launched in a pod, while sharing resources. It mostly does nothing, and unless you have a very good reason to replace it with something custom, you shouldn't. It mostly invokes the pause system call (hence its name) but it also performs the important function of having PID 1 and making sure no zombie processes are kept around.

An extremely complete article on the subject can be found here, from where I also shamelessly stole the following picture which illustrates where the pause container lives:

According to https://github.com/kubernetes/kubernetes/pull/70603 (although that comment is from 2018):

1
2
3
4
The kubelet allows you to set --pod-infra-container-image
(also called PodSandboxImage in the kubelet config),
which can be a custom location to the "pause" image in the case
of Docker. Other CRIs are not supported.

And according to https://github.com/kubernetes/kubernetes/issues/86081 (from November 2019):

1
Kubelet flag "pod-infra-container-image" is invalid when using CRI-O

And according to the kubelet help, this only affects docker and doesn't affect us:

1
2
sh-4.4# kubelet --help | grep infra
      --pod-infra-container-image string                                                                          The image whose network/ipc namespaces containers in each pod will use. This docker-specific flag only works when container-runtime is set to docker. (default "k8s.gcr.io/pause:3.2")

As an additional step just to bring the infra nodes in line with the workers and the rest of the cluster, we could update that by inspecting the current worker kubelet configuration and then modify this post-upgrade:

1
2
3
4
5
6
7
8
[akaris@linux ipi]$ oc get machineconfig 01-worker-kubelet -o yaml | grep pod-infra-container-image
                --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0 \
[akaris@linux ipi]$ oc get machineconfig 03-infra-kubelet -o yaml | grep pod-infra-container-image | grep -v apiVersion
                  --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc0fe885d41cc4029caa3feacf71343806c81c8123abc91db90dc0e555fa5636 \
[akaris@linux ipi]$ oc edit machineconfig 03-infra-kubelet
machineconfig.machineconfiguration.openshift.io/03-infra-kubelet edited
[akaris@linux ipi]$ oc get machineconfig 03-infra-kubelet -o yaml | grep pod-infra-container-image | grep -v apiVersion
                  --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0 \

Which will then trigger a restart of the infra nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[akaris@linux ipi]$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-aa17b9b35abf210691d268dd4c7449d3    False     True       False      3              0                   0                     0                      17h
master   rendered-master-2cc3abe4c49c6db04d9456744d0079f9   True      False      False      3              3                   3                     0                      19h
worker   rendered-worker-d65740ba72e305c96782a27b8255dd89   True      False      False      3              3                   3                     0                      19h
[akaris@linux ipi]$ oc get machineconfig | grep infra
03-infra-kubelet                                                                                       2.2.0             17h
rendered-infra-aa17b9b35abf210691d268dd4c7449d3             807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             17h
rendered-infra-d63625ee2cdc4637cc156a1ebf25e926             601c2285f497bf7c73d84737b9977a0e697cb86a   2.2.0             17h
rendered-infra-f3df01ebf7429a29582f3c46132fcf88             807abb900cf9976a1baad66eab17c6d76016e7b7   2.2.0             44s
[akaris@linux ipi]$ 

And then eventually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[akaris@linux ipi]$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-80.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-145-35.eu-west-3.compute.internal    Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-148-174.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-159-154.eu-west-3.compute.internal   Ready    infra    17h   v1.18.3+08c38ef
ip-10-0-159-35.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-168-64.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-191-238.eu-west-3.compute.internal   Ready    worker   19h   v1.18.3+08c38ef
ip-10-0-208-80.eu-west-3.compute.internal    Ready    master   19h   v1.18.3+08c38ef
ip-10-0-220-32.eu-west-3.compute.internal    Ready    worker   19h   v1.18.3+08c38ef
[akaris@linux ipi]$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-f3df01ebf7429a29582f3c46132fcf88    True      False      False      3              3                   3                     0                      18h
master   rendered-master-2cc3abe4c49c6db04d9456744d0079f9   True      False      False      3              3                   3                     0                      19h
worker   rendered-worker-d65740ba72e305c96782a27b8255dd89   True      False      False      3              3                   3                     0                      19h
[akaris@linux ipi]$ oc get machines
NAME                                    PHASE     TYPE        REGION      ZONE         AGE
akaris2-f74ht-infra-eu-west-3a-fctd5    Running   m5.large    eu-west-3   eu-west-3a   18h
akaris2-f74ht-infra-eu-west-3a-xtpcv    Running   m5.large    eu-west-3   eu-west-3a   18h
akaris2-f74ht-infra-eu-west-3a-zmvzh    Running   m5.large    eu-west-3   eu-west-3a   18h
akaris2-f74ht-master-0                  Running   m5.xlarge   eu-west-3   eu-west-3a   19h
akaris2-f74ht-master-1                  Running   m5.xlarge   eu-west-3   eu-west-3b   19h
akaris2-f74ht-master-2                  Running   m5.xlarge   eu-west-3   eu-west-3c   19h
akaris2-f74ht-worker-eu-west-3a-4krrd   Running   m5.large    eu-west-3   eu-west-3a   19h
akaris2-f74ht-worker-eu-west-3b-59z6n   Running   m5.large    eu-west-3   eu-west-3b   19h
akaris2-f74ht-worker-eu-west-3c-chst9   Running   m5.large    eu-west-3   eu-west-3c   19h
[akaris@linux ipi]$ oc debug node/ip-10-0-128-80.eu-west-3.compute.internal
Starting pod/ip-10-0-128-80eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.80
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ps aux | grep kubelet
root        1411  3.5  1.8 1484684 147368 ?      Ssl  10:16   0:23 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/infra,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=aws --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e8bcc9c91267283f8b0edf1af0cb0d310c54d4f73905d9bcfb2a103a664fcb0 --v=4
root       17471  0.0  0.0   9180  1052 ?        S+   10:27   0:00 grep kubelet