OpenShift Cluster – How to Drain or Evacuate a Node for Maintenance

Gineesh Madapparambath
Automation, Cloud, Infrastructre & hardware
November 7, 2018

Image : www.oemoffhighway.com

As we know OpenShift clusters are bundled with multiple compute nodes, master nodes, infra nodes etc, it’s not a big deal to manage node maintenance for OS patching kind of activities. But we need to ensure we have enough capacity on other nodes to balance the workload.

When there is a maintenance work – eg: Kernel patching – we need to exercise this without impacting those pods and application running on cluster.

Step 1 : Disable Scheduling on the node

This is to ensure no more pods can be scheduled for placement on the node.

Check node status – eg: compute-102

[root@master-101 ~]# oc get nodes |grep compute-102
compute-102       Ready                      1y        v1.6.1+5115d708d7

Update to SchedulingDisabled

[root@master-101 ~]# oadm manage-node compute-102 --schedulable=false
NAME                      STATUS                     AGE       VERSION
compute-102               Ready,SchedulingDisabled   1y        v1.6.1+5115d708d7

Step 2 : Drain or Evacuate pods from the node

You can simply run below command to for this task.

# oc adm drain compute-102

But most of the time it will not work as there will be pods with local data or some pods with daemons running. So we need to add additional options such as –ignore-daemonsets, –delete-local-data etc.

[root@master-101 ~]# oc adm drain compute-102 --delete-local-data --ignore-daemonsets  --force
node "compute-102" already cordoned
WARNING: Ignoring DaemonSet-managed pods: logging-fluentd-1gttp; Deleting pods with local storage: myapp-1-1kr16, uysed-25-m7qk4, postgresql-1-xt7bm

Then you can see the warning messages and pods are evacuating from the node compute-102 .

–force – force deletion of bare pods
–delete-local-data – delete even if there are pods using emptyDir (local data that will be deleted when the node is drained)
–ignore-daemonsets – ignore daemonset-managed pods

Wait for all pods to remove and something like below.

node "compute-102" drained

Step 3 : Do your patching or kernel update

So your node is free now to do any kind of activity since we have disabled scheduling and evacuated all pods.

Let’s verify no pods are running on the node

[root@master-101 ~]# oadm manage-node compute-102 --list-pods
Listing matched pods on node: compute-102
NAME                    READY     STATUS    RESTARTS   AGE
logging-fluentd-1gttp   1/1       Running   1          1d

Once you finished your task – eg: patching and rebooting – wait for server/node to back online. Yeah, maybe you don’t need to reboot; it might be a change in configuration.

Step 4 : Verify required services are running

On node, make sure openvswitch , docker and atomic-openshift-node.service services are up and running.

Step 5 : Enable Scheduling

[root@master-101 ~]# oadm manage-node compute-102 --schedulable=true
NAME                      STATUS    AGE       VERSION
compute-102               Ready     1y        v1.6.1+5115d708d7

Wait for nodes getting pods and do some check.

That’s it

Gineesh Madapparambath

Gineesh Madapparambath is the founder of techbeatly. He is the co-author of The Kubernetes Bible, Second Edition and the author of Ansible for Real Life Automation. He has worked as a Systems Engineer, Automation Specialist, and content author. His primary focus is on Ansible Automation, Containerisation (OpenShift & Kubernetes), and Infrastructure as Code (Terraform). (Read more: iamgini.com)

Note

Disclaimer: The views expressed and the content shared in all published articles on this website are solely those of the respective authors, and they do not necessarily reflect the views of the author’s employer or the techbeatly platform. We strive to ensure the accuracy and validity of the content published on our website. However, we cannot guarantee the absolute correctness or completeness of the information provided. It is the responsibility of the readers and users of this website to verify the accuracy and appropriateness of any information or opinions expressed within the articles. If you come across any content that you believe to be incorrect or invalid, please contact us immediately so that we can address the issue promptly.

OpenShift Cluster – How to Drain or Evacuate a Node for Maintenance

Step 1 : Disable Scheduling on the node

Step 2 : Drain or Evacuate pods from the node

Step 3 : Do your patching or kernel update

Step 4 : Verify required services are running

Step 5 : Enable Scheduling

Gineesh Madapparambath

Tags :

Share :

Related Posts

How to Create, Increase or Decrease Project Quota in OpenShift

How to find the pod details from container in OpenShift

Deploying Roles With Ansible Galaxy