Site icon techbeatly

Perform etcd Backup for Restricted Environment on OCP 4.3.x

https://unsplash.com/@frank041985

Etcd is the key-value store for OpenShift Container Platform, which persists the state of all resource objects.
Back up your cluster’s etcd data regularly and store in a secure location ideally outside the OpenShift Container Platform environment. Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours, as it is a blocking action.

(Cover image :

I was in OCP 4.3.0 Restricted Environment where OCP Nodes have no Internet Connection even through Proxy, and noticed etcd-snapshot-backup.sh script failed as it tried to download the etcdctl from Internet.

[root@bastion ~]# ssh -i .ssh/id_rsa core@etcd-1.ocp4.ocp.abip
[core@etcd-1 ~]$ sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup
Creating asset directory ./assets
Downloading etcdctl binary..

In high level to make the etcd backup successful, I had to find etcdctl and copied somewhere (/root/etcdctl), and modified etcd-snapshot-backup.sh script

[root@etcd-1 ~]# find / -iname etcdctl*


[root@etcd-1 ~]# diff /usr/local/bin/etcd-snapshot-backup.sh /usr/local/bin/etcd-snapshot-original.sh
40c40
< ETCDCTL="/root/etcdctl"
---
> ETCDCTL="${ASSET_DIR}/bin/etcdctl"
49c49
<   # dl_etcdctl
---
>   dl_etcdctl

Then performed the backup:

[root@etcd-1 ~]# /usr/local/bin/etcd-snapshot-backup.sh assets/backup/
Trying to backup etcd client certs..
etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-14 backing up to ./assets/backup/
Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/
Trying to backup latest static pod resources..
Snapshot saved at ./assets/tmp/snapshot.db
snapshot db and kube resources are successfully saved to assets/backup//snapshot_db_kuberesources_2020-02-25_030239.tar.gz!

PS:
We need to revert back the changes we have on etcd-snapshot-backup.sh script to avoid machine-config operatorgoes to DEGRADED state due to file mismatch, verification: oc describe pods -n machine-config-operator machine-config-daemon-XXX (the nodes where we modify the script)
To fix the DEGRADED state, we need to delete the problematic pods 

Note:
– Do not forget to store the snapshot backup file somewhere outside the OCP Nodes 
– For OCP nodes connected using proxy, We might need to add HTTP(S)_PROXY environment variables on the script or export them before running the script
– For OCP 4.3.5 and later, You might not need to modify the backup script.


Exit mobile version