Introduction
By default, all namespace egress networking will be using host IP where the pod sits as SNAT rule, similar to how our home router works to access the internet.
For access control and traceability, this pose some difficulties and security issues since all pod regardless which namespace it currently deployed, will use common host IP where the said pod is running.
With OpenShift, we have a concept known as EgressIP. With an EgressIP, we can configure an IP(or set of IPs) to be assigned to particular namespace that we required to have more fine grain control on egress route. Throughout the namespace lifecycle, it will use the assigned EgressIP(s) for any egress connection to external network(out from SDN layer).
There are two types of EgressIP configuration:
- Automatically assigned EgressIP
- Manually assigned EgressIP
For details of the configuration kindly refer to official documentation.
Example of Typical Use Case
Firewall rule to access certain external target
Configuring all host IPs to be allowed on firewall to access any secured resource is less efficient and not scalable. Furthermore it might pose a greater attack surface (imagine if the cluster has hundreds of worker node running collectively, hence hundreds of IPs on the firewall need to be allowed).
With the EgressIP, we can assign one or more EgressIP and allow that IP to access the resource on the firewall.
Automatically Assigned EgressIP
As the name suggest, this type of configuration is the easiest way of letting EgressIP to be automatically assigned. Cluster administrator just need to assign an EgressCIDR to node required to host the EgressIP and then patch the netnamespace for the namespace to select which IP to be used.
IMPORTANT: Since the CIDR are being used as the same as primary network interface(interface of host IP as seen by control plane), IP address management is critical to avoid routing issue. As an example EgressIP selected already being used by some other node in the subnet, this will cause ARP issue since two same IPs exist on different MAC address. This can be controlled by using proper range in CIDR format.
Configuring the netnamespace and hostsubnet:
#> oc patch netnamespace mywebserver --type=merge -p \
'{"egressIPs": [ "192.168.50.120"]}'
#> oc patch hostsubnet worker01 --type=merge -p \
'{"egressCIDRs": ["192.168.50.0/24"]}'
#> oc patch hostsubnet worker02 --type=merge -p \
'{"egressCIDRs": ["192.168.50.0/24"]}'
EgressIP now automatically being assigned by SDN controller:
#> oc get hostsubnets.network.openshift.io
NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS
master01 master01 192.168.50.141 10.10.0.0/23
master02 master02 192.168.50.142 10.9.0.0/23
master03 master03 192.168.50.143 10.8.0.0/23
worker01 worker01 192.168.50.144 10.11.0.0/23 [192.168.50.0/24]
worker02 worker02 192.168.50.145 10.8.2.0/23 [192.168.50.0/24] [192.168.50.120]
Now lets kill the worker02 and observe the IP reassigned to the next available node:
#>oc get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 23h v1.17.1
master02 Ready master 23h v1.17.1
master03 Ready master 23h v1.17.1
worker01 Ready worker 23h v1.17.1
worker02 NotReady worker 23h v1.17.1
Now the EgressIP should move to the worker01 (the only next available node):
#> oc get hostsubnets.network.openshift.io
NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS
master01 master01 192.168.50.141 10.10.0.0/23
master02 master02 192.168.50.142 10.9.0.0/23
master03 master03 192.168.50.143 10.8.0.0/23
worker01 worker01 192.168.50.144 10.11.0.0/23 [192.168.50.0/24] [192.168.50.120]
worker02 worker02 192.168.50.145 10.8.2.0/23 [192.168.50.0/24]
Statically Assigned EgressIP
Next available configuration is a manually assigned EgressIP, cluster administrator need to assigned each of the IP that can be hosted by the node and use at least two of those IPs for netnamespace to provide high availability.
Each node should have a specific IP and that IP cant be assigned to another node it will lead to this error in our lab:
#> oc get hostsubnets.network.openshift.io
NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS
master01 master01 192.168.50.141 10.10.0.0/23
master02 master02 192.168.50.142 10.9.0.0/23
master03 master03 192.168.50.143 10.8.0.0/23
worker01 worker01 192.168.50.144 10.11.0.0/23 [192.168.50.120 192.168.50.121]
worker02 worker02 192.168.50.145 10.8.2.0/23 [192.168.50.120 192.168.50.121]
#> oc logs -f sdn-8zhv7
E0509 08:09:03.956672 2649 egressip.go:370] Multiple EgressIPs (192.168.50.120, 192.168.50.121) for VNID 5281170 on node 192.168.50.144
E0509 08:09:03.956695 2649 egressip.go:370] Multiple EgressIPs (192.168.50.121, 192.168.50.120) for VNID 5281170 on node 192.168.50.144
E0509 08:09:04.747512 2649 egressip.go:370] Multiple nodes (192.168.50.144, 192.168.50.145) claiming EgressIP 192.168.50.120
E0509 08:09:04.747538 2649 egressip.go:370] Multiple nodes (192.168.50.144, 192.168.50.145) claiming EgressIP 192.168.50.121
To statically assigned an EgressIP, patch netnamespace and hostsubnet:
#> oc patch netnamespace mywebserver --type=merge -p \
'{"egressIPs": [ "192.168.50.120", "192.168.50.121"]}'
#> oc patch hostsubnet worker01 --type=merge -p \
'{"egressIPs": [ "192.168.50.120"]}'
#> oc patch hostsubnet worker02 --type=merge -p \
'{"egressIPs":[ "192.168.50.121"]}'
Now the pod in the namespace has two IPs attached to two different nodes that it can use for outbound connection, it will switch to next IP when first IP is unreachable:
#> oc get hostsubnets.network.openshift.io
NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS
master01 master01 192.168.50.141 10.10.0.0/23
master02 master02 192.168.50.142 10.9.0.0/23
master03 master03 192.168.50.143 10.8.0.0/23
worker01 worker01 192.168.50.144 10.11.0.0/23 [192.168.50.120]
worker02 worker02 192.168.50.145 10.8.2.0/23 [192.168.50.121]
#> oc get netnamespaces.network.openshift.io mywebserver
NAME NETID EGRESS IPS
mywebserver 5281170 [192.168.50.120 192.168.50.121]
Summary
With EgressIP, it is much more easier to control traffic and allow specific traffic to specific target(s). However this EgressIP only can be attached to primary interface (interface of host IP as seen by control plane), hence multi-homed node cant be attached to different interface except a default gateway, else we will see this error:
E0509 07:13:30.090807 2838 egressip.go:121] Error assigning Egress IP "10.10.10.81": egress IP "10.10.10.81" is not in local network 192.168.50.0/24 of interface enp1s0