Infrastructure requirements for Kubernetes cluster networking

Network engineers might find the transition to Kubernetes from configuring network devices intimidating, but the networking concepts are similar.

More enterprises are embracing Kubernetes to manage complex use cases across diverse domains like hybrid and multi-cloud environments. A well-designed network is imperative for seamless communication within the cluster. In this article, we discuss the following aspects of Kubernetes cluster networking:

Compute requirements for a Kubernetes cluster.
Core networking components, such as the Container Networking Interface (CNI) specification.
Network policies.
Service networking models.
Roles and permissions for network engineers.
Observability and scalability.

The following error code underscores the significance of a well-architected Kubernetes cluster network, displaying a snippet of code generated when attempting to run a container in a cluster without properly configured networking.
Warning NetworkNotReady 2m17s (x18 over 2m51s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Next, let’s dissect the components and principles that form the foundation of this infrastructure.

Compute requirements for a Kubernetes cluster
Compute comprises the nodes in a Kubernetes cluster. The first step in working with Kubernetes cluster networking is to gather infrastructure requirements, such as the following:

Determine the number of nodes — virtual or physical — that constitute the cluster. These nodes allocate resources to the pods.
Decide whether to deploy the nodes privately and if they’ll use network address translation (NAT).
Decide if the cluster will be a dual-stack IPv4-IPv6 address cluster.
Estimate the current and future requirements for IP addresses. If teams face restrictions on using existing IP addresses, they have an option to bring your own IP addresses.
Establish firewall rules for which type of traffic can access the nodes.
Confirm the network interface card for the nodes. Similarly, select the network devices based on the speed of the port, such as 10 Mbps and 100 Mbps. For example, in AWS, some nodes support higher bandwidth on AWS interfaces than others.

Network plugins and CNI
A network plugin configures networking on each container interface so they can communicate among themselves. The CNI specification defines how network plugins communicate with the container. It works on Layers 3 and 4 of the OSI model.The CNI also handles the allocation of IP addresses to containers, pods and routing protocols, such as Border Gateway Protocol, for communicating with external infrastructure and enforcing network policies.Commonly used CNI plugins are Calico and Cilium. Teams should choose a CNI that suits their requirements and has an active community supporting it.

Network policies for Kubernetes clusters
Network policies are like access control list configurations on network devices. They define rules for communication between pods and Kubernetes namespaces. A namespace is like a virtual LAN (VLAN) on a switch. For example, a VLAN divides a network into multiple logical segments, while a namespace divides a cluster into multiple virtual clusters.Teams can establish various network policies to control traffic, such as restricting pings between pods within a Kubernetes namespace, as shown in Figure 1.

Figure 1. Two pods confined in a namespace and unable to communicate between themselves.

Here is the YAML script that describes the deny network policy.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny
namespace: default
spec:
podSelector: {}
policyTypes:
– Egress
– Ingress

Service networking models
Kubernetes supports the following service networking models:

ClusterIP.
LoadBalancer.
ExternalName, aka a DNS alias.
NodePort, aka port forwarding.

These services communicate with other services, pods and external resources based on certain behaviors.

Figure 2. A service communicating with several pods in a Kubernetes cluster.

ClusterIP
This is the default service model for Kubernetes. It provides the service with an internal IP address, enabling pods within the same cluster to communicate using the IP address.
LoadBalancer
This model is like network load balancer options provided by vendors such as AWS and F5. It distributes network traffic across the resources and is useful for external communication. When teams configure and create this resource, Kubernetes requests a load balancer from the preferred vendor.
ExternalName
This model creates a DNS alias name for cloud services deployed outside the cluster, such as databases, APIs and messaging services. Pods and other services within the cluster can then use these aliases.
NodePort
This model is like port forwarding on network devices. On network devices, port forwarding targets a port and IP address. Similarly, in Kubernetes, a node port service can expose an application and target it using the node’s public IP address and a custom port.
Note that Kubernetes has a different port forwarding configuration. While it works similarly to a node port, the effect isn’t persistent, like with a node port.

Implement a continuous integration and deployment pipeline
A continuous integration and deployment pipeline is required for the following aspects of Kubernetes cluster networking:

Build and test a custom image before pushing it into the registry.
Apply the modified manifest file that contains network resources to the cluster.
Manage the infrastructure-as-code configuration for the cluster for automatic deployment.

Various vendors are available for this step, depending on a company’s infrastructure:

On-premises. Options include GitLab and Jenkins.
Public cloud. Use the vendor’s default option if it suits company requirements.
Multi-cloud. GitLab and Jenkins are also popular among developers for multi-cloud environments.

Roles and permissions
The next step is to set up roles for network engineers accessing the cluster. For example, an engineer who needs to access a cluster could have a role to access the whole cluster or a specific namespace within it.
Roles include permissions, such as get, list and watch, that allow or deny access to the following information:

Cluster. Information about a node or a namespace.
Resources. Information about pods or services, such as a load balancer or an external name for DNS.
Network policies. Information about policies attached to a pod and namespace.
Namespace. The engineer does or doesn’t have permissions only within a namespace.

Kubernetes documentation has more information about API permission verbs.

Cost considerations
In a public cloud environment, it’s necessary to consider egress cost for network traffic leaving the network.
Estimate the following factors based on the pod interface egress costs:

Communication within the cluster. Keep communication within the same cloud availability zone or data center zone, unless when otherwise required.
Communication to the internet. This involves three components. First, consider egress cost to the container registry. Instead of using a public registry, it’s possible to download the necessary dependencies and build an image once, then save it into the public cloud provider’s registry. When an engineer next needs to pull a new image to launch a pod, the egress traffic stays within the region. Second, consider egress cost to a second public cloud provider. Finally, evaluate any communication to the on-premises data center.

A Kubernetes cluster deployed across various public cloud providers and on premises.

Scalability and well-architected network infrastructure
When designing a network cluster, it’s important to consider high availability, disaster recovery and scalability. Here are some important considerations.
Compute
Teams that run a one worker-node cluster and manage it themselves should set up backup nodes for the control plane and worker node. An autoscaler can help with this. Also, patch nodes regularly for security purposes and use a managed cluster whenever possible.
Network security
Define least privileged access for network engineers and network policies, and encrypt all network traffic in transit using Transport Layer Security. Terminate decryption to load balancers or ingress instead of pods. Deploy a centralized firewall in multi-cluster environments.
IP routing
IP routing can create some challenges too. A common one is overlapping Classless Inter-Domain Routing. This is a big challenge when using multi-cloud tools or when a company acquires another. One way to address this issue is to use a tool that supports source and dynamic NAT gateways.
Another challenge is distributed load balancers. In a multi-cluster environment, different teams manage different pods, so they might have a specific load balancer in their virtual private cloud. This increases costs. Instead, teams can use a networking tool that supports a centralized load balancer for a multi-cluster environment.
Naming conventions
When architecting the cluster, it’s wise to use recognizable naming conventions for the network resource labels. In a key-value pair naming convention, labels are the key, and the value uses strings. For example, instead of using availability-zone:us-west-2a, try az:us-west-2a.
Resource sharing
Most public cloud vendors have resource-sharing services. For example, teams can share a subnet between different accounts so they don’t exceed IP addresses. They can also allow communication securely between accounts.
Multi-cluster single cloud
Instead of using traditional networking tools to manage communication between services, such as manual configuration of VPNs, firewalls and load balancers, use a service mesh for east-west traffic. A service mesh is an intelligent overlay network that provides secure connectivity between microservices. Examples include Cilium Mesh and HashiCorp Consul.
Multi-cluster hybrid and multi-cloud
If a company has clusters that span one cloud provider, opt to use a cloud provider’s dedicated network connection. The cloud provider’s connection will offer a higher bandwidth for up to 100 Gbps in contrast to a site-to-site VPN that typically offers less than 10 Gbps. Options include Aviatrix, Equinix and Google Cross-Cloud Interconnect. Select based on company requirements.
Observability and troubleshooting
Observability is also crucial within a Kubernetes cluster. It gleans insights about the cluster and pods, such as latency, packet loss and protocol header lookup.
To use this feature, engineers need to install their preferred vendor tool. Commonly used tools are Prometheus with Grafana and Cilium Hubble.

Test in a home lab
The next step is to try these concepts in a home lab. For a quick hands-on project, spin up a network cluster on your favorite public cloud vendor using Kubeadm or a managed cluster service like Amazon Elastic Kubernetes Service or Azure Kubernetes Service. Then try some networking tasks from the Kubernetes documentation.
Charles Uneze is a technical writer who specializes in cloud-native networking, Kubernetes and open source.