When a Node in a Kubernetes cluster crashes or shuts down, it enters the ‘NotReady‘ state in which it can’t be used to run Pods and all stateful Pods running on it become unavailable.
Common reasons of the ‘NotReady‘ error include a lack of resources on the Node, connectivity issue between the Node and the Control Plane, or an error related to a kube-proxy
or kubelet
.
This note shows how to troubleshoot the Kubernetes Node ‘NotReady‘ state.
Cool Tip: How to increase a verbosity of the kubectl
command! Read more →
Kubernetes Node ‘NotReady’ Troubleshooting
Identify a ‘NotReady’ Node
To identify a Kubernetes Node in the ‘NotReady‘ state, execute:
$ kubectl get nodes
- sample output -
NAME STATUS ROLES AGE VERSION
kuber-master Ready master 12d v1.18.6
kuber-node1 Ready <none> 12d v1.18.6
kuber-node2 Ready,SchedulingDisabled <none> 12d v1.18.6
kuber-node3 NotReady <none> 12d v1.18.6
A Kubernetes Node can be in one of the following states:
Status | Description |
---|---|
Ready | Able to run Pods. |
NotReady | Not operating due to some problem and can’t run Pods. |
SchedulingDisabled | Healthy but has been marked by the cluster as not schedulable. |
Unknown | If the Node controller can’t communicate with the Node, it waits a default of 40 seconds and then sets the Node status to Unknown. |
Kube-Proxy Issue
One of the reasons of the ‘NotReady‘ state of the Node is a kube-proxy
.
The kube-proxy
Pod is a network proxy that must run on each Node.
To check the state of the kube-proxy
Pod on the Node that is not ready, execute:
$ kubectl get pods -n kube-system -o wide | grep <nodeName> | grep kube-proxy
- sample output -
NAME READY STATUS AGE IP NODE
kube-proxy-28rzx 1/1 Running 12d 10.243.96.151 kuber-node3
ℹ The kube-system
is the Namespace for objects created by the Kubernetes system.
If the kube-proxy
is in some other state than ‘Running‘, use the following commands to get more information:
$ kubectl describe pod <podName> -n kube-system $ kubectl logs <podName> -n kube-system $ kubectl get events
If the Node doesn’t have the kube-proxy
, then you need to inspect a DaemonSet which is responsible for running of the kube-proxy
on each Node:
$ kubectl describe daemonset kube-proxy -n kube-system
ℹ A DaemonSet ensures that all eligible Nodes run a copy of a Pod.
The output of the above command might reveal any possible issues with the DaemonSet.
The Node is Running Out of Resources
If the Node is running out of resources, this can be another possible reason of the ‘NotReady‘ state.
Execute the following command to get the detailed information about the Node:
$ kubectl describe node <nodeName>
Search for the ‘Conditions’ section that shows if the Node is running out of resources or not.
The following conditions are available:
Type | Status | Description |
---|---|---|
MemoryPressure | False | Node is running out of memory. |
DiskPressure | False | Node is running out of disk space. |
PIDPressure | False | Node is running too many processes. |
Ready | True | Node is healthy and ready to accept Pods. |
Kubelet Issue
The kubelet
is the primary “node agent” that must run on each Node.
If it crashes or stops, the Node can’t communicate with the API server and goes into the ‘NotReady‘ state.
Run the following command and check the ‘Conditions’ section:
$ kubectl describe node <nodeName>
If all the conditions are ‘Unknown‘ with the “Kubelet stopped posting node status” message, this indicates that the kubelet
is down.
To debug this issue, you need to SSH into the Node and check if the kubelet
is running:
$ systemctl status kubelet.service $ journalctl -u kubelet.service
Once the issue is fixed, restart the kubelet
with:
$ systemctl restart kubelet.service
Cool Tip: How to troubleshoot when a Deployment is not ready and is not creating Pods on a Kubernetes cluster! Read more →
Connectivity Issue
One more reason of the ‘NotReady‘ state of the Node is the connectivity issue between the Node and the API server (the front-end of the Kubernetes Control Plane).
Run the following command and check the ‘Conditions’ section:
$ kubectl describe node <nodeName>
If it shows ‘NetworkUnavailable‘, this indicates an issue in the network communication between the Node and the API server.
To get the address of the API server, execute:
$ kubectl cluster-info
- sample output -
Kubernetes master is running at https://192.168.1.60:433
KubeDNS is running at https://192.168.1.60:433/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'
To test the network connectivity between the Node in the ‘NotReady‘ state and the Control Plane, SSH into the Node and execute:
$ nc -vz <apiServerEndpoint> <apiServerPort>
For example:
$ nc -vz 192.168.1.60 433
- sample output -
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connceted to 192.168.1.60:433.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.