Getting started

Edit This Page

Troubleshooting kubeadm

As with any program, you might run into an error installing or running kubeadm. This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem.

If your problem is not listed below, please follow the following steps:

ebtables or some similar executable not found during installation

If you see the following warnings while running kubeadm init

[preflight] WARNING: ebtables not found in system path
[preflight] WARNING: ethtool not found in system path

Then you may be missing ebtables, ethtool or a similar executable on your node. You can install them with the following commands:

kubeadm blocks waiting for control plane during installation

If you notice that kubeadm init hangs after printing out the following line:

[apiclient] Created API client, waiting for the control plane to become ready

This may be caused by a number of problems. The most common are:

  error: failed to run Kubelet: failed to create kubelet:
  misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

There are two common ways to fix the cgroup driver problem:

  1. Install Docker again following instructions here.

  2. Change the kubelet config to match the Docker cgroup driver manually, you can refer to Configure cgroup driver used by kubelet on Master Node

kubeadm blocks when removing managed containers

The following could happen if Docker halts and does not remove any Kubernetes-managed containers:

sudo kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
(block)

A possible solution is to restart the Docker service and then re-run kubeadm reset:

sudo systemctl restart docker.service
sudo kubeadm reset

Inspecting the logs for docker may also be useful:

journalctl -ul docker

Pods in RunContainerError, CrashLoopBackOff or Error state

Right after kubeadm init there should not be any pods in these states.

coredns (or kube-dns) is stuck in the Pending state

This is expected and part of the design. kubeadm is network provider-agnostic, so the admin should install the pod network solution of choice. You have to install a Pod Network before CoreDNS may be deployed fully. Hence the Pending state before the network is set up.

HostPort services do not work

The HostPort and HostIP functionality is available depending on your Pod Network provider. Please contact the author of the Pod Network solution to find out whether HostPort and HostIP functionality are available.

Calico, Canal, and Flannel CNI providers are verified to support HostPort.

For more information, see the CNI portmap documentation.

If your network provider does not support the portmap CNI plugin, you may need to use the NodePort feature of services or use HostNetwork=true.

Pods are not accessible via their Service IP

TLS certificate errors

The following error indicates a possible certificate mismatch.

# kubectl get pods
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
  unset KUBECONFIG

Or set it to the default KUBECONFIG location:

  export KUBECONFIG=/etc/kubernetes/admin.conf
  mv  $HOME/.kube $HOME/.kube.bak
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Default NIC When using flannel as the pod network in Vagrant

The following error might indicate that something was wrong in the pod network:

Error from server (NotFound): the server could not find the requested resource

Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address 10.0.2.15, is for external traffic that gets NATed.

This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the --iface eth1 flag to flannel so that the second interface is chosen.

Non-public IP used for containers

In some situations kubectl logs and kubectl run commands may return with the following errors in an otherwise functional cluster:

Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc65b868-glc5m/mysql: dial tcp 10.19.0.41:10250: getsockopt: no route to host

Use ip addr show to check for this scenario instead of ifconfig because ifconfig will not display the offending alias IP address. Alternatively an API endpoint specific to Digital Ocean allows to query for the anchor IP from the droplet:

  curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address

The workaround is to tell kubelet which IP to use using --node-ip. When using Digital Ocean, it can be the public one (assigned to eth0) or the private one (assigned to eth1) should you want to use the optional private network. The KubeletExtraArgs section of the kubeadm NodeRegistrationOptions structure can be used for this.

Then restart kubelet:

  systemctl daemon-reload
  systemctl restart kubelet

coredns pods have CrashLoopBackOff or Error state

If you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the coredns pods are not starting. To solve that you can try one of the following options:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

Another cause for CoreDNS to have CrashLoopBackOff is when a CoreDNS Pod deployed in Kubernetes detects a loop. A number of workarounds are available to avoid Kubernetes trying to restart the CoreDNS Pod every time CoreDNS detects the loop and exits.

Warning: Disabling SELinux or setting allowPrivilegeEscalation to true can compromise the security of your cluster.

etcd pods restart continually

If you encounter the following error:

rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

this issue appears if you run CentOS 7 with Docker 1.13.1.84. This version of Docker can prevent the kubelet from executing into the etcd container.

To work around the issue, choose one of these options:

Not possible to pass a comma separated list of values to arguments inside a --component-extra-args flag

kubeadm init flags such as --component-extra-args allow you to pass custom arguments to a control-plane component like the kube-apiserver. However, this mechanism is limited due to the underlying type used for parsing the values (mapStringString).

If you decide to pass an argument that supports multiple, comma-separated values such as --apiserver-extra-args "enable-admission-plugins=LimitRanger,NamespaceExists" this flag will fail with flag: malformed pair, expect string=string. This happens because the list of arguments for --apiserver-extra-args expects key=value pairs and in this case NamespacesExists is considered as a key that is missing a value.

Alternatively, you can try separating the key=value pairs like so: --apiserver-extra-args "enable-admission-plugins=LimitRanger,enable-admission-plugins=NamespaceExists" but this will result in the key enable-admission-plugins only having the value of NamespaceExists.

A known workaround is to use the kubeadm configuration file.

kube-proxy scheduled before node is initialized by cloud-controller-manager

In cloud provider scenarios, kube-proxy can end up being scheduled on new worker nodes before the cloud-controller-manager has initialized the node addresses. This causes kube-proxy to fail to pick up the node’s IP address properly and has knock-on effects to the proxy function managing load balancers.

The following error can be seen in kube-proxy Pods:

server.go:610] Failed to retrieve node IP: host IP unknown; known addresses: []
proxier.go:340] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP

A known solution is to patch the kube-proxy DaemonSet to allow scheduling it on control-plane nodes regardless of their conditions, keeping it off of other nodes until their initial guarding conditions abate:

kubectl -n kube-system patch ds kube-proxy -p='{ "spec": { "template": { "spec": { "tolerations": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/master" } ] } } } }'

The tracking issue for this problem is here.

The NodeRegistration.Taints field is omitted when marshalling kubeadm configuration

Note: This issue only applies to tools that marshal kubeadm types (e.g. to a YAML configuration file). It will be fixed in kubeadm API v1beta2.

By default, kubeadm applies the role.kubernetes.io/master:NoSchedule taint to control-plane nodes. If you prefer kubeadm to not taint the control-plane node, and set InitConfiguration.NodeRegistration.Taints to an empty slice, the field will be omitted when marshalling. When the field is omitted, kubeadm applies the default taint.

There are at least two workarounds:

  1. Use the role.kubernetes.io/master:PreferNoSchedule taint instead of an empty slice. Pods will get scheduled on masters, unless other nodes have capacity.

  2. Remove the taint after kubeadm init exits:

    kubectl taint nodes NODE_NAME role.kubernetes.io/master:NoSchedule-

Feedback