我正在使用kubeadm尝试设置一个开发人员。 我遇到了一个问题,即kubelet的健康检查失败。 我正在寻找如何debugging的方向。 运行build议debugging的命令( systemctl status kubelet )没有看到错误的原因:
kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago Docs: http://kubernetes.io/docs/ Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE) Main PID: 4786 (code=exited, status=1/FAILURE) Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state. Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.
我在哪里可以find一个特定的错误消息,指出为什么这不运行?
在运行
swapoff -a来禁用swap之后,我仍然无法configurationKubernetes。
以下是kubeadm init的完整输出:
$ kubeadm init [kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters. [init] Using Kubernetes version: v1.8.2 [init] Using Authorization modes: [Node RBAC] [preflight] Running pre-flight checks [preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03 [preflight] Starting the kubelet service [kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0) [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf" [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" [init] This often takes around a minute; or longer if the control plane images have to be pulled. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by that: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) - There is no internet connection; so the kubelet can't pull the following control plane images: - gcr.io/google_containers/kube-apiserver-amd64:v1.8.2 - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2 - gcr.io/google_containers/kube-scheduler-amd64:v1.8.2 You can troubleshoot this for example with the following commands if you're on a systemd-powered system: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' couldn't initialize a Kubernetes cluster
我也尝试删除docker仓库并安装不可运行的Docker 1.12 – Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ... Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...
通过在systemd脚本中设置–fail-swap-on = false来解决问题。 只需要修改文件/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment=“KUBELET_SYSTEM_PODS_ARGS = – pod-manifest-path = / etc / kubernetes / manifests –allow-privileged = true –fail-swap-on = false”
然后运行systemctl守护进程重新加载然后systemctl重新启动kubelet
发现这个问题: https : //github.com/kubernetes/kubernetes/issues/53333
之前的答案为我工作,但不是在关联问题提供的决议。
所以也许,按照他们编辑90-kubeadm.conf(取代10-kubeadm.conf)的build议是可行的
这个问题已经在Atom发布的问题中讨论过了,所以我不觉得我贡献了很多,但是如果打开swap,我可以复制你的问题。 所以对我来说,解决方法是禁用交换并重试init:
sudo -i swapoff -a kubeadm reset kubeadm init
dirtbag发布的答案也适用于我,但为了在systemctl daemon-reload之后安全起见,我做了一个完整的kubeadm reset和kubeadm init ,而不仅仅是systemctl restart kubelet 。
如果这不适合你,你可以在禁用swap之后粘贴kubeadm init的新输出吗?