我遇到了一个很大的问题。 如果我不重新启动节点,一些基于Proxmox的LXC容器在2天内没有响应。
这种情况总是发生在夜晚的同一时间(我想在容器上发生的事情会导致重负荷)。
问题是: top / atop / htop没有显示任何东西。 proxmox-node对ssh命令没有问题,但5个节点中有2个没有真正的响应(我可以用SSHlogin,但是我不能input命令)。
我也必须做一个“硬”重启,因为重启不起作用(LXC容器在40分钟后不停止)。
这是我的PVE版本:
pveversion -v proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve) pve-manager: 4.1-15 (running version: 4.1-15/8cd55b52) pve-kernel-4.2.6-1-pve: 4.2.6-36 pve-kernel-2.6.32-43-pve: 2.6.32-166 pve-kernel-4.2.8-1-pve: 4.2.8-39 pve-kernel-4.2.2-1-pve: 4.2.2-16 pve-kernel-2.6.32-26-pve: 2.6.32-114 pve-kernel-4.2.3-2-pve: 4.2.3-22 lvm2: 2.02.116-pve2 corosync-pve: 2.3.5-2 libqb0: 1.0-1 pve-cluster: 4.0-33 qemu-server: 4.0-62 pve-firmware: 1.1-7 libpve-common-perl: 4.0-49 libpve-access-control: 4.0-11 libpve-storage-perl: 4.0-42 pve-libspice-server1: 0.12.5-2 vncterm: 1.2-1 pve-qemu-kvm: 2.5-9 pve-container: 1.0-46 pve-firewall: 2.0-18 pve-ha-manager: 1.0-24 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u1 lxc-pve: 1.1.5-7 lxcfs: 2.0.0-pve1 cgmanager: 0.39-pve1 criu: 1.6.0-1
不幸的是,日志没有显示任何东西。
系统日志:
Mar 15 04:32:31 server pvedaemon[4061]: worker exit Mar 15 04:32:31 server pvedaemon[1192]: worker 4061 finished Mar 15 04:32:31 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:32:31 server pvedaemon[1192]: worker 24675 started Mar 15 04:33:05 server pvedaemon[6601]: worker exit Mar 15 04:33:05 server pvedaemon[1192]: worker 6601 finished Mar 15 04:33:05 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:33:05 server pvedaemon[1192]: worker 25112 started Mar 15 04:34:57 server systemd-timesyncd[559]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.021s/0.001s/+1ppm Mar 15 04:36:08 server pveproxy[17238]: worker exit Mar 15 04:36:08 server pveproxy[1212]: worker 17238 finished Mar 15 04:36:08 server pveproxy[1212]: starting 1 worker(s) Mar 15 04:36:08 server pveproxy[1212]: worker 28231 started Mar 15 04:39:48 server pvedaemon[572]: worker exit Mar 15 04:39:48 server pvedaemon[1192]: worker 572 finished Mar 15 04:39:48 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:39:48 server pvedaemon[1192]: worker 31498 started Mar 15 04:40:40 server pveproxy[31690]: worker exit Mar 15 04:40:40 server pveproxy[1212]: worker 31690 finished Mar 15 04:40:40 server pveproxy[1212]: starting 1 worker(s) Mar 15 04:40:40 server pveproxy[1212]: worker 32442 started Mar 15 04:45:02 server pvedaemon[25112]: <root@pam> successful auth for user 'root@pam' Mar 15 04:46:27 server pveproxy[28231]: worker exit Mar 15 04:46:27 server pveproxy[1212]: worker 28231 finished Mar 15 04:46:27 server pveproxy[1212]: starting 1 worker(s) Mar 15 04:46:27 server pveproxy[1212]: worker 5082 started Mar 15 04:48:45 server pveproxy[17122]: worker exit Mar 15 04:48:45 server pveproxy[1212]: worker 17122 finished Mar 15 04:48:45 server pveproxy[1212]: starting 1 worker(s) Mar 15 04:48:45 server pveproxy[1212]: worker 6924 started Mar 15 04:51:28 server pvedaemon[25112]: worker exit Mar 15 04:51:28 server pvedaemon[1192]: worker 25112 finished Mar 15 04:51:28 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:51:28 server pvedaemon[1192]: worker 9770 started Mar 15 04:51:38 server pveproxy[32442]: worker exit Mar 15 04:51:38 server pveproxy[1212]: worker 32442 finished Mar 15 04:51:38 server pveproxy[1212]: starting 1 worker(s) Mar 15 04:51:38 server pveproxy[1212]: worker 9911 started Mar 15 04:52:45 server pvedaemon[31498]: worker exit Mar 15 04:52:45 server pvedaemon[1192]: worker 31498 finished Mar 15 04:52:45 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:52:45 server pvedaemon[1192]: worker 10794 started Mar 15 04:55:46 server pvedaemon[24675]: worker exit Mar 15 04:55:46 server pvedaemon[1192]: worker 24675 finished Mar 15 04:55:46 server pvedaemon[1192]: starting 1 worker(s) Mar 15 04:55:46 server pvedaemon[1192]: worker 13187 started Mar 15 04:57:32 server rrdcached[972]: flushing old values Mar 15 04:57:32 server rrdcached[972]: rotating journals Mar 15 04:57:32 server rrdcached[972]: started new journal /var/lib/rrdcached/journal/rrd.journal.1458014252.151024 Mar 15 04:57:32 server rrdcached[972]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1458007052.150971 Mar 15 04:57:40 server puppet-agent[14639]: Finished catalog run in 0.53 seconds
lxcfs:2.0.0-pve1有一个错误,让容器挂在内核中。
我通过更新到lxcfs:2.0.0-pve2来解决这个问题。 看看这里:
https://forum.proxmox.com/threads/proxmox-4-0-lxc-containers-network-unstable.26353/
我们运行相同的内核,也有LXC容器完全挂起。 同一主机上的KVM机器仍然运行正常。 有什么可以的,以及如何让LXC容器在不重新启动主机的情况下再次响应?
即使在主机上运行以下命令,它也不会继续:
pctinputID