基于KVM的VPS每3-7天崩溃一次。 这是VPS方面还是节点的问题?

我想知道VPS是导致崩溃的一个根本原因,这个崩溃发生在每天晚上03:00-4:00(内核bug或者别的东西)的3-7天,或者它是一个节点虚拟服务器托pipe(后端问题)。

详细信息:基于KVM的VPS(带有CentOS 7),在VPS提供商处托pipe的xfs,具有后端和存储后端基础架构。

通常情况如下,一旦运行的kthreadd进程变为D -status(即不可中断的睡眠),然后我们得到如下消息: blocked for more than 120 seconds. 和高LA:

May 21 03:08:01 vps root: root 2 0.0 0.0 0 0 ? S May18 0:00 [kthreadd] May 21 03:10:01 vps root: root 2 0.0 0.0 0 0 ? S May18 0:00 [kthreadd] May 21 03:12:01 vps root: root 2 0.0 0.0 0 0 ? S May18 0:00 [kthreadd] May 21 03:14:01 vps root: root 2 0.0 0.0 0 0 ? D May18 0:00 [kthreadd] May 21 03:15:16 vps kernel: INFO: task kthreadd:2 blocked for more than 120 seconds. May 21 03:15:16 vps kernel: kthreadd D ffffffffffffffff 0 2 0 0x00000000 May 21 03:15:16 vps kernel: [<ffffffff810a65f2>] kthreadd+0x2b2/0x2f0 May 21 03:16:01 vps root: root 2 0.0 0.0 0 0 ? D May18 0:00 [kthreadd] May 21 03:18:01 vps root: root 2 0.0 0.0 0 0 ? D May18 0:00 [kthreadd] May 21 03:20:02 vps root: root 2 0.0 0.0 0 0 ? D May18 0:00 [kthreadd]

在这里我们有一个呼叫跟踪:

May 18 04:14:37 vps kernel: INFO: task kthreadd:2 blocked for more than 120 seconds. May 18 04:14:37 vps kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 18 04:14:37 vps kernel: kthreadd D ffffffffffffffff 0 2 0 0x00000000 May 18 04:14:37 vps kernel: ffff88023413b4e0 0000000000000046 ffff880234120b80 ffff88023413bfd8 May 18 04:14:37 vps kernel: ffff88023413bfd8 ffff88023413bfd8 ffff880234120b80 ffff88023413b628 May 18 04:14:37 vps kernel: ffff88023413b630 7fffffffffffffff ffff880234120b80 ffffffffffffffff May 18 04:14:37 vps kernel: Call Trace: May 18 04:14:37 vps kernel: [<ffffffff8163ae49>] schedule+0x29/0x70 May 18 04:14:37 vps kernel: [<ffffffff81638b39>] schedule_timeout+0x209/0x2d0 May 18 04:14:37 vps kernel: [<ffffffff8104fac3>] ? x2apic_send_IPI_mask+0x13/0x20 May 18 04:14:37 vps kernel: [<ffffffff810b8a86>] ? try_to_wake_up+0x1b6/0x300 May 18 04:14:37 vps kernel: [<ffffffff8163b216>] wait_for_completion+0x116/0x170 May 18 04:14:37 vps kernel: [<ffffffff810b8c30>] ? wake_up_state+0x20/0x20 May 18 04:14:37 vps kernel: [<ffffffff8109e7ac>] flush_work+0xfc/0x1c0 May 18 04:14:37 vps kernel: [<ffffffff8109a7e0>] ? move_linked_works+0x90/0x90 May 18 04:14:37 vps kernel: [<ffffffffa021143a>] xlog_cil_force_lsn+0x8a/0x210 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa020fa7e>] _xfs_log_force_lsn+0x6e/0x2f0 [xfs] May 18 04:14:37 vps kernel: [<ffffffff81632005>] ? __slab_free+0x10e/0x277 May 18 04:14:37 vps kernel: [<ffffffffa020fd2e>] xfs_log_force_lsn+0x2e/0x90 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa0201fc9>] ? xfs_iunpin_wait+0x19/0x20 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa01fe4b7>] __xfs_iunpin_wait+0xa7/0x150 [xfs] May 18 04:14:37 vps kernel: [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40 May 18 04:14:37 vps kernel: [<ffffffffa0201fc9>] xfs_iunpin_wait+0x19/0x20 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa01f684c>] xfs_reclaim_inode+0x8c/0x350 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa01f6d77>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa01f7923>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] May 18 04:14:37 vps kernel: [<ffffffffa0206895>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] May 18 04:14:37 vps kernel: [<ffffffff811e0cd8>] prune_super+0xe8/0x170 May 18 04:14:37 vps kernel: [<ffffffff8117c5c5>] shrink_slab+0x165/0x300 May 18 04:14:37 vps kernel: [<ffffffff811d5f01>] ? vmpressure+0x21/0x90 May 18 04:14:37 vps kernel: [<ffffffff8117f742>] do_try_to_free_pages+0x3c2/0x4e0 May 18 04:14:37 vps kernel: [<ffffffff8117f95c>] try_to_free_pages+0xfc/0x180 May 18 04:14:37 vps kernel: [<ffffffff8117365d>] __alloc_pages_nodemask+0x7fd/0xb90 May 18 04:14:37 vps kernel: [<ffffffff81078d73>] copy_process.part.25+0x163/0x1610 May 18 04:14:37 vps kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 May 18 04:14:37 vps kernel: [<ffffffff8107a401>] do_fork+0xe1/0x320 May 18 04:14:37 vps kernel: [<ffffffff8107a666>] kernel_thread+0x26/0x30 May 18 04:14:37 vps kernel: [<ffffffff810a65f2>] kthreadd+0x2b2/0x2f0 May 18 04:14:37 vps kernel: [<ffffffff810a6340>] ? kthread_create_on_cpu+0x60/0x60 May 18 04:14:37 vps kernel: [<ffffffff81645e18>] ret_from_fork+0x58/0x90 May 18 04:14:37 vps kernel: [<ffffffff810a6340>] ? kthread_create_on_cpu+0x60/0x60

一个肮脏的网页技巧没有帮助。

只有硬重置有助于使服务器进入运行状态。

你能帮忙了解这是VPS方面还是节点问题?

问候,亚历克斯。

这可能是一个备份过程或在主机层面发生的影响存储的事情。 这是你无法控制的,你应该推动VPS提供商寻求解决scheme。

如果他们不能解决,可以考虑去其他地方。

这是因为你使用Redhat / CentOS 7.2和xfs。 内核不像7.1那样稳定。 如果你想使用CentOS 7.2,目前的解决scheme是迁移到ext4。