有些早晨,debian虚拟机挂住了CPU并且没有响应

在某些早晨,通常在上午六点半到八点半之间,我的虚拟机会locking,导致VMWare服务器主机本身的附带损坏。 发生这种情况时,我无法通过SSH连接到虚拟机或主机。

我相信我已经把这个范围缩小到了cron.daily的mlocate工作。 但是,当然这个工作不应该有什么问题 ,所以我不能确定一个更大的问题。 对于这台机器来说,它的存储量是非常有限的,只有384MB。 也许并不现实,但是这超过了debian的要求,而且我知道这个系统在这个问题出现的时间里并没有做太多的工作。

以下是我在消息日志中收到的一些信息:

Jul 18 08:30:02 core kernel: [607607.955528] updatedb.mloc D ddadc12f 0 3274 3270 Jul 18 08:30:02 core kernel: [607607.955615] d746ece0 00000082 0011caef ddadc12f 000221d2 d746ee6c c1309fc0 00000000 Jul 18 08:30:02 core kernel: [607607.955692] d60c3b4c 01142a38 07a53f31 00000000 01142a38 d60c3b4c 01142a38 c6ae3d3c Jul 18 08:30:02 core kernel: [607607.955709] c1309fc0 00f4f000 c6ae3d3c c1300e28 c02b9048 c6ae3d34 00000000 c0190d2e Jul 18 08:30:02 core kernel: [607607.955723] Call Trace: Jul 18 08:30:02 core kernel: [607607.956038] [<c02b9048>] io_schedule+0x49/0x80 Jul 18 08:30:02 core kernel: [607607.956472] [<c0190d2e>] sync_buffer+0x30/0x33 Jul 18 08:30:02 core kernel: [607607.956511] [<c02b9236>] __wait_on_bit+0x33/0x58 Jul 18 08:30:02 core kernel: [607607.956515] [<c0190cfe>] sync_buffer+0x0/0x33 Jul 18 08:30:02 core kernel: [607607.956524] [<c0190cfe>] sync_buffer+0x0/0x33 Jul 18 08:30:02 core kernel: [607607.956527] [<c02b92ba>] out_of_line_wait_on_bit+0x5f/0x67 Jul 18 08:30:02 core kernel: [607607.956533] [<c0131a91>] wake_bit_function+0x0/0x3c Jul 18 08:30:02 core kernel: [607607.956583] [<c0190cca>] __wait_on_buffer+0x16/0x18 Jul 18 08:30:02 core kernel: [607607.956593] [<d89b153d>] ext3_find_entry+0x37a/0x515 [ext3] Jul 18 08:30:02 core kernel: [607607.957163] [<c01bae24>] security_inode_alloc+0x16/0x17 Jul 18 08:30:02 core kernel: [607607.957192] [<c0184900>] alloc_inode+0x12e/0x186 Jul 18 08:30:02 core kernel: [607607.957210] [<c0184ce9>] iget_locked+0x5b/0x100 Jul 18 08:30:02 core kernel: [607607.957217] [<d89b2bea>] ext3_lookup+0x21/0x9b [ext3] Jul 18 08:30:02 core kernel: [607607.957228] [<c017aac3>] do_lookup+0xb6/0x153 Jul 18 08:30:13 core kernel: [607607.957233] [<c017c6c4>] __link_path_walk+0x726/0xb26 Jul 18 08:30:13 core kernel: [607607.957239] [<c0186f4c>] mntput_no_expire+0x13/0xd9 Jul 18 08:30:13 core kernel: [607607.957243] [<c017cafb>] path_walk+0x37/0x70 Jul 18 08:30:13 core kernel: [607607.957247] [<c017cdaa>] do_path_lookup+0x122/0x184 Jul 18 08:30:13 core kernel: [607607.957251] [<c017d607>] __user_walk_fd+0x29/0x3a Jul 18 08:30:13 core kernel: [607607.957255] [<c0177625>] vfs_lstat_fd+0x12/0x39 Jul 18 08:30:13 core kernel: [607607.957276] [<c01776b9>] sys_lstat64+0xf/0x23 Jul 18 08:30:13 core kernel: [607607.957283] [<c0103857>] sysenter_past_esp+0x78/0xb1 Jul 18 08:30:13 core kernel: [607607.957344] ======================= 

而最近略less,

 Jun 30 07:44:11 core kernel: [2065298.377450] ionice D 299741d5 0 32588 32441 Jun 30 07:44:11 core kernel: [2065298.377515] ce11a5e0 00000086 02a1416f 299741d5 000755a5 ce11a76c c1209fc0 00000000 Jun 30 07:44:11 core kernel: [2065298.377578] c38d5f6c 058eebe6 003d2086 00000000 058eebe6 c38d5f6c 058eebe6 c3b9fd08 Jun 30 07:44:11 core kernel: [2065298.377598] c1209fc0 00e4f000 c3b9fd08 c12001cc c02b9048 c3b9fd00 00000000 c0190d2e Jun 30 07:44:11 core kernel: [2065298.377612] Call Trace: Jun 30 07:44:11 core kernel: [2065298.378275] [<c02b9048>] io_schedule+0x49/0x80 Jun 30 07:44:11 core kernel: [2065298.379280] [<c0190d2e>] sync_buffer+0x30/0x33 Jun 30 07:44:11 core kernel: [2065298.379325] [<c02b9236>] __wait_on_bit+0x33/0x58 Jun 30 07:44:11 core kernel: [2065298.379331] [<c0190cfe>] sync_buffer+0x0/0x33 Jun 30 07:44:11 core kernel: [2065298.379338] [<c0190cfe>] sync_buffer+0x0/0x33 Jun 30 07:44:11 core kernel: [2065298.379342] [<c02b92ba>] out_of_line_wait_on_bit+0x5f/0x67 Jun 30 07:44:11 core kernel: [2065298.379348] [<c0131a91>] wake_bit_function+0x0/0x3c Jun 30 07:44:11 core kernel: [2065298.379399] [<c0190cca>] __wait_on_buffer+0x16/0x18 Jun 30 07:44:12 core kernel: [2065298.379415] [<d09af08d>] ext3_bread+0x44/0x5b [ext3] Jun 30 07:44:12 core kernel: [2065298.379680] [<d09b0f50>] dx_probe+0x3a/0x2ad [ext3] Jun 30 07:44:12 core kernel: [2065298.379692] [<c01e046c>] rb_insert_color+0x4c/0xad Jun 30 07:44:12 core kernel: [2065298.379741] [<d09b1280>] ext3_find_entry+0xbd/0x515 [ext3] Jun 30 07:44:12 core kernel: [2065298.379753] [<c01344ec>] hrtimer_start+0xf7/0x110 Jun 30 07:44:12 core kernel: [2065298.379760] [<c01361e0>] getnstimeofday+0x37/0xbc Jun 30 07:44:12 core kernel: [2065298.379765] [<c0134658>] ktime_get_ts+0x22/0x49 Jun 30 07:44:12 core kernel: [2065298.379769] [<c0155174>] delayacct_end+0x70/0x77 Jun 30 07:44:12 core kernel: [2065298.379788] [<c0156aee>] sync_page+0x0/0x36 Jun 30 07:44:12 core kernel: [2065298.379803] [<c0155249>] __delayacct_blkio_end+0x56/0x59 Jun 30 07:44:12 core kernel: [2065298.379810] [<c02b9063>] io_schedule+0x64/0x80 Jun 30 07:44:12 core kernel: [2065298.379816] [<d09b2bea>] ext3_lookup+0x21/0x9b [ext3] Jun 30 07:44:12 core kernel: [2065298.379827] [<c017aac3>] do_lookup+0xb6/0x153 Jun 30 07:44:12 core kernel: [2065298.379847] [<c017c6c4>] __link_path_walk+0x726/0xb26 Jun 30 07:44:12 core kernel: [2065298.379852] [<c0131a49>] __wake_up_bit+0x29/0x2e Jun 30 07:44:12 core kernel: [2065298.379857] [<c01621a6>] __do_fault+0x30e/0x34d Jun 30 07:44:12 core kernel: [2065298.379863] [<c017cafb>] path_walk+0x37/0x70 Jun 30 07:44:12 core kernel: [2065298.379867] [<c017cdaa>] do_path_lookup+0x122/0x184 Jun 30 07:44:12 core kernel: [2065298.379872] [<c017d78c>] __path_lookup_intent_open+0x42/0x72 Jun 30 07:44:12 core kernel: [2065298.379878] [<c017d80b>] path_lookup_open+0xf/0x13 Jun 30 07:44:12 core kernel: [2065298.379882] [<c0177c98>] open_exec+0x1d/0x94 Jun 30 07:44:12 core kernel: [2065298.379900] [<c0164be3>] free_pgtables+0x86/0x93 Jun 30 07:44:12 core kernel: [2065298.379906] [<c0182b46>] dput+0x25/0xbb Jun 30 07:44:12 core kernel: [2065298.379912] [<c0178d13>] do_execve+0x48/0x1c6 Jun 30 07:44:12 core kernel: [2065298.379917] [<c010213b>] sys_execve+0x2a/0x4a Jun 30 07:44:12 core kernel: [2065298.379944] [<c0103857>] sysenter_past_esp+0x78/0xb1 Jun 30 07:44:12 core kernel: [2065298.379984] ======================= 

我会指出ionice实际上是由mlocate cron工作使用的。

编辑:这个问题似乎是零星的 – 它可能每周杀死一次机器,但它也似乎与正常运行时间变差。 我真的不想责怪cron作业,因为我通常在我安装和支持的几乎所有服务器上运行debian lenny – 这里没什么特别的。 难道是内存泄漏? 我说它正常运行时间会变得更糟,因为我在我的vmware主机上运行nagios,通常在4-6天之后,我开始在上午一分钟发出负载警告,然后是第二天两分钟。 在发生这种情况时,我一直在尝试远程访问,但是当发生什么事情时,我只是无法连接到访客虚拟机。

也许mlocate是症状,但不是原因。 你在服务器上有其他的cron作业吗? 尝试删除他们(如果他们不是真的必要)除了mlocate,看看它是否再次发生。 你有没有在服务器上挂载的文件系统?