我的服务器发生了什么事,它崩溃了

我安装了CentOS 6.2 64bit,已经有44天了。 它突然崩溃,所以我login到KVM并检查 – 我设法打印这个屏幕。

! http://picpaste.com/1-cgYdKDAy.png(即时新,所以不能上传图片在这里)

任何想法可能造成的? 我问数据中心硬重启服务器,现在它再次确定,我可以login到SSH。 我应该检查什么日志?

更新

以下是从/ var / log / message请求的日志:

Jun 28 12:24:27 la-noc lfd[13058]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:24:27 la-noc lfd[13058]: daemon stopped Jun 28 12:25:55 la-noc proftpd[12732]: 96.44.184.123 (115.133.56.39[115.133.56.39]) - Client session idle timeout, disconnected Jun 28 12:25:55 la-noc proftpd[12732]: 96.44.184.123 (115.133.56.39[115.133.56.39]) - FTP session closed. Jun 28 12:26:28 la-noc lfd[13114]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:26:28 la-noc lfd[13114]: daemon stopped Jun 28 12:26:42 la-noc lfd[13125]: DynDNS - update IP addresses Jun 28 12:28:06 la-noc proftpd[13188]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened. Jun 28 12:28:06 la-noc proftpd[13188]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed. Jun 28 12:28:28 la-noc lfd[13204]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:28:28 la-noc lfd[13204]: daemon stopped Jun 28 12:28:55 la-noc kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:43:95:42:00:04:80:5c:17:25:08:00 SRC=79.169.210.214 DST=96.44.184.126 LEN=60 TOS=0x$ Jun 28 12:28:58 la-noc kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:43:95:42:00:04:80:5c:17:25:08:00 SRC=79.169.210.214 DST=96.44.184.126 LEN=60 TOS=0x$ Jun 28 12:30:29 la-noc lfd[13291]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:30:29 la-noc lfd[13291]: daemon stopped Jun 28 12:31:43 la-noc lfd[13332]: DynDNS - update IP addresses Jun 28 12:32:29 la-noc lfd[13363]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:32:29 la-noc lfd[13363]: daemon stopped Jun 28 12:34:02 la-noc proftpd[13415]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened. Jun 28 12:34:02 la-noc proftpd[13415]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed. Jun 28 12:34:29 la-noc lfd[13434]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:34:29 la-noc lfd[13434]: daemon stopped Jun 28 12:36:29 la-noc lfd[13493]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:36:29 la-noc lfd[13493]: daemon stopped Jun 28 12:36:44 la-noc lfd[13506]: DynDNS - update IP addresses Jun 28 12:38:29 la-noc lfd[13555]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:38:29 la-noc lfd[13555]: daemon stopped Jun 28 12:39:03 la-noc proftpd[13600]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened. Jun 28 12:39:03 la-noc proftpd[13600]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed. Jun 28 12:40:29 la-noc lfd[13648]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:40:29 la-noc lfd[13648]: daemon stopped Jun 28 12:41:44 la-noc lfd[13680]: DynDNS - update IP addresses Jun 28 12:42:29 la-noc lfd[13712]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:42:29 la-noc lfd[13712]: daemon stopped Jun 28 12:44:29 la-noc lfd[13771]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790. Jun 28 12:44:29 la-noc lfd[13771]: daemon stopped Jun 28 12:44:30 la-noc proftpd[13781]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened. Jun 28 12:44:30 la-noc proftpd[13781]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed. Jun 28 15:56:26 la-noc kernel: imklog 4.6.2, log source = /proc/kmsg started. Jun 28 15:56:26 la-noc rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1459" x-info="http://www.rsyslog.com"] (re)start Jun 28 15:56:26 la-noc kernel: Initializing cgroup subsys cpuset Jun 28 15:56:26 la-noc kernel: Initializing cgroup subsys cpu Jun 28 15:56:26 la-noc kernel: Linux version 2.6.32-220.17.1.el6.x86_64 ([email protected].centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 S$ Jun 28 15:56:26 la-noc kernel: Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-s$ Jun 28 15:56:26 la-noc kernel: KERNEL supported cpus: Jun 28 15:56:26 la-noc kernel: Intel GenuineIntel Jun 28 15:56:26 la-noc kernel: AMD AuthenticAMD Jun 28 15:56:26 la-noc kernel: Centaur CentaurHauls Jun 28 15:56:26 la-noc kernel: BIOS-provided physical RAM map: Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000000100000 - 00000000fbff0000 (usable) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000fbff0000 - 00000000fbfff000 (ACPI data) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000fbfff000 - 00000000fc000000 (ACPI NVS) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000100000000 - 0000000400000000 (usable) Jun 28 15:56:26 la-noc kernel: DMI 2.3 present. Jun 28 15:56:26 la-noc kernel: SMBIOS version 2.3 @ 0xF7570 Jun 28 15:56:26 la-noc kernel: AMI BIOS detected: BIOS may corrupt low RAM, working around it. Jun 28 15:56:26 la-noc kernel: last_pfn = 0x400000 max_arch_pfn = 0x400000000 Jun 28 15:56:26 la-noc kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Jun 28 15:56:26 la-noc kernel: total RAM covered: 16320M Jun 28 15:56:26 la-noc kernel: Found optimal setting for mtrr clean up Jun 28 15:56:26 la-noc kernel: gran_size: 64K chunk_size: 128M num_reg: 4 lose cover RAM: 0G Jun 28 15:56:26 la-noc kernel: last_pfn = 0xfbff0 max_arch_pfn = 0x400000000 Jun 28 15:56:26 la-noc kernel: init_memory_mapping: 0000000000000000-00000000fbff0000 Jun 28 15:56:26 la-noc kernel: init_memory_mapping: 0000000100000000-0000000400000000 Jun 28 15:56:26 la-noc kernel: RAMDISK: 37217000 - 37fefcd2 Jun 28 15:56:26 la-noc kernel: ACPI: RSDP 00000000000f6f20 00024 (v02 ACPIAM) Jun 28 15:56:26 la-noc kernel: ACPI: XSDT 00000000fbff0100 00054 (v01 AMI OEMXSDT 07000626 MSFT 00000097) Jun 28 15:56:26 la-noc kernel: ACPI: FACP 00000000fbff0281 000F4 (v01 AMI OEMFACP 07000626 MSFT 00000097) Jun 28 15:56:26 la-noc kernel: ACPI: DSDT 00000000fbff0410 03751 (v01 0AAAA 0AAAA000 00000000 INTL 02002026) Jun 28 15:56:26 la-noc kernel: ACPI: FACS 00000000fbfff000 00040 Jun 28 15:56:26 la-noc kernel: ACPI: APIC 00000000fbff0380 00084 (v01 AMI OEMAPIC 07000626 MSFT 00000097) Jun 28 15:56:26 la-noc kernel: ACPI: OEMB 00000000fbfff040 00041 (v01 AMI OEMBIOS 07000626 MSFT 00000097) Jun 28 15:56:26 la-noc kernel: ACPI: SRAT 00000000fbff3b70 00110 (v01 AMI OEMSRAT 07000626 MSFT 00000097) Jun 28 15:56:26 la-noc kernel: ACPI: ASF! 00000000fbff3cc0 00086 (v01 AMIASF AMDSTRET 00000001 INTL 02002026) Jun 28 15:56:26 la-noc kernel: SRAT: PXM 0 -> APIC 0 -> Node 0 Jun 28 15:56:26 la-noc kernel: SRAT: PXM 0 -> APIC 1 -> Node 0 Jun 28 15:56:26 la-noc kernel: SRAT: PXM 1 -> APIC 2 -> Node 1 Jun 28 15:56:26 la-noc kernel: SRAT: PXM 1 -> APIC 3 -> Node 1 Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 100000-fc000000 Jun 28 15:56:26 la-noc kernel: SRAT: Node 1 PXM 1 200000000-400000000 Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 100000000-200000000 Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 0-9fc00 Jun 28 15:56:26 la-noc kernel: Bootmem setup node 0 0000000000000000-0000000200000000 Jun 28 15:56:26 la-noc kernel: NODE_DATA [0000000000028040 - 000000000005c03f] Jun 28 15:56:26 la-noc kernel: bootmap [000000000005d000 - 000000000009cfff] pages 40 Jun 28 15:56:26 la-noc kernel: (9 early reservations) ==> bootmem [0000000000 - 0200000000] Jun 28 15:56:26 la-noc kernel: #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] Jun 28 15:56:26 la-noc kernel: #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] Jun 28 15:56:26 la-noc kernel: #2 [0001000000 - 000200c864] TEXT DATA BSS ==> [0001000000 - 000200c864] Jun 28 15:56:26 la-noc kernel: #3 [0037217000 - 0037fefcd2] RAMDISK ==> [0037217000 - 0037fefcd2] Jun 28 15:56:26 la-noc kernel: #4 [000009f400 - 0000100000] BIOS reserved ==> [000009f400 - 0000100000] 

更新 sar输出到这里:

 root@la-noc [~]# sar Linux 2.6.32-220.13.1.el6.x86_64 (server.abc.com) 06/28/2012 _x86_64_ (4 CPU) 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 12:10:01 AM all 0.87 0.01 0.34 0.35 0.00 98.44 12:20:01 AM all 0.51 0.01 0.25 0.18 0.00 99.04 12:30:01 AM all 0.62 0.01 0.26 0.22 0.00 98.89 12:40:01 AM all 0.78 0.01 0.31 0.27 0.00 98.63 12:50:01 AM all 0.52 0.01 0.25 0.18 0.00 99.04 01:00:01 AM all 0.71 0.01 0.25 0.22 0.00 98.81 01:10:01 AM all 0.61 0.19 0.33 0.33 0.00 98.54 01:20:01 AM all 0.51 0.01 0.24 0.19 0.00 99.05 01:30:01 AM all 0.55 0.01 0.26 0.21 0.00 98.97 01:40:01 AM all 0.56 0.01 0.31 0.21 0.00 98.92 01:50:01 AM all 0.40 0.01 0.21 0.18 0.00 99.20 02:00:01 AM all 0.55 0.01 0.25 0.23 0.00 98.96 02:10:01 AM all 0.60 0.01 0.29 0.36 0.00 98.75 02:20:01 AM all 0.66 0.01 0.24 0.19 0.00 98.91 02:30:01 AM all 2.65 0.01 0.43 0.24 0.00 96.66 02:40:01 AM all 1.90 0.01 0.54 0.26 0.00 97.29 02:50:01 AM all 3.31 0.02 0.54 0.31 0.00 95.82 03:00:01 AM all 1.48 0.01 0.33 0.27 0.00 97.91 03:10:01 AM all 0.88 0.01 0.33 0.44 0.00 98.34 03:20:01 AM all 0.62 0.19 0.40 0.24 0.00 98.54 03:30:01 AM all 0.94 0.01 0.41 0.19 0.00 98.45 03:40:01 AM all 1.17 0.01 0.35 0.21 0.00 98.26 03:50:01 AM all 0.82 0.02 0.37 0.20 0.00 98.59 04:00:01 AM all 0.61 0.01 0.30 0.18 0.00 98.91 04:10:01 AM all 0.66 0.01 0.28 0.35 0.00 98.70 04:20:01 AM all 0.37 0.01 0.23 0.17 0.00 99.22 04:30:01 AM all 0.72 0.01 0.25 0.16 0.00 98.86 04:40:01 AM all 0.83 0.02 0.29 0.18 0.00 98.69 04:50:01 AM all 0.51 0.01 0.24 0.21 0.00 99.03 05:00:01 AM all 0.63 0.01 0.25 0.22 0.00 98.89 05:10:01 AM all 0.80 0.01 0.34 0.39 0.00 98.47 05:20:01 AM all 0.56 0.19 0.26 0.22 0.00 98.77 05:30:01 AM all 0.69 0.02 0.35 0.26 0.00 98.69 05:40:01 AM all 0.79 0.01 0.51 0.24 0.00 98.45 05:50:01 AM all 0.45 0.01 0.23 0.16 0.00 99.15 06:00:01 AM all 0.52 0.01 0.26 0.21 0.00 98.99 06:10:01 AM all 0.95 0.01 0.33 0.44 0.00 98.27 06:20:01 AM all 0.79 0.02 0.30 0.24 0.00 98.65 06:30:01 AM all 1.16 0.01 0.31 0.20 0.00 98.33 06:40:01 AM all 0.70 0.01 0.29 0.23 0.00 98.77 06:50:01 AM all 0.77 0.01 0.25 0.21 0.00 98.77 07:00:01 AM all 0.76 0.01 0.27 0.26 0.00 98.70 07:00:01 AM CPU %user %nice %system %iowait %steal %idle 07:10:01 AM all 0.68 0.20 0.32 0.40 0.00 98.40 07:20:01 AM all 1.03 0.01 0.37 0.21 0.00 98.38 07:30:01 AM all 0.67 0.01 0.25 0.19 0.00 98.89 07:40:01 AM all 0.77 0.01 0.31 0.25 0.00 98.66 07:50:01 AM all 1.09 0.01 0.30 0.33 0.00 98.27 08:00:01 AM all 1.27 0.02 0.36 0.23 0.00 98.13 08:10:01 AM all 0.70 0.01 0.29 0.37 0.00 98.64 08:20:01 AM all 0.54 0.01 0.24 0.19 0.00 99.03 08:30:01 AM all 0.73 0.01 0.27 0.27 0.00 98.73 08:40:01 AM all 0.67 0.01 0.28 0.27 0.00 98.77 08:50:01 AM all 0.48 0.02 0.23 0.16 0.00 99.11 09:00:01 AM all 0.52 0.01 0.24 0.21 0.00 99.02 09:10:01 AM all 0.63 0.18 0.32 0.34 0.00 98.52 09:20:01 AM all 0.86 0.01 0.31 0.23 0.00 98.60 09:30:01 AM all 0.84 0.01 0.28 0.29 0.00 98.57 09:40:01 AM all 1.36 0.02 0.34 0.27 0.00 98.01 09:50:01 AM all 1.12 0.01 0.31 0.26 0.00 98.29 10:00:01 AM all 0.49 0.01 0.25 0.20 0.00 99.05 10:10:01 AM all 0.55 0.01 0.26 0.34 0.00 98.84 10:20:01 AM all 0.61 0.01 0.27 0.23 0.00 98.89 10:30:01 AM all 0.76 0.02 0.28 0.28 0.00 98.66 10:40:01 AM all 0.60 0.01 0.30 0.25 0.00 98.84 10:50:01 AM all 0.71 0.01 0.37 0.27 0.00 98.65 11:00:01 AM all 0.58 0.01 0.35 0.25 0.00 98.81 11:10:01 AM all 1.03 0.21 0.44 0.43 0.00 97.89 11:20:01 AM all 0.74 0.02 0.27 0.26 0.00 98.72 11:30:01 AM all 0.78 0.01 0.27 0.29 0.00 98.66 11:40:01 AM all 0.79 0.01 0.29 0.20 0.00 98.70 11:50:01 AM all 0.90 0.01 0.55 0.54 0.00 98.00 12:00:01 PM all 0.84 0.01 0.53 0.73 0.00 97.89 12:10:01 PM all 0.92 0.02 0.90 1.50 0.00 96.66 12:20:01 PM all 0.87 0.01 0.87 1.44 0.00 96.81 12:30:01 PM all 0.89 0.01 0.86 1.42 0.00 96.82 12:40:01 PM all 0.88 0.01 0.86 1.31 0.00 96.93 Average: all 0.82 0.02 0.34 0.32 0.00 98.49 03:56:19 PM LINUX RESTART 04:00:01 PM CPU %user %nice %system %iowait %steal %idle 04:10:01 PM all 0.96 0.19 0.41 1.10 0.00 97.34 04:20:01 PM all 0.47 0.01 0.22 0.30 0.00 99.00 04:30:01 PM all 0.52 0.01 0.24 0.33 0.00 98.90 04:40:01 PM all 0.88 0.02 0.33 0.65 0.00 98.12 04:50:01 PM all 1.35 0.01 0.30 0.27 0.00 98.06 05:00:01 PM all 0.66 0.01 0.26 0.26 0.00 98.82 05:10:01 PM all 0.46 0.01 0.23 0.23 0.00 99.08 05:20:01 PM all 0.51 0.01 0.22 0.23 0.00 99.03 05:30:01 PM all 0.64 0.01 0.30 0.26 0.00 98.78 05:40:01 PM all 0.73 0.01 0.29 0.41 0.00 98.56 05:50:01 PM all 0.60 0.01 0.22 0.23 0.00 98.94 06:00:01 PM all 0.61 0.01 0.35 0.26 0.00 98.78 06:10:01 PM all 0.55 0.01 0.26 0.29 0.00 98.89 06:20:01 PM all 0.67 0.21 0.27 0.31 0.00 98.55 06:30:01 PM all 1.07 0.01 0.36 0.33 0.00 98.23 06:40:01 PM all 0.95 0.01 0.51 0.39 0.00 98.14 06:50:01 PM all 0.75 0.01 0.39 0.24 0.00 98.61 07:00:01 PM all 0.84 0.01 0.50 0.23 0.00 98.43 Average: all 0.73 0.03 0.31 0.35 0.00 98.57 root@la-noc [~]# 

更新我上传大量的video文件到我的服务器使用FTP 1.1Mbps的速度,是硬盘故障导致服务器死亡?

这是内核恐慌的输出(的底部); 有趣的一点是在顶部。 由于服务器已经重新启动,所以最好的办法是在/var/log/messages查找错误。

你有没有安装sysstat( sar命令)? 如果是这样,它可以给你非常有用的关于服务器负载,内存使用情况,磁盘IOPS等的历史信息。它不会给你一个明确的答案,但知道内核恐慌之前服务器正在做什么总是有帮助的。

如果你没有安装它,我会安装它以备将来使用。

可能有多种原因。 行为不端的硬件,行为不当的人,冷却等等。 没有完整的堆栈跟踪很难诊断。

我会说检查一些基本的东西 – *保持你的软件(包括内核)最新*确保只有linux兼容的硬件被使用*检查你的系统/ CPU温度。 检查您的RAID和硬盘是否与您的RAID控制器实用程序运行良好*启用内核核心转储(谷歌说明)*如果服务器可以脱机,运行一些压力testing – bonnie ++ / fio / iozone和通过sar捕获数据。

欢呼声,Chida