Debian服务器意外重启

我的实验室的Debian-Wheezy-7.8-Stable服务器在几小时的正常运行时间内几次重启,没有任何通知。 该服务器设置为相当高的负载数值计算以及并行计算。 我已经从var/log/messages打印日志, last reboot但是我发现很难理解这个日志消息。 我已经尝试在重启时间之前查看入口,并在var/log/messages同时查看,但似乎来自var/log/messages条目仅在重新启动后显示日志/消息。

我浏览了一下,发现有些人遇到了同样的问题,但是看起来原因是彼此不同的,而/var/log/messages似乎是解决问题的关键。 我的var/log/messages实际上描述了这个不需要的重启事件? 以及如何开始学习如何阅读这个日志为初学者? 我的意思是有任何重要的关键字要查找什么?

感谢您提供任何帮助。

last reboot

 reboot system boot 3.2.0-4-amd64 Wed May 20 03:29 - 12:43 (09:14) reboot system boot 3.2.0-4-amd64 Tue May 19 16:01 - 12:43 (20:42) 

var/log/messages

 May 18 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed May 19 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed May 19 16:01:19 labserver kernel: imklog 5.8.11, log source = /proc/kmsg started. May 19 16:01:19 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2401" x-info="http://www.rsyslog.com"] start May 19 16:01:19 labserver kernel: [ 0.000000] Initializing cgroup subsys cpuset May 19 16:01:19 labserver kernel: [ 0.000000] Initializing cgroup subsys cpu May 19 16:01:19 labserver kernel: [ 0.000000] Linux version 3.2.0-4-amd64 ([email protected]) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.65-1+deb7u2 May 19 16:01:19 labserver kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-provided physical RAM map: May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009ac00 (usable) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 000000007df71000 (usable) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007df71000 - 000000007e0f1000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007e0f1000 - 000000007e2ec000 (ACPI NVS) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007e2ec000 - 000000007f367000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007f367000 - 000000007f800000 (ACPI NVS) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000080000000 - 0000000090000000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000fed1c000 - 00000000fed40000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved) May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000100000000 - 0000000880000000 (usable) May 19 16:01:19 labserver kernel: [ 0.000000] NX (Execute Disable) protection: active May 19 16:01:19 labserver kernel: [ 0.000000] SMBIOS 2.7 present. May 19 16:01:19 labserver kernel: [ 0.000000] No AGP bridge found May 19 16:01:19 labserver kernel: [ 0.000000] last_pfn = 0x880000 max_arch_pfn = 0x400000000 May 19 16:01:19 labserver kernel: [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 May 19 16:01:19 labserver kernel: [ 0.000000] last_pfn = 0x7df71 max_arch_pfn = 0x400000000 May 19 16:01:19 labserver kernel: [ 0.000000] found SMP MP-table at [ffff8800000fd900] fd900 May 19 16:01:19 labserver kernel: [ 0.000000] Using GB pages for direct mapping May 19 16:01:19 labserver kernel: [ 0.000000] init_memory_mapping: 0000000000000000-000000007df71000 May 19 16:01:19 labserver kernel: [ 0.000000] init_memory_mapping: 0000000100000000-0000000880000000 May 19 16:01:19 labserver kernel: [ 0.000000] RAMDISK: 36bea000 - 375ed000 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: RSDP 00000000000f04a0 00024 (v02 ALASKA) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: XSDT 000000007e204088 0008C (v01 ALASKA AMI 01072009 AMI 00010013) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FACP 000000007e211040 0010C (v05 ALASKA AMI 01072009 AMI 00010013) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 268 to 244 (20110623/tbfadt-288) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: DSDT 000000007e2041a8 0CE96 (v02 ALASKA AMI 00000015 INTL 20051117) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FACS 000000007e2e3080 00040 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: APIC 000000007e211150 00100 (v03 ALASKA AMI 01072009 AMI 00010013) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FPDT 000000007e211250 00044 (v01 ALASKA AMI 01072009 AMI 00010013) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: MCFG 000000007e211298 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HPET 000000007e2112d8 00038 (v01 ALASKA AMI 01072009 AMI. 00000005) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: PRAD 000000007e211310 000BE (v02 PRADID PRADTID 00000001 MSFT 03000001) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: SPMI 000000007e2113d0 00040 (v05 AMI OEMSPMI 00000000 AMI. 00000000) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: SSDT 000000007e211410 D0CB0 (v02 INTEL CpuPm 00004000 INTL 20051117) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: EINJ 000000007e2e20c0 00130 (v01 AMI AMI EINJ 00000000 00000000) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: ERST 000000007e2e21f0 00230 (v01 AMIER AMI ERST 00000000 00000000) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HEST 000000007e2e2420 000A8 (v01 AMI AMI HEST 00000000 00000000) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: BERT 000000007e2e24c8 00030 (v01 AMI AMI BERT 00000000 00000000) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: DMAR 000000007e2e24f8 000C4 (v01 AMI OEMDMAR 00000001 INTL 00000001) May 19 16:01:19 labserver kernel: [ 0.000000] No NUMA configuration found May 19 16:01:19 labserver kernel: [ 0.000000] Faking a node at 0000000000000000-0000000880000000 May 19 16:01:19 labserver kernel: [ 0.000000] Initmem setup node 0 0000000000000000-0000000880000000 May 19 16:01:19 labserver kernel: [ 0.000000] NODE_DATA [000000087fffb000 - 000000087fffffff] May 19 16:01:19 labserver kernel: [ 0.000000] Zone PFN ranges: May 19 16:01:19 labserver kernel: [ 0.000000] DMA 0x00000010 -> 0x00001000 May 19 16:01:19 labserver kernel: [ 0.000000] DMA32 0x00001000 -> 0x00100000 May 19 16:01:19 labserver kernel: [ 0.000000] Normal 0x00100000 -> 0x00880000 May 19 16:01:19 labserver kernel: [ 0.000000] Movable zone start PFN for each node May 19 16:01:19 labserver kernel: [ 0.000000] early_node_map[3] active PFN ranges May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00000010 -> 0x0000009a May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00000100 -> 0x0007df71 May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00100000 -> 0x00880000 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: PM-Timer IO Port: 0x408 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1]) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0]) May 19 16:01:19 labserver kernel: [ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24]) May 19 16:01:19 labserver kernel: [ 0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47 May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) May 19 16:01:19 labserver kernel: [ 0.000000] Using ACPI (MADT) for SMP configuration information May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000 May 19 16:01:19 labserver kernel: [ 0.000000] SMP: Allowing 12 CPUs, 0 hotplug CPUs May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000000009a000 - 000000000009b000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000000009b000 - 00000000000a0000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007df71000 - 000000007e0f1000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007e0f1000 - 000000007e2ec000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007e2ec000 - 000000007f367000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007f367000 - 000000007f800000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed40000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000fed40000 - 00000000ff000000 May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000 May 19 16:01:19 labserver kernel: [ 0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000) May 19 16:01:19 labserver kernel: [ 0.000000] Booting paravirtualized kernel on bare hardware May 19 16:01:19 labserver kernel: [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:12 nr_node_ids:1 May 19 16:01:19 labserver kernel: [ 0.000000] PERCPU: Embedded 27 pages/cpu @ffff88087fc00000 s78848 r8192 d23552 u131072 May 19 16:01:19 labserver kernel: [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 8258294 May 19 16:01:19 labserver kernel: [ 0.000000] Policy zone: Normal May 19 16:01:19 labserver kernel: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text May 19 16:01:19 labserver kernel: [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) May 19 16:01:19 labserver kernel: [ 0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340 May 19 16:01:19 labserver kernel: [ 0.000000] Checking aperture... May 19 16:01:19 labserver kernel: [ 0.000000] No AGP bridge found May 19 16:01:19 labserver kernel: [ 0.000000] Memory: 32975732k/35651584k available (3434k kernel code, 2130964k absent, 544888k reserved, 3305k data, 576k init) May 19 16:01:19 labserver kernel: [ 0.000000] Hierarchical RCU implementation. May 19 16:01:19 labserver kernel: [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled. May 19 16:01:19 labserver kernel: [ 0.000000] NR_IRQS:33024 nr_irqs:1184 16 May 19 16:01:19 labserver kernel: [ 0.000000] Extended CMOS year: 2000 May 19 16:01:19 labserver kernel: [ 0.000000] Console: colour VGA+ 80x25 May 19 16:01:19 labserver kernel: [ 0.000000] console [tty0] enabled May 19 16:01:19 labserver kernel: [ 0.000000] Fast TSC calibration using PIT May 19 16:01:19 labserver kernel: [ 0.004000] Detected 2100.074 MHz processor. May 19 16:01:19 labserver kernel: [ 0.000003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4200.14 BogoMIPS (lpj=8400296) May 19 16:01:19 labserver kernel: [ 0.000144] pid_max: default: 32768 minimum: 301 May 19 16:01:19 labserver kernel: [ 0.000253] Security Framework initialized May 19 16:01:19 labserver kernel: [ 0.000324] AppArmor: AppArmor disabled by boot time parameter May 19 16:01:19 labserver kernel: [ 0.002355] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes) May 19 16:01:19 labserver kernel: [ 0.011585] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes) May 19 16:01:19 labserver kernel: [ 0.015724] Mount-cache hash table entries: 256 May 19 16:01:19 labserver kernel: [ 0.015915] Initializing cgroup subsys cpuacct May 19 16:01:19 labserver kernel: [ 0.015986] Initializing cgroup subsys memory May 19 16:01:19 labserver kernel: [ 0.016063] Initializing cgroup subsys devices May 19 16:01:19 labserver kernel: [ 0.016133] Initializing cgroup subsys freezer May 19 16:01:19 labserver kernel: [ 0.016201] Initializing cgroup subsys net_cls May 19 16:01:19 labserver kernel: [ 0.016270] Initializing cgroup subsys blkio May 19 16:01:19 labserver kernel: [ 0.016344] Initializing cgroup subsys perf_event May 19 16:01:19 labserver kernel: [ 0.016441] CPU: Physical Processor ID: 0 May 19 16:01:19 labserver kernel: [ 0.016509] CPU: Processor Core ID: 0 May 19 16:01:19 labserver kernel: [ 0.017564] mce: CPU supports 23 MCE banks May 19 16:01:19 labserver kernel: [ 0.017670] CPU0: Thermal monitoring enabled (TM1) May 19 16:01:19 labserver kernel: [ 0.017768] using mwait in idle threads. May 19 16:01:19 labserver kernel: [ 0.018315] ACPI: Core revision 20110623 May 19 16:01:19 labserver kernel: [ 0.049889] DMAR: Host address width 46 May 19 16:01:19 labserver kernel: [ 0.049958] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1 May 19 16:01:19 labserver kernel: [ 0.050034] IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de May 19 16:01:19 labserver kernel: [ 0.050122] DMAR: RMRR base: 0x0000007f239000 end: 0x0000007f247fff May 19 16:01:19 labserver kernel: [ 0.050195] DMAR: ATSR flags: 0x0 May 19 16:01:19 labserver kernel: [ 0.050261] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0 May 19 16:01:19 labserver kernel: [ 0.050427] IOAPIC id 0 under DRHD base 0xfbffc000 IOMMU 0 May 19 16:01:19 labserver kernel: [ 0.050497] IOAPIC id 2 under DRHD base 0xfbffc000 IOMMU 0 May 19 16:01:19 labserver kernel: [ 0.050568] HPET id 0 under DRHD base 0xfbffc000 May 19 16:01:19 labserver kernel: [ 0.050741] Enabled IRQ remapping in x2apic mode May 19 16:01:19 labserver kernel: [ 0.050810] Enabling x2apic May 19 16:01:19 labserver kernel: [ 0.050875] Enabled x2apic May 19 16:01:19 labserver kernel: [ 0.050943] Switched APIC routing to cluster x2apic. May 19 16:01:19 labserver kernel: [ 0.051552] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 May 19 16:01:19 labserver kernel: [ 0.091256] CPU0: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping 04 May 19 16:01:19 labserver kernel: [ 0.195570] Performance Events: PEBS fmt1+, generic architected perfmon, Intel PMU driver. May 19 16:01:19 labserver kernel: [ 0.195802] ... version: 3 May 19 16:01:19 labserver kernel: [ 0.195869] ... bit width: 48 May 19 16:01:19 labserver kernel: [ 0.195936] ... generic registers: 4 May 19 16:01:19 labserver kernel: [ 0.196003] ... value mask: 0000ffffffffffff May 19 16:01:19 labserver kernel: [ 0.196073] ... max period: 000000007fffffff May 19 16:01:19 labserver kernel: [ 0.196143] ... fixed-purpose events: 3 May 19 16:01:19 labserver kernel: [ 0.196210] ... event mask: 000000070000000f May 19 16:01:19 labserver kernel: [ 0.196468] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.196637] Booting Node 0, Processors #1 May 19 16:01:19 labserver kernel: [ 0.312587] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.312765] #2 May 19 16:01:19 labserver kernel: [ 0.424400] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.424578] #3 May 19 16:01:19 labserver kernel: [ 0.536316] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.536489] #4 May 19 16:01:19 labserver kernel: [ 0.648124] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.648303] #5 May 19 16:01:19 labserver kernel: [ 0.759941] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.760115] #6 May 19 16:01:19 labserver kernel: [ 0.871864] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.872050] #7 May 19 16:01:19 labserver kernel: [ 0.983690] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 0.983866] #8 May 19 16:01:19 labserver kernel: [ 1.095600] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 1.095774] #9 May 19 16:01:19 labserver kernel: [ 1.207414] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 1.207589] #10 May 19 16:01:19 labserver kernel: [ 1.319223] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 1.319400] #11 Ok. May 19 16:01:19 labserver kernel: [ 1.431095] NMI watchdog enabled, takes one hw-pmu counter. May 19 16:01:19 labserver kernel: [ 1.431192] Brought up 12 CPUs May 19 16:01:19 labserver kernel: [ 1.431260] Total of 12 processors activated (50398.84 BogoMIPS). May 19 16:01:19 labserver kernel: [ 1.450786] devtmpfs: initialized May 19 16:01:19 labserver kernel: [ 1.455360] PM: Registering ACPI NVS region at 7e0f1000 (2076672 bytes) May 19 16:01:19 labserver kernel: [ 1.455494] PM: Registering ACPI NVS region at 7f367000 (4820992 bytes) May 19 16:01:19 labserver kernel: [ 1.455843] print_constraints: dummy: May 19 16:01:19 labserver kernel: [ 1.455977] NET: Registered protocol family 16 May 19 16:01:19 labserver kernel: [ 1.456140] ACPI: bus type pci registered May 19 16:01:19 labserver kernel: [ 1.456268] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) May 19 16:01:19 labserver kernel: [ 1.456361] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820 May 19 16:01:19 labserver kernel: [ 1.466673] PCI: Using configuration type 1 for base access May 19 16:01:19 labserver kernel: [ 1.468173] bio: create slab <bio-0> at 0 May 19 16:01:19 labserver kernel: [ 1.468353] ACPI: Added _OSI(Module Device) May 19 16:01:19 labserver kernel: [ 1.468422] ACPI: Added _OSI(Processor Device) May 19 16:01:19 labserver kernel: [ 1.468491] ACPI: Added _OSI(3.0 _SCP Extensions) May 19 16:01:19 labserver kernel: [ 1.468560] ACPI: Added _OSI(Processor Aggregator Device) May 19 16:01:19 labserver kernel: [ 1.484562] ACPI: Executed 1 blocks of module-level executable AML code May 19 16:01:19 labserver kernel: [ 1.727818] ACPI: Interpreter enabled May 19 16:01:19 labserver kernel: [ 1.727891] ACPI: (supports S0 S1 S4 S5) May 19 16:01:19 labserver kernel: [ 1.728159] ACPI: Using IOAPIC for interrupt routing May 19 16:01:19 labserver kernel: [ 1.736531] ACPI: No dock devices found. May 19 16:01:19 labserver kernel: [ 1.736630] HEST: Table parsing has been initialized. May 19 16:01:19 labserver kernel: [ 1.736704] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug May 19 16:01:19 labserver kernel: [ 1.737041] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) May 19 16:01:19 labserver kernel: [ 1.737361] pci_root PNP0A08:00: host bridge window [io 0x0000-0x03af] May 19 16:01:19 labserver kernel: [ 1.737435] pci_root PNP0A08:00: host bridge window [io 0x03e0-0x0cf7] May 19 16:01:19 labserver kernel: [ 1.737508] pci_root PNP0A08:00: host bridge window [io 0x03b0-0x03df] May 19 16:01:19 labserver kernel: [ 1.737586] pci_root PNP0A08:00: host bridge window [io 0x0d00-0xffff] May 19 16:01:19 labserver kernel: [ 1.737659] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff] May 19 16:01:19 labserver kernel: [ 1.737747] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff] May 19 16:01:19 labserver kernel: [ 1.737834] pci_root PNP0A08:00: host bridge window [mem 0xfed0e000-0xfed0ffff] May 19 16:01:19 labserver kernel: [ 1.737922] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xfbffffff] May 19 16:01:19 labserver kernel: [ 1.740791] pci 0000:00:01.0: PCI bridge to [bus 01-01] May 19 16:01:19 labserver kernel: [ 1.745575] pci 0000:00:01.1: PCI bridge to [bus 02-03] May 19 16:01:19 labserver kernel: [ 1.745700] pci 0000:00:02.0: PCI bridge to [bus 04-04] May 19 16:01:19 labserver kernel: [ 1.745816] pci 0000:00:03.0: PCI bridge to [bus 05-05] May 19 16:01:19 labserver kernel: [ 1.745933] pci 0000:00:03.2: PCI bridge to [bus 06-06] May 19 16:01:19 labserver kernel: [ 1.746285] pci 0000:00:11.0: PCI bridge to [bus 07-07] May 19 16:01:19 labserver kernel: [ 1.746541] pci 0000:00:1e.0: PCI bridge to [bus 08-08] (subtractive decode) May 19 16:01:19 labserver kernel: [ 1.747170] pci0000:00: Requesting ACPI _OSC control (0x1d) May 19 16:01:19 labserver kernel: [ 1.747465] pci0000:00: ACPI _OSC control (0x15) granted May 19 16:01:19 labserver kernel: [ 1.756901] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus ff]) May 19 16:01:19 labserver kernel: [ 1.758443] pci0000:ff: Requesting ACPI _OSC control (0x1d) May 19 16:01:19 labserver kernel: [ 1.758528] pci0000:ff: ACPI _OSC control (0x1d) granted May 19 16:01:19 labserver kernel: [ 1.759439] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15) May 19 16:01:19 labserver kernel: [ 1.760105] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15) May 19 16:01:19 labserver kernel: [ 1.760768] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 10 11 12 14 15) May 19 16:01:19 labserver kernel: [ 1.761383] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15) May 19 16:01:19 labserver kernel: [ 1.762006] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0 May 19 16:01:19 labserver kernel: [ 1.762729] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0 May 19 16:01:19 labserver kernel: [ 1.763450] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0 May 19 16:01:19 labserver kernel: [ 1.764170] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15) 

您需要提供更多信息,特别是在系统重新引导之前的日志条目。 但据我所知,可能无法提供更多信息。 检查其他日志,如系统日志。

根据我的经验,突然重新启动的最常见的原因通常是与硬件有关,而没有任何指示真正出错的地方。 否则内核大多会有机会在日志中写点东西给出线索。

突然重启的一些常见原因:

  • 过热 ,可能是主要原因,了解温度,尝试logging,服务器是否有显示温度的显示,房间是否正确冷却。 也许更换覆盖CPU的散热器上的散热器。

  • 坏硬件或驱动程序 ,例如,使用“lspci”得到一个列表,一个坏dimm可能会导致系统突然挂起和/或重新启动(重新调整dimms,CPU和卡)。 我记得有一个服务器偶尔会因为intel以太网卡的问题而重启。 有时坏磁盘也可能导致这样的问题,但通常它只会导致它挂起而不是重新启动。

  • 一个糟糕的UPS ,我记得有一个电池供电的UPS慢慢地坏了,其中一个指标是每周定期连接的服务器。 您可能只是有一个错误configuration的电源循环时间表。