我怎样才能find哪个内存有CE错误?

/var/log/kern.log

 kernel: [13291329.657499] EDAC MC0: 48 CE error on CPU#0Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) 

这是edac日志,其中一个内存有错误。

我已经阅读了edac文档

 Dual channels allows for 128 bit data transfers to the CPU from memory. Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs). The following example will assume 2 channels: Channel 0 Channel 1 =================================== csrow0 | DIMM_A0 | DIMM_B0 | csrow1 | DIMM_A0 | DIMM_B0 | =================================== =================================== csrow2 | DIMM_A1 | DIMM_B1 | csrow3 | DIMM_A1 | DIMM_B1 | =================================== 

并find错误通道:

 $ grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count:144648966 /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow1/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow1/ch1_ce_count:0 

它应该是mc0/csrow0/ch2 ,作为doc,DIMM应该是DIMM_C0 ,可以通过dmidecodefind:

但是我找不到这个DIMM,所以我不知道哪个内存有问题:

 $ dmidecode -t memory | grep 'Locator: PROC' Locator: PROC 1 DIMM 2A Locator: PROC 1 DIMM 1D Locator: PROC 1 DIMM 4B Locator: PROC 1 DIMM 3E Locator: PROC 1 DIMM 6C Locator: PROC 1 DIMM 5F Locator: PROC 2 DIMM 2A Locator: PROC 2 DIMM 1D Locator: PROC 2 DIMM 4B Locator: PROC 2 DIMM 3E Locator: PROC 2 DIMM 6C Locator: PROC 2 DIMM 5F 

有12个插槽,9个插槽有内存。

那我怎么知道哪个内存有问题?


补充:

 System Information Manufacturer: HP Product Name: ProLiant DL180 G6 

问题DIMM可能是 – Locator: PROC 1 DIMM 5F

CPU#0通道#2_DIMM#0表示:

 PROC 1, 1D,2A = Channel 0 3E,4B = Channel 1 5F,6C = Channel 2 5F = DIMM 0 6C = DIMM 1 

编辑:

在提出问题时,更多的信息总是更好的…让服务器制造商和模型可以简化这一点:

以下是HP ProLiant DL180 G6 Quickspecs的内存图:

在这里输入图像描述

我build议CPU插槽#1中的DIMM是正确的…但这是HP硬件。 你不应该猜测!

您应该使用HP的pipe理代理,因为他们可以提醒并提供有关硬件健康和状态的特定于平台的详细信息…

 [root@veloce ~]# hpasmcli HP management CLI for Linux (v2.0) Copyright 2008 Hewlett-Packard Development Group, LP -------------------------------------------------------------------------- This server ProLiant DL180 G6 , is a Proliant 100 Series Server. NOTE: Some hpasmcli commands may not be supported on 100 series servers. Type 'help' to get a list of all top level commands. -------------------------------------------------------------------------- hpasmcli> show dimm Cartridge #: 0 Processor #: 1 Module #: 2 Present: Yes Form Factor: fh Memory Type: 5h Size: 4096 MB Speed: 1333 MHz Status: N/A Cartridge #: 0 Processor #: 1 Module #: 1 Present: Yes Form Factor: fh Memory Type: 5h Size: 4096 MB Speed: 1333 MHz Status: N/A Cartridge #: 0 Processor #: 1 Module #: 4 Present: Yes Form Factor: fh Memory Type: 5h Size: 4096 MB Speed: 1333 MHz Status: N/A Cartridge #: 0 Processor #: 1 Module #: 6 Present: Yes Form Factor: fh Memory Type: 5h Size: 4096 MB Speed: 1333 MHz Status: N/A