硬件RAID1configuration – 可能在一个物理磁盘上出现故障扇区。操作系统是否自动从其他磁盘读取？

我不是一个专业的系统pipe理员，但是因为经过一段时间的研究，我无法find答案，所以希望能在这里得到一些帮助。我们的服务器使用P222 – 一个RAID1configuration的HP智能控制器arrays。我相信其中一个物理硬盘上的某些扇区已经失败。我使用了hpacucli工具，输出如下所示： –

  $ hpacucli ctrl all show config Smart Array P222 in Slot 1 (sn: PDSXH0ARH5I0SW) array A (SATA, Unused Space: 0 MB) logicaldrive 1 (2.7 TB, RAID 1, Ready for Rebuild) physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA, 3 TB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA, 3 TB, Predictive Failure)

我再次运行相同的工具几次来检查状态，有一次我注意到“预测性失败”被“重build1％”所取代，后来增加到2％。我不认为我做了什么来启动重build。无论如何，我让它运行并检查一段时间之后的状态，在这一点上它回到了“预测性失败”。

在运行smartctl长短testing – 自测日志报告： –

 === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 14368 334201968 # 2 Short offline Completed: read failure 90% 14367 625082211

我们正在这台服务器上运行一个MySQL实例，并且它始终无法开始抱怨读取错误，这表明它可能是由于硬盘故障/坏扇区，因此上面使用的工具。我有几个问题：

我不知道，但看起来像其中一个硬盘部分失败。在这种情况下，操作系统（Ubuntu 12.04）不应该从镜像硬盘读取数据吗？（这将意味着MySQL应该继续运行）
我按照http://sg.danny.cz/scsi/badblockhowto.html中的步骤操作。 LBA 334201968（长testing读取失败的LBA）对应于MySQL的数据文件。但我不想覆盖这个文件的任何部分，因为我不确定MySQL是否会永久地将文件视为已损坏。什么是我最好的select来“修复”损坏的磁盘部分？

高兴地报告任何可能需要诊断/修复这些细节

编辑1：按要求的MySQL错误日志像这样： –

 150824 10:27:00 InnoDB: Completed initialization of buffer pool 150824 10:27:00 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 150824 10:27:00 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 150824 10:27:00 InnoDB: Waiting for the background threads to start 150824 10:27:01 InnoDB: 5.5.35 started; log sequence number 2723867081864 150824 10:27:01 [Note] Server hostname (bind-address): <ip and port here>; 150824 10:27:01 [Note] - <ip here> resolves to <ip here>; 150824 10:27:01 [Note] Server socket created on IP: <ip here>. InnoDB: Error: tried to read 16384 bytes at offset 70 1898921984. InnoDB: Was only able to read -1. 150824 10:27:01 InnoDB: Operating system error number 5 in a file operation. InnoDB: Error number 5 means 'Input/output error'. InnoDB: Some operating system error numbers are described at InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html InnoDB: File operation call: 'read'. InnoDB: Cannot continue operation.

编辑2：基于评论https://serverfault.com/a/716471/306555 ，我打开了更换磁盘，并取而代之的RAID和重buildRAID。 hpacucli的输出如下所示： –

  physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA, 3 TB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA, 3 TB, OK)

所以预测失败已经消失。然而，MySQL一直给我读取错误，所以我再次运行smartctl多空testing。虽然短暂的testing通过，长期失败与读取错误： –

 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 14393 625116232 # 2 Short offline Completed without error 00% 14392 -

我也检查了系统日志，并注意到每次MySQL尝试启动时，系统日志中都有这个错误

 Aug 25 14:23:41 kernel: [ 1603.911185] sd 6:0:0:1: [sda] Unhandled sense code Aug 25 14:23:41 kernel: [ 1603.911186] sd 6:0:0:1: [sda] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Aug 25 14:23:41 kernel: [ 1603.911188] sd 6:0:0:1: [sda] Sense Key : Medium Error [current] Aug 25 14:23:41 kernel: [ 1603.911190] sd 6:0:0:1: [sda] Add. Sense: Unrecovered read error Aug 25 14:23:41 kernel: [ 1603.911192] sd 6:0:0:1: [sda] CDB: Read(10): 28 00 46 a2 d5 a0 00 00 08 00

那会表明什么？（看起来像在磁盘上的坏扇区？）如果是这样的话，有没有办法解决这个问题？

好的。这是一个很长的问题，但会得到一个简短的答案：

如果看到“预测失败”或“失败”，请更换磁盘。

这两个条件对于build立惠普的支持票和/或保修部件更换都是有效的。

“预测性故障”包含SMART数据以及其他启发式信息以确定驾驶健康状况。但具体应该不是真的重要。 计划更换驱动器。

您在应用程序级别看到的影响是另一个迹象，表明正确的行为是更换磁盘。这很容易做…即使它是一个SATA驱动器，它是惠普的一部分，所以它有一个保修期（可能1年，但它与您的服务器的序列号绑定） 。

致电HP …

是的，如果RAID中的一个驱动器发生故障，RAID控制器将其标记为失败，并且将与其他健康的硬盘驱动器一起读取。
预测失败意味着，该磁盘仍然工作，但控制通知你，那么很快就会失败。如果您在testing中收到读取错误，则应该用另一个更换驱动器。只需在本地商店/供应商支持处购买备用驱动器，安装即可，RAID控制器将重buildarrays至健康状态。

您使用惠普硬盘吗？还是正常的消费者驱动驱动器是否有时间有限的错误恢复？

如果不是这样，驱动器可能会在试图读取坏扇区时locking控制器。驱动器放弃需要很长时间，所以读取失败。 RAID控制器没有机会尝试另一个驱动器，因为它等待第一个驱动器判断失败。

这种行为也会导致驱动器暂时掉出来，这将解释重build。

这应该仅适用于使用非HP驱动器 – 支持的驱动器编程为快速放弃并让RAID控制器处理错误。

硬件RAID1configuration – 可能在一个物理磁盘上出现故障扇区。 操作系统是否自动从其他磁盘读取？

硬件RAID1configuration – 可能在一个物理磁盘上出现故障扇区。操作系统是否自动从其他磁盘读取？