ZFS在清理时一直在修复相同的磁盘 – 这是什么意思？

我有一个由6个磁盘组成的ZFS池，排列成3个镜像。该服务器是一个带Xeon CPU的Supermicro X11SSM-F，32 GB运行Ubuntu 17.04的ECC内存。我使用2个Icy Dock MB154SP-B物理地托pipe磁盘，主板上有8个SATA3接口，所以磁盘直接提供给ZFS（中间没有RAID卡）。

这个设置运行良好，直到最近。然后我注意到在运行zpool status ，最后的擦洗已经修复了一些数据：

 $ sudo zpool status pool: cloudpool state: ONLINE scan: scrub repaired 2.98M in 4h56m with 0 errors on Sun Jul 9 05:20:16 2017 config: NAME STATE READ WRITE CKSUM cloudpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0 ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0 errors: No known data errors

好奇，我决定运行一个新的磨合：

 $ sudo zpool scrub cloudpool ... giving it a few minutes to run ... $ sudo zpool status pool: cloudpool state: ONLINE scan: scrub in progress since Tue Jul 11 22:55:12 2017 124M scanned out of 4.52T at 4.59M/s, 286h55m to go 256K repaired, 0.00% done config: NAME STATE READ WRITE CKSUM cloudpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 (repairing) ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0 ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0 errors: No known data errors

完成后，我得到以下结果：

 $ sudo zpool status pool: cloudpool state: ONLINE scan: scrub repaired 624K in 4h35m with 0 errors on Wed Jul 12 03:31:00 2017 config: NAME STATE READ WRITE CKSUM cloudpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0 ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0 errors: No known data errors

然后我决定再次开始擦洗池。让它运行一下后，我得到以下内容：

 $ sudo zpool status pool: cloudpool state: ONLINE scan: scrub in progress since Wed Jul 12 09:55:19 2017 941G scanned out of 4.52T at 282M/s, 3h42m to go 112K repaired, 20.34% done config: NAME STATE READ WRITE CKSUM cloudpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 (repairing) ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0 ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0 errors: No known data errors

在查看磁盘的SMART数据时，我没有看到任何可疑（除了可能的Raw_Read_Error_Rate ？）：

 smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-26-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4E5NFLRU3 LU WWN Device Id: 5 0014ee 262ee543f Firmware Version: 82.00A82 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Jul 12 10:19:08 2017 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (52020) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 520) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 99 3 Spin_Up_Time 0x0027 186 176 021 Pre-fail Always - 7683 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6735 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5 193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7500 194 Temperature_Celsius 0x0022 110 108 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 13 - # 2 Conveyance offline Completed without error 00% 1 - # 3 Short offline Completed without error 00% 1 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

不过，我确实在dmesg的输出中看到了一些奇怪的消息：

 [100240.777601] ata2.00: exception Emask 0x0 SAct 0x3000000 SErr 0x0 action 0x0 [100240.777608] ata2.00: irq_stat 0x40000008 [100240.777614] ata2.00: failed command: READ FPDMA QUEUED [100240.777624] ata2.00: cmd 60/00:c0:c8:bc:01/01:00:00:00:00/40 tag 24 ncq dma 131072 in res 41/40:00:a8:bd:01/00:00:00:00:00/40 Emask 0x409 (media error) <F> [100240.777628] ata2.00: status: { DRDY ERR } [100240.777631] ata2.00: error: { UNC } [100240.779320] ata2.00: configured for UDMA/133 [100240.779342] sd 1:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [100240.779346] sd 1:0:0:0: [sdb] tag#24 Sense Key : Medium Error [current] [100240.779350] sd 1:0:0:0: [sdb] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed [100240.779354] sd 1:0:0:0: [sdb] tag#24 CDB: Read(16) 88 00 00 00 00 00 00 01 bc c8 00 00 01 00 00 00 [100240.779357] blk_update_request: I/O error, dev sdb, sector 114088 [100240.779384] ata2: EH complete [100244.165785] ata2.00: exception Emask 0x0 SAct 0x3d SErr 0x0 action 0x0 [100244.165793] ata2.00: irq_stat 0x40000008 [100244.165798] ata2.00: failed command: READ FPDMA QUEUED [100244.165807] ata2.00: cmd 60/00:00:c8:be:01/01:00:00:00:00/40 tag 0 ncq dma 131072 in res 41/40:00:70:bf:01/00:00:00:00:00/40 Emask 0x409 (media error) <F> [100244.165811] ata2.00: status: { DRDY ERR } [100244.165814] ata2.00: error: { UNC } [100244.167465] ata2.00: configured for UDMA/133 [100244.167488] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [100244.167492] sd 1:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current] [100244.167496] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed [100244.167500] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 01 be c8 00 00 01 00 00 00 [100244.167503] blk_update_request: I/O error, dev sdb, sector 114544 [100244.167531] ata2: EH complete [100248.177949] ata2.00: exception Emask 0x0 SAct 0x41c00002 SErr 0x0 action 0x0 [100248.177957] ata2.00: irq_stat 0x40000008 [100248.177963] ata2.00: failed command: READ FPDMA QUEUED [100248.177972] ata2.00: cmd 60/00:f0:c8:c0:01/01:00:00:00:00/40 tag 30 ncq dma 131072 in res 41/40:00:b8:c1:01/00:00:00:00:00/40 Emask 0x409 (media error) <F> [100248.177977] ata2.00: status: { DRDY ERR } [100248.177980] ata2.00: error: { UNC } [100248.179638] ata2.00: configured for UDMA/133 [100248.179667] sd 1:0:0:0: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [100248.179671] sd 1:0:0:0: [sdb] tag#30 Sense Key : Medium Error [current] [100248.179675] sd 1:0:0:0: [sdb] tag#30 Add. Sense: Unrecovered read error - auto reallocate failed [100248.179679] sd 1:0:0:0: [sdb] tag#30 CDB: Read(16) 88 00 00 00 00 00 00 01 c0 c8 00 00 01 00 00 00 [100248.179682] blk_update_request: I/O error, dev sdb, sector 115128 [100248.179705] ata2: EH complete ...

通过dmesg清理，我在日志中看到了31个这样的实例：

 [100240.779357] blk_update_request: I/O error, dev sdb, sector 114088 [100244.167503] blk_update_request: I/O error, dev sdb, sector 114544 [100248.179682] blk_update_request: I/O error, dev sdb, sector 115128 [100251.599649] blk_update_request: I/O error, dev sdb, sector 115272 [100255.812020] blk_update_request: I/O error, dev sdb, sector 115576 [100259.636088] blk_update_request: I/O error, dev sdb, sector 115768 [100263.400169] blk_update_request: I/O error, dev sdb, sector 116000 [100267.912099] blk_update_request: I/O error, dev sdb, sector 116472 [100271.300223] blk_update_request: I/O error, dev sdb, sector 116680 [100274.732989] blk_update_request: I/O error, dev sdb, sector 117000 [100279.665331] blk_update_request: I/O error, dev sdb, sector 118624 [100283.043738] blk_update_request: I/O error, dev sdb, sector 118768 [100286.456260] blk_update_request: I/O error, dev sdb, sector 119072 [100293.472354] blk_update_request: I/O error, dev sdb, sector 7814018576 [100298.443416] blk_update_request: I/O error, dev sdb, sector 119496 [100302.236908] blk_update_request: I/O error, dev sdb, sector 119968 [100305.655675] blk_update_request: I/O error, dev sdb, sector 120032 [100309.450754] blk_update_request: I/O error, dev sdb, sector 120496 [100313.724792] blk_update_request: I/O error, dev sdb, sector 121512 [100324.782008] blk_update_request: I/O error, dev sdb, sector 186032 [100329.002031] blk_update_request: I/O error, dev sdb, sector 189536 [100333.057101] blk_update_request: I/O error, dev sdb, sector 189680 [100336.476953] blk_update_request: I/O error, dev sdb, sector 189888 [100341.133527] blk_update_request: I/O error, dev sdb, sector 190408 [100349.890540] blk_update_request: I/O error, dev sdb, sector 191824 [353944.190625] blk_update_request: I/O error, dev sdb, sector 115480 [353951.660635] blk_update_request: I/O error, dev sdb, sector 116536 [353959.391011] blk_update_request: I/O error, dev sdb, sector 118976 [353966.811863] blk_update_request: I/O error, dev sdb, sector 120176 [353978.447354] blk_update_request: I/O error, dev sdb, sector 189984 [393732.681767] blk_update_request: I/O error, dev sdb, sector 190000

我不太确定该怎么做：

为什么清理过程不断修复同一磁盘上的数据？修复的数据量正在下降，这似乎表明数据正在“持久”修复，但是为什么仍然有数据要在逐个擦洗几个小时后才能修复？
为什么在zpool status我看不到任何读/写/校验和错误的迹象，尽pipeZFS在每个scrub上find了一些数据要纠正的事实？
为什么我在磁盘上看到URE，但在SMART报告中没有任何可疑内容？
auto reallocate failed是什么意思？磁盘已经用完了更换块吗？这是一个在过去约6个月里一直运行顺畅的系统，所以我预料这个磁盘出现问题的时间会更早。

更实际的是，这个特殊的磁盘是什么意思？它需要更换吗？

编辑＃1最新的磨砂后，我现在得到以下内容：

 $ sudo zpool status pool: cloudpool state: ONLINE scan: scrub repaired 0 in 4h35m with 0 errors on Wed Jul 12 21:44:41 2017 config: NAME STATE READ WRITE CKSUM cloudpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0 ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0 ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0 errors: No known data errors

Smartctl现在说：

 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 122 3 Spin_Up_Time 0x0027 186 176 021 Pre-fail Always - 7683 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6749 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5 193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7507 194 Temperature_Celsius 0x0022 114 108 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged

所以…我很好？刚才那是干什么啊？