我有一个问题,我真的很难debugging。 通过系统“hiccuped”运行ZFS,将一些信息转储到DMESG中,并继续工作。
我的ZFS在ESXi上托pipe虚拟机。 发生此问题时,许多虚拟机遇到块IO错误,其中一些虚拟机会进入只读模式,需要从备份或fsck进行恢复以修复文件系统。 这个问题只是偶尔发生,而且我已经敲定了系统,试图强调它,似乎没有与性能有关。 每隔几个月才会发生,所以最终解决这个问题对我来说似乎是一个梦想。
首先,关于我的系统(Centos 7,4.5)的一些信息。
[root@zfs-head ~]# name -a Linux zfs-head 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
dmesg条目:
[4331253.022999] sd 2:0:28:0: [sdaa] tag#2 CDB: Read(10) 28 00 10 a8 3d b5 00 00 20 00 [4331253.023006] mpt3sas_cm0: sas_address(0x5000c500837f31f2), phy(8) [4331253.023008] mpt3sas_cm0: enclosure_logical_id(0x50010c60004d41ff),slot(0) [4331253.023010] mpt3sas_cm0: enclosure level(0x0003), connector name( ) [4331253.023013] mpt3sas_cm0: handle(0x002d), ioc_status(scsi data underrun)(0x0045), smid(222) [4331253.023016] mpt3sas_cm0: request_len(131072), underflow(16384), resid(131072) [4331253.023018] mpt3sas_cm0: tag(0), transfer_count(0), sc->result(0x00000000) [4331253.023020] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [4331253.023023] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x2a,0x01], count(96) [4331253.023030] sd 2:0:28:0: Mode parameters changed [4331266.475222] sd 2:0:29:0: [sdab] tag#29 CDB: Write(10) 2a 00 09 97 6e c1 00 00 02 00 [4331266.475229] mpt3sas_cm0: sas_address(0x5000c500837f25c6), phy(9) [4331266.475232] mpt3sas_cm0: enclosure_logical_id(0x50010c60004d41ff),slot(1) [4331266.475234] mpt3sas_cm0: enclosure level(0x0003), connector name( ) [4331266.475237] mpt3sas_cm0: handle(0x002e), ioc_status(scsi data underrun)(0x0045), smid(139) [4331266.475239] mpt3sas_cm0: request_len(8192), underflow(1024), resid(8192) [4331266.475241] mpt3sas_cm0: tag(0), transfer_count(0), sc->result(0x00000000) [4331266.475244] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [4331266.475246] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x2a,0x01], count(96) [4331266.475252] sd 2:0:29:0: Mode parameters changed
游泳池状态:
[root@zfs-head ~]# pool status pool: storage state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 s1d1 ONLINE 0 0 0 s2d1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 s3d1 ONLINE 0 0 0 s4d1 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 s1d2 ONLINE 0 0 0 s2d2 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 s3d2 ONLINE 0 0 0 s4d2 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 s1d3 ONLINE 0 0 0 s2d3 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 s3d3 ONLINE 0 0 0 s4d3 ONLINE 0 0 0 logs ata-Samsung_SSD_850_PRO_128GB_S24ZNXAGA10768M ONLINE 0 0 0 cache ata-Samsung_SSD_850_EVO_250GB_S21NNXAG918721R ONLINE 0 0 0 ata-Samsung_SSD_850_EVO_250GB_S21NNXAGA59337A ONLINE 0 0 0 ata-Samsung_SSD_850_EVO_250GB_S21NNXAGA69590F ONLINE 0 0 0 errors: No known data errors [root@zfs-head ~]#
我的Vdev地图:
[root@zfs-head ~]# cat /etc/zfs/vdev_id.conf # by-vdev # name fully qualified or base name of device link alias s1d1 /dev/disk/by-id/scsi-35000c500837ff247 alias s1d2 /dev/disk/by-id/scsi-35000c500837f15c3 alias s1d3 /dev/disk/by-id/scsi-35000c500837f137f alias s2d1 /dev/disk/by-id/scsi-35000c500837f377b alias s2d2 /dev/disk/by-id/scsi-35000c500837f5bf7 alias s2d3 /dev/disk/by-id/scsi-35000c500837f75bf alias s3d1 /dev/disk/by-id/scsi-35000c500837f14d3 alias s3d2 /dev/disk/by-id/scsi-35000c500837f571b alias s3d3 /dev/disk/by-id/scsi-35000c500837f604f alias s4d1 /dev/disk/by-id/scsi-35000c500837f31f3 alias s4d2 /dev/disk/by-id/scsi-35000c500837f25c7 alias s4d3 /dev/disk/by-id/scsi-35000c500837f14cf [root@zfs-head ~]#
该盒子没有重启,或者甚至真的承认有一个问题,除了dmesg条目。 我已经把这些条目search到最好,但没有find任何相关的东西。
帮助赞赏!