我正在testing设置一个具有DRBD存储的Xen DomU,以实现轻松故障切换。 在大多数情况下,在DomU启动后立即出现IO错误:
[ 3.153370] EXT3-fs (xvda2): using internal journal [ 3.277115] ip_tables: (C) 2000-2006 Netfilter Core Team [ 3.336014] nf_conntrack version 0.5.0 (3899 buckets, 15596 max) [ 3.515604] init: failsafe main process (397) killed by TERM signal [ 3.801589] blkfront: barrier: write xvda2 op failed [ 3.801597] blkfront: xvda2: barrier or flush: disabled [ 3.801611] end_request: I/O error, dev xvda2, sector 52171168 [ 3.801630] end_request: I/O error, dev xvda2, sector 52171168 [ 3.801642] Buffer I/O error on device xvda2, logical block 6521396 [ 3.801652] lost page write due to I/O error on xvda2 [ 3.801755] Aborting journal on device xvda2. [ 3.804415] EXT3-fs (xvda2): error: ext3_journal_start_sb: Detected aborted journal [ 3.804434] EXT3-fs (xvda2): error: remounting filesystem read-only [ 3.814754] journal commit I/O error [ 6.973831] init: udev-fallback-graphics main process (538) terminated with status 1 [ 6.992267] init: plymouth-splash main process (546) terminated with status 1
drbdsetup的联机帮助页说LVM(我使用的)不支持障碍(更好地称为tagged command queuing或native command queing ),所以我configuration了drbd设备不要使用障碍。 这可以在/proc/drbd看到(通过“ wo:f ,意味着flush,下一个方法drbd在屏障之后select):
3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:2160152 nr:520204 dw:2680344 dr:2678107 al:3549 bm:9183 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
而在另一个主机上:
3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---- ns:0 nr:2160152 dw:2160152 dr:0 al:0 bm:8052 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
根据drbd文档,我还启用了disable_sendpage选项:
cat /sys/module/drbd/parameters/disable_sendpage Y
我也尝试添加barrier = 0作为挂载选项fstab。 它有时还说:
[ 58.603896] blkfront: barrier: write xvda2 op failed [ 58.603903] blkfront: xvda2: barrier or flush: disabled
我甚至不知道ext3是否有一个非常好的select。 而且,因为我的存储系统中只有一个是电池支持的,所以它不会很聪明。
为什么我禁用这个function后仍然会屏障?
这两个主机是:
Debian: 6.0.4 uname -a: Linux 2.6.32-5-xen-amd64 drbd: 8.3.7 Xen: 4.0.1
客人:
Ubuntu 12.04 LTS uname -a: Linux 3.2.0-24-generic pvops
drbd资源:
resource drbdvm { meta-disk internal; device /dev/drbd3; startup { # The timeout value when the last known state of the other side was available. 0 means infinite. wfc-timeout 0; # Timeout value when the last known state was disconnected. 0 means infinite. degr-wfc-timeout 180; } syncer { # This is recommended only for low-bandwidth lines, to only send those # blocks which really have changed. #csums-alg md5; # Set to about half your net speed rate 60M; # It seems that this option moved to the 'net' section in drbd 8.4. (later release than Debian has currently) verify-alg md5; } net { # The manpage says this is recommended only in pre-production (because of its performance), to determine # if your LAN card has a TCP checksum offloading bug. #data-integrity-alg md5; } disk { # Detach causes the device to work over-the-network-only after the # underlying disk fails. Detach is not default for historical reasons, but is # recommended by the docs. # However, the Debian defaults in drbd.conf suggest the machine will reboot in that event... on-io-error detach; # LVM doesn't support barriers, so disabling it. It will revert to flush. Check wo: in /proc/drbd. If you don't disable it, you get IO errors. no-disk-barrier; } on host1 { # universe is a VG disk /dev/universe/drbdvm-disk; address 10.0.0.1:7792; } on host2 { # universe is a VG disk /dev/universe/drbdvm-disk; address 10.0.0.2:7792; } }
DomU cfg:
bootloader = '/usr/lib/xen-default/bin/pygrub' vcpus = '2' memory = '512' # # Disk device(s). # root = '/dev/xvda2 ro' disk = [ 'phy:/dev/drbd3,xvda2,w', 'phy:/dev/universe/drbdvm-swap,xvda1,w', ] # # Hostname # name = 'drbdvm' # # Networking # # fake IP for posting vif = [ 'ip=1.2.3.4,mac=00:16:3E:22:A8:A7' ] # # Behaviour # on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart'
在我的testing设置中:主要主机的存储是带有电池的9650SE SATA-II RAID PCIe。 辅助是软件RAID1。
DRBD + Xen是否被广泛使用? 有了这些问题,这是不行的。
编辑:障碍实际上是一个已知的问题( 这里和这里 )。 我还没有真正看到解决scheme。
我不知道这是否会改变任何内容,但是您也可以在您的DomUconfiguration中指定DRBD卷,如下所示:
disk = [ 'drbd:drbdvm,xvda2,w' ... ]
这样,在创buildDomU后,Xen会自动将当前节点作为指定资源的主节点(除非该资源已被第二台机器使用)。 而且,当DomU被破坏时,资源将被释放。
我有许多DRBD对像这样运行,从来没有看到你发布的错误。
向DomUconfiguration添加extra = " barrier=off" 。 注意屏障前的空间。
还要在DomU的/ etc / fstab中添加相应的barrier / off选项(根据文件系统挂载选项)。
更新:
屏障/closures选项是确保屏障closures的第二个措施。
至于障碍操作:正如你在启动过程中看到的,这些操作失败了。 所以把它关掉不会让事情变得更糟。 除此之外,这些障碍只有在启用了写caching的硬盘上才有意义,而且在电源故障时不会写回磁盘。
服务器应该有一个电池支持的UPS以及一个电池支持的RAID控制器。 因此,转换障碍只会降低成本(即使这样做会起作用)。