上下文:我有一个proxmox集群,其卷组与drbd共享的VM-KVM。
我有一个drbd的问题。 事实上,当一个节点断开连接时,不同节点之间的关联链路被中断。 影响是:
我停止正在运行的虚拟机(因为:错误:模块drbd正在使用)我禁用vg(与vgchange -an)我重新启动drbd服务和重新同步运行
configuration是:
/etc/drbd.conf:
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res";
Global_common.conf:
global { usage-count no; } common { protocol C; startup { degr-wfc-timeout 120; # become-primary-on proxmox001; become-primary-on both; } disk { } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { verify-alg md5; rate 30M; } }
R0.res:
resource r0 { protocol C; on proxmox001 { device /dev/drbd0; disk /dev/mapper/pve-lv_data; address 192.168.0.1:7788; meta-disk internal; } on proxmox002 { device /dev/drbd0; disk /dev/mapper/pve-lv_data; address 192.168.0.2:7788; meta-disk internal; } }
当物理主机不稳定时,我丢失了drbd链接。
在正常的时候,/ proc / drbd:
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@proxmox001, 2013-04-24 12:55:32 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:421142 nr:55715 dw:8498959 dr:10994034 al:1144 bm:420 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
而一个不稳定的,/ proc / drbd返回:
cs:Standalone or cs:WFConnection st:Secondary/Unknown
而当我重新启动DRBD同步,我有一个错误UpToDate或
ERROR: Module drbd is in use proxmox
我尝试不同的testing,然后谷歌build议:
root@proxmox001:~# vgscan Reading all physical volumes. This may take a while... Found volume group "pve" using metadata type lvm2 Found volume group "drbdvg" using metadata type lvm2 root@proxmox001:~# vgchange -an /dev/drbdvg Can't deactivate volume group "drbdvg" with 1 open logical volume(s) root@proxmox001:~# /sbin/vgchange -ay 4 logical volume(s) in volume group "pve" now active 2 logical volume(s) in volume group "drbdvg" now active root@proxmox001:~# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@proxmox001, 2013-04-24 12:55:32 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:0 dw:8523047 dr:11025118 al:1147 bm:421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8080 root@proxmox001:~# drbdadm connect all root@proxmox001:~# drbdadm verify r0 0: State change failed: (-15) Need a connection to start verify or resync Command 'drbdsetup 0 verify' terminated with exit code 11 root@proxmox001:~# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@proxmox001, 2013-04-24 12:55:32 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:0 dw:8523371 dr:11025534 al:1147 bm:421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8288 root@proxmox001:~# drbdadm secondary all 0: State change failed: (-12) Device is held open by someone Command 'drbdsetup 0 secondary' terminated with exit code 11 root@proxmox001:~# drbdadm up r0 0: Failure: (124) Device is attached to a disk (use detach first) Command 'drbdsetup 0 disk /dev/mapper/pve-lv_data /dev/mapper/pve-lv_data internal --set-defaults --create-device' terminated with exit code 10 root@proxmox001:~# service drbd stop Stopping all DRBD resources:/dev/drbd0: State change failed: (-12) Device is held open by someone ERROR: Module drbd is in use . root@proxmox001:~# drbdadm detach r0 0: State change failed: (-2) Need access to UpToDate data Command 'drbdsetup 0 detach' terminated with exit code 17 root@proxmox001:~# vgchange -an /dev/drbdvg 0 logical volume(s) in volume group "drbdvg" now active root@proxmox001:~# service drbd stop Stopping all DRBD resources:. root@proxmox001:~# service drbd start Starting DRBD resources:[ d(r0) s(r0) n(r0) ].
但是我总是生产中断…
你有任何解决scheme绕过这个问题?
非常感谢 !
编辑:
如果我将global_common文件configuration中的磁盘块replace为:
disk { fencing resource-only; }
如果在一台主机上有任何正在运行的虚拟机,我只需要重新启动drbd来重build关联同步链接。
但是,如果我有两台或所有主机上运行虚拟机,我已经与顶级线程相同。
谢谢