我的故障转移群集有一个神秘的问题,
Cluster name: PrintCluster01.domain.com Members: PrintServer01.domain.com andPrintServer02.domain.com
在故障转移群集pipe理 – 群集事件中,我收到了严重错误消息1135和1177:
Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 15/06/2011 9:07:49 PM Event ID: 1177 Task Category: None Level: Critical Keywords: User: SYSTEM Computer: PrintServer01.domain.com Description: The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 15/06/2011 9:07:28 PM Event ID: 1135 Task Category: None Level: Critical Keywords: User: SYSTEM Computer: PrintServer01.domain.com Description: Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
经过进一步的调查,我发现了一些有趣的错误,从PrintServer02的事件查看器中logging的第一个严重错误消息:
Log Name: System Source: Tcpip Date: 15/06/2011 9:07:29 PM Event ID: 4199 Task Category: None Level: Error Keywords: Classic User: N/A Computer: PrintServer02-VM.domain.com Description: The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.
192.168.127.142 – > PrintServer01的二级IP怎么可能通过PrintServer01节点之一冲突呢? 具体如下:
**From PrintServer01** Ethernet adapter Local Area Connection* 8: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter Physical Address. . . . . . . . . : 02-50-56-AE-29-23 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.0.0 Default Gateway . . . . . . . . . : NetBIOS over Tcpip. . . . . . . . : Enabled
我仔细检查了所有集群成员现在所有的IP地址是唯一的。
但是我敢肯定,我的IP是静态的而不是通过DHCP从IPCONFIG结果如下:
From **PrintServer01** (the Active Node) Windows IP Configuration Host Name . . . . . . . . . . . . : PrintServer01 Primary Dns Suffix . . . . . . . : domain.com Node Type . . . . . . . . . . . . : Hybrid IP Routing Enabled. . . . . . . . : No WINS Proxy Enabled. . . . . . . . : No DNS Suffix Search List. . . . . . : domain.com domain.com.au Ethernet adapter Local Area Connection* 8: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter Physical Address. . . . . . . . . : 02-50-56-AE-29-23 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.0.0 Default Gateway . . . . . . . . . : NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Cluster Public Network: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection Physical Address. . . . . . . . . : 00-50-56-AE-29-23 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.127.254 DNS Servers . . . . . . . . . . . : 192.168.127.10 192.168.127.11 Primary WINS Server . . . . . . . : 192.168.127.10 Secondary WINS Server . . . . . . : 192.168.127.11 NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Cluster Private Network: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2 Physical Address. . . . . . . . . : 00-50-56-AE-43-EC DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : NetBIOS over Tcpip. . . . . . . . : Disabled From **PrintServer02** Windows IP Configuration Host Name . . . . . . . . . . . . : PrintServer02 Primary Dns Suffix . . . . . . . : domain.com Node Type . . . . . . . . . . . . : Hybrid IP Routing Enabled. . . . . . . . : No WINS Proxy Enabled. . . . . . . . : No DNS Suffix Search List. . . . . . : domain.com domain.com.au Ethernet adapter Local Area Connection* 8: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.0.0 Default Gateway . . . . . . . . . : NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Cluster Public Network: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection Physical Address. . . . . . . . . : 00-50-56-AE-79-FA DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.127.254 DNS Servers . . . . . . . . . . . : 192.168.127.10 192.168.127.11 Primary WINS Server . . . . . . . : 192.168.127.11 Secondary WINS Server . . . . . . : 192.168.127.10 NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Cluster Private Network: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2 Physical Address. . . . . . . . . : 00-50-56-AE-77-8D DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : NetBIOS over Tcpip. . . . . . . . : Disabled
任何帮助将不胜感激。
谢谢,AWT
如果群集中的多个节点试图同时使资源组(及其关联的IP)联机,则会发生IP地址冲突错误。
如果群集节点瞬间失去联系,就会发生这种情况。 每个节点都假设另一个节点发生故障,因此,当“活动”节点上的“被动”节点实际上仍处于联机状态时,它将使所有资源组联机。
当我们的VMWare环境中有一个ESX(i)主机过载时,我曾经看到过这个问题 – 有时甚至在HBA总线重新扫描期间,突然间,MSCS节点突然失去联系并且发生这种混乱。
使用此页面上的脚本来查询VM的mac地址:
http://www.virtuallyghetto.com/2011/05/how-to-query-for-macs-on-internal.html
将其与您的行为不当的MAC地址相匹配,仔细检查机器。
恕我直言,任何逻辑服务-IP应该有一个/ 32的子网掩码。 networking应由物理IP提供服务,该物理IP应具有与所使用的子网匹配的子网掩码。
我通过自动分配IP和手动分配IP来解决了这个问题。 这要求删除不存在的设备,这就解决了这个问题。