DNS循环不会负载平衡SSH

我使用SSHtesting了DNS循环,并且在我的testing环境中注意到了SSH客户端的惊人结果。 我正在使用RHEL 6.2的3个节点(openssh-5.3p1,bind-9.7.3-8.P3)。 像主机密钥的东西已经被pipe理。

我的“问题”:

我想在多个SSH服务器之间使用多个DNS条目进行基本的负载平衡。 我(几乎)确定这是可能的。 但是,我得到了一种基本的HA …看起来,openssh客户端并不关心循环,它总是连接到同一个节点,除非它是closures的,在最后一种情况下,客户端使用另一个loggingDNS条目列表,然后连接成功。 这是正常/常见的行为? 或者我的testing有什么问题?

我把我的事情和tcpdumps在几种情况下发生的事情。 在此先感谢,如果你有任何想法或解释,可以帮助:)

login=> 10.255.254.1(node0),10.255.254.3(node2)ssh client => 10.255.254.2(node1)

node0上的DNS服务器,RR尚未禁用。

login IN A 10.255.254.1 login IN A 10.255.254.3 

我确认:

  • 与主机(1)的查找确认循环;
  • ping(1)命令看起来不错:

[root @ node1〜]#pinglogin

 PING login.node (10.255.254.3) 56(84) bytes of data. 64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=1.73 ms ^C [root@node1 ~]# ping login PING login.node (10.255.254.1) 56(84) bytes of data. 64 bytes from node0.node (10.255.254.1): icmp_seq=1 ttl=64 time=0.467 ms ^C [root@node1 ~]# ping login PING login.node (10.255.254.3) 56(84) bytes of data. 64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=0.433 ms ^C 

testing1 (两个SSH服务器都可以访问)

 [root@node1 ~]# strace -e connect ssh login connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) (...) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 (...) [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:03:04.875099 IP node1.node.53511 > node0.node.domain: 55904+ A? login.node. (29) 17:03:04.875417 IP node0.node.domain > node1.node.53511: 55904* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102) 17:03:04.875432 IP node1.node.53511 > node0.node.domain: 22271+ AAAA? login.node. (29) 17:03:04.875523 IP node0.node.domain > node1.node.53511: 22271* 0/1/0 (79) 

=> node2上的连接(10.255.254.3)

testing2 (两个SSH服务器仍然可用)

 [root@node1 ~]# strace -e connect ssh login connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) (...) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 (...) [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 17:04:29.663664 IP node1.node.51950 > node0.node.domain: 4685+ A? login.node. (29) 17:04:29.663685 IP node1.node.51950 > node0.node.domain: 36559+ AAAA? login.node. (29) 17:04:29.664046 IP node0.node.domain > node1.node.51950: 4685* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102) 17:04:29.664110 IP node0.node.domain > node1.node.51950: 36559* 0/1/0 (79) 

=>连接在node2上

(另一个testing再次确认与node2的连接,似乎循环仅用于ssh客户端的初步testing)

testing3 (节点2上的 SSH服务器停止)

 [root@node2 ~]# /etc/init.d/sshd stop Stopping sshd: [ OK ] [root@node1 ~]# strace -e connect ssh login connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) (...) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused) connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 17:09:05.854022 IP node1.node.41233 > node0.node.domain: 63435+ A? login.node. (29) 17:09:05.854055 IP node1.node.41233 > node0.node.domain: 3015+ AAAA? login.node. (29) 17:09:05.854436 IP node0.node.domain > node1.node.41233: 63435* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102) 17:09:05.854531 IP node0.node.domain > node1.node.41233: 3015* 0/1/0 (79) 17:09:05.856764 IP node1.node.59579 > node0.node.ssh: Flags [S], seq 3025023931, win 14600, options [mss 1460,sackOK,TS val 9854496 ecr 0,nop,wscale 7], length 0 17:09:05.856806 IP node0.node.ssh > node1.node.59579: Flags [S.], seq 1105519762, ack 3025023932, win 14480, options [mss 1460,sackOK,TS val 350907197 ecr 9854496,nop,wscale 7], length 0 17:09:05.857106 IP node1.node.59579 > node0.node.ssh: Flags [.], ack 1, win 115, options [nop,nop,TS val 9854496 ecr 350907197], length 0 17:09:05.865291 IP node0.node.ssh > node1.node.59579: Flags [P.], seq 1:22, ack 1, win 114, options [nop,nop,TS val 350907205 ecr 9854496], length 21 (...) 

=> node0上的连接(故障切换,惊喜!)

testing4 (相同条件)

 [root@node1 ~]# strace -e connect ssh login connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) (...) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused) connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 (...) 17:11:44.154595 IP node1.node.56947 > node0.node.domain: 4602+ A? login.node. (29) 17:11:44.154862 IP node0.node.domain > node1.node.56947: 4602* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102) (...) 

=> 相同的结果 (node0上的连接)

testing5 (node2上的SSH服务器重新启动)

 [root@node2 ~]# /etc/init.d/sshd restart Stopping sshd: [FAILED] Starting sshd: [ OK ] [root@node1 ~]# strace -e connect ssh login connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) (...) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 (...) 17:17:12.893633 IP node1.node.42432 > node0.node.domain: 7264+ A? login.node. (29) 17:17:12.893988 IP node0.node.domain > node1.node.42432: 7264* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102) (...) 

=>在node2上再次连接(故障恢复)

DNS不提供负载平衡,所以是的,除非主机closures,否则它将总是使用返回的DNSlogging列表中的logging。 如果你想dynamic地处理被closures的主机,你将不得不负载平衡你的SSH盒子的传入连接。

循环DNS请求在负载均衡方面非常简单。 查看缺点部分: http : //en.wikipedia.org/wiki/Round_robin_DNS

那么,最后这个行为就像上面描述的一样,只在同一个子网内。 当我在另一个LAN上(使用中间网关)使用openssh客户端时, 我的意思是:我得到了一个基本的负载分配,当一个节点closures时有一个“故障转移”。

所以我得出结论:RRDNS足以处理SSH用户的基本负载分配。