在长期生活的ruby进程中,DNS查询最终变得一直很慢

我有一个运行在Ubuntu服务器上的Ruby应用程序,这个应用程序经常需要向其他服务发出HTTP请求。 其中一些服务位于内部域,有些可以通过公共DNS访问。 我有一个Consul代理与我的Ruby应用程序共同工作,它的工作是解决内部组件的DNS查询。

最终,Ruby应用程序的速度变慢,并且需要5秒钟才能parsing到外部域的DNS查询。 我需要帮助理解为什么下面描述的行为可能会发生。

为了澄清,我的目标是了解以下行为,我已经有几个修复/我正在考虑的解决方法。

我已经将问题隔离到以下内容:

最小可重复的例子:

在/etc/resolv.conf

nameserver 127.0.0.1 nameserver 8.8.8.8 

我用下面的始终SERVFAIL DNSparsing器代替端口53上的Consul:

 #!/usr/bin/env ruby require "rubydns" RubyDNS.run_server(listen: [[:udp, "0.0.0.0", 53], [:tcp, "0.0.0.0", 53]]) do otherwise do |transaction| transaction.fail!(:ServFail) end end 

我把上面的文件放在一个文件( /var/vcap/data/tmp/dns.rb )然后运行它:

 ruby /var/vcap/data/tmp/dns.rb 

这不是非常重要的,但是上面的是用系统ruby(1.9)来运行的。 然后我运行下面的命令(再次,可能不重要,但是这使用了不同的ruby版本):

 var/vcap/packages/ruby-2.1.6/bin/ruby -r'net/protocol' -e \ '1.step do |i|; t=Time.now; TCPSocket.open("SOME.REAL.DOMAIN",80).close; p Time.now-t; end' 

上面是一个小小的谎言,打印出来的格式有点多了,但是为了使它适合这个post而不需要左右滚动,我简化了一下。

观察到的行为:

我看到的行为是一段时间,循环的每个迭代大约10ms的顺序,但最终它开始采取5000ms的顺序。

慢行为的持续时间

有一次,我修改了/etc/resolv.conf来让options timeout:4然后重新运行这个实验,每次迭代开始需要大约4000ms。 所以5000ms似乎来自5s的默认resolvconf超时。

减速前的迭代次数

它可能需要从几百到几万次的迭代来performance行为,但总是这样。 如果你停止循环,再次运行命令,同样的事情,几百或几千次~10ms的迭代,然后一直~5000ms的迭代。 通常情况下,需要一两次迭代,中间时间为2000-4000毫秒。

长寿命的ruby循环与重复的一次性ruby命令

我相信这与长寿命的ruby进程有关,因为如果我运行一个bash循环,并且在bash循环的每次迭代中它执行一些ruby,我不会看到这个最终的减速。 例如以下不会显示不良行为:

 while true; do time /var/vcap/packages/ruby-VERSION/bin/ruby -r'net/protocol' -e \ 'TCPSocket.open("SOME.REAL.DOMAIN, 80).close'; done 

只有本地主机名称服务器与两个名称服务器

另外,这与多个域名服务器有关。 如果我编辑/etc/resolv.conf只有127.0.0.1名称服务器,并且为ruby迭代添加一个rescue nil (为了防止脚本在第一次迭代时被炸毁),它似乎需要很长时间没有变慢。

debugging信息:

在一次7873个快速迭代(〜10ms)的情况下,我做了上述循环的一个步骤,然后一个迭代耗时3013ms,然后所有后续迭代耗时5000ms以上。 所有快速迭代的strace输出看起来基本相同,从〜5000ms迭代的strace输出看起来基本相同。 所以我在这里将包括上一次快速迭代的输出,3013ms迭代和第一次 ~5000ms迭代。 当然,在strace输出中它是全部连续的,但是我将它分解成单独的块以便于查看。

最后一次快速迭代:9ms

 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000 read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305 read(7, "", 4096) = 0 close(7) = 0 munmap(0x7f4e70992000, 4096) = 0 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2 poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}]) ioctl(7, FIONREAD, [45]) = 0 recvfrom(7, "A\5\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45 poll([{fd=7, events=POLLIN}], 1, 4999) = 1 ([{fd=7, revents=POLLIN}]) ioctl(7, FIONREAD, [45]) = 0 recvfrom(7, "YH\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2 poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [61]) = 0 recvfrom(8, "A\5\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61 poll([{fd=8, events=POLLIN}], 1, 2996) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [109]) = 0 recvfrom(8, "YH\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109 close(7) = 0 close(8) = 0 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7 fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 close(7) = 0 write(1, "7873: 9ms", 9) = 9 write(1, "\n", 1) = 1 

中间迭代:3013ms

 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000 read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305 read(7, "", 4096) = 0 close(7) = 0 munmap(0x7f4e70992000, 4096) = 0 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2 poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}]) ioctl(7, FIONREAD, [45]) = 0 recvfrom(7, "u>\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45 poll([{fd=7, events=POLLIN}], 1, 4999) = 1 ([{fd=7, revents=POLLIN}]) ioctl(7, FIONREAD, [45]) = 0 recvfrom(7, "e?\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2 poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [109]) = 0 recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109 poll([{fd=8, events=POLLIN}], 1, 2997) = 0 (Timeout) poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) sendto(8, "u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45 poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [61]) = 0 recvfrom(8, "u>\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61 poll([{fd=8, events=POLLOUT}], 1, 2997) = 1 ([{fd=8, revents=POLLOUT}]) sendto(8, "e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45 poll([{fd=8, events=POLLIN}], 1, 2997) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [109]) = 0 recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109 close(7) = 0 close(8) = 0 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7 fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 close(7) = 0 write(1, "7874: 3013ms", 12) = 12 write(1, "\n", 1) = 1 

第一次慢迭代:5014ms

 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000 read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305 read(7, "", 4096) = 0 close(7) = 0 munmap(0x7f4e70992000, 4096) = 0 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}]) sendto(7, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45 poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}]) ioctl(7, FIONREAD, [45]) = 0 recvfrom(7, "\r\337\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45 poll([{fd=7, events=POLLIN}], 1, 4995) = 0 (Timeout) socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8 connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}]) sendto(8, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45 poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [61]) = 0 recvfrom(8, "\r\337\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61 poll([{fd=8, events=POLLOUT}], 1, 2996) = 1 ([{fd=8, revents=POLLOUT}]) sendto(8, "}\217\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45 poll([{fd=8, events=POLLIN}], 1, 2996) = 1 ([{fd=8, revents=POLLIN}]) ioctl(8, FIONREAD, [109]) = 0 recvfrom(8, "}\217\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109 close(7) = 0 close(8) = 0 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7 fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 close(7) = 0 write(1, "7875: 5014ms", 12) = 12 write(1, "\n", 1) = 1 

补充意见:

上面的几个奇怪的东西。 请注意,〜3000ms的迭代超时轮询文件描述符8,它用于与8.8.8.8名称服务器交谈,而〜5000ms迭代(以及所有后续的迭代)超时轮询文件描述符7,该文件描述符用于127.0.0.1域名服务器。 还要注意,所有快速迭代和中间迭代都使用sendmmsg来通过套接字进行通信,但所有sendmmsg迭代都使用sendto

也许我理解这个问题(我希望)。 我认为你可以使用一个自定义的绑定DNSconfiguration来修复这个问题,使用一个私有的“假”区域和Internet区域,并修复一个合适的TTL,以便使用cachingDNS引擎获得更快的响应速度。 希望能帮助到你! 问候。

只需在本地主机上configuration绑定。 您必须为.consul区域(默认端口为8600)添加转发区域,请参阅: https ://www.consul.io/docs/agent/dns.html

确保您的服务器可以访问Internet(udp和tcp端口53和icmp)。

然后你可以安全地删除/etc/resolv.conf中的8.8.8.8部分。