找不到NGINX / Puma超时110:连接超时

每隔一段时间,我的Rails服务器停止响应。 我尾巴的日志,没有什么 – 只是一堆请求已经开始后,最后一个完成…

运用

  • Ubuntu的
  • 彪马2.15.3
  • Ruby 2.2.1p85
  • NGINX 1.4.6
  • 主pipe(运行/ respawning导轨服务器&sidekiq)

Rails日志…

Completed 200 OK in 179ms (Views: 6.2ms | ActiveRecord: 165.7ms) Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:43:47 +0000 Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:44:21 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:45:11 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:45:27 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:48:13 +0000 Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:49:19 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 22:11:58 +0000 

NGINX日志显示从读取响应头开始发生的明显的超时

 2015/12/16 21:48:43 [error] 1129#0: *1777 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "GET / HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/", host: "example.com" 2015/12/16 21:49:49 [error] 1129#0: *1783 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "OPTIONS /api/v2/users/me HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/api/v2/users/me", host: "example.com", referrer: "http://example.com/" 2015/12/16 22:12:28 [error] 1129#0: *1785 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "GET / HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/", host: "example.com" 

NGINX服务器代理传递给一个有两个worker和8个线程的puma服务器。

我在其中一个线程上运行了strace命令。 看到这个…

 [pid 32463] futex(0x7f0314c27a78, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000042> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535389460}) = 0 <0.000034> [pid 32463] write(16, "p32428\n", 7) = 7 <0.000036> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535541983}) = 0 <0.000013> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535578664}) = 0 <0.000012> [pid 32463] futex(0x7f0314c27a04, FUTEX_WAIT_BITSET_PRIVATE, 5195, {454150, 535578664}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000163> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119592840}) = 0 <0.000005> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119637474}) = 0 <0.000006> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119655894}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25969, {454147, 119655894}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000142> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000013> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531287803}) = 0 <0.000006> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531415055}) = 0 <0.000049> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531502589}) = 0 <0.000037> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25969, {454147, 531502589}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000166> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000009> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119925263}) = 0 <0.000005> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119965172}) = 0 <0.000014> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119996627}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25971, {454148, 119996627}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000142> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000046> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 531936925}) = 0 <0.000033> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 532077784}) = 0 <0.000042> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 532160473}) = 0 <0.000036> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25971, {454148, 532160473}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000143> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120250641}) = 0 <0.000004> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120296848}) = 0 <0.000014> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120329244}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25973, {454149, 120329244}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000130> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532433899}) = 0 <0.000005> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532527808}) = 0 <0.000015> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532562815}) = 0 <0.000006> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25973, {454149, 532562815}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000163> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000013> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120641924}) = 0 <0.000006> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120692457}) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120715154}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25975, {454150, 120715154}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000141> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000009> 

nginx.conf

 user ubuntu; worker_processes auto; pid /run/nginx.pid; env PATH; events { worker_connections 768; } http { sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 70; types_hash_max_size 2048; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; include /etc/nginx/mime.types; default_type application/octet-stream; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; gzip on; gzip_disable "msie6"; include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } 

site.conf

 upstream app { server unix:/example_site/shared/tmp/sockets/puma.production.sock fail_timeout=0; } server { listen 80; listen 443 ssl; sendfile off; server_name example.com; ssl_certificate /etc/nginx/ssl/example.com.crt; ssl_certificate_key /etc/nginx/ssl/example.com.key; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # don't use SSLv3 ref: POODLE root /example_site/current/public; try_files $uri/index.html $uri @app; location @app { proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_read_timeout 30; proxy_connect_timeout 60; proxy_redirect off; proxy_pass http://app; } error_page 500 502 503 504 /500.html; client_max_body_size 4G; keepalive_timeout 10; } 

我是使用strace的令人难以置信的新手,我不确定我在这里看到什么,如果这对解决问题甚至是有价值的。

任何人有什么想法,还有什么我可以做什么来弄清楚什么是超时正在发生?

你看到的nginx错误是由于nginxclosures连接,因为它花了太长时间,或者上游服务器不接受连接,或者你的代理没有转发你的上游服务器连接所需的某些标题。

没有看到你的代理块和相关的nginx设置这里有一些快速的事情来检查。

快速的事情你可以做:

  • 在/etc/nginx/nginx.conf中仔细检查keepalive_timeout。
  • 为您的上游服务器添加一个主机文件入口,确保每个连接完成并且不超时。
  • 确保您将nginx代理上正确的标题转发回上游服务器。

这里是指向nginx的代理传递指令的链接: Nginx http_proxy_module

更新:

尝试运行nginx -t ,看看它是否告诉你configuration有什么问题,因为现在突出的一点是keepalive_timeout尝试改变你的keepalive_timeout 70; keepalive_timeout 70s;

如果你的nginx代理和你的puma服务器在不同的设备上,你真的想要使用tcpdump进行一些数据包捕获,然后在wireshark中检查它们以确定发生了什么。 如果nginx没有像One Connect那样做连接池,那么你可能会遇到只有2个并发用户的问题 – 但是你真的只能用这个configuration最多支持4个 。 一个tcpdump将通过允许你计算3路握手的次数并看到第9个复位来确认问题。

长话短说 – 尝试提高工人和线程的数量。