一天几次我们的nginx(1.1.19 Ubuntu 12.04 lts)停顿几秒钟(目前最长的时间是53秒),并等待提供数据。 客户端没有错误,请求只需要拖延时间。 这适用于在这段时间内的所有请求(cgi或状态模块),所有请求将在失速结束后立即提供。
我在服务器上有一个屏幕会话,并每秒钟curl状态页面:
{"time":"2016-09-02T10:10:21+02:00","host":"app1","data":"nginx","reading":1,"writing":4,"waiting":0} {"time":"2016-09-02T10:10:22+02:00","host":"app1","data":"nginx","reading":1,"writing":4,"waiting":0} {"time":"2016-09-02T10:10:50+02:00","host":"app1","data":"nginx","reading":3,"writing":9,"waiting":0} {"time":"2016-09-02T10:11:43+02:00","host":"app1","data":"nginx","reading":5,"writing":98,"waiting":0} {"time":"2016-09-02T10:11:44+02:00","host":"app1","data":"nginx","reading":0,"writing":25,"waiting":0} {"time":"2016-09-02T10:11:45+02:00","host":"app1","data":"nginx","reading":3,"writing":7,"waiting":0}
差距不是由于错误,而是要求有比正常更长的外部持续时间。 在访问日志中没有logging错误的请求。 你可以注意到差距。
127.0.0.1 - - [02/Sep/2016:10:10:17 +0200] "GET /basic_status HTTP/1.1" 200 121 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3" 127.0.0.1 - - [02/Sep/2016:10:10:18 +0200] "GET /basic_status HTTP/1.1" 200 121 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3" 127.0.0.1 - - [02/Sep/2016:10:10:20 +0200] "GET /basic_status HTTP/1.1" 200 121 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3" 127.0.0.1 - - [02/Sep/2016:10:10:21 +0200] "GET /basic_status HTTP/1.1" 200 121 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3" 127.0.0.1 - - [02/Sep/2016:10:10:22 +0200] "GET /basic_status HTTP/1.1" 200 121 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3"
我检查了nginx错误日志和fpm日志等,但在这个时候没有错误。
user www-data; worker_processes 4; pid /var/run/nginx.pid; events { worker_connections 768; } http { sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; include /etc/nginx/mime.types; default_type application/octet-stream; large_client_header_buffers 4 80k; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; gzip on; gzip_disable "msie6"; include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } server { listen 80; server_name app1; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; root /var/www; location /basic_status { stub_status on; } }
我也logging了TIME_WAIT-Quadruples的数量,但是没有高数字:
{"time":"2016-09-02T10:10:54+02:00","host":"app1","data":"time_wait","httpFromLB":1153,"httpFromLocal":604,"mysqlToDb":1988,"memcacheToLocal":250, "cgiToLocal":1527} {"time":"2016-09-02T10:10:55+02:00","host":"app1","data":"time_wait","httpFromLB":1153,"httpFromLocal":604,"mysqlToDb":1991,"memcacheToLocal":251, "cgiToLocal":1527} {"time":"2016-09-02T10:10:56+02:00","host":"app1","data":"time_wait","httpFromLB":1153,"httpFromLocal":604,"mysqlToDb":1992,"memcacheToLocal":252, "cgiToLocal":1527} {"time":"2016-09-02T10:10:57+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1628,"memcacheToLocal":213, "cgiToLocal":1236} {"time":"2016-09-02T10:10:58+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1629,"memcacheToLocal":214, "cgiToLocal":1236} {"time":"2016-09-02T10:10:59+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1631,"memcacheToLocal":215, "cgiToLocal":1236} {"time":"2016-09-02T10:11:00+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1632,"memcacheToLocal":216, "cgiToLocal":1236} {"time":"2016-09-02T10:11:01+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1633,"memcacheToLocal":217, "cgiToLocal":1236} {"time":"2016-09-02T10:11:03+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1636,"memcacheToLocal":218, "cgiToLocal":1236} {"time":"2016-09-02T10:11:04+02:00","host":"app1","data":"time_wait","httpFromLB":902,"httpFromLocal":496,"mysqlToDb":1637,"memcacheToLocal":219, "cgiToLocal":1236}
我作为第一行代码从调查中排除的应用程序本身将logging时间戳,这是时间结束的时间。
我不知道在哪里进一步调查。 有任何想法吗?