nginx反向代理大大增加了最坏情况下的延迟

（编辑：部分理解和解决，见评论）

我有一个设置与Nginx作为一个CherryPy应用服务器前面的反向代理。我使用ab来比较通过nginx和不通过的性能，并且注意到前一种情况的最坏情况性能要差得多：

$ ab -n 200 -c 10 'http://localhost/noop' This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Finished 200 requests Server Software: nginx Server Hostname: localhost Server Port: 80 Document Path: /noop Document Length: 0 bytes Concurrency Level: 10 Time taken for tests: 3.145 seconds Complete requests: 200 Failed requests: 0 Write errors: 0 Total transferred: 29600 bytes HTML transferred: 0 bytes Requests per second: 63.60 [#/sec] (mean) Time per request: 157.243 [ms] (mean) Time per request: 15.724 [ms] (mean, across all concurrent requests) Transfer rate: 9.19 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 1 Processing: 5 48 211.7 31 3007 Waiting: 5 48 211.7 31 3007 Total: 5 48 211.7 31 3007 Percentage of the requests served within a certain time (ms) 50% 31 66% 36 75% 39 80% 41 90% 46 95% 51 98% 77 99% 252 100% 3007 (longest request) $ ab -n 200 -c 10 'http://localhost:8080/noop' This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Finished 200 requests Server Software: CherryPy/3.2.0 Server Hostname: localhost Server Port: 8080 Document Path: /noop Document Length: 0 bytes Concurrency Level: 10 Time taken for tests: 0.564 seconds Complete requests: 200 Failed requests: 0 Write errors: 0 Total transferred: 27600 bytes HTML transferred: 0 bytes Requests per second: 354.58 [#/sec] (mean) Time per request: 28.202 [ms] (mean) Time per request: 2.820 [ms] (mean, across all concurrent requests) Transfer rate: 47.79 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 1.7 0 11 Processing: 6 26 23.5 24 248 Waiting: 3 25 23.6 23 248 Total: 6 26 23.4 24 248 Percentage of the requests served within a certain time (ms) 50% 24 66% 27 75% 29 80% 31 90% 34 95% 40 98% 51 99% 234 100% 248 (longest request)

什么可能导致这个？我唯一能想到的是nginx以不同的顺序向后端发送请求，但这似乎是不合理的。

该机器是具有2个内核的EC2 c1.medium实例，CherryPy使用了10个线程的线程池，而nginx的worker_connections = 1024。

更新：两个更令人困惑的发现：

在给定的并发性下，发送更多的请求可以提高性能。并发40个和40个请求，我的中位时间为3秒，最大为10.5秒; 并发40个和200个请求，我得到一个38ms（！）和最大7.5s的中位数。实际上，200个请求的总时间是less的！（6.5和7.5对于40）。这是全部可重复的。
用strace监控两个nginx工作进程大大提高了他们的性能，例如中位时间为3s到77ms，而不会明显改变其行为。（我用一个非平凡的API调用进行了testing，并确认strace不会改变响应，以及所有这些性能观察依然持有）。这也是可重复的。

在你的第一次运行中，最糟糕的3秒看起来就像是一个丢包。这可能是一些缓冲区/资源configuration不足的结果，一些可能的原因没有特定的顺序：

在后端监听队列太小，导致偶然的监听队列溢出（在这种情况下，Linux通常被configuration为丢弃SYN数据包，从而使数据包丢失使其不能被窃取;请参阅netstat -s | grep listen是否存在问题）。
本地主机上的Statefull防火墙可以限制状态数量，并且会因此丢弃一些随机的SYN数据包。
由于sockets处于TIME_WAIT状态，系统不在sockets/本地端口中，如果您使用的是Linux，请参阅此问题。

您必须仔细检查您的操作系统以找出原因并相应地configuration您的操作系统。您可能还想要遵循一些networking子系统调优指南为您的操作系统。请注意，EC2在这里可能有点特殊，因为有报告说EC2实例的networking性能非常有限。

从nginx的angular度来看，任何解决scheme或多或less都会出错（因为问题不在nginx中，而是在无法应付负载和丢包的操作系统中）。不过，你可以尝试一些技巧来减轻操作系统networking子系统的负担：

将保持连接configuration到后端。
configuration后端以侦听unix域套接字（如果您的后端支持它），并configurationnginx代理对它的请求。

NGINX使用HTTP / 1.0进行后端连接，默认情况下没有保持连接（请参阅Maxim后端keepalive的链接），所以这意味着为每个请求创build一个新的后端连接，从而增加了一些延迟。你也许应该有更多的工作进程，2 * CPU核心的数量，最less5个。如果你有超过10个并发请求，你也可能需要更多的线程在CherryPy。