Intereting Posts

装载失败时防止根系统装满的最佳方法是什么？ HP LeftHand VSA“可用空间”小于“原始空间” LXC上的IOTOP – > Netlink错误服务apache2重新启动导致“错误：设备上没有剩余空间” NetApp可用容量魔术百胜希望删除并重新安装我的内核。这安全吗？计数匹配值的数组元素可以将Apacheconfiguration为忽略损坏的.htaccess文件，但如果它们是正确的，请尊重它们？如何让Windows相信我的PowerShellconfiguration文件？ chdir上的uwsgi权限被拒绝，该权限对该uid具有权限无法从传输连接读取数据：现有连接被远程主机强制closures 通配符“A”logging覆盖CNAMElogging 当出现故障节点时，心跳将无法成功启动冷引导的资源 Nagios nrpe自定义插件命令没有在localhost上定义迁移到新存储库的Azure最佳实践

在+200个并发连接之后NGINX超时

这是我的nginx.conf （我已经更新configuration，以确保没有涉及PHP或任何其他瓶颈）：

 user nginx; worker_processes 4; worker_rlimit_nofile 10240; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; error_log /var/www/log/nginx_errors.log warn; port_in_redirect off; server_tokens off; sendfile on; gzip on; client_max_body_size 200M; map $scheme $php_https { default off; https on; } index index.php; client_body_timeout 60; client_header_timeout 60; keepalive_timeout 60 60; send_timeout 60; server { server_name dev.anuary.com; root "/var/www/virtualhosts/dev.anuary.com"; } }

我正在使用http://blitz.io/play来testing我的服务器（我买了10 000个并发连接计划）。在30秒内，我得到了964次命中和5,587 timeouts 。当并发用户数为200时，第一次超时发生在40.77秒。

在testing期间，服务器负载是（ top输出）：

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20225 nginx 20 0 48140 6248 1672 S 16.0 0.0 0:21.68 nginx 1 root 20 0 19112 1444 1180 S 0.0 0.0 0:02.37 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.03 migration/0

所以这不是服务器资源问题。之后怎么样了？

更新2011 12 09 GMT 17:36。

到目前为止，我做了以下更改，以确保瓶颈不是TCP / IP。添加到/etc/sysctl.conf ：

 # These ensure that TIME_WAIT ports either get reused or closed fast. net.ipv4.tcp_fin_timeout = 1 net.ipv4.tcp_tw_recycle = 1 # TCP memory net.core.rmem_max = 16777216 net.core.rmem_default = 16777216 net.core.netdev_max_backlog = 262144 net.core.somaxconn = 4096 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_max_syn_backlog = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2

更多的debugging信息：

 [root@server node]# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 126767 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

注意worker_rlimit_nofile被设置为10240 nginx config。

更新2011 12 09 GMT 19:02。

它看起来像我做的更多，更糟的是，但在这里新的configuration文件。

 user nginx; worker_processes 4; worker_rlimit_nofile 10240; pid /var/run/nginx.pid; events { worker_connections 2048; #1,353 hits, 2,751 timeouts, 72 errors - Bummer. Try again? #1,408 hits, 2,727 timeouts - Maybe you should increase the timeout? } http { include /etc/nginx/mime.types; error_log /var/www/log/nginx_errors.log warn; # http://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/ access_log off; open_file_cache max=1000; open_file_cache_valid 30s; client_body_buffer_size 10M; client_max_body_size 200M; proxy_buffers 256 4k; fastcgi_buffers 256 4k; keepalive_timeout 15 15; client_body_timeout 60; client_header_timeout 60; send_timeout 60; port_in_redirect off; server_tokens off; sendfile on; gzip on; gzip_buffers 256 4k; gzip_comp_level 5; gzip_disable "msie6"; map $scheme $php_https { default off; https on; } index index.php; server { server_name ~^www\.(?P<domain>.+); rewrite ^ $scheme://$domain$request_uri? permanent; } include /etc/nginx/conf.d/virtual.conf; }

更新2011 12 11 GMT 20:11。

这是testing期间netstat -ntla输出。

https://gist.github.com/d74750cceba4d08668ea

UPDATE 2011 12 12 GMT 10:54。

只是为了澄清， iptables （防火墙）在testing时closures。

UPDATE 2011 12 12 GMT 22:47。

这是sysctl -p | grep mem sysctl -p | grep mem转储。

 net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.route.flush = 1 net.ipv4.ip_local_port_range = 1024 65000 net.core.rmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_max = 8388608 net.core.wmem_default = 65536 net.core.netdev_max_backlog = 262144 net.core.somaxconn = 4096 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_max_syn_backlog = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2

UPDATE 2011 12 12 GMT 22:49

我正在使用blitz.io来运行所有的testing。我testing的URL是http://dev.anuary.com/test.txt ，使用以下命令： – --region ireland --pattern 200-250:30 -T 1000 http://dev.anuary.com/test.txt

UPDATE 2011 12 13 GMT 13:33

nginx用户限制（在/etc/security/limits.conf设置）。

 nginx hard nofile 40000 nginx soft nofile 40000

在testing过程中，您将需要转储您的networking连接。虽然服务器可能接近零负载，但您的TCP / IP堆栈可能会计费。在netstat输出中查找TIME_WAIT连接。

如果是这种情况，那么您将需要检查与TCP等待状态，TCP回收和类似度量有关的tcp / ip内核参数的调整。

另外，你还没有描述什么被testing。

我总是testing：

静态内容（图像或文本文件）
简单的PHP页面（例如phpinfo）
申请页面

这可能不适用于你的情况，但是我在做性能testing时要做的事情。 testing不同types的文件可以帮助您找出瓶颈。

即使使用静态内容，testing不同大小的文件也很重要，以便获取超时和其他指标。

我们有一些静态内容Nginx框处理3000+活动连接。所以Nginx肯定可以做到这一点。

更新：你的netstat显示了很多打开的连接。可能要尝试调整您的TCP / IP堆栈。另外，你要求什么文件？ Nginx应该快速closures这个端口。

这是一个关于sysctl.conf的build议：

 net.ipv4.ip_local_port_range = 1024 65000 net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1

这些值非常低，但是我已经在高并发性的Nginx盒子上取得了成功。

又一个假设。您增加了worker_rlimit_nofile ，但在文档中将最大客户数定义为

max_clients = worker_processes * worker_connections

如果你试图提升worker_connections ，比如8192呢？或者，如果有足够的CPU核心，请增加worker_processes ？

我有一个非常类似的问题与一个nginx框服务器负载均衡器上游的Apache服务器。

在我的情况下，我能够孤立的问题是networking相关的上游apache服务器变得过载。当整个系统处于负载状态时，我可以用简单的bash脚本重新创build它。根据一个挂起的进程strace连接调用得到一个ETIMEDOUT。

这些设置（在nginx和上游服务器上）为我解决了这个问题。在进行这些更改之前，我每分钟得到1或2个超时（盒子处理〜100个请求/秒），现在得到0。

 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_fin_timeout = 20 net.ipv4.tcp_max_syn_backlog = 20480 net.core.netdev_max_backlog = 4096 net.ipv4.tcp_max_tw_buckets = 400000 net.core.somaxconn = 4096

我不会推荐使用net.ipv4.tcp_tw_recycle或net.ipv4.tcp_tw_reuse，但是如果你想使用后者。如果存在任何forms的延迟，它们会引起奇怪的问题，而后者至less是两者中较为安全的。

我认为tcp_fin_timeout设置为1以上可能会造成一些麻烦。尝试把它在20/30 – 仍然远低于默认值。

也许是不是Nginx的问题，而你在blitz.io上testing做一个：

 tail -f /var/log/php5-fpm.log

（多数民众赞成在我用来处理PHP）

这触发一个错误，超时开始提高：

 WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

所以，把更多的max_children fmp conf和它的完成！ ; d

您的max open files （1024）太低，请尝试更改并重新启动nginx。（ cat /proc/<nginx>/limits确认）

 ulimit -n 10240

并将worker_connections增加到10240或更高。