一些简要的背景:我们已经从我们的用户那里得到非常间歇性的连接问题的报告。 由于连接失败或SSL握手问题(我认为这是由于连接问题),用户每天会多次重新加载他们所在的页面。 发生得如此之快,以至于我无法在这些事件中收集任何数据。 它往往会自行消失,只能在晚些时候再回来,通常是在高峰时段。
关于我们的设置一点点:我们有三个虚拟IP在循环DNS中,由Keepalivedpipe理我们的应用服务器池。 nginx正在接受SSL连接,这些连接会上传到haproxy以分发到其他应用程序服务器。 由于这些问题已经出现,我已经更新了服务器上的所有软件(包括从CentOS5到CentOS6),这并没有帮助。 我已经在这里发布了关于我们的nginxconfiguration,这似乎是好的。 它主要基于Mozilla的nginxconfiguration生成器来实现SSL最佳实践。
我被build议注意TCP统计数据。 但是,我不太清楚如何解释这些。 这是我从昨天重新启动的应用程序服务器上的netstat -s的输出(大概是昨天的计数器为0):
Ip: 1021579809 total packets received 4875 forwarded 0 incoming packets discarded 1021562810 incoming packets delivered 1033056732 requests sent out 1 outgoing packets dropped 76648 dropped because of missing route 2 fragments dropped after timeout 7072 reassemblies required 2020 packets reassembled ok 2 packet reassembles failed 1514 fragments received ok 6056 fragments created Icmp: 20522423 ICMP messages received 533 input ICMP message failed. ICMP input histogram: destination unreachable: 20503410 timeout in transit: 2013 wrong parameters: 2 source quenches: 10 redirects: 8264 echo requests: 8256 echo replies: 2 20497056 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 20488798 time exceeded: 1 echo request: 1 echo replies: 8256 IcmpMsg: InType0: 2 InType3: 20503410 InType4: 10 InType5: 8264 InType8: 8256 InType11: 2013 InType12: 2 OutType0: 8256 OutType3: 20488798 OutType8: 1 OutType11: 1 Tcp: 46263582 active connections openings 30767670 passive connection openings 104167 failed connection attempts 2769710 connection resets received 104167 failed connection attempts 2769710 connection resets received 6428 connections established 979651572 segments received 989059642 segments send out 2386512 segments retransmited 1454 bad segments received. 4277435 resets sent Udp: 32926 packets received 21204463 packets to unknown port received. 0 packet receive errors 21033739 packets sent UdpLite: TcpExt: 624791 invalid SYN cookies received 96083 resets received for embryonic SYN_RECV sockets 367 packets pruned from receive queue because of socket buffer overrun 54 ICMP packets dropped because they were out-of-window 21204114 TCP sockets finished time wait in fast timer 57674 packets rejects in established connections because of timestamp 38714053 delayed acks sent 12521 delayed acks further delayed because of locked socket Quick ack mode was activated 6563499 times 62 times the listen queue of a socket overflowed 62 SYNs to LISTEN sockets ignored 74285057 packets directly queued to recvmsg prequeue. 554544554 packets directly received from backlog 34503032789 packets directly received from prequeue 336811743 packets header predicted 75957393 packets header predicted and directly queued to user 210355614 acknowledgments not containing data received 318977957 predicted acknowledgments 1663 times recovered from packet loss due to fast retransmit 181338 times recovered from packet loss due to SACK data 898 bad SACKs received Detected reordering 1847 times using FACK Detected reordering 3512 times using SACK Detected reordering 40 times using reno fast retransmit Detected reordering 16201 times using time stamp 46565 congestion windows fully recovered 49940 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 196240 204108 congestion windows recovered after partial ack 63640 TCP data loss events TCPLostRetransmit: 4150 747 timeouts after reno fast retransmit 40359 timeouts after SACK recovery 24399 timeouts in loss state 286482 fast retransmits 71966 forward retransmits 317608 retransmits in slow start 802284 other TCP timeouts TCPRenoRecoveryFail: 324 14820 sack retransmits failed 22966 packets collapsed in receive queue due to low socket buffer 6453991 DSACKs sent for old packets 1781 DSACKs sent for out of order packets 649408 DSACKs received 3047 DSACKs for out of order packets received 1733842 connections reset due to unexpected data 100890 connections reset due to early user close 98451 connections aborted due to timeout TCPSACKDiscard: 446 TCPDSACKIgnoredOld: 4660 TCPDSACKIgnoredNoUndo: 161136 TCPSpuriousRTOs: 15474 TCPSackShifted: 296768 TCPSackMerged: 495277 TCPSackShiftFallback: 944017 TCPChallengeACK: 82577 TCPSYNChallenge: 287 TCPFromZeroWindowAdv: 5279 TCPToZeroWindowAdv: 5279 TCPWantZeroWindowAdv: 35932 IpExt: InMcastPkts: 151429 OutMcastPkts: 151501 InOctets: 947588463980 OutOctets: 692622505019 InMcastOctets: 6360064 OutMcastOctets: 6060040
除了netstat -s ,还有其他什么工具可以用来更好地理解这里发生的事情吗?
最近我做了一些改变(在这些问题开始之后)。 其他参数设置为默认,因为我没有改变它们。
/proc/sys/net/ipv4/tcp_max_syn_backlog: 4096 /proc/sys/net/ipv4/tcp_fin_timeout: 30 /proc/sys/net/ipv4/ip_local_port_range: 15000 65000 /proc/sys/net/netfilter/nf_conntrack_max: 500000
更新来自Cacti的相关图表:

