太多的TCP连接导致断开连接

我有一个运行TCP连接的游戏服务器。服务器随机断开用户。我认为它与服务器的TCP设置有关。

在本地开发环境中，编写代码可以处理超过8000个并发用户，而不会有任何断开或错误（在本地主机上）。

但在实际部署的Centos 5 64位服务器中，服务器独立于并发tcp连接数量创build这些断开连接。

服务器似乎无法处理吞吐量。

netstat -s -t IcmpMsg: InType0: 31 InType3: 87717 InType4: 699 InType5: 2 InType8: 1023781 InType11: 7211 OutType0: 1023781 OutType3: 603 Tcp: 8612766 active connections openings 14255236 passive connection openings 12174 failed connection attempts 319225 connection resets received 723 connections established 6351090913 segments received 6180297746 segments send out 45791634 segments retransmited 0 bad segments received. 1664280 resets sent TcpExt: 46244 invalid SYN cookies received 3745 resets received for embryonic SYN_RECV sockets 327 ICMP packets dropped because they were out-of-window 1 ICMP packets dropped because socket was locked 11475281 TCP sockets finished time wait in fast timer 140 time wait sockets recycled by time stamp 1569 packets rejects in established connections because of timestamp 103783714 delayed acks sent 6929 delayed acks further delayed because of locked socket Quick ack mode was activated 6210096 times 1806 times the listen queue of a socket overflowed 1806 SYNs to LISTEN sockets ignored 1080380601 packets directly queued to recvmsg prequeue. 31441059 packets directly received from backlog 5272599307 packets directly received from prequeue 324498008 packets header predicted 1143146 packets header predicted and directly queued to user 3217838883 acknowledgments not containing data received 1027969883 predicted acknowledgments 395 times recovered from packet loss due to fast retransmit 257420 times recovered from packet loss due to SACK data 5843 bad SACKs received Detected reordering 29 times using FACK Detected reordering 12 times using SACK Detected reordering 1 times using reno fast retransmit Detected reordering 809 times using time stamp 1602 congestion windows fully recovered 1917 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 8196226 7850525 congestion windows recovered after partial ack 139681 TCP data loss events TCPLostRetransmit: 26 10139 timeouts after reno fast retransmit 2802678 timeouts after SACK recovery 86212 timeouts in loss state 273698 fast retransmits 19494 forward retransmits 2637236 retransmits in slow start 33381883 other TCP timeouts TCPRenoRecoveryFail: 92 19488 sack retransmits failed 7 times receiver scheduled too late for direct processing 6354641 DSACKs sent for old packets 333 DSACKs sent for out of order packets 20615579 DSACKs received 2724 DSACKs for out of order packets received 123034 connections reset due to unexpected data 91876 connections reset due to early user close 169244 connections aborted due to timeout 28736 times unabled to send RST due to no memory IpExt: InMcastPkts: 2

是什么让我思考这些似乎是非常有问题的。

 123034 connections reset due to unexpected data 91876 connections reset due to early user close 28736 times unabled to send RST due to no memory

我如何解决这些错误？我需要做TCP调优吗？

编辑：一些sysctl信息：

 sysctl -A | grep net | grep mem net.ipv4.udp_wmem_min = 4096 net.ipv4.udp_rmem_min = 4096 net.ipv4.udp_mem = 772704 1030272 1545408 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_mem = 196608 262144 393216 net.ipv4.igmp_max_memberships = 20 net.core.optmem_max = 20480 net.core.rmem_default = 129024 net.core.wmem_default = 129024 net.core.rmem_max = 131071 net.core.wmem_max = 131071

编辑： ethtool信息2检测到的以太网卡：

 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Half Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: no

你增加FD限制吗？你可以在这里获得一些信息http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

通过“服务器随机断开用户”，如果你的意思是客户断开而没有预期的FIN，ACK，RST通信，我会首先parsing半双工接口，特别是如果你的开发环境同时具有全双工网卡。在自动协商=打开时，处于半双工状态的eth1接口通常由以下任一情况引起：

交换机和服务器之间的自动协商失败。
禁用了自动协商的交换机，明确设置端口的速度和双工。

在情况2中，我更经常看到这种情况，但那可能是因为我已经有十多年了，因为我明知自己的谈判失败了。以太网自动协商行为，一方是自动，另一方是硬编码（或无法响应）是自动端进入半双工模式。

简单地说，Eth1处于半双工模式会导致服务器仅通过接口发送或接收数据，而不是发送和接收数据。硬编码端仍将处于全双工模式，并尝试从服务器接收数据时向服务器发送数据。然而，服务器会认为这是一个冲突，因为它假定一个冲突域，其中全双工消除冲突域。服务器将使用退避algorithm来安排重传。如果服务器继续经历它认为是冲突的事件，服务器将不断增加等待重新传输数据的时间。

因此，使用全双工半双工伙伴可能会导致客户端断开连接，吞吐量或性能问题，延迟峰值以及其他各种问题。