通过解释“netstat -s”来分析TCP性能

我在运行debian的专用服务器上执行了netstat -s 。 我想解释结果,因为我遇到了TCP连接问题。 我不知道如何阅读这些结果。 任何人都可以帮忙吗?

背景:这是一个公共的tcp服务器,来自世界各地的客户,大部分都使用3G / UMTSnetworking。 平均打开sockets1小时。 一些TCP链接失去10-60秒,每隔10分钟左右。 我正在运行一个自定义的java程序,这是tcp服务器。

这是netstat -s的输出。 它是否显示任何明显的连接问题?

  Ip: 33780786 total packets received 0 forwarded 0 incoming packets discarded 33780059 incoming packets delivered 33577363 requests sent out 1 outgoing packets dropped 1442 reassemblies required 715 packets reassembled ok Icmp: 4675 ICMP messages received 98 input ICMP message failed. ICMP input histogram: destination unreachable: 2901 timeout in transit: 152 echo requests: 1334 echo replies: 226 2109 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 550 echo request: 225 echo replies: 1334 IcmpMsg: InType0: 226 InType3: 2901 InType8: 1334 InType11: 152 OutType0: 1334 OutType3: 550 OutType8: 225 Tcp: 8752 active connections openings 287296 passive connection openings 58164 failed connection attempts 74065 connection resets received 30 connections established 32997886 segments received 32357425 segments send out 438184 segments retransmited 587 bad segments received. 75868 resets sent Udp: 777245 packets received 550 packets to unknown port received. 0 packet receive errors 779944 packets sent TcpExt: 28674 invalid SYN cookies received 56570 resets received for embryonic SYN_RECV sockets 998 packets pruned from receive queue because of socket buffer overrun 9 ICMP packets dropped because they were out-of-window 27402 packets rejects in established connections because of timestamp 1266543 delayed acks sent 1399 delayed acks further delayed because of locked socket Quick ack mode was activated 143367 times 1556 times the listen queue of a socket overflowed 1556 SYNs to LISTEN sockets dropped 25884635 packets directly queued to recvmsg prequeue. 785180902 bytes directly in process context from backlog 1800599695 bytes directly received in process context from prequeue 2879633 packet headers predicted 7627605 packets header predicted and directly queued to user 3218508 acknowledgments not containing data payload received 14774120 predicted acknowledgments 52 times recovered from packet loss due to fast retransmit 24519 times recovered from packet loss by selective acknowledgements 4 bad SACK blocks received Detected reordering 146 times using FACK Detected reordering 77 times using SACK Detected reordering 2239 times using time stamp 3548 congestion windows fully recovered without slow start 15840 congestion windows partially recovered using Hoe heuristic 8832 congestion windows recovered without slow start by DSACK 127403 congestion windows recovered without slow start after partial ack 12080 TCP data loss events TCPLostRetransmit: 3 179 timeouts after reno fast retransmit 21328 timeouts after SACK recovery 1481 timeouts in loss state 32373 fast retransmits 5349 forward retransmits 26402 retransmits in slow start 230593 other TCP timeouts 4 classic Reno fast retransmits failed 2367 SACK retransmits failed 563 times receiver scheduled too late for direct processing 243774 packets collapsed in receive queue due to low socket buffer 151068 DSACKs sent for old packets 45306 DSACKs sent for out of order packets 238987 DSACKs received 14 DSACKs for out of order packets received 27627 connections reset due to unexpected data 4045 connections reset due to early user close 4992 connections aborted due to timeout IpExt: 

 1 outgoing packets dropped 

几乎没有数据包丢失,这是好的,但我们没有延迟数据。 匆匆一瞥,我会说你正在使用错误的工具来完成这项工作。

是否有涉及数据库? 是否有某种周期性的function在10分钟左右减慢系统? 机器是否仅运行此tcp服务器或正在服务其他资源?

Netstat不是你想要做的一个合适的指标。 为了确保您的networking应用程序按预期运行,您需要具备以下function的基础架构

  • 挂钩到您的应用程序,以确保正确的指标。 你是开发者,所以你可以做到这一点,这将大大减轻你的工作。 挂钩我的意思是设施来获取诊断和性能数据,直接编码到您的应用程序。
  • graphics/监控基础架构。 仙人掌和Nagios是我熟悉的一个例子,但还有更多。
  • 一个计划。 你想达到什么目的? 你想为你的用户提供什么级别的服务? 在开发应用程序时实施诊断和性能指标,如果遇到风,可能会变成一些大事,使其具有可扩展性。 *真的*可扩展。

有些事情要尝试并帮助你理解问题:

  • 你的接收程序如何处理来自networking的连接? 它是multithreading的吗? 它如何处理客户? 是否有超时?
  • 你如何testing服务器代码? 你有没有在本地机器上运行它,并尝试了多less连接,你​​可以得到它? 你有没有testing过会议的影响?
  • 尝试运行“netstat -p”或“lsof -i TCP”,看看发生了什么。 发送队列是什么样的? 运行一个“ps auxwww”,服务器程序的状态是什么?