(注:我不是networking工程师)我们正在向外部供应商发送文件,并在不同的服务上获取随机超时。 看起来我们在大文件上最经常超时。 我们做了一个数据包捕获,显示我们的窗口缩小,并怀疑在窗口打到0之前,小的有效载荷会使窗口达到0,在那里大的有效载荷给我们一个RST。
11369 > su-mit-tg [ACK] Seq=677231 Ack=253694 Win=32768 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=256614 Win=29848 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=259534 Win=26928 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=262454 Win=24008 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=265374 Win=21088 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=268294 Win=18168 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=271214 Win=15248 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=274134 Win=12328 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=277054 Win=9408 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=279974 Win=6488 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=282894 Win=3568 Len=0 11369 > su-mit-tg [ACK] Seq=677231 Ack=285814 Win=648 Len=0
编辑:我指的是我们从我们的应用程序调用不同的Web服务。 超时并不总是在特定的服务上失败,而是在不同的时间点击所有的服务。 我无法从其他networking发送。
我认为这个问题与IO问题相关或应用程序问题有关,并且由于任何原因套接字缓冲区已经完成了空间
我做了这样的事情来重现在Linux相关的IO问题:
/dev/vdb 2.0G 1.6G 470M 77% /brick1 [root@nod01 ~]# ls -l /dev/vdb brw-rw---- 1 root disk 252, 16 Apr 19 22:46 /dev/vdb echo "252:16 $((1024*250))" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device ## Limit write to 250KB per second cd /brick1 ## change directory for downloading the Centos Iso wget ftp://mirror.fdcservers.net/centos/6.4/isos/x86_64/CentOS-6.4-x86_64-bin-DVD2.iso 00:19:58.992042 IP mirror.50966 > nod01.example.com.46637: Flags [.], ack 1, win 46, options [nop,nop,TS val 2662018758 ecr 5131800], length 0 00:19:58.992107 IP nod01.example.com.46637 > mirror.50966: Flags [.], ack 11256736, win 0, options [nop,nop,TS val 5144749 ecr 2661992655], length 0 ## I'm telling to the sender, please don't send me more data, because my socket buffer is full [root@nod01 ~]# netstat -tunap | grep wget Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 5264896 0 192.168.122.244:46637 208.53.158.34:50966 ESTABLISHED 15574/wget #### note the sender has 5M of data in the doesn't buffer, because it cannot write fast in /brick1 as data arrive tcp 0 0 192.168.122.244:51331 208.53.158.34:21 ESTABLISHED 15574/wget