奇怪的AWS EC2可访问性问题 – Ubuntu,nginx,iPad

我不会说我是一个服务器pipe理的新手,但显然我错过了这里的一些关键时刻…

问题:从某个特定设备(运行Safari的Apple iPad(版本8.4.1(12H321)型号:MD515HC / A)访问网站时,连接到服务器的连接丢失(如通过服务器上的防火墙locking)

一段时间的ipad不活动后,连接回来。

如果在locking之前有一个到服务器的主动SSH连接 – 连接保持正常,但是无法build立到服务器的新连接(就好像所有的端口都closures了一样)。

Iptables的input/输出策略设置为ACCEPT。 Amazon EC2已将我的IP地址设置为允许所有stream量。

# iptables -L -n Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain f2b-sshd (0 references) target prot opt source destination 

常规的日志文件显示绝对没有相关信息。

 # apparmor_status apparmor module is loaded. 1 profiles are loaded. 1 profiles are in enforce mode. docker-default 0 profiles are in complain mode. 0 processes have profiles defined. 0 processes are in enforce mode. 0 processes are in complain mode. 0 processes are unconfined but have a profile defined. # cat /etc/selinux/config SELINUX=permissive SELINUXTYPE=targeted SETLOCALDEFS=0 

使用php 5.6和7.0运行nginx 1.12.1的Webserver从nginx从1.10更新到1.12.1 – 问题依然存在。

我怀疑问题是直接连接到nginx,而不是如何使用系统资源。

实例types目前是Amazon EC2 – t2.micro,但同样的问题仍然存在于c4.8xlarge

当从iPad访问网页时,nginx strace中没有任何显而易见的东西。

连接后立即挂起 – 发射器端的Wireshark输出:

 13713 1413.319083 192.168.8.100 52.57.147.216 TCP 66 54046 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1 ... 13750 1422.319314 192.168.8.100 52.57.147.216 TCP 66 [TCP Retransmission] 54046 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1 

连接好的时候服务器上的tcpdump:

 17:10:53.188562 IP (tos 0x0, ttl 111, id 11792, offset 0, flags [DF], proto TCP (6), length 40) XXX.XXX.XXX.XXX.55020 > 172.31.12.47.80: Flags [F.], cksum 0x3882 (correct), seq 2232, ack 1211, win 255, length 0 17:10:53.188741 IP (tos 0x0, ttl 111, id 11793, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [S], cksum 0x111a (correct), seq 2503140615, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0 17:10:53.249513 IP (tos 0x0, ttl 111, id 11794, offset 0, flags [DF], proto TCP (6), length 40) XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [.], cksum 0x984a (correct), seq 2503140616, ack 1871922116, win 260, length 0 17:10:53.252631 IP (tos 0x0, ttl 111, id 11795, offset 0, flags [none], proto TCP (6), length 784) XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [P.], cksum 0x88f8 (correct), seq 0:744, ack 1, win 260, length 744: HTTP, length: 744 GET /wtf/2.htm HTTP/1.1 Host: www....lv Connection: keep-alive Pragma: no-cache Cache-Control: no-cache Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8 Cookie: PHPSESSID=0hospgqmearo59saf20cfv7tt3; _hjIncludedInSample=1; _ga=GA1.2.2014778432.1499196989; _gid=GA1.2.833813339.1507378818 x-tele2-subid: XXX.XXX.XXX.XXX 17:10:53.359526 IP (tos 0x0, ttl 111, id 11796, offset 0, flags [DF], proto TCP (6), length 40) XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [.], cksum 0x93d0 (correct), seq 744, ack 404, win 259, length 0 

连接断开后服务器上的tcpdump:

 17:11:19.181562 IP (tos 0x0, ttl 47, id 38570, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [.], cksum 0xf058 (correct), seq 1157, ack 199, win 4129, options [nop,nop,TS val 323793256 ecr 4027542545], length 0 17:11:19.251976 IP (tos 0x0, ttl 47, id 8939, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [.], cksum 0x711b (correct), seq 1158, ack 198, win 4129, options [nop,nop,TS val 323793326 ecr 4027542547], length 0 17:11:20.212575 IP (tos 0x0, ttl 111, id 11804, offset 0, flags [DF], proto TCP (6), length 40) XXX.XXX.XXX.XXX.55058 > 172.31.12.47.80: Flags [F.], cksum 0x3b77 (correct), seq 744, ack 405, win 259, length 0 17:11:20.212839 IP (tos 0x0, ttl 111, id 11805, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.55069 > 172.31.12.47.80: Flags [S], cksum 0xc9cb (correct), seq 4012888626, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0 17:11:20.459739 IP (tos 0x0, ttl 111, id 11806, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.55070 > 172.31.12.47.80: Flags [S], cksum 0xd787 (correct), seq 1916158319, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0 17:11:21.219597 IP (tos 0x0, ttl 47, id 25897, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51272 > 172.31.12.47.80: Flags [.], cksum 0x4702 (correct), seq 2220, ack 2185, win 4096, options [nop,nop,TS val 323795291 ecr 4027543025], length 0 17:11:21.221524 IP (tos 0x0, ttl 47, id 12413, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [.], cksum 0xe66f (correct), seq 1157, ack 200, win 4129, options [nop,nop,TS val 323795291 ecr 4027543046], length 0 17:11:21.221548 IP (tos 0x0, ttl 47, id 40941, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [.], cksum 0x6779 (correct), seq 1158, ack 199, win 4129, options [nop,nop,TS val 323795291 ecr 4027543047], length 0 17:11:22.010619 IP (tos 0x0, ttl 47, id 20698, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51272 > 172.31.12.47.80: Flags [F.], cksum 0x43ff (correct), seq 2220, ack 2185, win 4096, options [nop,nop,TS val 323796061 ecr 4027543025], length 0 17:11:22.010687 IP (tos 0x0, ttl 47, id 21278, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [F.], cksum 0xe36c (correct), seq 1157, ack 200, win 4129, options [nop,nop,TS val 323796061 ecr 4027543046], length 0 17:11:22.010780 IP (tos 0x0, ttl 47, id 37726, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [F.], cksum 0x6477 (correct), seq 1158, ack 199, win 4129, options [nop,nop,TS val 323796060 ecr 4027543047], length 0 17:11:22.391572 IP (tos 0x0, ttl 47, id 30595, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [F.], cksum 0xe208 (correct), seq 1125, ack 200, win 4129, options [nop,nop,TS val 323796449 ecr 4027543046], length 0 17:11:22.462590 IP (tos 0x0, ttl 47, id 9929, offset 0, flags [DF], proto TCP (6), length 40) XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [R], cksum 0xf229 (correct), seq 3704030890, win 0, length 0 17:11:23.201564 IP (tos 0x0, ttl 111, id 11807, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.55069 > 172.31.12.47.80: Flags [S], cksum 0xc9cb (correct), seq 4012888626, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0 17:11:23.459562 IP (tos 0x0, ttl 111, id 11808, offset 0, flags [DF], proto TCP (6), length 52) XXX.XXX.XXX.XXX.55070 > 172.31.12.47.80: Flags [S], cksum 0xd787 (correct), seq 1916158319, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0 

有时在iPad上刷新2或3页会导致连接崩溃。 连接下降约2-5分钟,然后一切恢复正常……直到使用iPad。

任何有关如何追踪这个问题的提示是高度赞赏。 说实话 – 我没有想法…

更新#1

 # sysctl -p net.ipv4.ip_forward = 1 fs.file-max = 65536 net.ipv4.conf.all.rp_filter = 1 net.ipv4.tcp_synack_retries = 2 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_intvl = 15 net.core.rmem_default = 31457280 net.core.rmem_max = 12582912 net.core.wmem_default = 31457280 net.core.wmem_max = 12582912 net.core.somaxconn = 4096 net.core.netdev_max_backlog = 65536 net.core.optmem_max = 25165824 net.ipv4.tcp_mem = 65536 131072 262144 net.ipv4.udp_mem = 65536 131072 262144 net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.udp_rmem_min = 16384 net.ipv4.tcp_wmem = 8192 65536 16777216 net.ipv4.udp_wmem_min = 16384 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 

根据请求的 nginx.conf 更新2

 user www-data; #worker_processes 8; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; multi_accept on; use epoll; } worker_rlimit_nofile 65536; http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; log_format scripts '$document_root$fastcgi_script_name > $request'; access_log /var/log/nginx/access.log main; server_tokens off; sendfile off; tcp_nopush on; tcp_nodelay on; client_max_body_size 400M; client_body_buffer_size 1m; client_header_timeout 15; keepalive_timeout 2 2; # open_file_cache max=10000 inactive=5m; # open_file_cache_valid 2m; # open_file_cache_min_uses 5; # open_file_cache_errors off; send_timeout 15; fastcgi_max_temp_file_size 0; gzip on; gzip_disable "msie6"; gzip_vary on; gzip_proxied any; gzip_comp_level 6; gzip_buffers 16 8k; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascripti application/javascript; server { listen 80 default_server; server_name _; return 444; include /etc/nginx/sites-enabled/*; 

}

虚拟主机configuration(并不重要 – 所有主机都受到影响,当从iPad访问任何虚拟主机时,与服务器的连接被冻结)

 server { listen 80; listen 443 ssl http2; server_name ds.somehost.lv; root "/www/ds.somehost.lv/html/public"; index index.html index.htm index.php; charset utf-8; location / { try_files $uri $uri/ /index.php?$query_string; } location = /favicon.ico { access_log off; log_not_found off; } location = /robots.txt { access_log off; log_not_found off; } error_log /var/log/nginx/ds.somehost.app-error.log error; sendfile off; client_max_body_size 1000m; location ~ \.php$ { fastcgi_split_path_info ^(.+\.php)(/.+)$; fastcgi_pass unix:/var/run/php/php7.0-fpm.sock; fastcgi_index index.php; include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_intercept_errors off; fastcgi_buffer_size 16k; fastcgi_buffers 4 16k; fastcgi_connect_timeout 300; fastcgi_send_timeout 300; fastcgi_read_timeout 300; } location ~ /\.ht { deny all; } } 

access.log:来自Windows机器的请求:

 XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:49 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:52 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:54 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-" 

来自iPad的请求

 XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /login HTTP/1.1" 200 967 "-" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/bootstrap.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/gentelella.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/font-awesome.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" 

在这里,我试图从Windows机器连接到相同的页面(请求超时)

试图从iPad刷新页面 – 请求立即满意

 XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:08 +0300] "GET /login HTTP/1.1" 200 967 "-" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/bootstrap.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/font-awesome.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/gentelella.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-" 

错误日志(syslog,kernel.log,nginx错误日志)中没有logging错误。

更新3原来,新的连接被严格封锁了60秒。

事实certificate,这个问题不得不被减less到一个低层的networking,在这个networking上发送一个SYN数据包并且没有提供响应。

这里的链接为什么服务器不发送SYN / ACK数据包来响应一个SYN数据包指向正确的方向。

通过closuressysctl中的tcp_timestamps,我设法绕过了最初描述的问题。 但这种行为的真正原因是由于某种原因启用的tcp_tw_recycle设置!

 tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4) Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended for devices communicating with the general Internet or using NAT (Network Address Translation). Since some NAT gateways pass through IP timestamp values, one IP can appear to have non-increasing timestamps. See RFC 1323 (PAWS), RFC 6191. 

这是一个伟大的写作,使其坚持。 https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux

我现在感觉有点傻了吗? 是

解除内核设置? 非也!