AWS EC2 DNSparsing诊断

我使用安装了amazon linux的EC2实例(使用来自DHCP的amazon dns服务器设置)以及RDS数据库。 EC2实例在ELB后面并且获得高stream量。 我使用的应用程序是用PHP编码的。

问题是当PHP尝试连接到RDS数据库时,有时会返回以下错误:

PHP Warning: mysqli_connect(): (HY000/2005): Unknown MySQL server host ... 

它不会发生很多,但有时会变得更糟。 这条消息我得到了成千上万的错误事件。

有没有任何build议来诊断问题? 我正在考虑将所有DNSstream量转储到一个文件并检查它,但服务器获得真正的高stream量,所以很难从该文件中进行跟踪。

 Ip: 197171459 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 197171458 incoming packets delivered 175015443 requests sent out Icmp: 12528 ICMP messages received 0 input ICMP message failed. ICMP input histogram: destination unreachable: 188 echo requests: 12340 12559 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 219 echo replies: 12340 IcmpMsg: InType3: 188 InType8: 12340 OutType0: 12340 OutType3: 219 Tcp: 5231380 active connections openings 3978862 passive connection openings 881 failed connection attempts 6420 connection resets received 17 connections established 191630575 segments received 200105352 segments send out 2797151 segments retransmited 0 bad segments received. 6910 resets sent Udp: 5577451 packets received 219 packets to unknown port received. 0 packet receive errors 5577700 packets sent UdpLite: TcpExt: 172 invalid SYN cookies received 808 resets received for embryonic SYN_RECV sockets 7176788 TCP sockets finished time wait in fast timer 507 packets rejects in established connections because of timestamp 448055 delayed acks sent 2927 delayed acks further delayed because of locked socket Quick ack mode was activated 2433 times 94865861 packets directly queued to recvmsg prequeue. 16611185 packets directly received from backlog 54150864749 packets directly received from prequeue 2158966 packets header predicted 79141174 packets header predicted and directly queued to user 40780030 acknowledgments not containing data received 56946553 predicted acknowledgments 84 times recovered from packet loss due to SACK data Detected reordering 4 times using FACK Detected reordering 11 times using SACK Detected reordering 69 times using time stamp 70 congestion windows fully recovered 1241 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 13 2491 congestion windows recovered after partial ack 0 TCP data loss events 220 timeouts after SACK recovery 104 fast retransmits 99 forward retransmits 7 retransmits in slow start 2792531 other TCP timeouts 22 times receiver scheduled too late for direct processing 2423 DSACKs sent for old packets 2785871 DSACKs received 5162 connections reset due to unexpected data 921 connections reset due to early user close 135 connections aborted due to timeout TCPDSACKIgnoredOld: 533 TCPDSACKIgnoredNoUndo: 393 TCPSackShifted: 477 TCPSackMerged: 536 TCPSackShiftFallback: 2709 TCPBacklogDrop: 46 TCPDeferAcceptDrop: 3906058 IpExt: InOctets: 69400712361 OutOctets: 94841399143 

有一个已知的AWS错误导致DNS解决偶尔失败:

https://forums.aws.amazon.com/thread.jspa?messageID=330465#330465

您可能想要使用持续连接进行testing,因为这会降低执行DNSparsing的频率。

本地DNScaching(例如pdns-recursor或dnscache )会减less频率,但是RDS主机名logging的TTL非常短(60秒),因此这意味着问题发生的频率要低得多,但仍然每天发生几次。

你提到高stream量。 我不知道你是否遇到networking问题。 你是否已经监视你的服务器上的SNMP统计信息? 您应该考虑趋势IF-MIB中的一些值:

 IF-MIB::ifInOctets.1 = Counter32: 117194642 IF-MIB::ifInOctets.2 = Counter32: 3406296104 IF-MIB::ifInOctets.3 = Counter32: 754235769 IF-MIB::ifInOctets.4 = Counter32: 0 IF-MIB::ifInUcastPkts.1 = Counter32: 112415844 IF-MIB::ifInUcastPkts.2 = Counter32: 352495427 IF-MIB::ifInUcastPkts.3 = Counter32: 588414566 IF-MIB::ifInUcastPkts.4 = Counter32: 0 IF-MIB::ifInNUcastPkts.1 = Counter32: 0 IF-MIB::ifInNUcastPkts.2 = Counter32: 5038722 IF-MIB::ifInNUcastPkts.3 = Counter32: 4835908 IF-MIB::ifInNUcastPkts.4 = Counter32: 0 IF-MIB::ifInDiscards.1 = Counter32: 0 IF-MIB::ifInDiscards.2 = Counter32: 0 IF-MIB::ifInDiscards.3 = Counter32: 0 IF-MIB::ifInDiscards.4 = Counter32: 0 IF-MIB::ifInErrors.1 = Counter32: 0 IF-MIB::ifInErrors.2 = Counter32: 0 IF-MIB::ifInErrors.3 = Counter32: 0 IF-MIB::ifInErrors.4 = Counter32: 0 

有关更多信息:

http://www.oidview.com/mibs/0/IF-MIB.html

您也可以通过以下方式查看一些networking统计数据:

 # netstat -s 

通常,虽然我认为在引用生产中的其他服务器时,在configuration文件中使用IP是一个更好的select。