linux路由错误？

自从一段时间以来，我一直在努力解决这个不容易重现的问题。我正在使用Linux内核v3.1.0，有时路由到几个IP地址不起作用。看来发生的事情是，内核并不是将数据包发送到网关，而是将目标地址视为本地，并尝试通过ARP获取其MAC地址。

例如，现在我的IP地址是172.16.1.104/24，网关是172.16.1.254：

# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:1B:63:97:FC:DC inet addr:172.16.1.104 Bcast:172.16.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:230772 errors:0 dropped:0 overruns:0 frame:0 TX packets:171013 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:191879370 (182.9 Mb) TX bytes:47173253 (44.9 Mb) Interrupt:17 # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.254 0.0.0.0 UG 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 1 0 0 eth0

我可以ping几个地址，但不能172.16.0.59：

 # ping -c1 172.16.1.254 PING 172.16.1.254 (172.16.1.254) 56(84) bytes of data. 64 bytes from 172.16.1.254: icmp_seq=1 ttl=64 time=0.383 ms --- 172.16.1.254 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.383/0.383/0.383/0.000 ms root@pozsybook:~# ping -c1 172.16.0.1 PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data. 64 bytes from 172.16.0.1: icmp_seq=1 ttl=63 time=5.54 ms --- 172.16.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.545/5.545/5.545/0.000 ms root@pozsybook:~# ping -c1 172.16.0.2 PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data. 64 bytes from 172.16.0.2: icmp_seq=1 ttl=62 time=7.92 ms --- 172.16.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.925/7.925/7.925/0.000 ms root@pozsybook:~# ping -c1 172.16.0.59 PING 172.16.0.59 (172.16.0.59) 56(84) bytes of data. From 172.16.1.104 icmp_seq=1 Destination Host Unreachable --- 172.16.0.59 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

当试图ping 172.16.0.59时，我可以看到在tcpdump发送了ARP请求：

 # tcpdump -n -i eth0|grep ARP tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 15:25:16.671217 ARP, Request who-has 172.16.0.59 tell 172.16.1.104, length 28

和/ proc / net / arp在172.16.0.59中有一个不完整的条目：

 # grep 172.16.0.59 /proc/net/arp 172.16.0.59 0x1 0x0 00:00:00:00:00:00 * eth0

请注意，172.16.0.59可从其他计算机的此局域网访问。

有谁知道发生了什么事？谢谢。

更新：回复下面的评论：

除eth0和lo以外没有其他的接口
ARP请求不能在另一端看到，但这就是应该如何工作的。主要的问题是ARP请求甚至不应该首先发送
问题仍然存在，即使我添加一个明确的路由与命令“route add -host 172.16.0.59 gw 172.16.1.254 dev eth0”

这确实是一个Linux内核错误，可能从版本2.6.39。我已经把这个问题发布到lkml和netdev列表（请参阅https://lkml.org/lkml/2011/11/18/191上的线程），并在http&#xFF1A:// www的另一个netdev线程中讨论.spinics.net /列表/ NETDEV / msg179687.html

目前的解决scheme是重新启动或刷新所有路由，并等待10分钟以使icmpredirect到期。为了防止它再次发生，

 echo 0 >/proc/sys/net/ipv4/conf/eth0/accept_redirects

帮助。

172.16.XX的默认子网掩码为255.255.0.0，您已将其重新configuration为255.255.255.0。所以主机的东西172.16.0.x和172.16.1.x是在不同的子网。因此它会尝试通过默认网关进行路由。

将您的子网掩码改为255.255.0.0即可解决问题。

你能提供一个图表吗？如果你不能画一个networking，它是不能修复的（老的networking工程师谚语…我说的！）。

干杯，