在日志中奇怪的“GET / api / levels /”和“GET / play /”请求

我已经设置了新的Amazon EC2实例。在一两天内，大约10秒内（例如66.249.76.84,66.249.74.152）就开始从“类似谷歌机器的”IP（例如66.249.76.84,66.249.74.152）中获得奇怪的“GET”请求：

66.249.74.152 - - [10/Apr/2013:06:05:02 +0000] "GET /play/gp4GbjXBD4B3?sh=04f2fd19ae2dd623e7135d29a1894f03&sh=f172a32c89190e28f9c27123d7c6cf43&sh=04f2fd19ae2dd623e7135d29a1894f03 HTTP/1.1" 404 295 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.76.84 - - [11/Apr/2013:03:51:44 +0000] "GET /api/levels/2ry7ZAh0Y91r HTTP/1.1" 404 295 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

他们正在检查文件夹中的散列

 /play/'some_hash_here' /profile/'some_hash_here' /level/'some_hash_here' /api/'some_hash_here'

我从来没有这个网站上的文件夹。但是，要做到这一点，我试图阻止他们在robots.txt

 User-agent: * Disallow: Crawl-delay: 120 Disallow: /play Disallow: /profile Disallow: /level

但它根本没有帮助，它只是不读robots.txt。为了摆脱他们在我的error_log文件中提供的所有混乱，我已经在.htaccess文件中创build了这样的规则

 Redirect 301 /play 'some_other_site' Redirect 301 /level 'some_other_site' Redirect 301 /profile 'some_other_site' Redirect 301 /api 'some_other_site'

而且，我发现了爬行我的网站的真正的谷歌机器人的一些痕迹，它的行为是非常正常的：它只请求在我的网站页面上有链接的页面。我怎样才能摆脱这样的欺诈扫描？

这些IP是Google IP，所以很有可能是合法的GoogleBot点击。

我不担心他们。他们不太可能是黑客的企图。相反，最有可能的情况是您的服务器的IP先前是拥有这些URL的另一个网站的IP。这在Amazon EC2上相当常见，因为它们的IP地址是浮动的。

好。我不知道它是什么，我不知道它是什么意思，但我想我在fail2ban软件包的基础上find了一个解决scheme。