看来以下日志中的引用者是一个文件夹。
112.200.208.5 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 294677 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0" 61.3.158.113 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/lnosKHEN/payroll_system_-_lnoskhen_0.zip HTTP/1.1" 206 10806 "http://www.mysite.com/download-code" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3" 112.200.208.5 - - [29/Jul/2013:20:43:15 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 21465 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0" 112.200.208.5 - - [29/Jul/2013:20:43:16 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 469304 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0" 112.200.208.5 - - [29/Jul/2013:20:43:17 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 238639 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0" 112.200.208.5 - - [29/Jul/2013:20:43:18 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 267724 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0" 39.41.211.234 - - [29/Jul/2013:20:43:22 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 23361 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:23 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 200 632601 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:24 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 285171 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:24 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 138366 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 104108 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 52055 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 63038 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 39.41.211.234 - - [29/Jul/2013:20:43:27 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 32452 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1" 112.200.208.5 - - [29/Jul/2013:20:43:33 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 215059 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
我相信,唯一有效的下载是这条线是:
61.3.158.113 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/lnosKHEN/payroll_system_-_lnoskhen_0.zip HTTP/1.1" 206 10806 "http://www.mysite.com/download-code" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)
因为我设置所有下载来自这个URL:
http://www.mysite.com/download-code
那么,怎么来的推荐人似乎来自一个文件夹?
就像这条线一样:
112.200.208.5 - - [29/Jul/2013:20:43:33 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 215059 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
推荐人是:
http://www.mysite.com/sites/default/files/download/argie/
这一行:
/sites/default/files/download/argie/
是一个文件夹。
即使这是一个networking爬虫,是否有可能访问我的网站上的文件夹?
当我手动键入以下内容:
http://www.mysite.com/sites/default/files/download/argie/
它只会返回一个“找不到页面”。 这就是为什么我想知道如何成为推荐人。
顺便说一句,我正在使用nginx。
你不应该把太多的注意力放在引用者身上。 引荐者可以由客户端设置为任何东西。 这只是请求中的标题。
例如
GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1 Host: www.mysite.com Referer: http://example.org/JUST/SOME/REFERRER
所以我猜测抓取工具只是将path的末尾切断,并将其设置为引用。 我不担心。