nginx – 多久访问特定的文件或path?

我正在寻找一个命令/行来获取信息从Web服务器访问特定文件/path的频率。 (来源:nignx的默认访问日志)

它应该检查所有日志(当前和压缩的日志),并从特定文件/path的日志文件中返回一个或多个entrys。

原因:我想从他的死文件中清除旧的商业网站空间。 许多文件被用于多年前用于外部使用。 (如通讯,列表)。 其他似乎是重复的,只能用于老pipe理员的testing目的。


附加信息:

操作系统:Debian Jessie(x64)

服务器:nginx / 1.6.2

位置:/ var / logs / nginx /

压缩日志文件:gzip

文件:

2825674 | myDomainName_access.log 3895051 | myDomainName_access.log.1 106353 | myDomainName_access.log.2.gz 244729 | myDomainName_access.log.3.gz 143118 | myDomainName_access.log.4.gz 55763 | myDomainName_access.log.5.gz 

示例INPUT

你必须进入你的domain.tld的根目录并input以下命令:

(只是一个非常简单的例子)

 user@host:/var/www/domain.tld# filesInLogCheck /var/logs/nginx/domain-access.* subfolder/index.php 

OUTPUT

 xxxx - - [07/Mar/2016:10:13:29 +0100] "/subfolder/handle.php HTTP/1.1" 200 22 "https://domain.tld/subfolder/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0" xxxx - - [07/Mar/2016:10:16:37 +0100] "/subfolder/handle.php HTTP/1.1" 200 104 "https://domain.tld/subfolder/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0" xxxx - - [07/Mar/2016:10:21:39 +0100] "GET /subfolder/ HTTP/1.1" 200 12589 "https://domain.tld/subfolder/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0" xxxx - - [11/Mar/2016:11:18:36 +0100] "/subfolder/handle.php HTTP/1.1" 200 1206 "https://domain.tld/subfolder/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:19:05 +0100] "/subfolder/handle.php HTTP/1.1" 200 129 "https://domain.tld/subfolder/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:19:49 +0100] "/subfolder/handle.php HTTP/1.1" 200 120 "https://domain.tld/subfolder/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:22:09 +0100] "GET /subfolder/ HTTP/1.1" 200 16008 "https://domain.tld/subfolder/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:27:49 +0100] "/subfolder/handle.php HTTP/1.1" 200 468 "https://domain.tld/subfolder/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:28:03 +0100] "GET /subfolder/ HTTP/1.1" 200 16007 "https://domain.tld/subfolder/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" xxxx - - [11/Mar/2016:11:28:24 +0100] "/subfolder/handle.php HTTP/1.1" 200 468 "https://domain.tld/subfolder/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0" 

或清洁输出:

 [07/Mar/2016:10:13:29 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/index.php" [07/Mar/2016:10:16:37 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/index.php" [07/Mar/2016:10:21:39 +0100] | "GET /subfolder/" | "https://domain.tld/subfolder/index.php" [11/Mar/2016:11:18:36 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/" [11/Mar/2016:11:19:05 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/" [11/Mar/2016:11:19:49 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/" [11/Mar/2016:11:22:09 +0100] | "GET /subfolder/" | "https://domain.tld/subfolder/" [11/Mar/2016:11:27:49 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/index.php" [11/Mar/2016:11:28:03 +0100] | "GET /subfolder/" | "https://domain.tld/subfolder/index.php" [11/Mar/2016:11:28:24 +0100] | "/subfolder/handle.php" | "https://domain.tld/subfolder/" 

如果我正确地理解你的命令可能是

 $ grep GET access.log | awk '{print $7}' | cut -d '?' -f 1 | sort | uniq -c | sort -r -n -k 1 | head -10 114179 /bitrix/spread.php 13208 /bitrix/tools/public_session.php 11945 / 4393 /accessories/cases/ 2268 /search/ 2079 /ajax/actions.php 1951 /shop/ 1591 /search 1388 /apple-watch/ 1267 /apple-iphone/iphone-6s/ 

该命令将显示前10个最常访问的链接。 如果你真的需要所有的链接,只需删除“头-10”。

对于gz文件,你可以使用下面的一个

 $ zcat access.log.gz | grep GET | awk '{print $7}' | cut -d '?' -f 1 | sort | uniq -c | sort -r -n -k 1 | head -10 

没有一个命令可以完成你的任务

你错了,一行脚本。 bash中的pipe道function非常强大;)

 # zcat -f -- /var/log/httpd/* | grep GET | awk '{print $7}' | cut -d '?' -f 1 | sort | uniq -c | sort -r -n -k 1 | head -10 | awk '{SUM+=$1;print $0} END{print "Total hits: "SUM}' 15249 /sites/all/modules/lightbox2/js/lightbox.js 173 /scripts/template/ 128 /libs/bundler.php 125 /libs/jquery.min.js 60 /vSample Total hits: 15735 

更通用的脚本

 #!/bin/bash readonly LOG_DIR='/var/log/nginx' readonly TOPS=5 readonly METHOD='GET|POST' /bin/zcat -f -- ${LOG_DIR}/* | grep -E "${METHOD}" | awk '{print $7}' | cut -d '?' -f 1 | sort | uniq -c | sort -r -n -k 1 | head -${TOPS} | awk '{SUM+=$1;print $0} END{print "Total hits: "SUM}' 

testing结果

 # ./tops.sh 15249 /sites/all/modules/lightbox2/js/lightbox.js 173 /scripts/template/ 128 /libs/bundler.php 125 /libs/jquery.min.js 60 /vSample Total hits: 15735