使用all-in-one shell脚本分析squid3 access.log日志文件

我只是要写一个shell脚本,让我知道像SARG这样的基本function。

  • 按照大多数点击的url进行sorting(在10k中10分钟内排名前100)
  • 状态/错误代码总和
  • 以及以很多带宽消耗的URLsorting
  • 还有更多的sortingfunction

不幸的是,我用大多数Bandwith按URLsorting的function有问题。 那里已经有各种各样的尝试,但总是相同的问题:要么不工作,要么加在一起,在第二个spallte总字节…有没有人有任何想法,我怎么能实现最好的?

raw accecc.log(常用样式)

> tail /var/log/squid3/access.log 192.168.1.208 - - [10/Jan/2016:19:01:44 -0100] "CONNECT i.ytimg.com:443 HTTP/1.1" 200 143903 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:02:02 -0100] "CONNECT www.youtube.com:443 HTTP/1.1" 200 87392 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:02:12 -0100] "CONNECT s.ytimg.com:443 HTTP/1.1" 200 32718 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:03:00 -0100] "CONNECT s.youtube.com:443 HTTP/1.1" 200 6376 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:03:39 -0100] "CONNECT r2---sn-h0j7snel.googlevideo.com:443 HTTP/1.1" 200 13740382 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:03:40 -0100] "CONNECT r2---sn-h0j7snel.googlevideo.com:443 HTTP/1.1" 200 18250979 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:06:57 -0100] "CONNECT token.services.mozilla.com:443 HTTP/1.1" 200 4138 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:07:53 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4749 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:41:48 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4118 TCP_MISS:HIER_DIRECT 192.168.1.208 - - [10/Jan/2016:19:51:49 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4118 TCP_MISS:HIER_DIRECT 

试用并保存在临时文件中

猫/tmp/bandwith.tmp

 anonymousstats.keefox.org 5128 anonymousstats.keefox.org 3438 api.accounts.firefox.com:443 5509 api.flattr.com:443 4418 api.flattr.com:443 10397 blocklist.addons.mozilla.org:443 24118 button.flattr.com 4180 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 clients1.google.com 861 cm.g.doubleclick.net 4437 content.googleapis.com:443 4317 content.googleapis.com:443 4914 

希望的forms:

 anonymousstats.keefox.org 8566 api.accounts.firefox.com:443 5509 api.flattr.com:443 14815 blocklist.addons.mozilla.org:443 24118 button.flattr.com:443 4180 clients1.google.com 7749 cm.g.doubleclick.net:443 4437 content.googleapis.com:443 8754 

我的function在这一点上:

 bandwith() { #First Idee: awk '{print $10, $7}' "$LOGDATEI" | grep -vE "(^\"-\"$|/www.$HOST|/$HOST)" | sort | uniq -c | sort -rn | head -$HITS > /tmp/bandwith.tmp cat "$LOGDATEI" | awk '{print $10, $7}' | awk '{ sub(/http\:\/\//, ""); sub(/\//, " " ); print $2, $1 } ' | sort -d | head -$HITS > /tmp/bandwith.tmp 

我试过了:

 while read LINE do cut -d' ' -f2 /tmp/bandwith.tmp { while read NR do x=$(($x+$NR)) echo $x } 

要么

 awk '{sum+=$1}END{print sum}' foo.txt rule1=`head -1 /tmp/bandwith.tmp | awk '{print $1}'` rule2=`head -2 /tmp/bandwith.tmp | awk '{print $1}'` for word in `cat /tmp/bandwith.tmp` cat /tmp/bandwith.tmp | while read line do echo "Processing new line" >/dev/tty $sum = $zeile1 + $zeile2 done } until [ "$rule1" != "$rule2" ] do echo "$1" echo "$2" break echo "Only to test" done done } 

有人为这个问题的一个idee?