从日志文件获取数据

我想从下面的日志条目中获取内存使用情况。 这是url200之后的数字。 我想首先获得最高内存使用量的列表,比如前十名。我想我会使用grep,对吧?

178.0.140.206 - - [05/Nov/2010:16:46:09 -0400] "GET /image/promo/terran-88x31.jpg HTTP/1.1" 200 15227 0 - 79.66.101.95 - - [05/Nov/2010:16:46:09 -0400] "GET /strategy/article/view/?id=608 HTTP/1.1" 200 8456 0 4980736 79.66.101.95 - - [05/Nov/2010:16:46:10 -0400] "GET /lib/php/min/?f=lib/css/yui/2.7.0.css,lib/css/base.css,lib/css/ux/rating.css,lib/css/page/strategy.css,lib/css/page/article.css,lib/css/page/strategy/article.css HTTP/1.1" 200 8118 0 1835008 79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/logo-text.png HTTP/1.1" 200 9444 0 - 79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/s.gif HTTP/1.1" 200 43 0 - 79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/logo.png HTTP/1.1" 200 17722 0 - 79.66.101.95 - - [05/Nov/2010:16:46:13 -0400] "GET /lib/php/min/?f=lib/js/ext/3.0-core.js,lib/js/global.js,lib/js/ext/ux/rating.js,lib/js/page/article.js HTTP/1.1" 200 32919 0 1310720 79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/body-bg.png HTTP/1.1" 200 467 0 - 79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/foot-bg.png HTTP/1.1" 200 119 0 - 79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/search-bg-sprite.png HTTP/1.1" 200 280 0 - 190.213.177.71 - - [05/Nov/2010:16:46:16 -0400] "GET /images/banner/dark-templar_firefox.gif HTTP/1.1" 404 2827 0 1572864 

假设你想要访问URL(你可以根据需要调整awk打印语句以获得更多的字段):

 awk '{ print $10,$7 }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10 

仅用于特定的HTTP代码(在本例中为200):

 awk '{ if($9=="200") {print $10,$7} }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10 

或者使用正则expression式来检查多个错误代码:

 awk '{ if($9~"^200|403|404$") {print $10,$7} }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10 

如果这是您计划重复运行的事情,请考虑查看CustomLog。

我知道这些数字反映了返回的内容大小。 无论如何,你可以使用这个commnad得到所需的列(在200之后):

 grep "1.1\" 200 " logfile | awk {'print $10'} | sort -nr | head -n 10