比方说,我有一个文件有很多url,我想要使用任意数量的进程并行下载它们。 我怎样才能用bash做到这一点?
看看man xargs :
-P max-procs --max-procs=max-procsRun up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time.
解:
xargs -P 20 -n 1 wget -nv <urs.txt
如果你只是想抓住每个url(不pipe数字),那么答案很简单:
#!/bin/bash URL_LIST="http://url1/ http://url2/" for url in $URL_LIST ; do wget ${url} & >/dev/null done
如果你只想创build一个有限数量的拉,说10,那么你会做这样的事情:
#!/bin/bash URL_LIST="http://url1/ http://url2/" function download() { touch /tmp/dl-${1}.lck wget ${url} >/dev/null rm -f /tmp/dl-${1}.lck } for url in $URL_LIST ; do while [ 1 ] ; do iter=0 while [ $iter -lt 10 ] ; do if [ ! -f /tmp/dl-${iter}.lck ] ; then download $iter & break 2 fi let iter++ done sleep 10s done done
请注意,我没有真正testing过它,但在15分钟内就把它弄坏了。 但你应该得到一个大概的想法。
你可以使用类似puf这样的东西,或者可以使用wget / curl / lynx和GNU并行 。
http://puf.sourceforge.net/ puf做这个“为了生活”,并有一个完整的过程很好的运行状态。
I do stuff like this a lot. I suggest two scripts. the parent only determines the appropriate loading factors and launches a new child when there is 1. more work to do 2. not past some various limits of loadavg or bandwidth # my pref lang is tcsh so, this is just a rough approximation # I think with just a few debug runs, this could work fine. # presumes a file with one url to download per line # NUMPARALLEL=4 # controls how many at once #^tune above number to control CPU and bandwidth load, you # will not finish fastest by doing 100 at once. # Wed Mar 16 08:35:30 PDT 2011 , dianevm at gmail while : ; do WORKLEFT=`wc -l < $WORKFILE` if [ WORKLEFT -eq 0 ]; echo finished |write sysadmin echo finished |Mail sysadmin exit 0 fi NUMWORKERS=`ps auxwwf|grep WORKER|grep -v grep|wc -l` if [ $NUMWORKERS -lt $NUMPARALLEL]; then # time to fire off another 1 set WORKTODO=`head -1 $WORKFILE` WORKER $WORKTODO & # worker could just be wget "$1", ncftp, curl tail -n +2 $WORKFILE >TMP SECEPOCH=`date +%s` mv $WORKFILE $WORKFILE.$SECSEPOCH mv TMP $WORKFILE else # we have NUMWORKERS or more running. sleep 5 # suggest this time be close to ~ 1/4 of script run time fi done