如何刷新用`wget –mirror`创build的在线网站镜像？

一个月前，我使用“ wget –mirror ”来创build我们的公共网站的镜像，以便在即将到来的计划维护窗口中临时使用。我们的主网站运行HTML，PHP和MySQL，但镜像只需要HTML，不需要dynamic内容，PHP或数据库。

以下命令将创build一个简单的我们网站的在线镜像：

wget --mirror http://www.example.org/

请注意， Wget手册中说--mirror “目前相当于-r -N -l inf --no-remove-listing ”（人类可读的等价物是“–recursive –timestamping –level = inf -没有去除上市。

现在是一个月后，大部分的网站内容已经改变。我想要wget检查所有页面，并下载任何已更改的页面。但是，这是行不通的。

我的问题：

我需要做些什么来刷新网站的镜像，删除目录并重新运行镜像？

http://www.example.org/index.html的顶级文件没有改变，但还有很多其他的文件已经改变。

我以为所有我需要做的就是重新运行wget --mirror ，因为--mirror意味着标记 – recursion的“指定recursion下载”和 – --timestamping “不要重新检索文件，除非比本地更新”。我认为这将检查所有的网页，只检索文件，然后我的本地副本。我错了吗？

但是，wget不会在第二次尝试时递解该网站。 'wget –mirror'会检查http://www.example.org/index.html ，注意这个页面没有改变，然后停下来。

 --2010-06-29 10:14:07-- http://www.example.org/ Resolving www.example.org (www.example.org)... 10.10.6.100 Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Server file no newer than local file "www.example.org/index.html" -- not retrieving. Loading robots.txt; please ignore errors. --2010-06-29 10:14:08-- http://www.example.org/robots.txt Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 136 [text/plain] Saving to: “www.example.org/robots.txt” 0K 100% 6.48M=0s 2010-06-29 10:14:08 (6.48 MB/s) - "www.example.org/robots.txt" saved [136/136] --2010-06-29 10:14:08-- http://www.example.org/news/gallery/image-01.gif Reusing existing connection to www.example.org:80. HTTP request sent, awaiting response... 200 OK Length: 40741 (40K) [image/gif] Server file no newer than local file "www.example.org/news/gallery/image-01.gif" -- not retrieving. FINISHED --2010-06-29 10:14:08-- Downloaded: 1 files, 136 in 0s (6.48 MB/s)

以下解决方法似乎现在工作。它强行删除/index.html，这迫使wget再次检查所有子链接。但是，不应该自动检查所有的子链接？

 rm www.example.org/index.html && wget --mirror http://www.example.org/

wget -mirror -w 3 -p -P c：\ wget_files \ example2 ftp：//用户名：[email protected]

这是我如何做到基于Windows的机器http://www.devarticles.com/c/a/Web-Services/Website-Mirroring-With-wget/1/

您可以更改目录结构的path，尝试通过ftp下载所有内容，看看是否有帮助。

我也使用Windows上的另一个实用工具“AllwaySync”高超的作品。

我使用–mirror开关来完成你正在询问的事情，这确实会导致wget只recursion地下载较新的文件。具体来说，我的命令行（消毒）是：

 /usr/bin/wget -v --mirror ftp://user:password@site/ -o /var/log/webmirror -P /var/WebSites

你可以尝试使用：

 wget -r -l inf -N http://www.example.org/