出于某种原因(在我开始这个项目之前发生的事情) – 我的客户的网站每个单个文件有2个重复。 有效地使网站的大小增加三倍。
这些文件看起来很像这样:
wp-comments-post.php | 3,982 bytes wp-comments-post (john smith's conflicted copy 2012-01-12).php | 3,982 bytes wp-comments-post (JohnSmith's conflicted copy 2012-01-14).php | 3,982 bytes
网站所在的主机无法访问bash或SSH。
在你看来,删除这些重复文件最简单的方法是什么?
编辑:使用ftpfs在本地挂载点挂载远程ftp文件系统,然后使用此处详述的任何其他方法。
如果所有的文件符合这个语法,你可以例如
rbos@chili:~/tmp$ touch asdf.php rbos@chili:~/tmp$ touch "asdf (blah blah blah).php" rbos@chili:~/tmp$ touch "asdf (blah blah rawr).php" rbos@chili:~/tmp$ find | grep "(.*)" ./asdf (blah blah rawr).php ./asdf (blah blah blah).php
来匹配这些文件,然后将其转换为xargs或循环来检查列表:
find | grep "(.*)" | while read i; do echo "$i";done | less
然后用rmreplaceecho ,一旦你确信列表是准确的。
我使用WinSCP .NET程序集在PowerShell中编写了一个重复的查找程序脚本。
该脚本首先迭代远程目录树并查找具有相同大小的文件。 当它find任何,它默认情况下下载文件,并在本地进行比较。
如果您知道服务器支持用于计算校验和的协议扩展 ,则可以通过添加-remoteChecksumAlg开关来提高脚本效率,以使脚本向服务器请求校验和,从而节省文件下载时间。
powershell.exe -File find_duplicates.ps1 -sessionUrl ftp://user:[email protected]/ -remotePath /path
脚本是:
param ( # Use Generate URL function to obtain a value for -sessionUrl parameter. $sessionUrl = "sftp://user:mypassword;[email protected]/", [Parameter(Mandatory)] $remotePath, $remoteChecksumAlg = $Null ) function FileChecksum ($remotePath) { if (!($checksums.ContainsKey($remotePath))) { if ($remoteChecksumAlg -eq $Null) { Write-Host "Downloading file $remotePath..." # Download file $localPath = [System.IO.Path]::GetTempFileName() $transferResult = $session.GetFiles($remotePath, $localPath) if ($transferResult.IsSuccess) { $stream = [System.IO.File]::OpenRead($localPath) $checksum = [BitConverter]::ToString($sha1.ComputeHash($stream)) $stream.Dispose() Write-Host "Downloaded file $remotePath checksum is $checksum" Remove-Item $localPath } else { Write-Host ("Error downloading file ${remotePath}: " + $transferResult.Failures[0]) $checksum = $False } } else { Write-Host "Request checksum for file $remotePath..." $buf = $session.CalculateFileChecksum($remoteChecksumAlg, $remotePath) $checksum = [BitConverter]::ToString($buf) Write-Host "File $remotePath checksum is $checksum" } $checksums[$remotePath] = $checksum } return $checksums[$remotePath] } function FindDuplicatesInDirectory ($remotePath) { Write-Host "Finding duplicates in directory $remotePath ..." try { $directoryInfo = $session.ListDirectory($remotePath) foreach ($fileInfo in $directoryInfo.Files) { $remoteFilePath = ($remotePath + "/" + $fileInfo.Name) if ($fileInfo.IsDirectory) { # Skip references to current and parent directories if (($fileInfo.Name -ne ".") -and ($fileInfo.Name -ne "..")) { # Recurse into subdirectories FindDuplicatesInDirectory $remoteFilePath } } else { Write-Host ("Found file $($fileInfo.FullName) " + "with size $($fileInfo.Length)") if ($sizes.ContainsKey($fileInfo.Length)) { $checksum = FileChecksum($remoteFilePath) foreach ($otherFilePath in $sizes[$fileInfo.Length]) { $otherChecksum = FileChecksum($otherFilePath) if ($checksum -eq $otherChecksum) { Write-Host ("Checksums of files $remoteFilePath and " + "$otherFilePath are identical") $duplicates[$remoteFilePath] = $otherFilePath } } } else { $sizes[$fileInfo.Length] = @() } $sizes[$fileInfo.Length] += $remoteFilePath } } } catch [Exception] { Write-Host "Error processing directory ${remotePath}: $($_.Exception.Message)" } } try { # Load WinSCP .NET assembly Add-Type -Path "WinSCPnet.dll" # Setup session options from URL $sessionOptions = New-Object WinSCP.SessionOptions $sessionOptions.ParseUrl($sessionUrl) $session = New-Object WinSCP.Session $session.SessionLogPath = "session.log" try { # Connect $session.Open($sessionOptions) $sizes = @{} $checksums = @{} $duplicates = @{} $sha1 = [System.Security.Cryptography.SHA1]::Create() # Start recursion FindDuplicatesInDirectory $remotePath } finally { # Disconnect, clean up $session.Dispose() } # Print results Write-Host if ($duplicates.Count -gt 0) { Write-Host "Duplicates found:" foreach ($path1 in $duplicates.Keys) { Write-Host "$path1 <=> $($duplicates[$path1])" } } else { Write-Host "No duplicates found." } exit 0 } catch [Exception] { Write-Host "Error: $($_.Exception.Message)" exit 1 }
可作为WinSCP扩展的最新和增强版脚本在SFTP / FTP服务器中查找重复的文件 。
(我是WinSCP的作者)
您可以使用FSlint来查找重复的文件。
运行这个: find /yourdir -name "*conflicted copy*" -type f -ls
如果列出的文件是要删除的文件,请将-ls为-delete并再次运行。
我build议先备份你的基本目录,然后再进行tar操作。
编辑:我只是意识到你没有进入一个shell会话,所以这不会为你工作…
您可能需要这样的东西: http : //www.go4expert.com/forums/showthread.php? t = 2348recursion地转储文件的列表,然后创build另一个脚本,删除只有你想要的。
FTP到服务器和rm文件。