PowerShell - 查找重复文件并忽略同一压缩文件中的多个文件
PowerShell - Find duplicate files and ignore multiple files in the same compressed file
我得到了这个脚本并对其进行了一些修改(以避免将同一文件提取到一个临时文件中)。
我有两个问题:
- 当脚本发现重复时,SourchArchive 总是显示一个文件(而不是 2 个包含相同文件的文件)
- 当压缩文件在不同的子文件夹(在同一个 zip 中)中包含 1 个以上的相同文件时,脚本 return 存在重复,这对我不利。如果压缩文件有 3 个相同的文件,则应合并为 1 个文件,然后将其压缩为另一个压缩文件
更新:
主要目标是比较压缩文件,以便在压缩文件中找到重复文件。压缩文件可以是 cab 或 zip(zip 可以包含 dll、xml、msi 等。有时它还包含一个 vip 文件(vip 是一个压缩文件,还包含 dll 等文件))
在将每个压缩文件压缩到另一个之后,输出应该是包含内部相同文件的压缩文件
将结果与 ---------
分开会很棒
这应该是一个更大的脚本的一部分,如果在超过 1 个压缩文件中有重复文件,该脚本应该停止,因此只有 $MatchedSourceFiles 有结果时脚本才会停止,否则应该继续。希望现在晴朗
Example:
test1.zip contains temp.xml
test2.zip contains temp.xml
The output should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test2.zip temp.xml
------------------------------
The next duplication files
------------------------------
Example 2: (multiple identical files in the same compressed file)
test1.zip contains 2 subfolders
test1.zip contains temp.xml under subfolder1 and also temp.xml under subfolder2
The result should be none
SourceArchive DuplicateFile
Example 3:
test1.zip same as in example 2
test3.zip contains temp.xml
The result should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test3.zip temp.xml
------------------------------
The next duplication files
------------------------------
The next duplication files
------------------------------
添加类型 -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip', '*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
$t = $tempFolder + "\" + $file.Name
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $t )
try {
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $t)
}
}
catch {}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# now see if there are files with duplicate names
Get-ChildItem -Path $tempFolder -File -Recurse -Exclude vip.manifest, filesSources.txt, *.vip | Group-Object Name |
Where-Object { $_.Count -gt 1 } | ForEach-Object {
foreach ($item in $_.Group) {
# output objects to be collected in $MatchedSourceFiles
[PsCustomObject]@{
SourceArchive = $file.FullName
DuplicateFile = '.{0}' -f $item.FullName.Substring($tempFolder.Length) # relative path
}
}
}
}
# display on screen
$MatchedSourceFiles
$tempFolder | Remove-Item -Force -Recurse
感谢您提供示例。使用这些,我将 previous code 更改为:
Add-Type -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip','*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $tempFolder)
# prepare a subfolder name for .vip files
$subTemp = Join-Path -Path $tempFolder -ChildPath ([datetime]::Now.Ticks)
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $subTemp)
}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# output objects for each unique file name in the extracted folder to collect in $MatchedSourceFiles
Get-ChildItem -Path $tempFolder -File -Recurse |
Select-Object @{Name = 'SourceArchive'; Expression = {$file.FullName}},
@{Name = 'FileName'; Expression = {$_.Name}} -Unique
# delete the temporary folder
$tempFolder | Remove-Item -Force -Recurse
}
# at this point $MatchedSourceFiles contains all (unique) filenames from all .zip and/or .cab files
# now see if there are files with duplicate names between the archive files
$result = $MatchedSourceFiles | Group-Object FileName | Where-Object { $_.Count -gt 1 } | ForEach-Object {$_.Group}
# display on screen
$result
# save as CSV file
$result | Export-Csv -Path 'X:\DuplicateFiles.csv' -UseCulture -NoTypeInformation
输出将是:
示例 1:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test2.zip temp.xml
示例 2:
无输出
示例 3:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test3.zip temp.xml
我得到了这个脚本并对其进行了一些修改(以避免将同一文件提取到一个临时文件中)。 我有两个问题:
- 当脚本发现重复时,SourchArchive 总是显示一个文件(而不是 2 个包含相同文件的文件)
- 当压缩文件在不同的子文件夹(在同一个 zip 中)中包含 1 个以上的相同文件时,脚本 return 存在重复,这对我不利。如果压缩文件有 3 个相同的文件,则应合并为 1 个文件,然后将其压缩为另一个压缩文件
更新:
主要目标是比较压缩文件,以便在压缩文件中找到重复文件。压缩文件可以是 cab 或 zip(zip 可以包含 dll、xml、msi 等。有时它还包含一个 vip 文件(vip 是一个压缩文件,还包含 dll 等文件)) 在将每个压缩文件压缩到另一个之后,输出应该是包含内部相同文件的压缩文件 将结果与 ---------
分开会很棒这应该是一个更大的脚本的一部分,如果在超过 1 个压缩文件中有重复文件,该脚本应该停止,因此只有 $MatchedSourceFiles 有结果时脚本才会停止,否则应该继续。希望现在晴朗
Example:
test1.zip contains temp.xml
test2.zip contains temp.xml
The output should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test2.zip temp.xml
------------------------------
The next duplication files
------------------------------
Example 2: (multiple identical files in the same compressed file)
test1.zip contains 2 subfolders
test1.zip contains temp.xml under subfolder1 and also temp.xml under subfolder2
The result should be none
SourceArchive DuplicateFile
Example 3:
test1.zip same as in example 2
test3.zip contains temp.xml
The result should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test3.zip temp.xml
------------------------------
The next duplication files
------------------------------
The next duplication files
------------------------------
添加类型 -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip', '*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
$t = $tempFolder + "\" + $file.Name
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $t )
try {
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $t)
}
}
catch {}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# now see if there are files with duplicate names
Get-ChildItem -Path $tempFolder -File -Recurse -Exclude vip.manifest, filesSources.txt, *.vip | Group-Object Name |
Where-Object { $_.Count -gt 1 } | ForEach-Object {
foreach ($item in $_.Group) {
# output objects to be collected in $MatchedSourceFiles
[PsCustomObject]@{
SourceArchive = $file.FullName
DuplicateFile = '.{0}' -f $item.FullName.Substring($tempFolder.Length) # relative path
}
}
}
}
# display on screen
$MatchedSourceFiles
$tempFolder | Remove-Item -Force -Recurse
感谢您提供示例。使用这些,我将 previous code 更改为:
Add-Type -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip','*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $tempFolder)
# prepare a subfolder name for .vip files
$subTemp = Join-Path -Path $tempFolder -ChildPath ([datetime]::Now.Ticks)
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $subTemp)
}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# output objects for each unique file name in the extracted folder to collect in $MatchedSourceFiles
Get-ChildItem -Path $tempFolder -File -Recurse |
Select-Object @{Name = 'SourceArchive'; Expression = {$file.FullName}},
@{Name = 'FileName'; Expression = {$_.Name}} -Unique
# delete the temporary folder
$tempFolder | Remove-Item -Force -Recurse
}
# at this point $MatchedSourceFiles contains all (unique) filenames from all .zip and/or .cab files
# now see if there are files with duplicate names between the archive files
$result = $MatchedSourceFiles | Group-Object FileName | Where-Object { $_.Count -gt 1 } | ForEach-Object {$_.Group}
# display on screen
$result
# save as CSV file
$result | Export-Csv -Path 'X:\DuplicateFiles.csv' -UseCulture -NoTypeInformation
输出将是:
示例 1:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test2.zip temp.xml
示例 2:
无输出
示例 3:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test3.zip temp.xml