PowerShell - 查找重复文件并忽略同一压缩文件中的多个文件

PowerShell - Find duplicate files and ignore multiple files in the same compressed file

我得到了这个脚本并对其进行了一些修改(以避免将同一文件提取到一个临时文件中)。 我有两个问题:

  1. 当脚本发现重复时,SourchArchive 总是显示一个文件(而不是 2 个包含相同文件的文件)
  2. 当压缩文件在不同的子文件夹(在同一个 zip 中)中包含 1 个以上的相同文件时,脚本 return 存在重复,这对我不利。如果压缩文件有 3 个相同的文件,则应合并为 1 个文件,然后将其压缩为另一个压缩文件

更新:

主要目标是比较压缩文件,以便在压缩文件中找到重复文件。压缩文件可以是 cab 或 zip(zip 可以包含 dll、xml、msi 等。有时它还包含一个 vip 文件(vip 是一个压缩文件,还包含 dll 等文件)) 在将每个压缩文件压缩到另一个之后,输出应该是包含内部相同文件的压缩文件 将结果与 ---------

分开会很棒

这应该是一个更大的脚本的一部分,如果在超过 1 个压缩文件中有重复文件,该脚本应该停止,因此只有 $MatchedSourceFiles 有结果时脚本才会停止,否则应该继续。希望现在晴朗

Example:
test1.zip contains temp.xml 
test2.zip contains temp.xml

The output should be:
SourceArchive       DuplicateFile
test1.zip           temp.xml
test2.zip           temp.xml
------------------------------
The next duplication files 
------------------------------

Example 2: (multiple identical files in the same compressed file)
test1.zip contains 2 subfolders
test1.zip contains temp.xml under subfolder1 and also temp.xml under subfolder2 

The result should be none
SourceArchive       DuplicateFile

Example 3:
test1.zip same as in example 2 
test3.zip contains temp.xml

The result should be:

SourceArchive       DuplicateFile
    test1.zip           temp.xml
    test3.zip           temp.xml
    ------------------------------
    The next duplication files
    ------------------------------
    The next duplication files
    ------------------------------

添加类型 -AssemblyName System.IO.Compression.FileSystem

$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip', '*.CAB' -File -Recurse

$MatchedSourceFiles = foreach ($file in $compressedfiles) {
    switch ($file.Extension) {
        '.zip' {
            $t = $tempFolder + "\" + $file.Name
            # the destination folder should NOT already exist here
            $null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $t )
            try {
                Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
                    $null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $t)
                }
            }
            catch {}
        }
        '.cab' {
            # the destination folder MUST exist for expanding .cab files
            $null = New-Item -Path $tempFolder -ItemType Directory -Force
            expand.exe $file.FullName -F:* $tempFolder > $null
        }
    }
    # now see if there are files with duplicate names
    Get-ChildItem -Path $tempFolder -File -Recurse -Exclude vip.manifest, filesSources.txt, *.vip | Group-Object Name | 
    Where-Object { $_.Count -gt 1 } | ForEach-Object { 
        foreach ($item in $_.Group) {
            # output objects to be collected in $MatchedSourceFiles
            [PsCustomObject]@{
                SourceArchive = $file.FullName
                DuplicateFile = '.{0}' -f $item.FullName.Substring($tempFolder.Length)  # relative path
            }
        }
    }
}

# display on screen
$MatchedSourceFiles
$tempFolder | Remove-Item -Force -Recurse

感谢您提供示例。使用这些,我将 previous code 更改为:

Add-Type -AssemblyName System.IO.Compression.FileSystem

$tempFolder      = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip','*.CAB' -File -Recurse

$MatchedSourceFiles = foreach ($file in $compressedfiles) {
    switch ($file.Extension) {
        '.zip' {
            # the destination folder should NOT already exist here
            $null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $tempFolder)
            # prepare a subfolder name for .vip files
            $subTemp = Join-Path -Path $tempFolder -ChildPath ([datetime]::Now.Ticks)
            Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
                $null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $subTemp)
            }
        }
        '.cab' {
            # the destination folder MUST exist for expanding .cab files
            $null = New-Item -Path $tempFolder -ItemType Directory -Force
            expand.exe $file.FullName -F:* $tempFolder > $null
        }
    }
    # output objects for each unique file name in the extracted folder to collect in $MatchedSourceFiles
    Get-ChildItem -Path $tempFolder -File -Recurse | 
        Select-Object @{Name = 'SourceArchive'; Expression = {$file.FullName}},
                      @{Name = 'FileName'; Expression = {$_.Name}} -Unique

    # delete the temporary folder
    $tempFolder | Remove-Item -Force -Recurse
}

# at this point $MatchedSourceFiles contains all (unique) filenames from all .zip and/or .cab files

# now see if there are files with duplicate names between the archive files
$result = $MatchedSourceFiles | Group-Object FileName | Where-Object { $_.Count -gt 1 } | ForEach-Object {$_.Group}

# display on screen
$result

# save as CSV file
$result | Export-Csv -Path 'X:\DuplicateFiles.csv' -UseCulture -NoTypeInformation

输出将是:

示例 1:

SourceArchive      FileName
-------------      --------
C:\Intel\test1.zip temp.xml
C:\Intel\test2.zip temp.xml

示例 2:

无输出

示例 3:

SourceArchive      FileName
-------------      --------
C:\Intel\test1.zip temp.xml
C:\Intel\test3.zip temp.xml