在分组后拆分一个 XML 文件并使用 Powershell 按项目编号排序

Split a XML-file after group and sort by itemnumber with Powershell

我目前正在尝试拆分一个 XML 文件,其中包含许多具有已定义项目编号的对象。

XML-文件大致如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<Orders>
<ShopOrder>
  <OrderHead>
    <OrderNo>F10068</OrderNo>
    <OrderDate>20181003</OrderDate>
    <CustomerNo>200078</CustomerNo>
  </OrderHead>
  <Order>
    <ItemNo>F10029</ItemNo>
  </Order>
</ShopOrder>
<ShopOrder>
  <OrderHead>
    <OrderNo>F10069</OrderNo>
    <OrderDate>20181004</OrderDate>
    <CustomerNo>200078</CustomerNo>
  </OrderHead>
  <Order>
    <ItemNo>F10078</ItemNo>
  </Order>
</ShopOrder>
<ShopOrder>
  <OrderHead>
    <OrderNo>F10070</OrderNo>
    <OrderDate>20181004</OrderDate>
    <CustomerNo>200089</CustomerNo>
  </OrderHead>
  <Order>
    <ItemNo>F10029</ItemNo>
  </Order>
</ShopOrder>
</Orders>

...

我现在想将 XML 文件拆分为几个 XML 文件,这些文件按 ItemNo 分组以供进一步分析。例如,新的 XML 文件应该用 ItemNo F10029.

计算每个 ShopOrder-Object

此外,我想命名新的 XML-以它包含的 ItemNo 命名的文件。

现在我可以将 XML 文件分成几个 XML 文件,但每个 XML 只包含 1 个 ShopOrder。我不知道如何将拆分与对象分组相结合。

有人可以推荐一种在一定条件下将拆分与分组相结合的方法吗? 预先感谢您的帮助!

更新: 这是我当前的代码


$dir = "C:\Users\User1\ShopOrder"
$in_file = "ShopOrder1234.xml"

)

$in_path = Join-Path -Path $dir -ChildPath $in_file

$out_file_base = "$($in_file.split(".")[0])_"

$xml_dec_regex = "<\?xml .*"
$blank_regex = "^\s*$"

$file_num = 1
$out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"

$sr = New-Object -TypeName System.IO.StreamReader -ArgumentList $in_path

$length = $sr.BaseStream.Length

Write-Progress -Activity "Splitting File" `
               -Status "File: $file_num" `
               -PercentComplete ($sr.BaseStream.Position / $length * 100)

$sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path

$line = $sr.ReadLine()

While ($line -match $blank_regex -and !$sr.EndOfStream) {

    $line = $sr.ReadLine()

}

$sw.WriteLine($line)

While (!$sr.EndOfStream) {

    $line = $sr.ReadLine()

    While ($line -match $blank_regex) {

        $line = $sr.ReadLine()

    }

    If ($line -match $xml_dec_regex) {
        
        $sw.close()
        $file_num += 1
        $out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"
        $sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path
        Write-Progress -Activity "Splitting File" `
               -Status "File: $file_num" `
               -PercentComplete ($sr.BaseStream.Position / $length * 100)

    }

    $sw.WriteLine($line)

} 

$sr.close()
$sw.close()```

您可以像下面这样操作,但首先您需要确保您有一个有效的 XML:

在这里,我添加了一个根节点<Orders>并更正了结束标记</ItemNo>

<?xml version="1.0" encoding="UTF-8"?>
<Orders>
    <ShopOrder>
      <OrderHead>
        <OrderNo>F10068</OrderNo>
        <OrderDate>20181003</OrderDate>
        <CustomerNo>200078</CustomerNo>
      </OrderHead>
      <Order>
        <ItemNo>F10029</ItemNo>
      </Order>
    </ShopOrder>
    <ShopOrder>
      <OrderHead>
        <OrderNo>F10069</OrderNo>
        <OrderDate>20181004</OrderDate>
        <CustomerNo>200078</CustomerNo>
      </OrderHead>
      <Order>
        <ItemNo>F10078</ItemNo>
      </Order>
    </ShopOrder>
    <ShopOrder>
      <OrderHead>
        <OrderNo>F10070</OrderNo>
        <OrderDate>20181004</OrderDate>
        <CustomerNo>200089</CustomerNo>
      </OrderHead>
      <Order>
        <ItemNo>F10029</ItemNo>
      </Order>
    </ShopOrder>
</Orders>

代码

$pathIn = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
[xml]$xml = Get-Content -Path $pathIn -Raw -Encoding UTF8

# get the root elements name
$rootElement = $xml.DocumentElement.LocalName

# create a template Here-String to be used for each new file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<$rootElement>{0}</$rootElement>
"@

$xml.$rootElement.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
    # string the OuterXml text from all grouped ShopOrder nodes together
    $subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
    # create a new XML document and load the xml text
    $doc = New-Object System.Xml.XmlDocument
    $doc.LoadXml($subXmlText)
    # create a StringWriter and a XmlTextWriter to format the xml propertly
    $stringWriter = New-Object System.IO.StringWriter
    $xmlWriter    = New-Object System.Xml.XmlTextWriter($stringWriter)
    $xmlWriter.Formatting = [System.Xml.Formatting]::Indented
    $doc.WriteContentTo($xmlWriter)
    # write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
    $fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
    Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
    # clean up the used objects
    $xmlWriter.Dispose()
    $stringWriter.Dispose()
    $doc = $null
}

据我了解,你的APIreturns残缺不全XML..
总是最好找出它是否无法更改,因此它提供格式正确的 XML,但如果您没有其他选择然后处理您得到的内容,您可以试试这个:

$pathIn    = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
$apiReturn = Get-Content -Path $pathIn -Raw -Encoding UTF8

# the above gives you crappy text like
# <?xml version="1.0" encoding="UTF-8"?> <ShopOrder><OrderHead><OrderNo>F10068</OrderNo><OrderDate>20181003</OrderDate><CustomerNo>200078</CustomerNo> </OrderHead><Order><ItemNo>F10029</ItemNo></Order></ShopOrder><ShopOrder> <?xml version="1.0" encoding="UTF-8"?><ShopOrder><OrderHead><OrderNo>F10069</OrderNo><OrderDate>20181004</OrderDate> <CustomerNo>200078</CustomerNo></OrderHead><Order><ItemNo>F10078</ItemNo></Order></ShopOrder>

# First create a template Here-String to be used for each new input AND output file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<Orders>{0}</Orders>
"@

# split the text on the xml declaration line(s) and recombine the xmlnodes into one string
$subNodes = (($apiReturn -split '<\?xml[^>]+>').Trim() -ne '') -join ''
# use the template to form a single complete XML from this:
# remove doubled <ShopOrder> opening tags and fit it into the template
[xml]$xml = $xmlTemplate -f ($subNodes -replace '(<ShopOrder>){2,}', '')  

# the next code is really the (almost) unchanged code from my first answer

$xml.Orders.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
    # string the OuterXml text from all grouped ShopOrder nodes together
    $subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
    # create a new XML document and load the xml text
    $doc = New-Object System.Xml.XmlDocument
    $doc.LoadXml($subXmlText)
    # create a StringWriter and a XmlTextWriter to format the xml propertly
    $stringWriter = New-Object System.IO.StringWriter
    $xmlWriter    = New-Object System.Xml.XmlTextWriter($stringWriter)
    $xmlWriter.Formatting = [System.Xml.Formatting]::Indented
    $doc.WriteContentTo($xmlWriter)

    # write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
    $fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
    Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
    # clean up the used objects
    $xmlWriter.Dispose()
    $stringWriter.Dispose()
    $doc = $null
}