在分组后拆分一个 XML 文件并使用 Powershell 按项目编号排序
Split a XML-file after group and sort by itemnumber with Powershell
我目前正在尝试拆分一个 XML 文件,其中包含许多具有已定义项目编号的对象。
XML-文件大致如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<Orders>
<ShopOrder>
<OrderHead>
<OrderNo>F10068</OrderNo>
<OrderDate>20181003</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10069</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10078</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10070</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200089</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
</Orders>
...
我现在想将 XML 文件拆分为几个 XML 文件,这些文件按 ItemNo 分组以供进一步分析。例如,新的 XML 文件应该用 ItemNo F10029.
计算每个 ShopOrder-Object
此外,我想命名新的 XML-以它包含的 ItemNo 命名的文件。
现在我可以将 XML 文件分成几个 XML 文件,但每个 XML 只包含 1 个 ShopOrder。我不知道如何将拆分与对象分组相结合。
有人可以推荐一种在一定条件下将拆分与分组相结合的方法吗?
预先感谢您的帮助!
更新:
这是我当前的代码
$dir = "C:\Users\User1\ShopOrder"
$in_file = "ShopOrder1234.xml"
)
$in_path = Join-Path -Path $dir -ChildPath $in_file
$out_file_base = "$($in_file.split(".")[0])_"
$xml_dec_regex = "<\?xml .*"
$blank_regex = "^\s*$"
$file_num = 1
$out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"
$sr = New-Object -TypeName System.IO.StreamReader -ArgumentList $in_path
$length = $sr.BaseStream.Length
Write-Progress -Activity "Splitting File" `
-Status "File: $file_num" `
-PercentComplete ($sr.BaseStream.Position / $length * 100)
$sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path
$line = $sr.ReadLine()
While ($line -match $blank_regex -and !$sr.EndOfStream) {
$line = $sr.ReadLine()
}
$sw.WriteLine($line)
While (!$sr.EndOfStream) {
$line = $sr.ReadLine()
While ($line -match $blank_regex) {
$line = $sr.ReadLine()
}
If ($line -match $xml_dec_regex) {
$sw.close()
$file_num += 1
$out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"
$sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path
Write-Progress -Activity "Splitting File" `
-Status "File: $file_num" `
-PercentComplete ($sr.BaseStream.Position / $length * 100)
}
$sw.WriteLine($line)
}
$sr.close()
$sw.close()```
您可以像下面这样操作,但首先您需要确保您有一个有效的 XML:
在这里,我添加了一个根节点<Orders>
并更正了结束标记</ItemNo>
<?xml version="1.0" encoding="UTF-8"?>
<Orders>
<ShopOrder>
<OrderHead>
<OrderNo>F10068</OrderNo>
<OrderDate>20181003</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10069</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10078</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10070</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200089</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
</Orders>
代码
$pathIn = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
[xml]$xml = Get-Content -Path $pathIn -Raw -Encoding UTF8
# get the root elements name
$rootElement = $xml.DocumentElement.LocalName
# create a template Here-String to be used for each new file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<$rootElement>{0}</$rootElement>
"@
$xml.$rootElement.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
# string the OuterXml text from all grouped ShopOrder nodes together
$subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
# create a new XML document and load the xml text
$doc = New-Object System.Xml.XmlDocument
$doc.LoadXml($subXmlText)
# create a StringWriter and a XmlTextWriter to format the xml propertly
$stringWriter = New-Object System.IO.StringWriter
$xmlWriter = New-Object System.Xml.XmlTextWriter($stringWriter)
$xmlWriter.Formatting = [System.Xml.Formatting]::Indented
$doc.WriteContentTo($xmlWriter)
# write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
$fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
# clean up the used objects
$xmlWriter.Dispose()
$stringWriter.Dispose()
$doc = $null
}
据我了解,你的APIreturns残缺不全XML..
总是最好找出它是否无法更改,因此它提供格式正确的 XML,但如果您没有其他选择然后处理您得到的内容,您可以试试这个:
$pathIn = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
$apiReturn = Get-Content -Path $pathIn -Raw -Encoding UTF8
# the above gives you crappy text like
# <?xml version="1.0" encoding="UTF-8"?> <ShopOrder><OrderHead><OrderNo>F10068</OrderNo><OrderDate>20181003</OrderDate><CustomerNo>200078</CustomerNo> </OrderHead><Order><ItemNo>F10029</ItemNo></Order></ShopOrder><ShopOrder> <?xml version="1.0" encoding="UTF-8"?><ShopOrder><OrderHead><OrderNo>F10069</OrderNo><OrderDate>20181004</OrderDate> <CustomerNo>200078</CustomerNo></OrderHead><Order><ItemNo>F10078</ItemNo></Order></ShopOrder>
# First create a template Here-String to be used for each new input AND output file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<Orders>{0}</Orders>
"@
# split the text on the xml declaration line(s) and recombine the xmlnodes into one string
$subNodes = (($apiReturn -split '<\?xml[^>]+>').Trim() -ne '') -join ''
# use the template to form a single complete XML from this:
# remove doubled <ShopOrder> opening tags and fit it into the template
[xml]$xml = $xmlTemplate -f ($subNodes -replace '(<ShopOrder>){2,}', '')
# the next code is really the (almost) unchanged code from my first answer
$xml.Orders.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
# string the OuterXml text from all grouped ShopOrder nodes together
$subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
# create a new XML document and load the xml text
$doc = New-Object System.Xml.XmlDocument
$doc.LoadXml($subXmlText)
# create a StringWriter and a XmlTextWriter to format the xml propertly
$stringWriter = New-Object System.IO.StringWriter
$xmlWriter = New-Object System.Xml.XmlTextWriter($stringWriter)
$xmlWriter.Formatting = [System.Xml.Formatting]::Indented
$doc.WriteContentTo($xmlWriter)
# write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
$fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
# clean up the used objects
$xmlWriter.Dispose()
$stringWriter.Dispose()
$doc = $null
}
我目前正在尝试拆分一个 XML 文件,其中包含许多具有已定义项目编号的对象。
XML-文件大致如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<Orders>
<ShopOrder>
<OrderHead>
<OrderNo>F10068</OrderNo>
<OrderDate>20181003</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10069</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10078</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10070</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200089</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
</Orders>
...
我现在想将 XML 文件拆分为几个 XML 文件,这些文件按 ItemNo 分组以供进一步分析。例如,新的 XML 文件应该用 ItemNo F10029.
计算每个 ShopOrder-Object此外,我想命名新的 XML-以它包含的 ItemNo 命名的文件。
现在我可以将 XML 文件分成几个 XML 文件,但每个 XML 只包含 1 个 ShopOrder。我不知道如何将拆分与对象分组相结合。
有人可以推荐一种在一定条件下将拆分与分组相结合的方法吗? 预先感谢您的帮助!
更新: 这是我当前的代码
$dir = "C:\Users\User1\ShopOrder"
$in_file = "ShopOrder1234.xml"
)
$in_path = Join-Path -Path $dir -ChildPath $in_file
$out_file_base = "$($in_file.split(".")[0])_"
$xml_dec_regex = "<\?xml .*"
$blank_regex = "^\s*$"
$file_num = 1
$out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"
$sr = New-Object -TypeName System.IO.StreamReader -ArgumentList $in_path
$length = $sr.BaseStream.Length
Write-Progress -Activity "Splitting File" `
-Status "File: $file_num" `
-PercentComplete ($sr.BaseStream.Position / $length * 100)
$sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path
$line = $sr.ReadLine()
While ($line -match $blank_regex -and !$sr.EndOfStream) {
$line = $sr.ReadLine()
}
$sw.WriteLine($line)
While (!$sr.EndOfStream) {
$line = $sr.ReadLine()
While ($line -match $blank_regex) {
$line = $sr.ReadLine()
}
If ($line -match $xml_dec_regex) {
$sw.close()
$file_num += 1
$out_path = "$dir$out_file_base$("{0:d6}" -f $file_num).xml"
$sw = New-Object -TypeName System.IO.StreamWriter -ArgumentList $out_path
Write-Progress -Activity "Splitting File" `
-Status "File: $file_num" `
-PercentComplete ($sr.BaseStream.Position / $length * 100)
}
$sw.WriteLine($line)
}
$sr.close()
$sw.close()```
您可以像下面这样操作,但首先您需要确保您有一个有效的 XML:
在这里,我添加了一个根节点<Orders>
并更正了结束标记</ItemNo>
<?xml version="1.0" encoding="UTF-8"?>
<Orders>
<ShopOrder>
<OrderHead>
<OrderNo>F10068</OrderNo>
<OrderDate>20181003</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10069</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200078</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10078</ItemNo>
</Order>
</ShopOrder>
<ShopOrder>
<OrderHead>
<OrderNo>F10070</OrderNo>
<OrderDate>20181004</OrderDate>
<CustomerNo>200089</CustomerNo>
</OrderHead>
<Order>
<ItemNo>F10029</ItemNo>
</Order>
</ShopOrder>
</Orders>
代码
$pathIn = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
[xml]$xml = Get-Content -Path $pathIn -Raw -Encoding UTF8
# get the root elements name
$rootElement = $xml.DocumentElement.LocalName
# create a template Here-String to be used for each new file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<$rootElement>{0}</$rootElement>
"@
$xml.$rootElement.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
# string the OuterXml text from all grouped ShopOrder nodes together
$subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
# create a new XML document and load the xml text
$doc = New-Object System.Xml.XmlDocument
$doc.LoadXml($subXmlText)
# create a StringWriter and a XmlTextWriter to format the xml propertly
$stringWriter = New-Object System.IO.StringWriter
$xmlWriter = New-Object System.Xml.XmlTextWriter($stringWriter)
$xmlWriter.Formatting = [System.Xml.Formatting]::Indented
$doc.WriteContentTo($xmlWriter)
# write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
$fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
# clean up the used objects
$xmlWriter.Dispose()
$stringWriter.Dispose()
$doc = $null
}
据我了解,你的APIreturns残缺不全XML..
总是最好找出它是否无法更改,因此它提供格式正确的 XML,但如果您没有其他选择然后处理您得到的内容,您可以试试这个:
$pathIn = 'C:\Users\User1\ShopOrder\ShopOrder1234.xml'
$apiReturn = Get-Content -Path $pathIn -Raw -Encoding UTF8
# the above gives you crappy text like
# <?xml version="1.0" encoding="UTF-8"?> <ShopOrder><OrderHead><OrderNo>F10068</OrderNo><OrderDate>20181003</OrderDate><CustomerNo>200078</CustomerNo> </OrderHead><Order><ItemNo>F10029</ItemNo></Order></ShopOrder><ShopOrder> <?xml version="1.0" encoding="UTF-8"?><ShopOrder><OrderHead><OrderNo>F10069</OrderNo><OrderDate>20181004</OrderDate> <CustomerNo>200078</CustomerNo></OrderHead><Order><ItemNo>F10078</ItemNo></Order></ShopOrder>
# First create a template Here-String to be used for each new input AND output file
$xmlTemplate = @"
<?xml version="1.0" encoding="UTF-8"?>
<Orders>{0}</Orders>
"@
# split the text on the xml declaration line(s) and recombine the xmlnodes into one string
$subNodes = (($apiReturn -split '<\?xml[^>]+>').Trim() -ne '') -join ''
# use the template to form a single complete XML from this:
# remove doubled <ShopOrder> opening tags and fit it into the template
[xml]$xml = $xmlTemplate -f ($subNodes -replace '(<ShopOrder>){2,}', '')
# the next code is really the (almost) unchanged code from my first answer
$xml.Orders.ShopOrder | Group-Object -Property @{Expression = {$_.Order.ItemNo}} | ForEach-Object {
# string the OuterXml text from all grouped ShopOrder nodes together
$subXmlText = $xmlTemplate -f (($_.Group | ForEach-Object { $_.OuterXml }) -join [environment]::NewLine)
# create a new XML document and load the xml text
$doc = New-Object System.Xml.XmlDocument
$doc.LoadXml($subXmlText)
# create a StringWriter and a XmlTextWriter to format the xml propertly
$stringWriter = New-Object System.IO.StringWriter
$xmlWriter = New-Object System.Xml.XmlTextWriter($stringWriter)
$xmlWriter.Formatting = [System.Xml.Formatting]::Indented
$doc.WriteContentTo($xmlWriter)
# write the new XML (as text file) to disk with filename taken from the groups Name (==> whatever was in <ItemNo>)
$fileOut = Join-Path -Path ([System.IO.Path]::GetDirectoryName($pathIn)) -ChildPath ('{0}.xml' -f $_.Name)
Set-Content -Path $fileOut -Value $stringWriter.ToString() -Encoding UTF8
# clean up the used objects
$xmlWriter.Dispose()
$stringWriter.Dispose()
$doc = $null
}