有人可以为 Powershell 优化我的 .net RegEx - 解析 table 有错误
Can someone optimize my .net RegEx for Powershell - parsing a table with errors
目前我正在尝试从 Microsoft 站点(它的 GitHub 版本)解析 table 以获得正确的 PowerShell 对象。我将分享相关代码部分,以便您可以对其进行测试。它确实解析了我想要的,但我希望结果已经被修剪(没有前导尾随空格或换行符)。我还必须获得格式不同的 "CNG Key Isolation" 的结果。仅对于该数据块,我的 RegEx 包含换行符,但我没有让它工作。我知道我可以在 RegEx 之后在 PowerShell 中做一些解析,但我想更好地使用 RegEx。
我尚未优化的 RegEx 看起来像这样
(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)
你可以在这里测试:https://regex101.com/r/xQDRCO/1
基本上应该是每个服务取一个数据块,尝试得到
"ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
无论它们是什么顺序,或者其中一个丢失。 “ServiceTitle” 很特别,必须存在。
这是我目前测试的 PowerShell 代码:
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$RegExMatches = [Regex]::Matches($RequestData.content,'(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)',[System.Text.RegularExpressions.RegexOptions]::Multiline)
$FullList = @()
foreach ($entry in $RegExMatches) {$ServiceAsObject = [pscustomobject]@{};foreach ($field in $fields) {$ServiceAsObject | Add-Member -MemberType NoteProperty -Name $field -Value $entry.Groups[$field].value};$FullList += $ServiceAsObject}
$FullList[15..17] # three items to see what problem i have with "CNG Key Isolation"
我不经常使用像那样更大的正则表达式,所以请随时给我一些反馈来改进我自己。
谢谢,
安迪
这可能不是您要查找的内容,但您可以执行类似以下操作来输出自定义对象数组:
$output = switch -regex ($requestdata.content -split '\r?\n') {
'^##\s' {
# tracking empty lines since there is one under the service title
# start new hash table when a new service is found
# remove ## from service title names
$emptyLineCount = 0
$hash = [ordered]@{}
$hash.ServiceTitle = $_ -replace '^##\s'
}
'\| \*\*' {
# split on | and surrounding spaces
# replace ** so name is cleaner
if ($hash.ServiceTitle) {
$key,$value = ($_ -split '\s*\|\s*' -replace '\*\*')[1,2]
$hash[$key] = $value
}
}
'^$' {
# when second empty line is reached in a service block, output object
if ($hash.ServiceTitle -and ++$emptyLineCount -eq 2) {
[pscustomobject]$hash
}
}
}
# Finding a service by title
$output | Where ServiceTitle -eq 'CNG Key Isolation'
将内容拆分成一个行数组,这对我来说更容易使用 switch
语句。
如果存在数据不一致,使用更纯粹的正则表达式解决方案会使事情变得更加脆弱。 CNG Key Isolation 的数据块在每行末尾缺少 |
,并且是唯一一个这样的数据块。所以现在你必须匹配那个特殊情况或修复数据。
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$regexString = '(?m)^##\s(?<ServiceTitle>.*)$(?s).*?\*\*Service name\*\* \| (?<ServiceName>.*?(?=\s+\|)).*?\*\*Description\*\* \| (?<Description>.*?(?=\s+\|)).*?\*\*Installation\*\* \| (?<Installation>.*?(?=\s+\|)).*?\*\*Startup type\*\* \| (?<StartupType>.*?(?=\s+\|)).*?\*\*Recommendation\*\* \| (?<Recommendation>.*?(?=\s+\|)).*?\*\*Comments\*\* \| (?<Comments>.*?(?=\s+\|))'
$out = $RequestData.Content |
Select-String -Pattern $regexString -AllMatches |
Foreach-Object { $_.Matches | Foreach-Object {
$hash = [ordered]@{}
foreach ($field in $fields) {
$hash.$field = $_.Groups.where{$_.Name -eq $field}.Value}
[pscustomobject]$hash
}
}
假设您的 $RequestData.content
中包含所有文本,那么我不会尝试创建一个大型正则表达式来将其全部解析为可用对象,而是会这样做:
# first split the tables from the rest of the text and work on the table lines only
$result = ($RequestData.content -split '(?m)^The following tables.*:')[-1].Trim() -split '(?m)^## ' |
Where-Object { $_ -match '\S' } |
ForEach-Object {
# split each block to parse out the title and the table data
$title, $table = ($_.Trim() -split '(\r?\n){2}', 2).Trim()
# now remove the markdown stuff from the data and convert it using ConvertFrom-Csv
$data = (($table -replace '(?m)^\|--\|--\||[*]{2}|^\||\|$' -replace '\s\|\s', '|') -split '\r?\n' -ne '').Trim() | ConvertFrom-Csv -Delimiter '|'
# set up an ordered Hashtable to store the data
$hash = [ordered]@{ServiceTitle = $title}
foreach ($item in $data) {
$hash[$item.Name] = $item.Description
}
# output real objects
[PsCustomObject]$hash
}
$result
目前我正在尝试从 Microsoft 站点(它的 GitHub 版本)解析 table 以获得正确的 PowerShell 对象。我将分享相关代码部分,以便您可以对其进行测试。它确实解析了我想要的,但我希望结果已经被修剪(没有前导尾随空格或换行符)。我还必须获得格式不同的 "CNG Key Isolation" 的结果。仅对于该数据块,我的 RegEx 包含换行符,但我没有让它工作。我知道我可以在 RegEx 之后在 PowerShell 中做一些解析,但我想更好地使用 RegEx。
我尚未优化的 RegEx 看起来像这样
(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)
你可以在这里测试:https://regex101.com/r/xQDRCO/1
基本上应该是每个服务取一个数据块,尝试得到
"ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
无论它们是什么顺序,或者其中一个丢失。 “ServiceTitle” 很特别,必须存在。
这是我目前测试的 PowerShell 代码:
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$RegExMatches = [Regex]::Matches($RequestData.content,'(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)',[System.Text.RegularExpressions.RegexOptions]::Multiline)
$FullList = @()
foreach ($entry in $RegExMatches) {$ServiceAsObject = [pscustomobject]@{};foreach ($field in $fields) {$ServiceAsObject | Add-Member -MemberType NoteProperty -Name $field -Value $entry.Groups[$field].value};$FullList += $ServiceAsObject}
$FullList[15..17] # three items to see what problem i have with "CNG Key Isolation"
我不经常使用像那样更大的正则表达式,所以请随时给我一些反馈来改进我自己。
谢谢, 安迪
这可能不是您要查找的内容,但您可以执行类似以下操作来输出自定义对象数组:
$output = switch -regex ($requestdata.content -split '\r?\n') {
'^##\s' {
# tracking empty lines since there is one under the service title
# start new hash table when a new service is found
# remove ## from service title names
$emptyLineCount = 0
$hash = [ordered]@{}
$hash.ServiceTitle = $_ -replace '^##\s'
}
'\| \*\*' {
# split on | and surrounding spaces
# replace ** so name is cleaner
if ($hash.ServiceTitle) {
$key,$value = ($_ -split '\s*\|\s*' -replace '\*\*')[1,2]
$hash[$key] = $value
}
}
'^$' {
# when second empty line is reached in a service block, output object
if ($hash.ServiceTitle -and ++$emptyLineCount -eq 2) {
[pscustomobject]$hash
}
}
}
# Finding a service by title
$output | Where ServiceTitle -eq 'CNG Key Isolation'
将内容拆分成一个行数组,这对我来说更容易使用 switch
语句。
如果存在数据不一致,使用更纯粹的正则表达式解决方案会使事情变得更加脆弱。 CNG Key Isolation 的数据块在每行末尾缺少 |
,并且是唯一一个这样的数据块。所以现在你必须匹配那个特殊情况或修复数据。
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$regexString = '(?m)^##\s(?<ServiceTitle>.*)$(?s).*?\*\*Service name\*\* \| (?<ServiceName>.*?(?=\s+\|)).*?\*\*Description\*\* \| (?<Description>.*?(?=\s+\|)).*?\*\*Installation\*\* \| (?<Installation>.*?(?=\s+\|)).*?\*\*Startup type\*\* \| (?<StartupType>.*?(?=\s+\|)).*?\*\*Recommendation\*\* \| (?<Recommendation>.*?(?=\s+\|)).*?\*\*Comments\*\* \| (?<Comments>.*?(?=\s+\|))'
$out = $RequestData.Content |
Select-String -Pattern $regexString -AllMatches |
Foreach-Object { $_.Matches | Foreach-Object {
$hash = [ordered]@{}
foreach ($field in $fields) {
$hash.$field = $_.Groups.where{$_.Name -eq $field}.Value}
[pscustomobject]$hash
}
}
假设您的 $RequestData.content
中包含所有文本,那么我不会尝试创建一个大型正则表达式来将其全部解析为可用对象,而是会这样做:
# first split the tables from the rest of the text and work on the table lines only
$result = ($RequestData.content -split '(?m)^The following tables.*:')[-1].Trim() -split '(?m)^## ' |
Where-Object { $_ -match '\S' } |
ForEach-Object {
# split each block to parse out the title and the table data
$title, $table = ($_.Trim() -split '(\r?\n){2}', 2).Trim()
# now remove the markdown stuff from the data and convert it using ConvertFrom-Csv
$data = (($table -replace '(?m)^\|--\|--\||[*]{2}|^\||\|$' -replace '\s\|\s', '|') -split '\r?\n' -ne '').Trim() | ConvertFrom-Csv -Delimiter '|'
# set up an ordered Hashtable to store the data
$hash = [ordered]@{ServiceTitle = $title}
foreach ($item in $data) {
$hash[$item.Name] = $item.Description
}
# output real objects
[PsCustomObject]$hash
}
$result