如何使用 PowerShell 脚本从 html 代码中提取多个字符串
How to extract multiple strings from html code using PowerShell script
我最近开始了 PowerShell 之旅,并且已经在这个问题上卡住了几天。
基本上我有一个脚本,其中一个变量包含一个带有随机文本的 html 块,我需要从该文本中提取某些字符串并放入另一个变量中。
示例如下:
$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n Field1: Field1Value3\n</div>\n"
这就是我需要进一步传递的内容
$Field1 = "Field1Value1, Field1Value2, Field1Value3" //or in a list if possible
$Field2 = "Field2Value1"
要提取的字段更多,但思路是一样的。我能够使用以下函数来实现此功能,但这仅适用于唯一值。
function GetStringBetweenTwoStrings($firstString, $secondString, $Text){
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($Text,$pattern).Groups[1].Value
#Return result
return $result
}
编辑:
这是我最近的尝试。由于某些原因,\n
在某些情况下不起作用,所以我现在将 everying 替换为 @@@
。使用以下代码仍然只打印每个字段的第一个值。
$originalString = "<div>Field1: Field1Value1\n</div><div>Field2: Field2Value1</div>\n<div>Field1: Field1Value2\n Field3: Field3Value1<br /></div>"
$formattedString = $originalString
$hash = @{}
$hash.'\n' = ' @@@'
$hash.'\t' = ' @@@'
$hash.'<br />' = ' @@@'
$hash.'</div>' = ' @@@'
foreach ($key in $hash.Keys) {
$formattedString = $formattedString.Replace($key, $hash.$key)
}
function GetStringBetweenTwoStrings($firstString, $secondString, $string){
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($string,$pattern).Groups[1].Value
#Return result
return $result
}
$field1 = GetStringBetweenTwoStrings -firstString "Field1: " -secondString " @@@" $formattedString
$field2 = GetStringBetweenTwoStrings -firstString "Field2: " -secondString " @@@" $formattedString
$field3 = GetStringBetweenTwoStrings -firstString "Field3: " -secondString " @@@" $formattedString
我同意 Paolo 你通常应该为此使用 HTML 解析器,但由于不清楚你如何在变量 $text 中获得 html,我建议你试试
$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n Field1: Field1Value3\n</div>\n"
[regex]::Matches($text,"Field\d+:\s[^\<]+").Value | Group-Object {($_ -split ':')[0].Trim()} | ForEach-Object {
$value = foreach ($val in $_.Group) { ($val -split ':', 2)[-1].Trim() }
Remove-Variable $_.Name -ErrorAction SilentlyContinue
New-Variable -Name $_.Name -Value $value
}
$Field1 现在拥有一个数组,其中包含值
Field1Value1
Field1Value2
Field1Value3
$Field2 包含一个值为
的字符串
Field2Value1
编辑
看你最后的评论,你说 Field1
实际上是类似 First Name
的地方,代码应该完全不同..
(你为什么不首先向我们展示??)
$text = "<div>\n First Name: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Last Name: Field2Value1\n </div> \n First Name: Field1Value2\n </div> \n <div>\n First Name: Field1Value3\n</div>\n"
$hash = @{}
# replace all tags and \n, \t, \f, \v in the string with two (or more) spaces, then split on those
($text -replace '</?[a-z][a-z0-9]*[^<>]*>|\[nrtfv]', ' ').Trim() -split '\s{2,}' | ForEach-Object {
# split the name and the value
$name, $value = ($_ -split ':', 2).Trim()
$name = $name -replace '\s' # take out spaces because they do not belong in a variable name
# if the hash already has an element with this name, combine the value as array
if ($hash.ContainsKey($name)) { $hash[$name] = @($hash[$name]) + $value }
else { $hash[$name] = $value }
}
如果我是你,你可以将值保留在散列中,并将它们用作 $hash.FirstName
等
# show what is inside the hash:
$hash
Name Value
---- -----
FirstName {Field1Value1, Field1Value2, Field1Value3}
LastName Field2Value1
但是如果你必须为它创建单独的变量,你可以这样做
$hash.GetEnumerator() | ForEach-Object {
Remove-Variable $_.Name -ErrorAction SilentlyContinue
New-Variable -Name $_.Name -Value $_.Value
}
$FirstName 现在拥有一个数组,其中包含值
Field1Value1
Field1Value2
Field1Value3
$LastName 包含一个值为
的字符串
Field2Value1
我最近开始了 PowerShell 之旅,并且已经在这个问题上卡住了几天。 基本上我有一个脚本,其中一个变量包含一个带有随机文本的 html 块,我需要从该文本中提取某些字符串并放入另一个变量中。
示例如下:
$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n Field1: Field1Value3\n</div>\n"
这就是我需要进一步传递的内容
$Field1 = "Field1Value1, Field1Value2, Field1Value3" //or in a list if possible
$Field2 = "Field2Value1"
要提取的字段更多,但思路是一样的。我能够使用以下函数来实现此功能,但这仅适用于唯一值。
function GetStringBetweenTwoStrings($firstString, $secondString, $Text){
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($Text,$pattern).Groups[1].Value
#Return result
return $result
}
编辑:
这是我最近的尝试。由于某些原因,\n
在某些情况下不起作用,所以我现在将 everying 替换为 @@@
。使用以下代码仍然只打印每个字段的第一个值。
$originalString = "<div>Field1: Field1Value1\n</div><div>Field2: Field2Value1</div>\n<div>Field1: Field1Value2\n Field3: Field3Value1<br /></div>"
$formattedString = $originalString
$hash = @{}
$hash.'\n' = ' @@@'
$hash.'\t' = ' @@@'
$hash.'<br />' = ' @@@'
$hash.'</div>' = ' @@@'
foreach ($key in $hash.Keys) {
$formattedString = $formattedString.Replace($key, $hash.$key)
}
function GetStringBetweenTwoStrings($firstString, $secondString, $string){
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($string,$pattern).Groups[1].Value
#Return result
return $result
}
$field1 = GetStringBetweenTwoStrings -firstString "Field1: " -secondString " @@@" $formattedString
$field2 = GetStringBetweenTwoStrings -firstString "Field2: " -secondString " @@@" $formattedString
$field3 = GetStringBetweenTwoStrings -firstString "Field3: " -secondString " @@@" $formattedString
我同意 Paolo 你通常应该为此使用 HTML 解析器,但由于不清楚你如何在变量 $text 中获得 html,我建议你试试
$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n Field1: Field1Value3\n</div>\n"
[regex]::Matches($text,"Field\d+:\s[^\<]+").Value | Group-Object {($_ -split ':')[0].Trim()} | ForEach-Object {
$value = foreach ($val in $_.Group) { ($val -split ':', 2)[-1].Trim() }
Remove-Variable $_.Name -ErrorAction SilentlyContinue
New-Variable -Name $_.Name -Value $value
}
$Field1 现在拥有一个数组,其中包含值
Field1Value1
Field1Value2
Field1Value3
$Field2 包含一个值为
的字符串Field2Value1
编辑
看你最后的评论,你说 Field1
实际上是类似 First Name
的地方,代码应该完全不同..
(你为什么不首先向我们展示??)
$text = "<div>\n First Name: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Last Name: Field2Value1\n </div> \n First Name: Field1Value2\n </div> \n <div>\n First Name: Field1Value3\n</div>\n"
$hash = @{}
# replace all tags and \n, \t, \f, \v in the string with two (or more) spaces, then split on those
($text -replace '</?[a-z][a-z0-9]*[^<>]*>|\[nrtfv]', ' ').Trim() -split '\s{2,}' | ForEach-Object {
# split the name and the value
$name, $value = ($_ -split ':', 2).Trim()
$name = $name -replace '\s' # take out spaces because they do not belong in a variable name
# if the hash already has an element with this name, combine the value as array
if ($hash.ContainsKey($name)) { $hash[$name] = @($hash[$name]) + $value }
else { $hash[$name] = $value }
}
如果我是你,你可以将值保留在散列中,并将它们用作 $hash.FirstName
等
# show what is inside the hash:
$hash
Name Value ---- ----- FirstName {Field1Value1, Field1Value2, Field1Value3} LastName Field2Value1
但是如果你必须为它创建单独的变量,你可以这样做
$hash.GetEnumerator() | ForEach-Object {
Remove-Variable $_.Name -ErrorAction SilentlyContinue
New-Variable -Name $_.Name -Value $_.Value
}
$FirstName 现在拥有一个数组,其中包含值
Field1Value1
Field1Value2
Field1Value3
$LastName 包含一个值为
的字符串Field2Value1