如何使用 PowerShell 脚本从 html 代码中提取多个字符串

How to extract multiple strings from html code using PowerShell script

我最近开始了 PowerShell 之旅,并且已经在这个问题上卡住了几天。 基本上我有一个脚本,其中一个变量包含一个带有随机文本的 html 块,我需要从该文本中提取某些字符串并放入另一个变量中。

示例如下:

$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n  Field1: Field1Value3\n</div>\n"

这就是我需要进一步传递的内容

$Field1 = "Field1Value1, Field1Value2, Field1Value3" //or in a list if possible
$Field2 = "Field2Value1"

要提取的字段更多,但思路是一样的。我能够使用以下函数来实现此功能,但这仅适用于唯一值。

function GetStringBetweenTwoStrings($firstString, $secondString, $Text){

    #Regex pattern to compare two strings
    $pattern = "$firstString(.*?)$secondString"

    #Perform the opperation
    $result = [regex]::Match($Text,$pattern).Groups[1].Value

    #Return result
    return $result

}

编辑: 这是我最近的尝试。由于某些原因,\n 在某些情况下不起作用,所以我现在将 everying 替换为 @@@。使用以下代码仍然只打印每个字段的第一个值。

$originalString = "<div>Field1: Field1Value1\n</div><div>Field2: Field2Value1</div>\n<div>Field1: Field1Value2\n Field3: Field3Value1<br /></div>"
$formattedString = $originalString
$hash = @{}
$hash.'\n' = ' @@@'
$hash.'\t' = ' @@@'
$hash.'<br />' = ' @@@'
$hash.'</div>' = ' @@@'
foreach ($key in $hash.Keys) {
   $formattedString = $formattedString.Replace($key, $hash.$key)
   }

function GetStringBetweenTwoStrings($firstString, $secondString, $string){

    #Regex pattern to compare two strings
    $pattern = "$firstString(.*?)$secondString"

    #Perform the opperation
    $result = [regex]::Match($string,$pattern).Groups[1].Value

    #Return result
    return $result

}
$field1 = GetStringBetweenTwoStrings -firstString "Field1: " -secondString " @@@" $formattedString
$field2 = GetStringBetweenTwoStrings -firstString "Field2: " -secondString " @@@" $formattedString
$field3 = GetStringBetweenTwoStrings -firstString "Field3: " -secondString " @@@" $formattedString

我同意 Paolo 你通常应该为此使用 HTML 解析器,但由于不清楚你如何在变量 $text 中获得 html,我建议你试试

$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n  Field1: Field1Value3\n</div>\n"

[regex]::Matches($text,"Field\d+:\s[^\<]+").Value | Group-Object {($_ -split ':')[0].Trim()} | ForEach-Object {
    $value = foreach ($val in $_.Group) { ($val -split ':', 2)[-1].Trim() }
    Remove-Variable $_.Name -ErrorAction SilentlyContinue
    New-Variable -Name $_.Name -Value $value
}

$Field1 现在拥有一个数组,其中包含值

Field1Value1
Field1Value2
Field1Value3

$Field2 包含一个值为

的字符串
Field2Value1

编辑

看你最后的评论,你说 Field1 实际上是类似 First Name 的地方,代码应该完全不同.. (你为什么不首先向我们展示??)

$text = "<div>\n First Name: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Last Name: Field2Value1\n </div> \n First Name: Field1Value2\n </div> \n <div>\n  First Name: Field1Value3\n</div>\n"

$hash = @{}
# replace all tags and \n, \t, \f, \v in the string with two (or more) spaces, then split on those
($text -replace '</?[a-z][a-z0-9]*[^<>]*>|\[nrtfv]', '  ').Trim() -split '\s{2,}' | ForEach-Object {
    # split the name and the value 
    $name, $value = ($_ -split ':', 2).Trim()
    $name = $name -replace '\s'  # take out spaces because they do not belong in a variable name
    # if the hash already has an element with this name, combine the value as array
    if ($hash.ContainsKey($name)) { $hash[$name] = @($hash[$name]) + $value }
    else { $hash[$name] = $value }
}

如果我是你,你可以将值保留在散列中,并将它们用作 $hash.FirstName

# show what is inside the hash:
$hash
Name                           Value                                                                                                                  
----                           -----                                                                                                                  
FirstName                      {Field1Value1, Field1Value2, Field1Value3}
LastName                       Field2Value1

但是如果你必须为它创建单独的变量,你可以这样做

$hash.GetEnumerator() | ForEach-Object {
    Remove-Variable $_.Name -ErrorAction SilentlyContinue
    New-Variable -Name $_.Name -Value $_.Value        
}

$FirstName 现在拥有一个数组,其中包含值

Field1Value1
Field1Value2
Field1Value3

$LastName 包含一个值为

的字符串
Field2Value1