Powershell Invoke-WebRequest 从过滤后的文本中提取特定内容

Question

我已经编写了一个 PowerShell 脚本，它将从 url 中提取所需的文本，如下所示

$ExtractData = Invoke-WebRequest "https://www.somesite.com/downloads"
$ExtractData = $ExtractData.tostring() -split "[`r`n]" | select-string "http://somesite.com/download"

给出的结果如下

onclick="_gaq.push(['_trackEvent', 'Downloads', 'http://somesite.com/download/some.exe']);">

我想把它写成用逗号分隔，但是有没有更好的方法来只得到这个东西

http://somesite.com/download/some.exe

我对正则表达式的尝试

$regex = ‘(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?’
$ExtractData= $ExtractData | select-string -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } 
$ExtractData

这是给出这条路径但不是 exe

http://somesite.com/download

Answer 1

使用Regex.Matches to extract all links in an array of Match条记录，然后收集Groups[1].Value:

$webpage = Invoke-WebRequest "https://www.somesite.com/downloads"
$links = ([regex]'((?:ftp|https?)://\S+?)[''"]').Matches($webpage) |
         ForEach { [Web.HTTPUtility]::HtmlDecode($_.Groups[1].Value) }

请注意，由于我们处理的是原始 HTML，因此 URL 可能会使用 & 而不是 & HTML 编码，因此使用了 HtmlDecode .

Powershell Invoke-WebRequest 从过滤后的文本中提取特定内容

Powershell Invoke-WebRequest extract specific from the filtered text

powershell-4.0