使用 PowerShell 下载网页上的所有 pdf
Download all pdfs on a webpage with PowerShell
我在网上找到了以下代码来下载网页上的所有 pdf:
$psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/
"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href
$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
但是 PS 给了我错误:
Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not
available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing
parameter and try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:1 char:11
+ $psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
+ FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebReq
uestCommand
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Split-Path], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.S
plitPathCommand
Invoke-WebRequest : Cannot validate argument on parameter 'Uri'. The argument is null or empty. Provide an
argument that is not null or empty, and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+ ~~
+ CategoryInfo : InvalidData: (:) [Invoke-WebRequest], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeWebReques
tCommand
我该如何克服这个问题?为什么需要 Internet Explorer 引擎?
编辑:我尝试以这种方式修改代码:
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest -Uri $site -UseBasicParsing
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls | where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }
错误是:
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:3 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
EDIT 2 我尝试以这种方式修改代码。新代码:
$psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href
$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
Windows PowerShell 给了我一个新错误:
Invoke-WebRequest : Missing an argument for parameter 'Uri'. Specify a parameter of type 'System.Uri' and
try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:1 char:29
+ $psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.inf ...
+ ~~~~
+ CategoryInfo : InvalidArgument: (:) [Invoke-WebRequest], ParameterBindingException
+ FullyQualifiedErrorId : MissingArgument,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Split-Path], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.S
plitPathCommand
Invoke-WebRequest : Cannot validate argument on parameter 'Uri'. The argument is null or empty. Provide an
argument that is not null or empty, and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+ ~~
+ CategoryInfo : InvalidData: (:) [Invoke-WebRequest], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeWebReques
tCommand
按照您的方式:首先,您必须删除 URL 中的“关于:”或什么都不替换:
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href | ForEach-Object {$_.replace("about:", "")}
其次你必须重新创建完整的 URL :
$urls | ForEach-Object {Invoke-WebRequest -Uri "$site$_" -OutFile $_ }
但是您可以使用 "textcontent"
或 "pathname"
简化 :
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls | where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }
我在网上找到了以下代码来下载网页上的所有 pdf:
$psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/
"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href
$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
但是 PS 给了我错误:
Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not
available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing
parameter and try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:1 char:11
+ $psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
+ FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebReq
uestCommand
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Split-Path], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.S
plitPathCommand
Invoke-WebRequest : Cannot validate argument on parameter 'Uri'. The argument is null or empty. Provide an
argument that is not null or empty, and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+ ~~
+ CategoryInfo : InvalidData: (:) [Invoke-WebRequest], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeWebReques
tCommand
我该如何克服这个问题?为什么需要 Internet Explorer 引擎?
编辑:我尝试以这种方式修改代码:
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest -Uri $site -UseBasicParsing
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls | where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }
错误是:
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:3 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
EDIT 2 我尝试以这种方式修改代码。新代码:
$psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href
$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
Windows PowerShell 给了我一个新错误:
Invoke-WebRequest : Missing an argument for parameter 'Uri'. Specify a parameter of type 'System.Uri' and
try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:1 char:29
+ $psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.inf ...
+ ~~~~
+ CategoryInfo : InvalidArgument: (:) [Invoke-WebRequest], ParameterBindingException
+ FullyQualifiedErrorId : MissingArgument,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Split-Path], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.S
plitPathCommand
Invoke-WebRequest : Cannot validate argument on parameter 'Uri'. The argument is null or empty. Provide an
argument that is not null or empty, and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+ ~~
+ CategoryInfo : InvalidData: (:) [Invoke-WebRequest], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeWebReques
tCommand
按照您的方式:首先,您必须删除 URL 中的“关于:”或什么都不替换:
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href | ForEach-Object {$_.replace("about:", "")}
其次你必须重新创建完整的 URL :
$urls | ForEach-Object {Invoke-WebRequest -Uri "$site$_" -OutFile $_ }
但是您可以使用 "textcontent"
或 "pathname"
简化 :
$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls | where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }