使用异步并行下载网页内容
Downloading webpage contents in parallel using async
我正在使用 Microsoft 的 example,它使用 Async
和 Tasks
下载多个 URL 的数据。
我的要求是在 1 分钟内完成 200 link 秒的下载,以便在第 2 分钟再次开始下载同一组 200 URL 秒。我知道这在很大程度上取决于网络速度,并且在较小程度上取决于 CPU 功率,因为这不是 IO 绑定过程。
假设网络和 CPU 支持此操作并且不会成为瓶颈,我实际上在任务执行一段时间后看到超时和取消异常。
因此,问题是,在同一示例中,我能否将其更改为长 运行 任务,以便任务不会超时?我知道 TaskCreationOptions
枚举的用法和 LongRunning
的用法。然而,问题是:
1) 在下面的示例中创建任务时如何提供此参数以及提供的 link?
2)LongRunning
的定义是什么?这是否意味着每个任务都不会再超时?
3)我可以通过其他方式明确设置无限超时吗?
基本上,我的要求是,如果特定 URL 的下载过程完成,它将再次触发相同 URL 的下载 - 这意味着相同的 URL将被一遍又一遍地下载,因此任务永远不会完成(MSDN 示例中的 URLs 不是我将触发的 URLs,会有其他 URLs其内容每分钟都会更改,因此我需要至少每分钟连续下载一次 URL。
也将上面示例中的代码粘贴到此处 link:
Dim cts As CancellationTokenSource
Dim countProcessed As Integer
Private Async Sub startButton_Click(sender As Object, e As RoutedEventArgs)
' Instantiate the CancellationTokenSource.
cts = New CancellationTokenSource()
resultsTextBox.Clear()
Try
Await AccessTheWebAsync(cts.Token)
resultsTextBox.Text &= vbCrLf & "Downloads complete."
Catch ex As OperationCanceledException
resultsTextBox.Text &= vbCrLf & "Downloads canceled." & vbCrLf
Catch ex As Exception
resultsTextBox.Text &= vbCrLf & "Downloads failed." & vbCrLf
End Try
' Set the CancellationTokenSource to Nothing when the download is complete.
cts = Nothing
End Sub
Private Sub cancelButton_Click(sender As Object, e As RoutedEventArgs)
If cts IsNot Nothing Then
cts.Cancel()
End If
End Sub
Async Function AccessTheWebAsync(ct As CancellationToken) As Task
Dim client As HttpClient = New HttpClient()
' Call SetUpURLList to make a list of web addresses.
Dim urlList As List(Of String) = SetUpURLList()
' ***Create a query that, when executed, returns a collection of tasks.
Dim downloadTasksQuery As IEnumerable(Of Task(Of Integer)) =
From url In urlList Select ProcessURLAsync(url, client, ct)
' ***Use ToList to execute the query and start the download tasks.
Dim downloadTasks As List(Of Task(Of Integer)) = downloadTasksQuery.ToList()
Await Task.WhenAll(downloadTasks)
'Ideally, this line should never be reached
Console.WriteLine("Done")
End Function
Async Function ProcessURLAsync(url As String, client As HttpClient, ct As CancellationToken) As Task(Of Integer)
Console.WriteLine("URL=" & url)
' GetAsync returns a Task(Of HttpResponseMessage).
Dim response As HttpResponseMessage = Await client.GetAsync(url, ct)
' Retrieve the web site contents from the HttpResponseMessage.
Dim urlContents As Byte() = Await response.Content.ReadAsByteArrayAsync()
Interlocked.Increment(countProcessed)
Console.WriteLine(countProcessed)
Return urlContents.Length
End Function
Private Function SetUpURLList() As List(Of String)
Dim urls = New List(Of String) From
{
"http://msdn.microsoft.com",
"http://msdn.microsoft.com/en-us/library/hh290138.aspx",
"http://msdn.microsoft.com/en-us/library/hh290140.aspx",
"http://msdn.microsoft.com/en-us/library/dd470362.aspx",
"http://msdn.microsoft.com/en-us/library/aa578028.aspx",
"http://msdn.microsoft.com/en-us/library/ms404677.aspx",
"http://msdn.microsoft.com/en-us/library/ff730837.aspx",
"http://msdn.microsoft.com/en-us/library/hh290138.aspx",
"http://msdn.microsoft.com/en-us/library/hh290140.aspx"
'For space constraint I am not including the 200 URLs, but pls assume the above list contains 200 URLs
}
Return urls
End Function
Question is, therefore, in the same example, can I change this to long running tasks so that the tasks don't timeout?
任务本身不会超时。您可能看到的是 HTTP 请求超时。 Long-运行 任务没有任何不同的超时语义。
I am aware of usage of the TaskCreationOptions enum and using LongRunning.
您还应该知道,几乎不应该使用它们。
您可能会超时,因为您的所有请求都访问了同一个网站。尝试设置 ServicePointManager.DefaultConnectionLimit
to int.MaxValue
, and possibly also increase HttpClient.Timeout
.
我正在使用 Microsoft 的 example,它使用 Async
和 Tasks
下载多个 URL 的数据。
我的要求是在 1 分钟内完成 200 link 秒的下载,以便在第 2 分钟再次开始下载同一组 200 URL 秒。我知道这在很大程度上取决于网络速度,并且在较小程度上取决于 CPU 功率,因为这不是 IO 绑定过程。
假设网络和 CPU 支持此操作并且不会成为瓶颈,我实际上在任务执行一段时间后看到超时和取消异常。
因此,问题是,在同一示例中,我能否将其更改为长 运行 任务,以便任务不会超时?我知道 TaskCreationOptions
枚举的用法和 LongRunning
的用法。然而,问题是:
1) 在下面的示例中创建任务时如何提供此参数以及提供的 link?
2)LongRunning
的定义是什么?这是否意味着每个任务都不会再超时?
3)我可以通过其他方式明确设置无限超时吗?
基本上,我的要求是,如果特定 URL 的下载过程完成,它将再次触发相同 URL 的下载 - 这意味着相同的 URL将被一遍又一遍地下载,因此任务永远不会完成(MSDN 示例中的 URLs 不是我将触发的 URLs,会有其他 URLs其内容每分钟都会更改,因此我需要至少每分钟连续下载一次 URL。
也将上面示例中的代码粘贴到此处 link:
Dim cts As CancellationTokenSource
Dim countProcessed As Integer
Private Async Sub startButton_Click(sender As Object, e As RoutedEventArgs)
' Instantiate the CancellationTokenSource.
cts = New CancellationTokenSource()
resultsTextBox.Clear()
Try
Await AccessTheWebAsync(cts.Token)
resultsTextBox.Text &= vbCrLf & "Downloads complete."
Catch ex As OperationCanceledException
resultsTextBox.Text &= vbCrLf & "Downloads canceled." & vbCrLf
Catch ex As Exception
resultsTextBox.Text &= vbCrLf & "Downloads failed." & vbCrLf
End Try
' Set the CancellationTokenSource to Nothing when the download is complete.
cts = Nothing
End Sub
Private Sub cancelButton_Click(sender As Object, e As RoutedEventArgs)
If cts IsNot Nothing Then
cts.Cancel()
End If
End Sub
Async Function AccessTheWebAsync(ct As CancellationToken) As Task
Dim client As HttpClient = New HttpClient()
' Call SetUpURLList to make a list of web addresses.
Dim urlList As List(Of String) = SetUpURLList()
' ***Create a query that, when executed, returns a collection of tasks.
Dim downloadTasksQuery As IEnumerable(Of Task(Of Integer)) =
From url In urlList Select ProcessURLAsync(url, client, ct)
' ***Use ToList to execute the query and start the download tasks.
Dim downloadTasks As List(Of Task(Of Integer)) = downloadTasksQuery.ToList()
Await Task.WhenAll(downloadTasks)
'Ideally, this line should never be reached
Console.WriteLine("Done")
End Function
Async Function ProcessURLAsync(url As String, client As HttpClient, ct As CancellationToken) As Task(Of Integer)
Console.WriteLine("URL=" & url)
' GetAsync returns a Task(Of HttpResponseMessage).
Dim response As HttpResponseMessage = Await client.GetAsync(url, ct)
' Retrieve the web site contents from the HttpResponseMessage.
Dim urlContents As Byte() = Await response.Content.ReadAsByteArrayAsync()
Interlocked.Increment(countProcessed)
Console.WriteLine(countProcessed)
Return urlContents.Length
End Function
Private Function SetUpURLList() As List(Of String)
Dim urls = New List(Of String) From
{
"http://msdn.microsoft.com",
"http://msdn.microsoft.com/en-us/library/hh290138.aspx",
"http://msdn.microsoft.com/en-us/library/hh290140.aspx",
"http://msdn.microsoft.com/en-us/library/dd470362.aspx",
"http://msdn.microsoft.com/en-us/library/aa578028.aspx",
"http://msdn.microsoft.com/en-us/library/ms404677.aspx",
"http://msdn.microsoft.com/en-us/library/ff730837.aspx",
"http://msdn.microsoft.com/en-us/library/hh290138.aspx",
"http://msdn.microsoft.com/en-us/library/hh290140.aspx"
'For space constraint I am not including the 200 URLs, but pls assume the above list contains 200 URLs
}
Return urls
End Function
Question is, therefore, in the same example, can I change this to long running tasks so that the tasks don't timeout?
任务本身不会超时。您可能看到的是 HTTP 请求超时。 Long-运行 任务没有任何不同的超时语义。
I am aware of usage of the TaskCreationOptions enum and using LongRunning.
您还应该知道,几乎不应该使用它们。
您可能会超时,因为您的所有请求都访问了同一个网站。尝试设置 ServicePointManager.DefaultConnectionLimit
to int.MaxValue
, and possibly also increase HttpClient.Timeout
.