使用异步并行下载网页内容

Downloading webpage contents in parallel using async

我正在使用 Microsoft 的 example,它使用 AsyncTasks 下载多个 URL 的数据。

我的要求是在 1 分钟内完成 200 link 秒的下载,以便在第 2 分钟再次开始下载同一组 200 URL 秒。我知道这在很大程度上取决于网络速度,并且在较小程度上取决于 CPU 功率,因为​​这不是 IO 绑定过程。

假设网络和 CPU 支持此操作并且不会成为瓶颈,我实际上在任务执行一段时间后看到超时和取消异常。

因此,问题是,在同一示例中,我能否将其更改为长 运行 任务,以便任务不会超时?我知道 TaskCreationOptions 枚举的用法和 LongRunning 的用法。然而,问题是: 1) 在下面的示例中创建任务时如何提供此参数以及提供的 link? 2)LongRunning的定义是什么?这是否意味着每个任务都不会再超时? 3)我可以通过其他方式明确设置无限超时吗?

基本上,我的要求是,如果特定 URL 的下载过程完成,它将再次触发相同 URL 的下载 - 这意味着相同的 URL将被一遍又一遍地下载,因此任务永远不会完成(MSDN 示例中的 URLs 不是我将触发的 URLs,会有其他 URLs其内容每分钟都会更改,因此我需要至少每分钟连续下载一次 URL。

也将上面示例中的代码粘贴到此处 link:

Dim cts As CancellationTokenSource
Dim countProcessed As Integer

Private Async Sub startButton_Click(sender As Object, e As RoutedEventArgs)

    ' Instantiate the CancellationTokenSource.
    cts = New CancellationTokenSource()

    resultsTextBox.Clear()

    Try
        Await AccessTheWebAsync(cts.Token)
        resultsTextBox.Text &= vbCrLf & "Downloads complete."

    Catch ex As OperationCanceledException
        resultsTextBox.Text &= vbCrLf & "Downloads canceled." & vbCrLf

    Catch ex As Exception
        resultsTextBox.Text &= vbCrLf & "Downloads failed." & vbCrLf
    End Try

    ' Set the CancellationTokenSource to Nothing when the download is complete.
    cts = Nothing
End Sub

Private Sub cancelButton_Click(sender As Object, e As RoutedEventArgs)
    If cts IsNot Nothing Then
        cts.Cancel()
    End If
End Sub

Async Function AccessTheWebAsync(ct As CancellationToken) As Task

    Dim client As HttpClient = New HttpClient()

    ' Call SetUpURLList to make a list of web addresses.
    Dim urlList As List(Of String) = SetUpURLList()

    ' ***Create a query that, when executed, returns a collection of tasks.
    Dim downloadTasksQuery As IEnumerable(Of Task(Of Integer)) =
        From url In urlList Select ProcessURLAsync(url, client, ct)

    ' ***Use ToList to execute the query and start the download tasks. 
    Dim downloadTasks As List(Of Task(Of Integer)) = downloadTasksQuery.ToList()

    Await Task.WhenAll(downloadTasks)
    'Ideally, this line should never be reached
    Console.WriteLine("Done")

End Function

Async Function ProcessURLAsync(url As String, client As HttpClient, ct As CancellationToken) As Task(Of Integer)
    Console.WriteLine("URL=" & url)
    ' GetAsync returns a Task(Of HttpResponseMessage). 
    Dim response As HttpResponseMessage = Await client.GetAsync(url, ct)

    ' Retrieve the web site contents from the HttpResponseMessage.
    Dim urlContents As Byte() = Await response.Content.ReadAsByteArrayAsync()
    Interlocked.Increment(countProcessed)
    Console.WriteLine(countProcessed)
    Return urlContents.Length
End Function

Private Function SetUpURLList() As List(Of String)

    Dim urls = New List(Of String) From
        {
            "http://msdn.microsoft.com",
            "http://msdn.microsoft.com/en-us/library/hh290138.aspx",
            "http://msdn.microsoft.com/en-us/library/hh290140.aspx",
            "http://msdn.microsoft.com/en-us/library/dd470362.aspx",
            "http://msdn.microsoft.com/en-us/library/aa578028.aspx",
            "http://msdn.microsoft.com/en-us/library/ms404677.aspx",
            "http://msdn.microsoft.com/en-us/library/ff730837.aspx",
            "http://msdn.microsoft.com/en-us/library/hh290138.aspx",
            "http://msdn.microsoft.com/en-us/library/hh290140.aspx"
    'For space constraint I am not including the 200 URLs, but pls assume the above list contains 200 URLs
    }

    Return urls
End Function

Question is, therefore, in the same example, can I change this to long running tasks so that the tasks don't timeout?

任务本身不会超时。您可能看到的是 HTTP 请求超时。 Long-运行 任务没有任何不同的超时语义。

I am aware of usage of the TaskCreationOptions enum and using LongRunning.

您还应该知道,几乎不应该使用它们。


您可能会超时,因为您的所有请求都访问了同一个网站。尝试设置 ServicePointManager.DefaultConnectionLimit to int.MaxValue, and possibly also increase HttpClient.Timeout.