WebClient 不支持并发 I/O 操作 - DownloadStringAsync - 抓取

WebClient does not support concurrent I/O operations - DownloadStringAsync - Scraping

首先,我已经阅读了类似的问题,但他们没有给我一个连贯的解释。我使用 BlockingCollection<WebClient> ClientQueue 来提供网络客户端。我给他们一个处理函数并开始异步抓取:

// Create queue of WebClient instances
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
for (int i = 0; i < 10; i++)
{
   ClientQueue.Add(new WebClient());
}

//Triggering Async Calls
foreach (var item in source)
{
   var worker = ClientQueue.Take();
   worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue, item);
   worker.DownloadStringAsync(uri);
}

public static void HandleJson(object sender, EventArgs e, BlockingCollection<WebClient> ClientQueue, string item)
{
   var res = (DownloadStringCompletedEventArgs) e;
   var jsonData = res.Result;
   var worker = (WebClient) sender;
   var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
   // Record the data
   while (worker.IsBusy) Thread.Sleep(5); // wait for the webClient to be free
   ClientQueue.Add(worker);
 }

我收到此错误消息:

WebClient does not support concurrent I/O operations.

其他线程:

问题在于,每次从队列中取出特定的 WebClient 时,您都会为 worker.DownloadStringCompleted 事件注册新的事件处理程序,而无需注销先前的事件处理程序 - 因此事件处理程序会累积。因此,HandleJson 在异步下载完成后被多次调用,因此 ClientQueue.Add(worker) returns 同一客户端也会多次进入队列。在同一个 WebClient 上发出两个并发下载只是时间问题。

通过在 WebClient 创建期间仅注册一次事件处理程序并从 HandleJson 方法中删除 item 参数,可以轻松解决此问题。

BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
for (int i = 0; i < 2; i++)
{
    var worker = new WebClient();
    worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue);
    ClientQueue.Add(worker);
}

如果需要参数item,将其作为参数传递给DownloadStringAsync(uri, item)并从res.UserState读取:

foreach (var item in source)
{
   var worker = ClientQueue.Take();
   worker.DownloadStringAsync(uri, item);
}

public static void HandleJson(object sender, DownloadStringCompletedEventArgs e, BlockingCollection<WebClient> ClientQueue)
{
    string item = (string)res.UserState;
    ...
}