您是否需要后台工作者或多个线程来触发多个异步 HttpWebRequests?

Do you need background workers or multiple threads to fire multiple Async HttpWebRequests?

总体目标

我正在尝试使用从 .txt 文件读取的多个输入 URL 调用 Google PageSpeed Insights API,并将结果输出到 .csv

我试过的

我编写了一个控制台应用程序来尝试触发这些请求,然后当它们返回时将它们添加到列表中,当它们全部完成后,将 list 写入 .csv 文件(尝试立即将响应写入 .csv 时异步变得有点疯狂)。

我的代码在下面,远未优化。我来自 JavaScript 背景,我通常不使用网络工作者或任何其他托管的新线程,所以我试图在 C# 中做同样的事情。

  1. 我可以 运行 执行这些多个 WebRequest 并将它们写入集合(或输出文件)而不使用多个线程并让它们全部 运行 并行,而不是在处理下一个请求之前等待每个请求返回?
  2. 是否有更简洁的方法来使用 回调
  3. 如果需要线程或 BackgroundWorkerClean Code 的方法是什么?

初始示例代码

static void Main(string[] args)
{
    Console.WriteLine("Begin Google PageSpeed Insights!");

    appMode = ConfigurationManager.AppSettings["ApplicationMode"];
    var inputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["InputFile"];
    var outputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["OutputFile"];

    var inputLines = File.ReadAllLines(inputFilePath).ToList();

    if (File.Exists(outputFilePath))
    {
        File.Delete(outputFilePath);
    }

    List<string> outputCache = new List<string>();

    foreach (var line in inputLines)
    {
        var requestDataFromPsi = CallPsiForPrimaryStats(line);
        Console.WriteLine($"Got response of {requestDataFromPsi.Result}");

        outputCache.Add(requestDataFromPsi.Result);
    }

    var writeTask = WriteCharacters(outputCache, outputFilePath);

    writeTask.Wait();

    Console.WriteLine("End Google PageSpeed Insights");
}

private static async Task<string> CallPsiForPrimaryStats(string url)
{
    HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");
    myReq.Method = WebRequestMethods.Http.Get;
    myReq.Timeout = 60000;
    myReq.Proxy = null;
    myReq.ContentType = "application/json";

    Task<WebResponse> task = Task.Factory.FromAsync(
            myReq.BeginGetResponse,
            asyncResult => myReq.EndGetResponse(asyncResult),
            (object)null);

    return await task.ContinueWith(t => ReadStreamFromResponse(t.Result));
}

private static string ReadStreamFromResponse(WebResponse response)
{
   using (Stream responseStream = response.GetResponseStream())
   using (StreamReader sr = new StreamReader(responseStream))
   {
       string jsonResponse = sr.ReadToEnd();
       dynamic jsonObj = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonResponse);

       var psiResp = new PsiResponse()
       {
           Url = jsonObj.id,
           SpeedScore = jsonObj.ruleGroups.SPEED.score,
           UsabilityScore = jsonObj.ruleGroups.USABILITY.score,
           NumberResources = jsonObj.pageStats.numberResources,
           NumberHosts = jsonObj.pageStats.numberHosts,
           TotalRequestBytes = jsonObj.pageStats.totalRequestBytes,
           NumberStaticResources = jsonObj.pageStats.numberStaticResources,
           HtmlResponseBytes = jsonObj.pageStats.htmlResponseBytes,
           CssResponseBytes = jsonObj.pageStats.cssResponseBytes,
           ImageResponseBytes = jsonObj.pageStats.imageResponseBytes,
           JavascriptResponseBytes = jsonObj.pageStats.javascriptResponseBytes,
            OtherResponseBytes = jsonObj.pageStats.otherResponseBytes,
            NumberJsResources = jsonObj.pageStats.numberJsResources,
            NumberCssResources = jsonObj.pageStats.numberCssResources,

        };
        return CreateOutputString(psiResp);
    }
}

static async Task WriteCharacters(List<string> inputs, string outputFilePath)
{
    using (StreamWriter fileWriter = new StreamWriter(outputFilePath))
    {
        await fileWriter.WriteLineAsync(TABLE_HEADER);

        foreach (var input in inputs)
        {
            await fileWriter.WriteLineAsync(input);
        }
    }
}

private static string CreateOutputString(PsiResponse psiResponse)
{
    var stringToWrite = "";

    foreach (var prop in psiResponse.GetType().GetProperties())
    {
        stringToWrite += $"{prop.GetValue(psiResponse, null)},";
    }
    Console.WriteLine(stringToWrite);
    return stringToWrite;
}

更新:根据 Stephen Cleary Tips 进行重构后

问题是这仍然 运行 很慢。原来用了20分钟,重构后还是20分钟。它似乎在某处受到限制,可能是 PageSpeed API 上的 Google。我测试了它,调用 https://www.google.com/, https://www.yahoo.com/, https://www.bing.com/ 和其他 18 个,它 运行 也很慢,有一个瓶颈,一次只能处理大约 5 个请求。我尝试重构为 运行 5 个测试 URL,然后写入文件并重复,但它只是略微加快了过程。

static void Main(string[] args) { MainAsync(args).Wait(); }
static async Task MainAsync(string[] args)
{
    Console.WriteLine("Begin Google PageSpeed Insights!");

    appMode = ConfigurationManager.AppSettings["ApplicationMode"];
    var inputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["InputFile"];
    var outputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["OutputFile"];

    var inputLines = File.ReadAllLines(inputFilePath).ToList();

    if (File.Exists(outputFilePath))
    {
        File.Delete(outputFilePath);
    }

    var tasks = inputLines.Select(line => CallPsiForPrimaryStats(line));
    var outputCache = await Task.WhenAll(tasks);

    await WriteCharacters(outputCache, outputFilePath);

    Console.WriteLine("End Google PageSpeed Insights");
}

private static async Task<string> CallPsiForPrimaryStats(string url)
{
    HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");
    myReq.Method = WebRequestMethods.Http.Get;
    myReq.Timeout = 60000;
    myReq.Proxy = null;
    myReq.ContentType = "application/json";
    Console.WriteLine($"Start call: {url}");

    // Try using `HttpClient()` later
    //var myReq2 = new HttpClient();
    //await myReq2.GetAsync($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");

    Task<WebResponse> task = Task.Factory.FromAsync(
        myReq.BeginGetResponse,
        myReq.EndGetResponse,
        (object)null);
    var result = await task;
    return ReadStreamFromResponse(result);
}

private static string ReadStreamFromResponse(WebResponse response)
{
    using (Stream responseStream = response.GetResponseStream())
    using (StreamReader sr = new StreamReader(responseStream))
    {
        string jsonResponse = sr.ReadToEnd();
        dynamic jsonObj = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonResponse);

        var psiResp = new PsiResponse()
        {
            Url = jsonObj.id,
            SpeedScore = jsonObj.ruleGroups.SPEED.score,
            UsabilityScore = jsonObj.ruleGroups.USABILITY.score,
            NumberResources = jsonObj.pageStats.numberResources,
            NumberHosts = jsonObj.pageStats.numberHosts,
            TotalRequestBytes = jsonObj.pageStats.totalRequestBytes,
            NumberStaticResources = jsonObj.pageStats.numberStaticResources,
            HtmlResponseBytes = jsonObj.pageStats.htmlResponseBytes,
            CssResponseBytes = jsonObj.pageStats.cssResponseBytes,
            ImageResponseBytes = jsonObj.pageStats.imageResponseBytes,
            JavascriptResponseBytes = jsonObj.pageStats.javascriptResponseBytes,
            OtherResponseBytes = jsonObj.pageStats.otherResponseBytes,
            NumberJsResources = jsonObj.pageStats.numberJsResources,
            NumberCssResources = jsonObj.pageStats.numberCssResources,

        };
        return CreateOutputString(psiResp);
    }
}

static async Task WriteCharacters(IEnumerable<string> inputs, string outputFilePath)
{
    using (StreamWriter fileWriter = new StreamWriter(outputFilePath))
    {
        await fileWriter.WriteLineAsync(TABLE_HEADER);

        foreach (var input in inputs)
        {
            await fileWriter.WriteLineAsync(input);
        }
    }
}

private static string CreateOutputString(PsiResponse psiResponse)
{
    var stringToWrite = "";
    foreach (var prop in psiResponse.GetType().GetProperties())
    {
        stringToWrite += $"{prop.GetValue(psiResponse, null)},";
    }
    Console.WriteLine(stringToWrite);
    return stringToWrite;
}

Can I run do these multiple WebRequests and write them to a collection (or output file) without using multiple threads and have them all run in parallel, not having to wait for each request to come back before handling the next one?

没问题。

Is there a cleaner way to do this with callbacks?

您始终可以遍历输入行并获取全部 运行ning 的任务集合。

var resultTask = Task.WhenAll(
    inputLines.Select(line => CallPsiForPrimaryStats(line)).ToArray());

这类似于在 Javascript 中使用 Q 库来实现承诺。对于 .Net 任务,主机将尽可能多地并行启动进程。

resultTask 将是您可以使用的结果的集合,很像您的 outputCache

在您上面添加的代码中,循环中对 .Result 的调用将是同步的。没有什么是并行发生的。等待所有这些时要小心,你可能 运行 在它全部传回之前内存不足!可能值得将其流式传输到文件中,因为它们 return,并且有一个信号量或锁可以防止它们同时写入流。

此外,我认为 WebClient class 现在比手卷 HttpWebRequest 更惯用 HttpWebRequest

If threads or BackgroundWorkers are needed, what's a Clean Code way of doing this?

这就是 Task 库和 .Net 异步堆栈的优点。您不需要对线程执行任何操作。

了解 async/await 类型调用和 synchronous 类型调用之间的区别很重要。在方法声明中看到 async 并且在主体中看到 await 的任何地方都意味着代码正在释放当前同步线程来做其他工作,比如触发更多任务。当您看到 .Result.Wait() 时,它们是同步的,因此 阻塞 主同步线程。这意味着没有简单并行的能力。

Can I run do these multiple WebRequests and write them to a collection (or output file) without using multiple threads and have them all run in parallel, not having to wait for each request to come back before handling the next one?

是;您正在寻找的是异步并发,它使用 Task.WhenAll.

Is there a cleaner way to do this with callbacks?

async/await 比回调更干净。 JavaScript 已经从回调移动到 promises(类似于 C# 中的 Task<T>),再到 async/await(非常类似于 async/await 在 C# 中)。两种语言中最干净的解决方案现在是 async/await.

不过,在 C# 中有一些陷阱,主要是由于向后兼容性。

1) 在异步控制台应用程序中,您确实需要阻止 Main 方法。一般来说,这是 只有 你应该阻塞异步代码的时间:

static void Main(string[] args) { MainAsync(args).Wait(); }
static async Task MainAsync(string[] args)
{

一旦有了async MainAsync方法,就可以使用Task.WhenAll进行异步并发:

  ...
  var tasks = inputLines.Select(line => CallPsiForPrimaryStats(line));
  var outputCache = await Task.WhenAll(tasks);
  await WriteCharacters(outputCache, outputFilePath);
  ...

2) 你不应该使用 ContinueWith;这是一个低级的、危险的API。使用 await 代替:

private static async Task<string> CallPsiForPrimaryStats(string url)
{
  ...
  Task<WebResponse> task = Task.Factory.FromAsync(
      myReq.BeginGetResponse,
      myReq.EndGetResponse,
      (object)null);
  var result = await task;
  return ReadStreamFromResponse(result);
}

3) 通常有更多 "async-friendly" 类型可用。在这种情况下,请考虑使用 HttpClient 而不是 HttpWebRequest;你会发现你的代码清理了很多。