我应该如何在我的代码中使用 Task.Run 以获得适当的可伸缩性和性能?

How should I use Task.Run in my code for proper scalability and performance?

我开始对我的代码产生巨大的怀疑,我需要更有经验的程序员的一些建议。

在我的应用程序中单击按钮时,应用程序运行一个命令,即调用 ScrapJockeys 方法:

if (UpdateJockeysPl) await ScrapJockeys(JPlFrom, JPlTo + 1, "jockeysPl"); //1 - 1049

ScrapJockeys 正在触发一个 for 循环,重复代码块 20K - 150K 次(视情况而定)。在循环内部,我需要调用一个服务方法,该方法的执行需要花费大量时间。此外,我希望能够取消循环以及 loop/method.

内部发生的所有事情

现在我正在使用一个带有任务列表的方法,并且在循环内部触发了 Task.Run。在每个任务中,我调用一个等待的服务方法,与同步代码相比,它可以将所有内容的执行时间减少到 1/4。此外,每个任务都分配了一个取消令牌,如示例 GitHub link:

public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
    //init values and controls in here
    List<Task> tasks = new List<Task>();
    for (int i = startIndex; i < stopIndex; i++)
    {
        int j = i;
        Task task = Task.Run(async () =>
        {
            LoadedJockey jockey = new LoadedJockey();

            CancellationToken.ThrowIfCancellationRequested();

            if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j);
            if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j);

            //doing some stuff with results in here

            }, TokenSource.Token);

        tasks.Add(task);
    }

    try
    {
        await Task.WhenAll(tasks);
    }
    catch (OperationCanceledException)
    {
        //
    }
    finally
    {
        await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file

        //soing some stuff with UI props in here
    }
}

关于我的问题,我的代码是否一切正常?根据 this article:

Many async newbies start off by trying to treat asynchronous tasks the same as parallel (TPL) tasks and this is a major misstep.

那我应该用什么?

并且根据this article

On a busy server, this kind of implementation can kill scalability.

那我该怎么做呢?

请注意,服务接口方法签名为Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index);

而且我也不是 100% 确定我在我的服务 class 中正确使用 Task.Run。里面的方法将代码包装在 await Task.Run(() => 中,就像示例中的 GitHub link:

public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
    LoadedJockey jockey = new LoadedJockey();
    await Task.Run(() =>
    {
        //do some time consuming things

    });

    return jockey;
}

据我从文章中了解到,这是一种反模式。但我有点困惑。根据,应该没问题吧……?如果没有,如何更换?

As far as I understand from the articles, this is a kind of anti-pattern.

这是一个anti-pattern。但是如果不能修改服务实现,你至少应该能够并行执行任务。像这样:

public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
    ConcurrentBag<Task> tasks = new ConcurrentBag<Task>();
    ParallelOptions parallelLoopOptions = new ParallelOptions() { CancellationToken = CancellationToken };
    Parallel.For(startIndex, stopIndex, parallelLoopOptions, i =>
    {
        int j = i;
        switch (dataType)
        {
            case "jockeysPl":
                tasks.Add(_scrapServices.ScrapSingleJockeyPlAsync(j));
                break;
            case "jockeysCz":
                tasks.Add(_scrapServices.ScrapSingleJockeyCzAsync(j));
                break;
        }
    });

    try
    {
        await Task.WhenAll(tasks);
    }
    catch (OperationCanceledException)
    {
        //
    }
    finally
    {
        await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
                                                                   //soing some stuff with UI props in here
    }
}

在 UI 方面,当您的 CPU-bound 代码足够长以至于您需要将其移出 UI 线程时,您应该使用 Task.Run .这与服务器端完全不同,服务器端使用 Task.Run 完全 是 anti-pattern.

在你的情况下,你所有的代码似乎都是 I/O-based,所以我认为根本不需要 Task.Run

你的问题中有一个语句与提供的代码冲突:

I am calling an awaited service method

public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
    await Task.Run(() =>
    {
        //do some time consuming things
    });
}

传递给 Task.Run 的 lambda 不是 async,因此不可能等待服务方法。确实 it is not.

更好的解决方案是异步加载 HTML(例如,使用 HttpClient.GetStringAsync),然后调用 HtmlDocument.LoadHtml,如下所示:

public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
  LoadedJockey jockey = new LoadedJockey();
  ...
  string link = sb.ToString();

  var html = await httpClient.GetStringAsync(link).ConfigureAwait(false);
  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
  doc.LoadHtml(html);

  if (jockey.Name == null)
  ...

  return jockey;
}

并从 for 循环中删除 Task.Run

private async Task ScrapJockey(string dataType)
{
  LoadedJockey jockey = new LoadedJockey();

  CancellationToken.ThrowIfCancellationRequested();

  if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j).ConfigureAwait(false);
  if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j).ConfigureAwait(false);

  //doing some stuff with results in here
}

public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
  //init values and controls in here

  List<Task> tasks = new List<Task>();
  for (int i = startIndex; i < stopIndex; i++)
  {
    tasks.Add(ScrapJockey(dataType));
  }

  try
  {
    await Task.WhenAll(tasks);
  }
  catch (OperationCanceledException)
  {
    //
  }
  finally
  {
    await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file

    //soing some stuff with UI props in here
  }
}