使用 Parallel.ForEach 与 TPL.Dataflow 或其他解决方案将 OCR 应用于大量图像

Using Parallel.ForEach vs TPL.Dataflow or other solution for applying OCR to large number of images

我正在创建一个应用程序来批处理 OCR 图像,现在我正在使用异步 Parallel.ForEach 方法遍历包含文件名和 OCR 字段的对象列表(打印输出) 'ed 文本。

我想知道这是否是解决此问题的最佳方式。我读过有关 TPL.Dataflow 的文章,虽然它看起来有点矫枉过正,但我​​想知道使用更复杂的方法是否更好,因为我可能会同时处理数百个文件,而且我不确定是否有数百个创建的任务是很好的做法。另外,我读过在 Parallel.ForEach 中使用 Interlocked.Increment 是不好的做法,我应该将其转换为 Parallel.For 吗?下面是我当前的实现:

private async void BatchOCR_Click(object sender, EventArgs e) {
   //some UI stuff
   source = new CancellationTokenSource();
   progressBar.Value = 0;
   int counter = 0;
   IProgress<int> progress = new Progress<int>(i => { progressBar.Value = (int)Math.Round((float)(i)*100 / fileList.Items.Count, 0); });

   await Task.Run(() => RunBatchOCR(ListOfPrintouts,progress,counter), source.Token);
   //some UI stuff
}
private async Task RunBatchOCR(List<Printout> printouts,IProgress<int> progress, int counter) {
   progress.Report(0);
   Parallel.ForEach(printouts, (printout,state) =>
      {
         try
         {
            source.Token.ThrowIfCancellationRequested();
         }
         catch
         {
            Console.WriteLine("Task was cancelled");
            cancelButton.Enabled = false;
            state.Break();
          }
          finally
          {
             Interlocked.Increment(ref counter);
          }
          printout.OcrHelper.runOCR(); //loads bitmap and extracts text
          progress.Report(counter);
          Console.WriteLine(counter.ToString());
    });
}

I'm not sure if having hundreds of tasks created is good practice

没关系。 Parallel 使用智能分区。

关于代码的其余部分,Interlocked 可以用作计数器,但您不想在没有 互锁障碍的情况下访问相同的变量 CancellationToken 代码需要简化:

private async Task RunBatchOCR(List<Printout> printouts, IProgress<int> progress)
{
  int counter = 0;
  progress?.Report(0);
  try
  {
    Parallel.ForEach(printouts, new ParallelOptions { CancellationToken = source.Token }, printout =>
    {
      printout.OcrHelper.runOCR(); //loads bitmap and extracts text
      var update = Interlocked.Increment(ref counter);
      progress?.Report(update);
      Console.WriteLine(update.ToString());
    });
  }
  catch (OperationCanceledException)
  {
    Console.WriteLine("Task was cancelled");
    cancelButton.Enabled = false;
  }
}