如何使这个依赖嵌套的并行版本，以及为什么折叠不起作用

Question

我怎样才能将其与 OpenMP 3.1 相提并论？我试过崩溃，但编译器是这样说的：

 error: initializer expression refers to iteration variable ‘k’
   for (j = k+1; j < N; ++j){

当我尝试一个简单的并行时，结果就像线程有时做同样的事情并跳转，所以有时结果更大而其他时间更少

int N = 100;
int *x;
x = (int*) malloc ((N+1)*sizeof(int));
//... initialization of the array x ...
// ...
for (k = 1; k < N-1; ++k)
  {
    for (j = k+1; j < N; ++j)
     {
       s = x[k] + x[j];
       if (fn(s) == 1){
         count++;
     }
  }

计数必须是 62 但随机

Answer 1

根据您提供的代码片段，根据OpenMP 3.1标准对嵌套并行循环的限制：

The iteration count for each associated loop is computed before entry to the outermost loop. If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

由于内循环的迭代次数取决于外循环的迭代次数（即 j = k+1），您不能执行以下操作：

#pragma omp parallel for collapse(2) schedule(static, 1) private(j) reduction(+:count)
for (k = 1; k < N-1; ++k)
    for (j = k+1; j < N; ++j)
       ...

此外，从 OpenMP 3.1“循环构造”部分（与此问题相关）可以阅读：

for (init-expr; test-expr; incr-expr) structured-block

其中 init-expr 是以下之一：

...
integer-type var = lb
...

和test-expr：

...
var relational-op b

受 lb 和 b 的限制：

Loop invariant expressions of a type compatible with the type of var.

尽管如此，正如@Hristo Iliev 所指出的那样，“在 5.0 中发生了变化，添加了对非矩形循环的支持。”。从 OpenMP 5.0“循环构造”部分可以看出，现在对 lb 和 b 的限制是：

Expressions of a type compatible with the type of var that are loop invariant with respect to the outermost associated loop or are one of the following (where var-outer, a1, and a2 have a type compatible with the type of var, var-outer is var from an outer associated loop, and a1 and a2 are loop invariant integer expressions with respect to the outermost loop):

...

var-outer + a2

...

您可以使用 normal parallel for collapse 子句的替代方法。请记住，您在更新变量计数期间遇到了竞争条件。

#pragma omp parallel for schedule(static, 1) private(j) reduction(+:count) 
for (k = 1; k < N-1; ++k){
    for (j = k+1; j < N; ++j)
     {
       s = x[k] + x[j];
       if (fn(s) == 1){
         count++;
     }
  }

重要说明 尽管 k 不必是 private，因为它是循环的一部分parallelized 和 OpenMP 会隐式地将其设为 private，同样不适用于变量 j。因此，原因之一：

Count must be 62 but is random

另一个是缺少reduction(+:count)。

如何使这个依赖嵌套的并行版本，以及为什么折叠不起作用

How to make the parallel version of this dependent nested for, and why is the collapse not working

c

parallel-processing

multithreading

openmp

nested-loops