在 C++ 中使用 OpenMP 并行化算法

Question

我的问题是：

我想用C++中的蚁群优化算法解决TSP。现在我实现了一个迭代解决这个问题的算法。

例如：我生成了 500 只蚂蚁 - 它们一只接一只地找到它们的路线。每只蚂蚁在前一只蚂蚁完成后才开始。

现在我想将整个事情并行化 - 我考虑过使用 OpenMP。

所以我的第一个问题是：我可以生成大量的线程吗？同时（蚂蚁数量 > 500）?

我已经试过了。所以这是我的代码 main.cpp:

 #pragma omp parallel for       
    for (auto ant = antarmy.begin(); ant != antarmy.end(); ++ant) {
        #pragma omp ordered
        if (ant->getIterations() < ITERATIONSMAX) {
            ant->setNumber(currentAntNumber);
            currentAntNumber++;
            ant->antRoute();
        }

    }

这是我的 Ant class 中的代码，即 "critical" 因为每个 Ant 都读取和写入相同的矩阵（信息素矩阵）：

 void Ant::antRoute()
 {
     this->route.setCity(0, this->getStartIndex());
     int nextCity = this->getNextCity(this->getStartIndex());
     this->routedistance += this->data->distanceMatrix[this->getStartIndex()][nextCity];
     int tempCity;
     int i = 2;
     this->setProbability(nextCity);
     this->setVisited(nextCity);
     this->route.setCity(1, nextCity);
     updatePheromone(this->getStartIndex(), nextCity, routedistance, 0);

     while (this->getVisitedCount() < datacitycount) {
         tempCity = nextCity;
         nextCity = this->getNextCity(nextCity);
         this->setProbability(nextCity);
         this->setVisited(nextCity);
         this->route.setCity(i, nextCity);
         this->routedistance += this->data->distanceMatrix[tempCity][nextCity];
         updatePheromone(tempCity, nextCity, routedistance, 0);
         i++;
     }

     this->routedistance += this->data->distanceMatrix[nextCity][this->getStartIndex()];
     // updatePheromone(-1, -1, -1, 1);
     ShortestDistance(this->routedistance);
     this->iterationsshortestpath++;
}

void Ant::updatePheromone(int i, int j, double distance, bool reduce)
{

     #pragma omp critical(pheromone) 

     if (reduce == 1) {
        for (int x = 0; x < datacitycount; x++) {
             for (int y = 0; y < datacitycount; y++) {
                 if (REDUCE * this->data->pheromoneMatrix[x][y] < 0)
                     this->data->pheromoneMatrix[x][y] = 0.0;
                 else
                    this->data->pheromoneMatrix[x][y] -= REDUCE * this->data->pheromoneMatrix[x][y];
             }
         }
     }
     else {

         double currentpheromone = this->data->pheromoneMatrix[i][j];
         double updatedpheromone = (1 - PHEROMONEREDUCTION)*currentpheromone + (PHEROMONEDEPOSIT / distance);

         if (updatedpheromone < 0.0) {
            this->data->pheromoneMatrix[i][j] = 0;
            this->data->pheromoneMatrix[j][i] = 0;
         }
          else {
             this->data->pheromoneMatrix[i][j] = updatedpheromone;
             this->data->pheromoneMatrix[j][i] = updatedpheromone;
         }
     }

 }

因此，由于某些原因，omp parallel for 循环无法在这些基于范围的循环上运行。 所以这是我的第二个问题 - 如果你们对如何完成基于范围的循环的代码有任何建议我很高兴。

感谢您的帮助

Answer 1

So my first question is: Can I generate a large number of threads that work simultaneously (for the number of ants > 500)?

在 OpenMP 中，您通常不应该关心有多少线程处于活动状态，而是确保通过 omp for 或 omp task 等工作共享结构公开足够的并行工作。因此，虽然您可能有一个包含 500 次迭代的循环，但您的程序可能运行具有一个线程和 500 个线程之间的任何线程（或更多，但它们只会闲置）。这与其他并行化方法不同，例如 pthreads，您必须管理所有线程及其所做的事情。

现在您的示例使用 ordered 不正确。 Ordered 仅在循环体的一小部分需要按顺序执行时才有用。即使这样，它也会对性能造成很大的问题。如果要在内部使用 ordered，还需要将循环声明为 ordered。另见 this excellent answer.

您不应该使用 ordered。相反，请确保蚂蚁事先知道那里 number，编写代码使它们不需要数字，或者至少数字的顺序对蚂蚁来说无关紧要。在后一种情况下，您可以使用 omp atomic capture.

关于访问共享数据。尽量避免它。添加 omp critical 是获得正确并行程序的第一步，但通常会导致性能问题。衡量您的并行效率，使用并行性能分析工具来了解您是否属于这种情况。然后你可以使用原子数据访问或减少（每个线程都有自己的数据，只有在主要工作完成后，所有线程的数据才会合并）。

在 C++ 中使用 OpenMP 并行化算法

Parallelize Algorithm with OpenMP in C++

c++

multithreading

vector

object

openmp