在 pop() 期间提升 Fibonacci 堆访问冲突

Question

上下文

我目前正在实施某种形式的 A* 算法。我决定使用 boost 的斐波那契堆作为底层优先级队列。

算法运行时正在构建我的图表。作为 Vertex 对象，我正在使用：

class Vertex {
public:
    Vertex(double, double);
    double distance = std::numeric_limits<double>::max();
    double heuristic = 0;
    HeapData* fib;
    Vertex* predecessor = nullptr;
    std::vector<Edge*> adj;

    double euclideanDistanceTo(Vertex* v);
}

我的 Edge 看起来像：

class Edge {
public:
    Edge(Vertex*, double);
    Vertex* vertex = nullptr;
    double weight = 1;
}

为了使用 boosts 斐波那契堆，我读到应该创建一个堆数据对象，我就是这样做的：

struct HeapData {
    Vertex* v;
    boost::heap::fibonacci_heap<HeapData>::handle_type handle;

    HeapData(Vertex* u) {
        v = u;
    }

    bool operator<(HeapData const& rhs) const {
        return rhs.v->distance + rhs.v->heuristic < v->distance + v->heuristic;
    }
};

请注意，我在比较器中包含了启发式和实际距离以获得我想要的 A* 行为。

我实际的 A* 实现如下所示：

    boost::heap::fibonacci_heap<HeapData> heap;

    HeapData fibs(startPoint);
    startPoint->distance = 0;
    startPoint->heuristic = getHeuristic(startPoint);
    auto handles = heap.push(fibs);
    (*handles).handle = handles;

    while (!heap.empty()) {
        HeapData u = heap.top();
        heap.pop();

        if (u.v->equals(endPoint)) {
            return;
        }

        doSomeGraphCreationStuff(u.v); // this only creates vertices and edges

        for (Edge* e : u.v->adj) {
            double newDistance = e->weight + u.v->distance;

            if (e->vertex->distance > newDistance) {

                e->vertex->distance = newDistance;
                e->vertex->predecessor = u.v;

                if (!e->vertex->fib) {
                    if (!u.v->equals(endPoint)) {
                        e->vertex->heuristic = getHeuristic(e->vertex);
                    }
                    e->vertex->fib = new HeapData(e->vertex);
                    e->vertex->fib->handle = heap.push(*(e->vertex->fib));
                }
                else {
                    heap.increase(e->vertex->fib->handle);
                }
            }
        }
    }

问题

如果我使用非常小的启发式算法（将 A* 退化为 Dijkstra），算法运行得很好。但是，如果我引入一些更强的启发式方法，程序会抛出一个异常说明： 0xC0000005: Access violation writing location 0x0000000000000000. 在 boosts circular_list_algorithm.hpp 的 unlink 方法中。由于某种原因，next 和 prev 为空。这是调用 heap.pop() 的直接结果。请注意，heap.pop() 多次工作正常，不会立即崩溃。

问题

是什么导致了这个问题，我该如何解决？

我试过的

我的第一个想法是我不小心调用了 increase() 即使距离 + 启发式变得更大而不是更小（根据文档，这可能会破坏东西）。然而，这在我的实现中是不可能的，因为我只能在距离变小时更改节点。启发式保持不变。无论如何，我尝试使用 update() 而不是 increase()，但没有成功
我试图设置几个断点以获得更详细的视图，但我的数据集很大，我无法用较小的数据集重现它。

附加信息

加速版本：1.76.0
C++14
增加函数确实是正确的（而不是减少函数），因为所有提升堆都是作为最大堆实现的。我们通过反转比较器并使用 increase/decrease reversed

Answer 1

好，准备上车

首先我发现了一个bug
接下来，我全面审查、重构和简化了代码
尘埃落定后，我注意到行为发生了变化，看起来像是代码中潜在的逻辑错误

1。错误

就像我在问题中评论的那样，由于过度依赖没有清晰语义的原始指针，代码复杂度很高。

当我审查和重构代码时，我发现这确实导致了一个错误：

e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));

在第一行中，您创建了一个 HeapData 对象。您使 fib 成员指向 that 对象。
第二行插入该对象的副本（意思是，它是一个新对象，具有不同的对象标识，或者实际上说话：不同的地址）。

那么，现在

e->vertex->fib 指向队列中不存在的（泄漏的）HeapData 对象，并且
实际排队的 HeapData 副本有一个默认构造的 handle 成员，这意味着句柄包装了一个空指针。（检查 detail/stable_heap.hpp 中的 boost::heap::detail::node_handle<> 以验证这一点）。

这将很好地解释您所看到的症状。

2。重构、简化

所以，在理解代码后我得出的结论是

HeapData 和 Vertex 应该合并：HeapData 只服务于 link 一个顶点的句柄，但是你已经可以让顶点包含一个直接处理。

由于这次合并
- 您的顶点队列现在实际上包含顶点，表达了代码的意图
- 您将所有顶点访问减少一级间接访问（减少 Law Of Demeter 违规）
- 您可以将推送操作写在一行中，从而消除出现错误的空间。之前：
```
 target->fib = new HeapData(target);
 target->fib->handle = heap.push(*(target->fib));
```
  之后：
```
 target->fibhandle = heap.push(target);
```
您的 Edge class 实际上并没有对边缘建模，而是对“邻接”建模 - 目标边的一部分，带有weight属性。

为了清楚起见，我将其重命名为 OutEdge，并将向量更改为包含 values 而不是动态分配 OutEdge 个实例。

我无法从显示的代码中看出，但我几乎可以保证这些都是被泄露了。

此外，OutEdge 在大多数平台上只有 16 个字节，因此复制它们会很好，并且根据定义，邻接由它们的源顶点拥有（因为 including/moving 它到另一个源顶点会更改邻接的含义）。

In fact, if you're serious about performance, you may want to use a boost::container::small_vector with a suitably chosen capacity if you know that e.g. the median number of edges is "small"

您的比较可以“外包”给函数对象

 using Node = Vertex*;
 struct PrioCompare {
     bool operator()(Node a, Node b) const;
 };

之后堆可以输入为：

 namespace bh = boost::heap;
 using Heap   = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
 using Handle = Heap::handle_type;

您的成本函数违反了更多的 Demeter 法则，通过添加 Literate-Code 访问器可以轻松修复：
```
 Cost cost() const { return distance + heuristic; }
```
通过快速检查，我认为使用 infinite() 而不是 max() 作为初始距离会更准确。另外，为了可读性使用常量：
```
 static constexpr auto INF = std::numeric_limits<Cost>::infinity();
 Cost distance = INF;
```
您重复检查了 xyz->equals(endPoint) 以避免更新顶点的启发式算法。我建议将更新移动到顶点出队之后，这样重复就可以消失（检查和 getHeuristic(...) 调用）。
就像你说的，我们需要仔细对待 increase/update 修复方法。当我阅读您的代码时，节点的优先级与“成本”（累积边权重和启发式值）成反比。

因为 Boost Heap 堆是 max heaps 这意味着 增加优先级 应该匹配 降低成本 。我们可以断言它来检测调试构建中的任何程序员错误：
```
 assert(target->cost() < previous_cost);
 heap.increase(target->fibhandle);
```

进行这些更改后，代码可以更安静地阅读：

Cost AStarSearch(Node start, Node destination) {
    Heap heap;

    start->distance  = 0;
    start->fibhandle = heap.push(start);

    while (!heap.empty()) {
        Node u = heap.top();
        heap.pop();

        if (u->equals(destination)) {
            return u->cost();
        }
        u->heuristic = getHeuristic(start);

        doSomeGraphCreationStuff(u);

        for (auto& [target, weight] : u->adj) {
            auto curDistance = weight + u->distance;

            // if cheaper route, queue or update queued
            if (curDistance < target->distance) {
                auto cost_prior     = target->cost();
                target->distance    = curDistance;
                target->predecessor = u;

                if (target->fibhandle == NOHANDLE) {
                    target->fibhandle = heap.push(target);
                } else {
                    assert(target->cost() < cost_prior);
                    heap.update(target->fibhandle);
                }
            }
        }
    }

    return INF;
}

2(b) 现场演示

添加一些测试数据：

Live On Coliru

#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>

using Cost = double;
struct Vertex;

Cost getHeuristic(Vertex const*) { return 0; }
void doSomeGraphCreationStuff(Vertex const*) {
    // this only creates vertices and edges
}

struct OutEdge { // adjacency from implied source vertex
    Vertex* target = nullptr;
    Cost    weight = 1;
};

namespace bh = boost::heap;
using Node   = Vertex*;
struct PrioCompare {
    bool operator()(Node a, Node b) const;
};
using Heap   = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
static const Handle   NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();

struct Vertex {
    Vertex(Cost d = INF, Cost h = 0) : distance(d), heuristic(h) {}

    Cost    distance  = INF;
    Cost    heuristic = 0;
    Handle  fibhandle{};
    Vertex* predecessor = nullptr;

    std::vector<OutEdge> adj;

    Cost cost() const { return distance + heuristic; }
    Cost euclideanDistanceTo(Vertex* v);
    bool equals(Vertex const* u) const { return this == u; }
};

// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
    return a->cost() > b->cost();
}

Cost AStarSearch(Node start, Node destination) {
    Heap heap;

    start->distance  = 0;
    start->fibhandle = heap.push(start);

    while (!heap.empty()) {
        Node u = heap.top();
        heap.pop();

        if (u->equals(destination)) {
            return u->cost();
        }
        u->heuristic = getHeuristic(start);

        doSomeGraphCreationStuff(u);

        for (auto& [target, weight] : u->adj) {
            auto curDistance = weight + u->distance;

            // if cheaper route, queue or update queued
            if (curDistance < target->distance) {
                auto cost_prior     = target->cost();
                target->distance    = curDistance;
                target->predecessor = u;

                if (target->fibhandle == NOHANDLE) {
                    target->fibhandle = heap.push(target);
                } else {
                    assert(target->cost() < cost_prior);
                    heap.update(target->fibhandle);
                }
            }
        }
    }

    return INF;
}

int main() {
    // a very very simple graph data structure with minimal helpers:
    std::vector<Vertex> graph(10);
    auto node = [&graph](int id)             { return &graph.at(id);       };
    auto id   = [&graph](Vertex const* node) { return node - graph.data(); };

    // defining 6 edges
    graph[0].adj = {{node(2), 1.5}, {node(3), 15}};
    graph[2].adj = {{node(4), 2.5}, {node(1), 5}};
    graph[1].adj = {{node(7), 0.5}};
    graph[7].adj = {{node(3), 0.5}};

    // do a search
    Node startPoint = node(0);
    Node endPoint   = node(7);

    Cost cost = AStarSearch(startPoint, endPoint);

    std::cout << "Overall cost: " << cost << ", reverse path: \n";
    for (Node node = endPoint; node != nullptr; node = node->predecessor) {
        std::cout << " - " << id(node) << " distance " << node->distance
                  << "\n";
    }
}

版画

Overall cost: 7, reverse path: 
 - 7 distance 7
 - 1 distance 6.5
 - 2 distance 1.5
 - 0 distance 0

3。情节转折：潜伏的逻辑错误？

我对移动 getHeuristic() 更新感到不安。我想知道我是否可能已经改变了代码的含义，即使控制流量好像结账了。

然后我意识到行为确实发生了变化。这是微妙的。起初我以为旧的行为只是有问题。那么，让我们来分析一下：

风险的来源是节点访问与队列优先级的不一致。

访问节点时，条件看目标是否变“less” distance" 仅以 distance 表示。
但是，队列优先级会根据成本，这是不同的来自 distance，因为它包含任何启发式方法。

潜伏的问题是可以编写代码事实上，距离减少，不需要保证成本减少。

回到代码，我们可以看到这勉强避免了，因为 getHeuristic更新只在代码的非更新路径执行。

总结

理解这一点让我意识到

Vertex::heuristic 字段仅用作 getHeuristic() 函数调用的“缓存”版本
暗示该函数被视为幂等
我的版本确实改变了现在 getHeuristic 的行为可能对同一个顶点执行多次（如果再次访问通过更便宜的路径）

我建议通过

解决这个问题

将 heuristic 字段重命名为 cachedHeuristic
制作一个enqueue函数来封装顶点入队的三个步骤：
简单地省略端点检查，因为它最多可以消除对该节点的单次调用 getHeuristic，可能不值得增加复杂性
添加注释指出该代码路径的微妙之处
UPDATE 如评论中所见，我们还需要逆 operatione (dequeue) 对称更新 handle 所以它反映了该节点不再在队列中...

它也让我们明白了在调用 Heap::increase.

之前添加先决条件 assert 的用处

最终上市

经过以上改动

封装成Graph对象，即
还从输入中读取图表，例如：
```
 0   2   1.5
 0   3    15
 2   4   2.5
 2   1     5
 1   7   0.5
 7   3   0.5
```
每行包含（来源、目标、权重）。
一个单独的文件可以包含顶点索引 [0, ...) 的启发值，可选的换行符分隔，例如“7 11 99 33 44 55”
现在返回到达的节点而不是仅返回其成本

Live On Coliru

#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
#include <deque>
#include <fstream>

using Cost = double;
struct Vertex;

struct OutEdge { // adjacency from implied source vertex
    Vertex* target = nullptr;
    Cost    weight = 1;
};

namespace bh = boost::heap;
using Node   = Vertex*;
struct PrioCompare {
    bool operator()(Node a, Node b) const;
};
using MutableQueue   = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = MutableQueue::handle_type;
static const Handle   NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();

struct Vertex {
    Vertex(Cost d = INF, Cost h = 0) : distance(d), cachedHeuristic(h) {}

    Cost    distance        = INF;
    Cost    cachedHeuristic = 0;
    Handle  handle{};
    Vertex* predecessor = nullptr;

    std::vector<OutEdge> adj;

    Cost cost() const { return distance + cachedHeuristic; }
    Cost euclideanDistanceTo(Vertex* v);
};

// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
    return a->cost() > b->cost();
}

class Graph {
    std::vector<Cost> _heuristics;

    Cost getHeuristic(Vertex* v) {
        size_t n = id(v);
        return n < _heuristics.size() ? _heuristics[n] : 0;
    }
    void doSomeGraphCreationStuff(Vertex const*) {
        // this only creates vertices and edges
    }

  public:
    Graph(std::string edgeFile, std::string heurFile) {
        {
            std::ifstream stream(heurFile);
            _heuristics.assign(std::istream_iterator<Cost>(stream), {});
            if (!stream.eof())
                throw std::runtime_error("Unexpected heuristics");
        }
        std::ifstream stream(edgeFile);

        size_t src, tgt;
        double weight;
        while (stream >> src >> tgt >> weight) {
            _nodes.resize(std::max({_nodes.size(), src + 1, tgt + 1}));
            _nodes[src].adj.push_back({node(tgt), weight});
        }

        if (!stream.eof())
            throw std::runtime_error("Unexpected input");
    }

    Node search(size_t from, size_t to) {
        assert(from < _nodes.size());
        assert(to < _nodes.size());
        return AStar(node(from), node(to));
    }

    size_t id(Node node) const {
        // ugh, this is just for "pretty output"...
        for (size_t i = 0; i < _nodes.size(); ++i) {
            if (node == &_nodes[i])
                return i;
        }
        throw std::out_of_range("id");
    };
    Node node(int id) { return &_nodes.at(id); };

  private:
    // simple graph data structure with minimal helpers:
    std::deque<Vertex> _nodes; // reference stable when growing at the back

    // search state
    MutableQueue _queue;

    void enqueue(Node n) {
        assert(n && (n->handle == NOHANDLE));
        // get heuristic before insertion!
        n->cachedHeuristic = getHeuristic(n);
        n->handle = _queue.push(n);
    }

    Node dequeue() {
        Node node = _queue.top();
        node->handle = NOHANDLE;
        _queue.pop();
        return node;
    }

    Node AStar(Node start, Node destination) {
        _queue.clear();

        start->distance = 0;
        enqueue(start);

        while (!_queue.empty()) {
            Node u = dequeue();

            if (u == destination) {
                return u;
            }

            doSomeGraphCreationStuff(u);

            for (auto& [target, weight] : u->adj) {
                auto curDistance = u->distance + weight;

                // if cheaper route, queue or update queued
                if (curDistance < target->distance) {
                    auto cost_prior     = target->cost();
                    target->distance    = curDistance;
                    target->predecessor = u;

                    if (target->handle == NOHANDLE) {
                        // also caches heuristic
                        enqueue(target);
                    } else {
                        // NOTE: avoid updating heuristic here, because it
                        // breaks the queue invariant if heuristic increased
                        // more than decrease in distance
                        assert(target->cost() < cost_prior);
                        _queue.increase(target->handle);
                    }
                }
            }
        }

        return nullptr;
    }
};

int main() {
    Graph graph("input.txt", "heur.txt");

    Node arrival = graph.search(0, 7);

    std::cout << "reverse path: \n";
    for (Node n = arrival; n != nullptr; n = n->predecessor) {
        std::cout << " - " << graph.id(n) << " cost " << n->cost() << "\n";
    }
}

再次打印预期的

reverse path: 
 - 7 cost 7
 - 1 cost 17.5
 - 2 cost 100.5
 - 0 cost 7

注意启发式方法如何改变成本，但在这种情况下不是最佳路径。