用boost::serialization序列化递归图结构时如何防止堆栈溢出?

How to prevent stack overflow when serializing a recursive graph structure with boost::serialization?

我正在尝试使用 boost 序列化库序列化大型(几何)图形结构。

我将我的图存储为邻接表,即我的结构如下:

class Node {
  double x,y;
  std::vector<Node*> adjacent_nodes;
  ...
}

class Graph {
  std::vector<Node*> nodes;
  ...
}

现在有超过 10k 个节点,我的问题是,当我开始序列化(保存)我的图时,它会在返回之前递归调用所有这些节点的序列化,因为图是连接的。

更准确地说,在序列化图形时,它将首先序列化 "nodes" 向量中的第一个节点。这样做时,它需要序列化第一个节点的 "adjacent_nodes",例如包含第二个节点。

因此在返回第一个节点的序列化之前需要序列化第二个节点,依此类推。

我发现 this thread 从 2010 年开始,有人解释了完全相同的问题。然而,他们并没有找到可行的解决方案。

如有任何帮助,我们将不胜感激。

我的结构更详细:

class Node {

    double x,y;
    std::vector<Node*> adjacent_nodes;

public:

    inline double get_x() const { return x; }
    inline double get_y() const { return y; }
    inline std::vector<Node*> const& get_adjacent_nodes() const { return adjacent_nodes; }

    Node (double x, double y):x(x),y(y) {}

    void add_adjacent(Node* other) {
        adjacent_nodes.push_back(other);
    }

private:

    Node() {}

  friend class boost::serialization::access;
  template <class Archive>
  void serialize(Archive &ar, const unsigned int) {
    ar & x;
        ar & y;
        ar & adjacent_nodes;
  }

};

class Simple_graph {

std::vector<Node*> nodes;

void add_edge(int firstIndex, int secondIndex) {
    nodes[firstIndex]->add_adjacent(nodes[secondIndex]);
    nodes[secondIndex]->add_adjacent(nodes[firstIndex]);
}

public:

/* methods to get the distance of points, to read in the nodes, and to generate edges */

~Simple_graph() {
    for (auto node: nodes) {
        delete node;
    }
}

private:

  friend class boost::serialization::access;
  template <class Archive>
  void serialize(Archive &ar, const unsigned int) {
    ar & nodes;
  }

};

编辑:添加在上述线程中提出的一些建议,引用 Dominique Devienne:

1) save all the nodes without their topology info on a first pass of the vector, thus recording all the "tracked" pointers for them, then write the topology info for each, since then you don't "recurse", you only write a "ref" to an already serialized pointer.

2) have the possibility to write a "weak reference" to a pointer, which only adds the pointer to the "tracking" map with a special flag saying it wasn't "really" written yet, such that writing the topology of a node that wasn't yet written is akin to "forward references" to those neighboring nodes. Either the node will really be written later on, or it never will, and I suppose serialization should handle that gracefully.

#1 doesn't require changes in boost serialization, but puts the onus on the client code. Especially since you have to "externally" save the neighbors, so it's no longer well encapsulated, and writing a subset of the surface's nodes become more complex.

#2 would require seeking ahead to read the actual object when encountering a forward reference, and furthermore a separate map to know where to seek for it. That may be incompatible with boost serialization (I confess to be mostly ignorant about it).

现在可以实施这些建议吗?

由于您已经拥有一个包含指向所有节点的指针的向量,因此您可以使用索引序列化 adjacent_nodes 向量,而不是序列化实际的节点数据。

序列化节点时,您需要将 this 指针转换为索引。如果您可以将节点索引存储在节点中,这是最简单的,否则您将不得不搜索 nodes 以找到正确的指针(可以通过创建某种关联容器来映射指针来加快此过程到索引)。

当您需要读取数据时,您可以创建初始 nodes 向量,其中填充指向 empty/dummy 节点的指针(这些节点将在序列化时填充)。

如果这不可行,您可以将节点索引加载到一个临时数组中,然后在读入所有节点后返回并填充指针。但是您不必查找或 re-read 文件的任何部分。

如果图中没有任何大循环,您可以按照图中“end”中的节点出现在开头的方式对节点向量进行排序的向量。

示例: 假设我们有:

p1->p2->p3->....->p1000

如果您尝试序列化 vector v = {p1, p2, p3, ... , p1000},您将失败 但它适用于 vector v = {p1000, p999, p998, ... , p1} 但是如果你有像

这样的东西,你就没有机会了
p1->p2->p3->....->p1000->p1