在 python 2.7 中增加递归限制和堆栈大小

Question

我正在处理大树，需要增加 Python 2.7.

的递归限制

使用 sys.setrecursionlimit(10000) 会使我的内核崩溃，所以我想我需要增加堆栈大小。

但是我不知道堆栈大小应该有多大。我试过 100 MiB 像这样 threading.stack_size(104857600)，但内核仍然死机。给它 1 GiB 会引发错误。

我还没有使用过 threading 模块，所以当我把上面的语句放在脚本的开头时，我是不是用错了？我没有进行任何类型的并行处理，一切都在同一个线程中完成。

我的电脑有 128 GB 物理内存，运行 Windows 10，iPython Spyder 控制台。

显示的错误很简单：

Kernel died, restarting

仅此而已。

编辑：

重现问题的完整代码。树的构建工作得很好，虽然它需要很长时间，内核只会在 treeToDict() 的递归执行期间将整棵树读入字典时死亡。也许那个函数的代码有问题。该树是非二叉树：

import pandas as pd
import threading
import sys
import random as rd
import itertools as it
import string
threading.stack_size(104857600)
sys.setrecursionlimit(10000)

class treenode:
    # class to build the tree
    def __init__(self,children,name='',weight=0,parent=None,depth=0):
        self.name = name
        self.weight = weight
        self.children = children
        self.parent = parent
        self.depth = depth
        self.parentname = parent.name if parent is not None else ''

def add_child(node,name):
    # add element to the tree
    # if it already exists at the given node increase weight
    # else add a new child
    for i in range(len(node.children)):
        if node.children[i].name == name:
            node.children[i].weight += 1
            newTree = node.children[i]
            break
    else:
        newTree = treenode([],name=name,weight=1,parent=node,depth=node.depth+1)
        node.children.append(newTree)
    return newTree

def treeToDict(t,data):
    # read the tree into a dictionary
    if t.children != []:
        for i in range(len(t.children)):
            data[str(t.depth)+'_'+t.name] = [t.name, t.children[i].name, t.depth, t.weight, t.parentname]
    else:
        data[str(t.depth)+'_'+t.name] = [t.name, '', t.depth, t.weight, t.parentname]
    for i in range(len(t.children)):
        treeToDict(t.children[i],data)

# Create random dataset that leads to very long tree branches:
# A is an index for each set of data B which becomes one branch
rd.seed(23)
testSet = [''.join(l) for l in it.combinations(string.ascii_uppercase[:20],2)]
A = []
B = []
for i in range(10):
    for j in range(rd.randint(10,6000)):
        A.append(i)
        B.append(rd.choice(testSet))
dd = {"A":A,"B":B}
data = pd.DataFrame(dd)
# The maximum length should be above 5500, use another seed if it's not:
print data.groupby('A').count().max()

# Create the tree
root = treenode([],name='0')
for i in range(len(data.values)):
    if i == 0:
        newTree = add_child(root,data.values[i,1])
        oldses = data.values[i,0]
    else:
        if data.values[i,0] == oldses:
            newTree = add_child(newTree,data.values[i,1])
        else:
            newTree = add_child(root,data.values[i,1])
            oldses = data.values[i,0]

result={}
treeToDict(root,result)

PS：我知道 treeToDict() 函数有问题，因为它会覆盖条目，因为可能存在重复键。对于此错误，此错误并不重要。

Answer 1

根据我的经验，您遇到的问题不是堆栈大小，而是算法本身。

完全不用递归就可以实现树遍历过程。您应该实施基于堆栈的depth/breadth 优先搜索算法。 Python-like 伪代码可能如下所示：

stack = []
def traverse_tree(root):
  stack.append(root)
  while stack:
    cur = stack.pop()
    cur.do_some_awesome_stuff()
    stack.append(cur.get_children())

这种方法具有令人难以置信的可扩展性，可以让您处理任何树木。

作为进一步阅读，您可以尝试 this and that。

在 python 2.7 中增加递归限制和堆栈大小

Increase recursion limit and stack size in python 2.7

recursion

stack

ipython

python-2.7

spyder