使用二叉树从事件列表中记录 success/failures

Recording success/failures from a list of event using a binary tree

我有一个事件列表 ['one', 'two', 'three']。这些事件可以通过或失败。我想建立一个决策树来记录可能的结果——即:

                    <root>
                      /\
      <one_pass>              <one_fail>
          /\                      /\
<two_pass>  <two_fail>  <two_pass>  <two_fail>
...

可以看出,递归是解决这个问题的方法,似乎与二叉树结合在一起。我正在努力解决的是在每个级别构建 pass/fails 的循环...它是 for 循环还是我使用递归?

我的代码起点来自 ,复制在下面以便于参考。 (稍后我想在每个节点中存储值,然后计算这些值的总和,因此我从这里开始):

class Node:
    def __init__(self, data, children=None):
        if children is None:
            children = []
        self.data = data
        self.children = children

    def __str__(self):
        return str(self.data)

    __repr__ = __str__

def get_all_paths(node, path=None):
    paths = []
    if path is None:
        path = []
    path.append(node)
    if node.children:
        for child in node.children:
            paths.extend(get_all_paths(child, path[:]))
    else:
            paths.append(path)
    return paths

我已经手动构建了以下内容以开始研究如何创建完整的结构,这就是我遇到的问题。

tree = Node("start", [
        Node("one_pass", [
            Node("two_pass"), 
            Node("two_fail")
        ]),
        Node("one_fail", [
            Node("two_pass"),
            Node("two_fail")
        ])
    ])

通过尝试,我被困在 for 循环级别.. 这让我觉得我不知道在此处使用的方法。如果循环在每次通过时创建两个节点——基本上是在一次迭代中同时创建左右子节点——我如何 link 它们到前一个节点?我是否需要一种方法来执行类似以下伪代码的操作?

for event in events:
    insert_node(my_tree, event, "pass")
    insert_node(my_tree, event, "fail")

注意如果有任何影响,我有 15 层树。

您可以使用以下代码(非递归)生成您的树:

def generate_nodes(parent, data):  # adds data_pass and data_fail child nodes to parent
    passed = Node(data + '_pass')
    failed = Node(data + '_fail')
    parent.children.append(passed)
    parent.children.append(failed)

    return [passed, failed]  # returns new nodes


events = ['one', 'two', 'three']
root = Node('root')  # root of our tree
current_level_events = [root]  # nodes of currently processed level

for data in events:  # go through all events
    tmp = []
    for event in current_level_events:
        tmp += generate_nodes(event, data)  # generate events 
    current_level_events = tmp

递归实现:

events = ['one', 'two', 'three']

def generate_nodes(parent, index):
    if index >= len(events):  # we finished tree generation, no more events found
        return
    data = events[index]
    passed = Node(data + '_pass')
    failed = Node(data + '_fail')
    parent.children.append(passed)
    parent.children.append(failed)
    # generate children for passed and failed nodes
    generate_nodes(passed, index + 1)
    generate_nodes(failed, index + 1)


events = ['one', 'two', 'three']
root = Node('root')  # root of our tree

generate_nodes(root, 0)  # starts tree generation

二进制结构如此有用的原因之一是它们很容易存储在内存中和索引中,只需一些数学技巧。

如果您的所有条件都提前知道,则很容易将您的 ['pass', 'fail', 'pass', 'fail' etc..] 列表视为二进制数 ([1, 0, 1, 0]: 10),用作列表的索引预先分配给总可能结果的长度(2N,其中 N 是条件数)。

如果您需要在决策制定的每个阶段存储值(即:为 ['pass','fail']['pass', 'fail', 'fail'] 存储一个值)这有点复杂,但仍然没有那么困难.我们知道任何给定数量的条件都会产生 2N 种可能的结果,因此我们也可以知道在 N-1 个条件下给出了多少个结果,以及 N-2 等。总共有2N-1个条件的所有较短条件列表在与N个条件相关的2N条件之前。通过将我们之前找到的二进制数添加到 2N-1,我们可以获得每个可能的条件列表的唯一索引。

如果条件列表很长,很容易发现可能的结果列表呈指数级增长。如果您从不打算访问所有可能的结果,那么使用带有数字键的字典而不是列表来存储所有可能的结果可能会有好处。

以你的例子为例:

                        <root> (0)
                    /               \
          <one_pass> (1)           <one_fail> (2)
             /     \                     /     \
    <two_pass>(3) <two_fail>(4) <two_pass>(5) <two_fail>(6)
      /  \          /  \           /   \           /    \
   (7)  (8)      (9)    (10)     (11)   (12)    (13)    (14)
 /  \   /   \   /   \   /   \   /   \   /   \   /   \   /   \
15, 16, 17, 18| 19, 20, 21, 22| 23, 24, 25, 26| 27, 28, 29, 30

从这里很容易想出一些函数将 ['pass', 'fail', 'pass', 'fail'] 的列表转换为索引:

def list_to_number(L):
    acc = 0
    for item in L:
        acc *= 2 #shift number left as we shift in values from the right
        if item == 'fail': #assumes pass == 0 fail == 1
            acc += 1
    return acc


def idx(L): 
    return 2**len(L) - 1 + list_to_number(L)

完整示例:

#  Assume we will never have more than 4 events:
#  With 4 events we have 2**4 outcomes.
#  In total there are 2**5-1 possible outcomes including intermediate steps.
#  We will preallocate a list filled with `None` to length 2**5-1 so we have 
#    enough space for all possible outcomes.
tree_data = [None]*(2**5-1)

#insert value 42 given [event_1_pass, event_2_pass, event_3_fail]
tree_data[idx(['pass', 'pass', 'fail'])] = 42

#insert value 56 given [event_1_pass, event_2_pass, event_3_pass, event_4_pass]
tree_data[idx(['pass', 'pass', 'pass', 'pass'])] = 56

#retrieve value given [event_1_pass, event_2_pass, event_3_fail]
print(tree_data[idx(['pass', 'pass', 'fail'])])
# prints: 42

#retrieve value given [event_1_fail, event_2_pass, event_3_fail]
print(tree_data[idx(['fail', 'pass', 'fail'])])
# prints: None because we never stored anything there.