计算图中节点的所有可能路径

Calculate all possible paths from nodes in chart

我需要一个建议如何使用程序计算从开始节点到结束节点的所有可能路径(以编程方式),我可以使用 C#、Python 或 Matlab,但我不知道哪个是编码简单快捷,工作量少,我也不知道从哪里开始,我还需要自动绘制节点之间的边,稍后计算更多公式。 我有以下数据

Node Date         Data
A    2020-01-01   2.09
B    2020-01-05   0.89
C    2020-01-08   3.17
D    2020-01-08   1.15
E    2020-01-15   3.65

我想执行以下操作:

  1. 创建从任何节点到另一个节点的路径(一种方式,仅从较低日期到较高日期)并且必须通过路径中日期之间的所有节点,例如:

    1.1 从 (A) 到 (E) 应提取以下路径:

    • A,B,C,E
    • A,B,D,E

    因为C和D在同一天,所以我们有2条不同的路径。

  2. 列表项

这将为每条路径生成一个节点链列表。它使用 python 列表和字典进行散列 - 因此如果您处理大型数据集,它会非常慢。如果是这种情况,请查看 pandas 库 (groupby)。同样的逻辑也可以,但你肯定会在 nodes_by_date 函数中节省时间。该库中可能还有其他工具可以生成您需要的路径。

nodes = [('E',15,3.65), ('A',1,2.09), ('B',5,.89), ('C',8,3.17), ('D',8,1.15)]
#nodes = [('E',15,3.65), ('A',1,2.09), ('B',5,.89), ('C',8,3.17), ('D',8,1.15), ('F', 16, 100), ('G', 16, 200), ('H', 17, 1000)]

def all_paths(nodes):
    nbd = nodes_by_date(nodes)
    return get_chains(nbd, 0)

def nodes_by_date(nodes):
    # sort by date (not sure what format your data is in, but might be easier to convert to a datenum/utc time)
    nodes = sorted(nodes, key=lambda x: x[1])
    # build a list of lists, each containing all of the nodes with a certain date
    # if your data is larger, look into the pandas library's groupby operation
    dates = {}
    for n in nodes:
        d = dates.get(n[1], [])
        d.append(n)
        dates[n[1]]=d
    return list(dates.values())

def get_chains(nbd, i):
    if i == len(nbd):
        return []
    # depth-first recursion so tails are only generated once
    tails = get_chains(nbd, i+1)
    chains = []
    for n in nbd[i]:
        if len(tails):
            for t in tails:
                newchain = [n]
                # only performant for smaller data
                newchain.extend(t)
                chains.append(newchain)
        # end of recursion
        else:
            chains.append([n])
    return chains

使用 Python groupby 和来自 Python itertools module

的产品

代码

from itertools import groupby, product

def all_paths(s):
    # Convert string to list of node triplets
    nodes = [n.strip().split() for n in s.split('\n')]
    
    # Skip header and sort by time 
    # (since time is padded, no need to convert to datetime object to sort)
    nodes = sorted(nodes[1:], key=lambda n: n[1])  # sorting by time which is second item in list (n[1])
    
    # Group nodes by time (i.e. list of list, with nodes with same time in same sublist)
    # Just keeping the node field of each item in sublist (i.e. node[0] is Node field)
    labels = [[node[0] for node in v] for k, v in groupby(nodes, lambda n:n[1])]
    
    # Path is the product of the labels (list of lists)
    return list(product(*labels))

用法

示例 1:字符串中的数据

data = '''Node Date Data
A 2020-01-01 2.09
B 2020-01-05 0.89
C 2020-01-08 3.17
D 2020-01-08 1.15
E 2020-01-15 3.65'''

print(all_paths(data))

示例 2:文件中的数据 data.txt

with open('data.txt', 'r') as fin:
    print(all_paths(fin.read()))

输出(对于两种情况):

[('A', 'B', 'C', 'E'), ('A', 'B', 'D', 'E')]