遍历数据框两次:哪种方法最理想?

Iterating over a dataframe twice: which is the ideal way?

我正在尝试在 Power BI 中为 Sankey 图表创建一个数据框,它需要像这样的源和目标。

id Source Destination
1 Starting a next point b
1 next point b final point c
1 final point c end
2 Starting a next point b
2 next point b
3 Starting a next point b
3 next point b final point c
3 final point c end

我有这样一个数据框:

ID flow
1 Starting a
1 next point b
1 final point c
2 Starting a
2 next point b
3 Starting a
3 next point b
3 final point c

我尝试像下面这样遍历数据框两次:

for index, row in df.iterrows():
  for j, r in df.iterrows():
    if row['ID'] == r['ID']:
        if (index + 1 == j) & ("final point c" not in row['flow']):
            df['Destination'][index] = df['flow'][j]
        elif "final point c" in row['flow']:
            df['Destination'][index] = 'End of flow'

由于它在同一个数据帧上迭代两次,当记录很大时,处理起来会花费很多时间。

有更好的方法吗?我尝试查看所有类似的问题,但找不到与我的问题相关的任何内容。

您可以使用 groupby+shift 和一些掩码:

end = df['flow'].str.startswith('final point')
df2 = (df.assign(destination=df.groupby('ID')['flow'].shift(-1)
                               .mask(end, end.map({True: 'end'}))
                 )
         .rename(columns={'flow': 'source'})
       )

输出:

   ID         source    destination
0   1     Starting a   next point b
1   1   next point b  final point c
2   1  final point c            end
3   2     Starting a   next point b
4   2   next point b            NaN
5   3     Starting a   next point b
6   3   next point b  final point c
7   3  final point c            end

替代 combine_first 填充 NaN:

end = df['flow'].str.startswith('final point').map({True: 'end', False: ''})
df2 = (df.assign(destination=df.groupby('ID')['flow'].shift(-1).combine_first(end))
         .rename(columns={'flow': 'source'})
       )

输出:

   ID         source    destination
0   1     Starting a   next point b
1   1   next point b  final point c
2   1  final point c            end
3   2     Starting a   next point b
4   2   next point b               
5   3     Starting a   next point b
6   3   next point b  final point c
7   3  final point c            end