如何根据特定条件在数据框中插入行?

How to insert rows in dataframe based on specific condition?

I have a following dataframe:
Index Time User Description
1 27.10.2021 15:58:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
2 27.10.2021 15:59:00 UserA@gmail.com Start edit of part studio feature
3 27.10.2021 15:59:00 UserA@gmail.com Cancel Operation
4 27.10.2021 15:59:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
5 27.10.2021 15:59:00 UserB@gmail.com Start edit of part studio feature
6 27.10.2021 16:03:00 UserB@gmail.com Cancel Operation
7 27.10.2021 16:03:00 UserA@gmail.com Add assembly feature
9 27.10.2021 16:03:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
10 27.10.2021 16:15:00 UserA@gmail.com Start edit of part studio feature
11 27.10.2021 16:15:00 UserB@gmail.com Start edit of part studio feature
12 27.10.2021 16:15:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
14 27.10.2021 16:54:00 UserB@gmail.com Add assembly feature
15 27.10.2021 16:55:00 UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
16 27.10.2021 16:55:00 UserB@gmail.com Start edit of part studio feature
17 27.10.2021 16:55:00 UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B

预期输出:

Index Time User Description
1 27.10.2021 15:58:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
2 27.10.2021 15:59:00 UserA@gmail.com Start edit of part studio feature
3 27.10.2021 15:59:00 UserA@gmail.com Cancel Operation
4 27.10.2021 15:59:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
5 27.10.2021 15:59:00 UserB@gmail.com Start edit of part studio feature
6 27.10.2021 16:03:00 UserB@gmail.com Cancel Operation
7 27.10.2021 16:03:00 UserA@gmail.com Add assembly feature
8 27.10.2021 16:03:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO closed by User A
9 27.10.2021 16:03:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
10 27.10.2021 16:15:00 UserA@gmail.com Start edit of part studio feature
11 27.10.2021 16:15:00 UserB@gmail.com Start edit of part studio feature
12 27.10.2021 16:15:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
13 27.10.2021 16:15:00 UserB@gmail.com Tab Delta of type PARTSTUDIO opened by User B
14 27.10.2021 16:54:00 UserB@gmail.com Add assembly feature
15 27.10.2021 16:55:00 UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
16 27.10.2021 16:55:00 UserB@gmail.com Start edit of part studio feature
17 27.10.2021 16:55:00 UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B

如何遍历数据框并检查描述列中每个值“Tab x opened by User y”之后,“Tab x closed by User y" 在数据框中更远的地方?如果是,可以。如果不是,如果后面是“Tab zz opened by User A”,这意味着“Tab x closed by User y”丢失并且应该在“Tab zz opened by User A”值之前插入一行(示例索引 8)。反之亦然(索引 13)。没有 df.iterrows 有没有办法做到这一点?提前致谢。

抱歉,我忘了回答这个问题。

这是一种解决方案。不是很简洁也不是特别优雅,但应该比使用 iterrows 来修改和检查未来的行更快。

数据:

                   Time             User                                    Description
0   27.10.2021 15:58:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  UserA@gmail.com              Start edit of part studio feature
2   27.10.2021 15:59:00  UserA@gmail.com                               Cancel Operation
3   27.10.2021 15:59:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  UserB@gmail.com              Start edit of part studio feature
5   27.10.2021 16:03:00  UserB@gmail.com                               Cancel Operation
6   27.10.2021 16:03:00  UserA@gmail.com                           Add assembly feature
7   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
8   27.10.2021 16:03:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO opened by User A
9   27.10.2021 16:14:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
10  27.10.2021 16:15:00  UserA@gmail.com              Start edit of part studio feature
11  27.10.2021 16:15:00  UserB@gmail.com              Start edit of part studio feature
12  27.10.2021 16:15:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
13  27.10.2021 16:54:00  UserB@gmail.com                           Add assembly feature
14  27.10.2021 16:55:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
15  27.10.2021 16:55:00  UserB@gmail.com              Start edit of part studio feature
16  27.10.2021 16:55:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO closed by User B
17  27.10.2021 16:56:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
18  27.10.2021 16:57:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO closed by User B

我确实连续添加了几个 open/close 以进行更多测试。

代码:

# Pattern to extract action info.
pattern = r'^Tab (?P<tab_name>.+) of type (?P<tab_type>.+) (?P<tab_action>\bclosed\b|\bopened\b) by (?P<user_id>.+)$'

# Add utility columns.
df = pd.concat([df, df['Description'].str.extract(pattern)], axis=1)

# Get rows with tweaked index.
def get_new_rows(df):    
    all_values = []
    for action in ['opened', 'closed']:
        action_mask = df['tab_action'].eq(action)
        first_tabs = df[df['tab_action'].eq(df['tab_action'].shift(-1)) & action_mask]
        second_tabs = df[df['tab_action'].eq(df['tab_action'].shift(1)) & action_mask]
                
        if len(first_tabs) == 0:
            continue

        if action == 'opened':
            values_tab, index_tab, offset, new_action = first_tabs, second_tabs, -0.5, 'closed'
        elif action == 'closed':
            values_tab, index_tab, offset, new_action = second_tabs, first_tabs, 0.5, 'opened'

        values_tab.index = index_tab.index + offset
        values_tab['Time'] = index_tab['Time'].to_numpy()
        values_tab['tab_action'] = new_action
        all_values.append(values_tab)
    
    last_action = df.tail(1)
    if last_action['tab_action'].iat[0] == 'opened':
        last_action.index += 0.5
        last_action['tab_action'] = 'closed'
        all_values.append(last_action)
    
    return pd.concat(all_values)


# Add new rows at the correct positions.
complete_df = pd.concat([df, df.dropna(subset='tab_action').groupby(['user_id'], as_index=False).apply(get_new_rows).droplevel(0)]).sort_index().reset_index(drop=True)

# Fix the description
fix_m = complete_df['tab_name'].notna()
complete_df.loc[fix_m, 'Description'] = ('Tab ' + complete_df.loc[fix_m, 'tab_name'] + 
                                        ' of type ' + complete_df.loc[fix_m, 'tab_type'] +
                                        ' ' + complete_df.loc[fix_m, 'tab_action'] + ' by ' +
                                        complete_df.loc[fix_m, 'user_id']) 
# Drop utility columns.
complete_df = complete_df.drop(columns=['tab_name', 'tab_type', 'tab_action', 'user_id'])

结果:

                   Time             User                                    Description
0   27.10.2021 15:58:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  UserA@gmail.com              Start edit of part studio feature
2   27.10.2021 15:59:00  UserA@gmail.com                               Cancel Operation
3   27.10.2021 15:59:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  UserB@gmail.com              Start edit of part studio feature
5   27.10.2021 16:03:00  UserB@gmail.com                               Cancel Operation
6   27.10.2021 16:03:00  UserA@gmail.com                           Add assembly feature
7   27.10.2021 16:03:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO closed by User A
8   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
9   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
10  27.10.2021 16:03:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO opened by User A
11  27.10.2021 16:14:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO closed by User A
12  27.10.2021 16:14:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
13  27.10.2021 16:15:00  UserA@gmail.com              Start edit of part studio feature
14  27.10.2021 16:15:00  UserB@gmail.com              Start edit of part studio feature
15  27.10.2021 16:15:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
16  27.10.2021 16:15:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO opened by User B
17  27.10.2021 16:54:00  UserB@gmail.com                           Add assembly feature
18  27.10.2021 16:55:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
19  27.10.2021 16:55:00  UserB@gmail.com              Start edit of part studio feature
20  27.10.2021 16:55:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO closed by User B
21  27.10.2021 16:55:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
22  27.10.2021 16:56:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
23  27.10.2021 16:56:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO opened by User B
24  27.10.2021 16:57:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO closed by User B