使用 python 分隔列

Using python to separate columns

我有一个公共汽车时刻表,其中包含工作日、每个站点以及公共汽车 arrives/departs 停靠的相应时间。我希望将每个完整旅程分成一条新线路上的多个单独旅程。我的数据如下所示:

Day       Route    Leave Garage    Stop 1     Stop 2     Stop 3     Stop 4    Stop 5  
 
Monday     01          07:15        07:25      07:31      07:35      07:50     08:15
Monday     01          08:00        08:10      08:16      08:25      08:45     09:12
Tuesday    01          07:15        07:25      07:31      07:35      07:50     08:15
Tuesday    01          08:00        08:10      08:16      08:25      08:45     09:12
Wednesday  01          07:15        07:25      07:31      07:35      07:50     08:15
Wednesday  01          08:00        08:10      08:16      08:25      08:45     09:12

我的预期输出是:

Day       Route    Origin        Time     Destination    Time     
 
Monday     01      Leave Garage  07:15    Stop 1         07:25      
Monday     01      Stop 1        07:25    Stop 2         07:31 
Monday     01      Stop 2        07:31    Stop 3         07:35 
Monday     01      Stop 3        07:35    Stop 4         07:50     
Monday     01      Stop 4        07:50    Stop 5         08:15
Monday     01      Leave Garage  08:00    Stop 1         08:10      
Monday     01      Stop 1        08:10    Stop 2         08:16 
Monday     01      Stop 2        08:16    Stop 3         08:25 
Monday     01      Stop 3        08:25    Stop 4         08:45     
Monday     01      Stop 4        08:45    Stop 5         09:12
Tuesday    01      Leave Garage  07:15    Stop 1         07:25      
Tuesday    01      Stop 1        07:25    Stop 2         07:31 
Tuesday    01      Stop 2        07:31    Stop 3         07:35 
Tuesday    01      Stop 3        07:35    Stop 4         07:50     
Tuesday    01      Stop 4        07:50    Stop 5         08:15
Tuesday    01      Leave Garage  08:00    Stop 1         08:10      
Tuesday    01      Stop 1        08:10    Stop 2         08:16 
Tuesday    01      Stop 2        08:16    Stop 3         08:25 
Tuesday    01      Stop 3        08:25    Stop 4         08:45     
Tuesday    01      Stop 4        08:45    Stop 5         09:12
...

在pandas中是否有可以实现此目的的循环?

非常感谢! - 阿里

这很有趣!

我想我设法正确地转换了数据,但我认为肯定有更好的方法。

tmp = (
    df
    .set_index(["Day", "Route"])
    .stack()
    .reset_index()
    .rename(columns={"level_2": "Origin", 0:"Time"})
)

    Day Route   Origin  Time
0   Monday  1   Leave Garage    07:15
1   Monday  1   Stop 1  07:25
2   Monday  1   Stop 2  07:31
3   Monday  1   Stop 3  07:35
4   Monday  1   Stop 4  07:50
5   Monday  1   Stop 5  08:15
6   Monday  1   Leave Garage    08:00
7   Monday  1   Stop 1  08:10
8   Monday  1   Stop 2  08:16
9   Monday  1   Stop 3  08:25
10  Monday  1   Stop 4  08:45
11  Monday  1   Stop 5  09:12
12  Tuesday 1   Leave Garage    07:15
13  Tuesday 1   Stop 1  07:25
14  Tuesday 1   Stop 2  07:31
15  Tuesday 1   Stop 3  07:35
16  Tuesday 1   Stop 4  07:50
17  Tuesday 1   Stop 5  08:15
18  Tuesday 1   Leave Garage    08:00
19  Tuesday 1   Stop 1  08:10
20  Tuesday 1   Stop 2  08:16
21  Tuesday 1   Stop 3  08:25
22  Tuesday 1   Stop 4  08:45
23  Tuesday 1   Stop 5  09:12
24  Wednesday   1   Leave Garage    07:15
25  Wednesday   1   Stop 1  07:25
26  Wednesday   1   Stop 2  07:31
27  Wednesday   1   Stop 3  07:35
28  Wednesday   1   Stop 4  07:50
29  Wednesday   1   Stop 5  08:15
30  Wednesday   1   Leave Garage    08:00
31  Wednesday   1   Stop 1  08:10
32  Wednesday   1   Stop 2  08:16
33  Wednesday   1   Stop 3  08:25
34  Wednesday   1   Stop 4  08:45
35  Wednesday   1   Stop 5  09:12


result = (
    tmp
    .join(tmp.shift(-1)[["Origin", "Time"]], rsuffix="_")
    .rename(columns={"Origin_": "Destination", "Time_": "Destination_Time"})
)
result = result.loc[
    result["Destination"].ne("Leave Garage") & 
    result["Destination"].notnull()
]
result

          Day  Route        Origin   Time Destination Destination_Time
0      Monday      1  Leave Garage  07:15      Stop 1            07:25
1      Monday      1        Stop 1  07:25      Stop 2            07:31
2      Monday      1        Stop 2  07:31      Stop 3            07:35
3      Monday      1        Stop 3  07:35      Stop 4            07:50
4      Monday      1        Stop 4  07:50      Stop 5            08:15
6      Monday      1  Leave Garage  08:00      Stop 1            08:10
7      Monday      1        Stop 1  08:10      Stop 2            08:16
8      Monday      1        Stop 2  08:16      Stop 3            08:25
9      Monday      1        Stop 3  08:25      Stop 4            08:45
10     Monday      1        Stop 4  08:45      Stop 5            09:12
12    Tuesday      1  Leave Garage  07:15      Stop 1            07:25
13    Tuesday      1        Stop 1  07:25      Stop 2            07:31
14    Tuesday      1        Stop 2  07:31      Stop 3            07:35
15    Tuesday      1        Stop 3  07:35      Stop 4            07:50
16    Tuesday      1        Stop 4  07:50      Stop 5            08:15
18    Tuesday      1  Leave Garage  08:00      Stop 1            08:10
19    Tuesday      1        Stop 1  08:10      Stop 2            08:16
20    Tuesday      1        Stop 2  08:16      Stop 3            08:25
21    Tuesday      1        Stop 3  08:25      Stop 4            08:45
22    Tuesday      1        Stop 4  08:45      Stop 5            09:12
24  Wednesday      1  Leave Garage  07:15      Stop 1            07:25
25  Wednesday      1        Stop 1  07:25      Stop 2            07:31
26  Wednesday      1        Stop 2  07:31      Stop 3            07:35
27  Wednesday      1        Stop 3  07:35      Stop 4            07:50
28  Wednesday      1        Stop 4  07:50      Stop 5            08:15
30  Wednesday      1  Leave Garage  08:00      Stop 1            08:10
31  Wednesday      1        Stop 1  08:10      Stop 2            08:16
32  Wednesday      1        Stop 2  08:16      Stop 3            08:25
33  Wednesday      1        Stop 3  08:25      Stop 4            08:45
34  Wednesday      1        Stop 4  08:45      Stop 5            09:12

通过 pandas.concatreset_index 缩短:

df2 = df.set_index(["Day", "Route"])

s1 = df2.iloc[:, :-1].stack().rename_axis(index={None:"Origin"})
s2 = df2.shift(-1, axis=1).stack().rename_axis(index={None:"Destination"})

new_df = pd.concat([s1.reset_index(2, name="Time"), s2.reset_index(2, name="Time")], 1)
print(new_df.reset_index())

输出:

          Day  Route        Origin   Time   Destination   Time
0      Monday      1  Leave Garage  07:15  Leave Garage  07:25
1      Monday      1        Stop 1  07:25        Stop 1  07:31
2      Monday      1        Stop 2  07:31        Stop 2  07:35
3      Monday      1        Stop 3  07:35        Stop 3  07:50
4      Monday      1        Stop 4  07:50        Stop 4  08:15
5      Monday      1  Leave Garage  08:00  Leave Garage  08:10
6      Monday      1        Stop 1  08:10        Stop 1  08:16
7      Monday      1        Stop 2  08:16        Stop 2  08:25
8      Monday      1        Stop 3  08:25        Stop 3  08:45
9      Monday      1        Stop 4  08:45        Stop 4  09:12
10    Tuesday      1  Leave Garage  07:15  Leave Garage  07:25
11    Tuesday      1        Stop 1  07:25        Stop 1  07:31
12    Tuesday      1        Stop 2  07:31        Stop 2  07:35
13    Tuesday      1        Stop 3  07:35        Stop 3  07:50
14    Tuesday      1        Stop 4  07:50        Stop 4  08:15
15    Tuesday      1  Leave Garage  08:00  Leave Garage  08:10
16    Tuesday      1        Stop 1  08:10        Stop 1  08:16
17    Tuesday      1        Stop 2  08:16        Stop 2  08:25
18    Tuesday      1        Stop 3  08:25        Stop 3  08:45
19    Tuesday      1        Stop 4  08:45        Stop 4  09:12
20  Wednesday      1  Leave Garage  07:15  Leave Garage  07:25
21  Wednesday      1        Stop 1  07:25        Stop 1  07:31
22  Wednesday      1        Stop 2  07:31        Stop 2  07:35
23  Wednesday      1        Stop 3  07:35        Stop 3  07:50
24  Wednesday      1        Stop 4  07:50        Stop 4  08:15
25  Wednesday      1  Leave Garage  08:00  Leave Garage  08:10
26  Wednesday      1        Stop 1  08:10        Stop 1  08:16
27  Wednesday      1        Stop 2  08:16        Stop 2  08:25
28  Wednesday      1        Stop 3  08:25        Stop 3  08:45
29  Wednesday      1        Stop 4  08:45        Stop 4  09:12

让我们试试基于 numpy 的方法:

s = df.set_index(['Day', 'Route'])
s1, s2 = s.iloc[:, :-1], s.iloc[:, 1:]

df1 = pd.DataFrame({
    'Origin': np.tile([*s1], len(s)), 'Time_Orig': np.hstack(s1.values),
    'Destination': np.tile([*s2], len(s)),  'Time_Dest': np.hstack(s2.values)},
     index=s.index.repeat(s.shape[1] - 1)).reset_index()

          Day  Route        Origin Time_Orig Destination Time_Dest
0      Monday      1  Leave Garage     07:15      Stop 1     07:25
1      Monday      1        Stop 1     07:25      Stop 2     07:31
2      Monday      1        Stop 2     07:31      Stop 3     07:35
3      Monday      1        Stop 3     07:35      Stop 4     07:50
4      Monday      1        Stop 4     07:50      Stop 5     08:15
5      Monday      1  Leave Garage     08:00      Stop 1     08:10
6      Monday      1        Stop 1     08:10      Stop 2     08:16
7      Monday      1        Stop 2     08:16      Stop 3     08:25
8      Monday      1        Stop 3     08:25      Stop 4     08:45
9      Monday      1        Stop 4     08:45      Stop 5     09:12
10    Tuesday      1  Leave Garage     07:15      Stop 1     07:25
11    Tuesday      1        Stop 1     07:25      Stop 2     07:31
12    Tuesday      1        Stop 2     07:31      Stop 3     07:35
13    Tuesday      1        Stop 3     07:35      Stop 4     07:50
14    Tuesday      1        Stop 4     07:50      Stop 5     08:15
15    Tuesday      1  Leave Garage     08:00      Stop 1     08:10
16    Tuesday      1        Stop 1     08:10      Stop 2     08:16
17    Tuesday      1        Stop 2     08:16      Stop 3     08:25
18    Tuesday      1        Stop 3     08:25      Stop 4     08:45
19    Tuesday      1        Stop 4     08:45      Stop 5     09:12
20  Wednesday      1  Leave Garage     07:15      Stop 1     07:25
21  Wednesday      1        Stop 1     07:25      Stop 2     07:31
22  Wednesday      1        Stop 2     07:31      Stop 3     07:35
23  Wednesday      1        Stop 3     07:35      Stop 4     07:50
24  Wednesday      1        Stop 4     07:50      Stop 5     08:15
25  Wednesday      1  Leave Garage     08:00      Stop 1     08:10
26  Wednesday      1        Stop 1     08:10      Stop 2     08:16
27  Wednesday      1        Stop 2     08:16      Stop 3     08:25
28  Wednesday      1        Stop 3     08:25      Stop 4     08:45
29  Wednesday      1        Stop 4     08:45      Stop 5     09:12