使用 python 分隔列
Using python to separate columns
我有一个公共汽车时刻表,其中包含工作日、每个站点以及公共汽车 arrives/departs 停靠的相应时间。我希望将每个完整旅程分成一条新线路上的多个单独旅程。我的数据如下所示:
Day Route Leave Garage Stop 1 Stop 2 Stop 3 Stop 4 Stop 5
Monday 01 07:15 07:25 07:31 07:35 07:50 08:15
Monday 01 08:00 08:10 08:16 08:25 08:45 09:12
Tuesday 01 07:15 07:25 07:31 07:35 07:50 08:15
Tuesday 01 08:00 08:10 08:16 08:25 08:45 09:12
Wednesday 01 07:15 07:25 07:31 07:35 07:50 08:15
Wednesday 01 08:00 08:10 08:16 08:25 08:45 09:12
我的预期输出是:
Day Route Origin Time Destination Time
Monday 01 Leave Garage 07:15 Stop 1 07:25
Monday 01 Stop 1 07:25 Stop 2 07:31
Monday 01 Stop 2 07:31 Stop 3 07:35
Monday 01 Stop 3 07:35 Stop 4 07:50
Monday 01 Stop 4 07:50 Stop 5 08:15
Monday 01 Leave Garage 08:00 Stop 1 08:10
Monday 01 Stop 1 08:10 Stop 2 08:16
Monday 01 Stop 2 08:16 Stop 3 08:25
Monday 01 Stop 3 08:25 Stop 4 08:45
Monday 01 Stop 4 08:45 Stop 5 09:12
Tuesday 01 Leave Garage 07:15 Stop 1 07:25
Tuesday 01 Stop 1 07:25 Stop 2 07:31
Tuesday 01 Stop 2 07:31 Stop 3 07:35
Tuesday 01 Stop 3 07:35 Stop 4 07:50
Tuesday 01 Stop 4 07:50 Stop 5 08:15
Tuesday 01 Leave Garage 08:00 Stop 1 08:10
Tuesday 01 Stop 1 08:10 Stop 2 08:16
Tuesday 01 Stop 2 08:16 Stop 3 08:25
Tuesday 01 Stop 3 08:25 Stop 4 08:45
Tuesday 01 Stop 4 08:45 Stop 5 09:12
...
在pandas中是否有可以实现此目的的循环?
非常感谢! - 阿里
这很有趣!
我想我设法正确地转换了数据,但我认为肯定有更好的方法。
tmp = (
df
.set_index(["Day", "Route"])
.stack()
.reset_index()
.rename(columns={"level_2": "Origin", 0:"Time"})
)
Day Route Origin Time
0 Monday 1 Leave Garage 07:15
1 Monday 1 Stop 1 07:25
2 Monday 1 Stop 2 07:31
3 Monday 1 Stop 3 07:35
4 Monday 1 Stop 4 07:50
5 Monday 1 Stop 5 08:15
6 Monday 1 Leave Garage 08:00
7 Monday 1 Stop 1 08:10
8 Monday 1 Stop 2 08:16
9 Monday 1 Stop 3 08:25
10 Monday 1 Stop 4 08:45
11 Monday 1 Stop 5 09:12
12 Tuesday 1 Leave Garage 07:15
13 Tuesday 1 Stop 1 07:25
14 Tuesday 1 Stop 2 07:31
15 Tuesday 1 Stop 3 07:35
16 Tuesday 1 Stop 4 07:50
17 Tuesday 1 Stop 5 08:15
18 Tuesday 1 Leave Garage 08:00
19 Tuesday 1 Stop 1 08:10
20 Tuesday 1 Stop 2 08:16
21 Tuesday 1 Stop 3 08:25
22 Tuesday 1 Stop 4 08:45
23 Tuesday 1 Stop 5 09:12
24 Wednesday 1 Leave Garage 07:15
25 Wednesday 1 Stop 1 07:25
26 Wednesday 1 Stop 2 07:31
27 Wednesday 1 Stop 3 07:35
28 Wednesday 1 Stop 4 07:50
29 Wednesday 1 Stop 5 08:15
30 Wednesday 1 Leave Garage 08:00
31 Wednesday 1 Stop 1 08:10
32 Wednesday 1 Stop 2 08:16
33 Wednesday 1 Stop 3 08:25
34 Wednesday 1 Stop 4 08:45
35 Wednesday 1 Stop 5 09:12
result = (
tmp
.join(tmp.shift(-1)[["Origin", "Time"]], rsuffix="_")
.rename(columns={"Origin_": "Destination", "Time_": "Destination_Time"})
)
result = result.loc[
result["Destination"].ne("Leave Garage") &
result["Destination"].notnull()
]
result
Day Route Origin Time Destination Destination_Time
0 Monday 1 Leave Garage 07:15 Stop 1 07:25
1 Monday 1 Stop 1 07:25 Stop 2 07:31
2 Monday 1 Stop 2 07:31 Stop 3 07:35
3 Monday 1 Stop 3 07:35 Stop 4 07:50
4 Monday 1 Stop 4 07:50 Stop 5 08:15
6 Monday 1 Leave Garage 08:00 Stop 1 08:10
7 Monday 1 Stop 1 08:10 Stop 2 08:16
8 Monday 1 Stop 2 08:16 Stop 3 08:25
9 Monday 1 Stop 3 08:25 Stop 4 08:45
10 Monday 1 Stop 4 08:45 Stop 5 09:12
12 Tuesday 1 Leave Garage 07:15 Stop 1 07:25
13 Tuesday 1 Stop 1 07:25 Stop 2 07:31
14 Tuesday 1 Stop 2 07:31 Stop 3 07:35
15 Tuesday 1 Stop 3 07:35 Stop 4 07:50
16 Tuesday 1 Stop 4 07:50 Stop 5 08:15
18 Tuesday 1 Leave Garage 08:00 Stop 1 08:10
19 Tuesday 1 Stop 1 08:10 Stop 2 08:16
20 Tuesday 1 Stop 2 08:16 Stop 3 08:25
21 Tuesday 1 Stop 3 08:25 Stop 4 08:45
22 Tuesday 1 Stop 4 08:45 Stop 5 09:12
24 Wednesday 1 Leave Garage 07:15 Stop 1 07:25
25 Wednesday 1 Stop 1 07:25 Stop 2 07:31
26 Wednesday 1 Stop 2 07:31 Stop 3 07:35
27 Wednesday 1 Stop 3 07:35 Stop 4 07:50
28 Wednesday 1 Stop 4 07:50 Stop 5 08:15
30 Wednesday 1 Leave Garage 08:00 Stop 1 08:10
31 Wednesday 1 Stop 1 08:10 Stop 2 08:16
32 Wednesday 1 Stop 2 08:16 Stop 3 08:25
33 Wednesday 1 Stop 3 08:25 Stop 4 08:45
34 Wednesday 1 Stop 4 08:45 Stop 5 09:12
通过 pandas.concat
和 reset_index
缩短:
df2 = df.set_index(["Day", "Route"])
s1 = df2.iloc[:, :-1].stack().rename_axis(index={None:"Origin"})
s2 = df2.shift(-1, axis=1).stack().rename_axis(index={None:"Destination"})
new_df = pd.concat([s1.reset_index(2, name="Time"), s2.reset_index(2, name="Time")], 1)
print(new_df.reset_index())
输出:
Day Route Origin Time Destination Time
0 Monday 1 Leave Garage 07:15 Leave Garage 07:25
1 Monday 1 Stop 1 07:25 Stop 1 07:31
2 Monday 1 Stop 2 07:31 Stop 2 07:35
3 Monday 1 Stop 3 07:35 Stop 3 07:50
4 Monday 1 Stop 4 07:50 Stop 4 08:15
5 Monday 1 Leave Garage 08:00 Leave Garage 08:10
6 Monday 1 Stop 1 08:10 Stop 1 08:16
7 Monday 1 Stop 2 08:16 Stop 2 08:25
8 Monday 1 Stop 3 08:25 Stop 3 08:45
9 Monday 1 Stop 4 08:45 Stop 4 09:12
10 Tuesday 1 Leave Garage 07:15 Leave Garage 07:25
11 Tuesday 1 Stop 1 07:25 Stop 1 07:31
12 Tuesday 1 Stop 2 07:31 Stop 2 07:35
13 Tuesday 1 Stop 3 07:35 Stop 3 07:50
14 Tuesday 1 Stop 4 07:50 Stop 4 08:15
15 Tuesday 1 Leave Garage 08:00 Leave Garage 08:10
16 Tuesday 1 Stop 1 08:10 Stop 1 08:16
17 Tuesday 1 Stop 2 08:16 Stop 2 08:25
18 Tuesday 1 Stop 3 08:25 Stop 3 08:45
19 Tuesday 1 Stop 4 08:45 Stop 4 09:12
20 Wednesday 1 Leave Garage 07:15 Leave Garage 07:25
21 Wednesday 1 Stop 1 07:25 Stop 1 07:31
22 Wednesday 1 Stop 2 07:31 Stop 2 07:35
23 Wednesday 1 Stop 3 07:35 Stop 3 07:50
24 Wednesday 1 Stop 4 07:50 Stop 4 08:15
25 Wednesday 1 Leave Garage 08:00 Leave Garage 08:10
26 Wednesday 1 Stop 1 08:10 Stop 1 08:16
27 Wednesday 1 Stop 2 08:16 Stop 2 08:25
28 Wednesday 1 Stop 3 08:25 Stop 3 08:45
29 Wednesday 1 Stop 4 08:45 Stop 4 09:12
让我们试试基于 numpy
的方法:
s = df.set_index(['Day', 'Route'])
s1, s2 = s.iloc[:, :-1], s.iloc[:, 1:]
df1 = pd.DataFrame({
'Origin': np.tile([*s1], len(s)), 'Time_Orig': np.hstack(s1.values),
'Destination': np.tile([*s2], len(s)), 'Time_Dest': np.hstack(s2.values)},
index=s.index.repeat(s.shape[1] - 1)).reset_index()
Day Route Origin Time_Orig Destination Time_Dest
0 Monday 1 Leave Garage 07:15 Stop 1 07:25
1 Monday 1 Stop 1 07:25 Stop 2 07:31
2 Monday 1 Stop 2 07:31 Stop 3 07:35
3 Monday 1 Stop 3 07:35 Stop 4 07:50
4 Monday 1 Stop 4 07:50 Stop 5 08:15
5 Monday 1 Leave Garage 08:00 Stop 1 08:10
6 Monday 1 Stop 1 08:10 Stop 2 08:16
7 Monday 1 Stop 2 08:16 Stop 3 08:25
8 Monday 1 Stop 3 08:25 Stop 4 08:45
9 Monday 1 Stop 4 08:45 Stop 5 09:12
10 Tuesday 1 Leave Garage 07:15 Stop 1 07:25
11 Tuesday 1 Stop 1 07:25 Stop 2 07:31
12 Tuesday 1 Stop 2 07:31 Stop 3 07:35
13 Tuesday 1 Stop 3 07:35 Stop 4 07:50
14 Tuesday 1 Stop 4 07:50 Stop 5 08:15
15 Tuesday 1 Leave Garage 08:00 Stop 1 08:10
16 Tuesday 1 Stop 1 08:10 Stop 2 08:16
17 Tuesday 1 Stop 2 08:16 Stop 3 08:25
18 Tuesday 1 Stop 3 08:25 Stop 4 08:45
19 Tuesday 1 Stop 4 08:45 Stop 5 09:12
20 Wednesday 1 Leave Garage 07:15 Stop 1 07:25
21 Wednesday 1 Stop 1 07:25 Stop 2 07:31
22 Wednesday 1 Stop 2 07:31 Stop 3 07:35
23 Wednesday 1 Stop 3 07:35 Stop 4 07:50
24 Wednesday 1 Stop 4 07:50 Stop 5 08:15
25 Wednesday 1 Leave Garage 08:00 Stop 1 08:10
26 Wednesday 1 Stop 1 08:10 Stop 2 08:16
27 Wednesday 1 Stop 2 08:16 Stop 3 08:25
28 Wednesday 1 Stop 3 08:25 Stop 4 08:45
29 Wednesday 1 Stop 4 08:45 Stop 5 09:12
我有一个公共汽车时刻表,其中包含工作日、每个站点以及公共汽车 arrives/departs 停靠的相应时间。我希望将每个完整旅程分成一条新线路上的多个单独旅程。我的数据如下所示:
Day Route Leave Garage Stop 1 Stop 2 Stop 3 Stop 4 Stop 5
Monday 01 07:15 07:25 07:31 07:35 07:50 08:15
Monday 01 08:00 08:10 08:16 08:25 08:45 09:12
Tuesday 01 07:15 07:25 07:31 07:35 07:50 08:15
Tuesday 01 08:00 08:10 08:16 08:25 08:45 09:12
Wednesday 01 07:15 07:25 07:31 07:35 07:50 08:15
Wednesday 01 08:00 08:10 08:16 08:25 08:45 09:12
我的预期输出是:
Day Route Origin Time Destination Time
Monday 01 Leave Garage 07:15 Stop 1 07:25
Monday 01 Stop 1 07:25 Stop 2 07:31
Monday 01 Stop 2 07:31 Stop 3 07:35
Monday 01 Stop 3 07:35 Stop 4 07:50
Monday 01 Stop 4 07:50 Stop 5 08:15
Monday 01 Leave Garage 08:00 Stop 1 08:10
Monday 01 Stop 1 08:10 Stop 2 08:16
Monday 01 Stop 2 08:16 Stop 3 08:25
Monday 01 Stop 3 08:25 Stop 4 08:45
Monday 01 Stop 4 08:45 Stop 5 09:12
Tuesday 01 Leave Garage 07:15 Stop 1 07:25
Tuesday 01 Stop 1 07:25 Stop 2 07:31
Tuesday 01 Stop 2 07:31 Stop 3 07:35
Tuesday 01 Stop 3 07:35 Stop 4 07:50
Tuesday 01 Stop 4 07:50 Stop 5 08:15
Tuesday 01 Leave Garage 08:00 Stop 1 08:10
Tuesday 01 Stop 1 08:10 Stop 2 08:16
Tuesday 01 Stop 2 08:16 Stop 3 08:25
Tuesday 01 Stop 3 08:25 Stop 4 08:45
Tuesday 01 Stop 4 08:45 Stop 5 09:12
...
在pandas中是否有可以实现此目的的循环?
非常感谢! - 阿里
这很有趣!
我想我设法正确地转换了数据,但我认为肯定有更好的方法。
tmp = (
df
.set_index(["Day", "Route"])
.stack()
.reset_index()
.rename(columns={"level_2": "Origin", 0:"Time"})
)
Day Route Origin Time
0 Monday 1 Leave Garage 07:15
1 Monday 1 Stop 1 07:25
2 Monday 1 Stop 2 07:31
3 Monday 1 Stop 3 07:35
4 Monday 1 Stop 4 07:50
5 Monday 1 Stop 5 08:15
6 Monday 1 Leave Garage 08:00
7 Monday 1 Stop 1 08:10
8 Monday 1 Stop 2 08:16
9 Monday 1 Stop 3 08:25
10 Monday 1 Stop 4 08:45
11 Monday 1 Stop 5 09:12
12 Tuesday 1 Leave Garage 07:15
13 Tuesday 1 Stop 1 07:25
14 Tuesday 1 Stop 2 07:31
15 Tuesday 1 Stop 3 07:35
16 Tuesday 1 Stop 4 07:50
17 Tuesday 1 Stop 5 08:15
18 Tuesday 1 Leave Garage 08:00
19 Tuesday 1 Stop 1 08:10
20 Tuesday 1 Stop 2 08:16
21 Tuesday 1 Stop 3 08:25
22 Tuesday 1 Stop 4 08:45
23 Tuesday 1 Stop 5 09:12
24 Wednesday 1 Leave Garage 07:15
25 Wednesday 1 Stop 1 07:25
26 Wednesday 1 Stop 2 07:31
27 Wednesday 1 Stop 3 07:35
28 Wednesday 1 Stop 4 07:50
29 Wednesday 1 Stop 5 08:15
30 Wednesday 1 Leave Garage 08:00
31 Wednesday 1 Stop 1 08:10
32 Wednesday 1 Stop 2 08:16
33 Wednesday 1 Stop 3 08:25
34 Wednesday 1 Stop 4 08:45
35 Wednesday 1 Stop 5 09:12
result = (
tmp
.join(tmp.shift(-1)[["Origin", "Time"]], rsuffix="_")
.rename(columns={"Origin_": "Destination", "Time_": "Destination_Time"})
)
result = result.loc[
result["Destination"].ne("Leave Garage") &
result["Destination"].notnull()
]
result
Day Route Origin Time Destination Destination_Time
0 Monday 1 Leave Garage 07:15 Stop 1 07:25
1 Monday 1 Stop 1 07:25 Stop 2 07:31
2 Monday 1 Stop 2 07:31 Stop 3 07:35
3 Monday 1 Stop 3 07:35 Stop 4 07:50
4 Monday 1 Stop 4 07:50 Stop 5 08:15
6 Monday 1 Leave Garage 08:00 Stop 1 08:10
7 Monday 1 Stop 1 08:10 Stop 2 08:16
8 Monday 1 Stop 2 08:16 Stop 3 08:25
9 Monday 1 Stop 3 08:25 Stop 4 08:45
10 Monday 1 Stop 4 08:45 Stop 5 09:12
12 Tuesday 1 Leave Garage 07:15 Stop 1 07:25
13 Tuesday 1 Stop 1 07:25 Stop 2 07:31
14 Tuesday 1 Stop 2 07:31 Stop 3 07:35
15 Tuesday 1 Stop 3 07:35 Stop 4 07:50
16 Tuesday 1 Stop 4 07:50 Stop 5 08:15
18 Tuesday 1 Leave Garage 08:00 Stop 1 08:10
19 Tuesday 1 Stop 1 08:10 Stop 2 08:16
20 Tuesday 1 Stop 2 08:16 Stop 3 08:25
21 Tuesday 1 Stop 3 08:25 Stop 4 08:45
22 Tuesday 1 Stop 4 08:45 Stop 5 09:12
24 Wednesday 1 Leave Garage 07:15 Stop 1 07:25
25 Wednesday 1 Stop 1 07:25 Stop 2 07:31
26 Wednesday 1 Stop 2 07:31 Stop 3 07:35
27 Wednesday 1 Stop 3 07:35 Stop 4 07:50
28 Wednesday 1 Stop 4 07:50 Stop 5 08:15
30 Wednesday 1 Leave Garage 08:00 Stop 1 08:10
31 Wednesday 1 Stop 1 08:10 Stop 2 08:16
32 Wednesday 1 Stop 2 08:16 Stop 3 08:25
33 Wednesday 1 Stop 3 08:25 Stop 4 08:45
34 Wednesday 1 Stop 4 08:45 Stop 5 09:12
通过 pandas.concat
和 reset_index
缩短:
df2 = df.set_index(["Day", "Route"])
s1 = df2.iloc[:, :-1].stack().rename_axis(index={None:"Origin"})
s2 = df2.shift(-1, axis=1).stack().rename_axis(index={None:"Destination"})
new_df = pd.concat([s1.reset_index(2, name="Time"), s2.reset_index(2, name="Time")], 1)
print(new_df.reset_index())
输出:
Day Route Origin Time Destination Time
0 Monday 1 Leave Garage 07:15 Leave Garage 07:25
1 Monday 1 Stop 1 07:25 Stop 1 07:31
2 Monday 1 Stop 2 07:31 Stop 2 07:35
3 Monday 1 Stop 3 07:35 Stop 3 07:50
4 Monday 1 Stop 4 07:50 Stop 4 08:15
5 Monday 1 Leave Garage 08:00 Leave Garage 08:10
6 Monday 1 Stop 1 08:10 Stop 1 08:16
7 Monday 1 Stop 2 08:16 Stop 2 08:25
8 Monday 1 Stop 3 08:25 Stop 3 08:45
9 Monday 1 Stop 4 08:45 Stop 4 09:12
10 Tuesday 1 Leave Garage 07:15 Leave Garage 07:25
11 Tuesday 1 Stop 1 07:25 Stop 1 07:31
12 Tuesday 1 Stop 2 07:31 Stop 2 07:35
13 Tuesday 1 Stop 3 07:35 Stop 3 07:50
14 Tuesday 1 Stop 4 07:50 Stop 4 08:15
15 Tuesday 1 Leave Garage 08:00 Leave Garage 08:10
16 Tuesday 1 Stop 1 08:10 Stop 1 08:16
17 Tuesday 1 Stop 2 08:16 Stop 2 08:25
18 Tuesday 1 Stop 3 08:25 Stop 3 08:45
19 Tuesday 1 Stop 4 08:45 Stop 4 09:12
20 Wednesday 1 Leave Garage 07:15 Leave Garage 07:25
21 Wednesday 1 Stop 1 07:25 Stop 1 07:31
22 Wednesday 1 Stop 2 07:31 Stop 2 07:35
23 Wednesday 1 Stop 3 07:35 Stop 3 07:50
24 Wednesday 1 Stop 4 07:50 Stop 4 08:15
25 Wednesday 1 Leave Garage 08:00 Leave Garage 08:10
26 Wednesday 1 Stop 1 08:10 Stop 1 08:16
27 Wednesday 1 Stop 2 08:16 Stop 2 08:25
28 Wednesday 1 Stop 3 08:25 Stop 3 08:45
29 Wednesday 1 Stop 4 08:45 Stop 4 09:12
让我们试试基于 numpy
的方法:
s = df.set_index(['Day', 'Route'])
s1, s2 = s.iloc[:, :-1], s.iloc[:, 1:]
df1 = pd.DataFrame({
'Origin': np.tile([*s1], len(s)), 'Time_Orig': np.hstack(s1.values),
'Destination': np.tile([*s2], len(s)), 'Time_Dest': np.hstack(s2.values)},
index=s.index.repeat(s.shape[1] - 1)).reset_index()
Day Route Origin Time_Orig Destination Time_Dest
0 Monday 1 Leave Garage 07:15 Stop 1 07:25
1 Monday 1 Stop 1 07:25 Stop 2 07:31
2 Monday 1 Stop 2 07:31 Stop 3 07:35
3 Monday 1 Stop 3 07:35 Stop 4 07:50
4 Monday 1 Stop 4 07:50 Stop 5 08:15
5 Monday 1 Leave Garage 08:00 Stop 1 08:10
6 Monday 1 Stop 1 08:10 Stop 2 08:16
7 Monday 1 Stop 2 08:16 Stop 3 08:25
8 Monday 1 Stop 3 08:25 Stop 4 08:45
9 Monday 1 Stop 4 08:45 Stop 5 09:12
10 Tuesday 1 Leave Garage 07:15 Stop 1 07:25
11 Tuesday 1 Stop 1 07:25 Stop 2 07:31
12 Tuesday 1 Stop 2 07:31 Stop 3 07:35
13 Tuesday 1 Stop 3 07:35 Stop 4 07:50
14 Tuesday 1 Stop 4 07:50 Stop 5 08:15
15 Tuesday 1 Leave Garage 08:00 Stop 1 08:10
16 Tuesday 1 Stop 1 08:10 Stop 2 08:16
17 Tuesday 1 Stop 2 08:16 Stop 3 08:25
18 Tuesday 1 Stop 3 08:25 Stop 4 08:45
19 Tuesday 1 Stop 4 08:45 Stop 5 09:12
20 Wednesday 1 Leave Garage 07:15 Stop 1 07:25
21 Wednesday 1 Stop 1 07:25 Stop 2 07:31
22 Wednesday 1 Stop 2 07:31 Stop 3 07:35
23 Wednesday 1 Stop 3 07:35 Stop 4 07:50
24 Wednesday 1 Stop 4 07:50 Stop 5 08:15
25 Wednesday 1 Leave Garage 08:00 Stop 1 08:10
26 Wednesday 1 Stop 1 08:10 Stop 2 08:16
27 Wednesday 1 Stop 2 08:16 Stop 3 08:25
28 Wednesday 1 Stop 3 08:25 Stop 4 08:45
29 Wednesday 1 Stop 4 08:45 Stop 5 09:12