在 Pandas 中以特定格式添加新的时间增量值列
Add a new column of timedelta values in a particular format in Pandas
我有一个数据框,我想添加一个包含两列之间时差的列:
df[Diff] = df['End Time'] - df['Open Time']
df[Diff]
0 0 days 01:25:40
1 0 days 00:41:57
2 0 days 00:21:47
3 0 days 16:41:57
4 0 days 04:32:00
5 0 days 03:01:57
6 0 days 01:37:56
7 0 days 01:13:57
8 0 days 01:07:56
9 0 days 02:33:59
10 29 days 18:33:53
11 0 days 03:50:56
12 0 days 01:57:56
我希望此列的格式为“1h 25m”,因此我尝试以小时为单位计算天数:
diff = df['End Time'] - df['Open Time']
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
并收到这些结果:
0 1
1 0
2 0
3 16
4 4
5 3
6 1
7 1
8 1
9 2
10 714
11 3
12 1
dtype: int64h 0 25
1 41
2 21
3 41
4 32
5 1
6 37
7 13
8 7
9 33
10 33
11 50
12 57
Name: minutes, dtype: int64m
但我无法在新专栏中以这种格式表达这些结果:
'{}h {}m'.format(hours,minutes))
您可以提取相关列,使用 astype
转换为 str
,然后根据需要连接列。
c = (df['End Time'] - df['Open Time'])\
.dt.components[['days', 'hours', 'minutes']]
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
df['diff']
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
Name: diff, dtype: object
你可以使用total_seconds
将timedelta
转换为秒,然后计算hours
、minutes
和秒,比使用[=19快10倍=]:
s = diff.dt.total_seconds().astype(int)
hours = s // 3600
# remaining seconds
s = s - (hours * 3600)
# minutes
minutes = s // 60
# remaining seconds
seconds = s - (minutes * 60)
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25
1 0h 41
2 0h 21
3 16h 41
4 4h 32
5 3h 1
6 1h 37
7 1h 13
8 1h 7
9 2h 33
10 714h 33
11 3h 50
12 1h 57
Name: Diff, dtype: object
解决方案:
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 18h 33m
11 3h 50m
12 1h 57m
dtype: object
另一个:
a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
dtype: object
时间:
#13000 rows
df = pd.concat([df]*1000).reset_index(drop=True)
In [191]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
1 loop, best of 3: 483 ms per loop
In [192]: %%timeit
...: s = diff.dt.total_seconds().astype(int)
...:
...: hours = s // 3600
...: # remaining seconds
...: s = s - (hours * 3600)
...: # minutes
...: minutes = s // 60
...: # remaining seconds
...: seconds = s - (minutes * 60)
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
10 loops, best of 3: 43.9 ms per loop
In [193]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
...:
1 loop, best of 3: 465 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ solution
In [194]: %%timeit
...: c = diff.dt.components[['days', 'hours', 'minutes']]
...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
...:
1 loop, best of 3: 208 ms per loop
我有一个数据框,我想添加一个包含两列之间时差的列:
df[Diff] = df['End Time'] - df['Open Time']
df[Diff]
0 0 days 01:25:40
1 0 days 00:41:57
2 0 days 00:21:47
3 0 days 16:41:57
4 0 days 04:32:00
5 0 days 03:01:57
6 0 days 01:37:56
7 0 days 01:13:57
8 0 days 01:07:56
9 0 days 02:33:59
10 29 days 18:33:53
11 0 days 03:50:56
12 0 days 01:57:56
我希望此列的格式为“1h 25m”,因此我尝试以小时为单位计算天数:
diff = df['End Time'] - df['Open Time']
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
并收到这些结果:
0 1
1 0
2 0
3 16
4 4
5 3
6 1
7 1
8 1
9 2
10 714
11 3
12 1
dtype: int64h 0 25
1 41
2 21
3 41
4 32
5 1
6 37
7 13
8 7
9 33
10 33
11 50
12 57
Name: minutes, dtype: int64m
但我无法在新专栏中以这种格式表达这些结果:
'{}h {}m'.format(hours,minutes))
您可以提取相关列,使用 astype
转换为 str
,然后根据需要连接列。
c = (df['End Time'] - df['Open Time'])\
.dt.components[['days', 'hours', 'minutes']]
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
df['diff']
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
Name: diff, dtype: object
你可以使用total_seconds
将timedelta
转换为秒,然后计算hours
、minutes
和秒,比使用[=19快10倍=]:
s = diff.dt.total_seconds().astype(int)
hours = s // 3600
# remaining seconds
s = s - (hours * 3600)
# minutes
minutes = s // 60
# remaining seconds
seconds = s - (minutes * 60)
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25
1 0h 41
2 0h 21
3 16h 41
4 4h 32
5 3h 1
6 1h 37
7 1h 13
8 1h 7
9 2h 33
10 714h 33
11 3h 50
12 1h 57
Name: Diff, dtype: object
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 18h 33m
11 3h 50m
12 1h 57m
dtype: object
另一个:
a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
dtype: object
时间:
#13000 rows
df = pd.concat([df]*1000).reset_index(drop=True)
In [191]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
1 loop, best of 3: 483 ms per loop
In [192]: %%timeit
...: s = diff.dt.total_seconds().astype(int)
...:
...: hours = s // 3600
...: # remaining seconds
...: s = s - (hours * 3600)
...: # minutes
...: minutes = s // 60
...: # remaining seconds
...: seconds = s - (minutes * 60)
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
10 loops, best of 3: 43.9 ms per loop
In [193]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
...:
1 loop, best of 3: 465 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ solution
In [194]: %%timeit
...: c = diff.dt.components[['days', 'hours', 'minutes']]
...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
...:
1 loop, best of 3: 208 ms per loop