计算字符串列的总数

calculate total of string column

如何计算 pandas 中字符串列的总数?

myl=[('2012-11-07 19:16:07', ' 2012-11-07 19:21:07', ' 0h 05m 00s'),
 ('2012-11-13 06:16:07', ' 2012-11-13 06:21:07', ' 0h 05m 00s'),
 ('2012-11-15 09:56:07', ' 2012-11-15 11:41:07', ' 1h 45m 00s'),
 ('2012-11-15 22:26:07', ' 2012-11-16 07:01:07', ' 8h 35m 00s')]

import pandas as pd
df = pd.DataFrame(myl, columns=['from', 'to', 'downtime'])

以上代码将 return 和 "downtime" 放在一个列中。如何计算该列中整数值的总和?

In [5]: df
Out[5]:
                  from                    to     downtime
0  2012-11-07 19:16:07   2012-11-07 19:21:07   0h 05m 00s
1  2012-11-13 06:16:07   2012-11-13 06:21:07   0h 05m 00s
2  2012-11-15 09:56:07   2012-11-15 11:41:07   1h 45m 00s
3  2012-11-15 22:26:07   2012-11-16 07:01:07   8h 35m 00s

例如在上面的输出中,预期的总停机时间列为 9h 90m 00s


更新:

如何计算每天的停机时间?

预期结果:

2012-11-07 0h 05m 00s
2012-11-13 0h 05m 00s
2012-11-15 10h 20m 00s

这按预期工作:

df['downtime_t'] = pd.to_timedelta(df['downtime'])

df['year'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).year
df['month'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).month
df['day'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).day

df.groupby(['year', 'month', 'day'])['downtime_t'].sum()

这也适用于年份分组:

df['from_d2'] = pd.to_datetime(df['from'])
df.groupby(df['from_d2'].map(lambda x:  x.year ))['downtime_t'].sum()

但这不起作用:

df.groupby(df['from_d2'].map(lambda x:  x.year, x.month, x.day))['downtime_t'].sum()

还有其他方法可以实现按总数分组吗?

您可以使用 pandas' to_timedelta 函数。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html

pd.to_timedelta(df['downtime']).sum()