从 Dataframe 中的开始和结束日期时间列计算 运行 应用程序

Counting the running app from start and finish datetime column in Dataframe

我有一个这样的数据框

df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})

df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])

我的主要目的是创建一个图 x 轴是时间,y 轴是 运行 app

的计数

通过这种方式,我的想法是在应用程序启动时创建与 运行 应用程序相同的新列。例如,在这种情况下,当应用程序 2 启动时,实际上应用程序 1 仍然是 运行(如果应用程序 2 包含在计数过程中就很好),但我被困在这里(这是我的数据框示例打算制作)

    app start_time  finish_time  running_apps(if current app included)
0   1   2022-03-11 22:26:00 2022-03-11 22:26:40 1
1   2   2022-03-11 22:26:30 2022-03-11 22:27:00 2
2   3   2022-03-11 22:27:00 2022-03-11 22:28:00 2
3   4   2022-03-11 22:27:30 2022-03-11 22:27:40 2
4   5   2022-03-11 22:28:00 2022-03-11 22:29:00 2

如果其他人有其他想法,将不胜感激,谢谢

您可以使用带有 np.tril 的 numpy 广播用于下三角测试下一个日期时间,链 bot hmask 并通过 sum:

计算 Trues
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      1
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      1
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1

或者如果需要比较所有值:

df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      2
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      2
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1