从 Dataframe 中的开始和结束日期时间列计算 运行 应用程序
Counting the running app from start and finish datetime column in Dataframe
我有一个这样的数据框
df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})
df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])
我的主要目的是创建一个图 x 轴是时间,y 轴是 运行 app
的计数
通过这种方式,我的想法是在应用程序启动时创建与 运行 应用程序相同的新列。例如,在这种情况下,当应用程序 2 启动时,实际上应用程序 1 仍然是 运行(如果应用程序 2 包含在计数过程中就很好),但我被困在这里(这是我的数据框示例打算制作)
app start_time finish_time running_apps(if current app included)
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 2
如果其他人有其他想法,将不胜感激,谢谢
您可以使用带有 np.tril
的 numpy 广播用于下三角测试下一个日期时间,链 bot hmask 并通过 sum
:
计算 True
s
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 1
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1
或者如果需要比较所有值:
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 2
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1
我有一个这样的数据框
df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})
df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])
我的主要目的是创建一个图 x 轴是时间,y 轴是 运行 app
的计数通过这种方式,我的想法是在应用程序启动时创建与 运行 应用程序相同的新列。例如,在这种情况下,当应用程序 2 启动时,实际上应用程序 1 仍然是 运行(如果应用程序 2 包含在计数过程中就很好),但我被困在这里(这是我的数据框示例打算制作)
app start_time finish_time running_apps(if current app included)
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 2
如果其他人有其他想法,将不胜感激,谢谢
您可以使用带有 np.tril
的 numpy 广播用于下三角测试下一个日期时间,链 bot hmask 并通过 sum
:
True
s
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 1
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1
或者如果需要比较所有值:
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 2
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1