如何根据日期时间迭代和比较 2 pandas 列并添加 true 或 false 的值?
How can iterate & compare 2 pandas columns based on datetime & add a value of true or false?
我有一个 pandas 数据框,其中包含 2 列:预测时间和实际时间。我想要包含真值或假值的第三列。换句话说,如果对于每个预测时间行,时间与同一行中的实际时间之一匹配,或者预测时间介于这些实际时间之一之间,则将 'True' 添加到第三列值。否则,在行中添加 'False'。
有什么想法可以从哪里开始?我假设这需要两件事:用于比较和迭代每一行以生成新写入的行值的日期时间模块?
当前数据帧:
;Predicted time;Actual times
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51]
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
期望输出
;Predicted time;Actual times;True or False
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];FALSE
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51];TRUE
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];TRUE
我还附上了一张图片以更清楚地显示所需的输出。
使用列表理解和 any
之间的测试值来测试至少一个 True
:
#for converting to datetimes, in Actual times was removed nested lists
f = lambda x: pd.to_datetime(x.strip('[]').split(', ')).tolist()
df[['Actual times', 'Predicted time']] = df[['Actual times', 'Predicted time']].applymap(f)
df['True or False'] = [any((s < y) & (e > y) for y in x)
for (s, e), x in zip(df['Predicted time'], df['Actual times'])]
print (df)
Predicted time \
0 [2017-09-09 06:53:37, 2017-09-09 06:53:46]
1 [2017-09-09 06:54:19, 2017-09-09 06:54:43]
2 [2017-09-09 06:54:44, 2017-09-09 06:54:48]
Actual times True or False
0 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... False
1 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... True
2 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... False
下面函数中的大部分工作是将数据框中的字符串转换为可用于比较的 datetime
对象的集合 -
def pred_intersects_act(row):
#Convert predicted times to list of datetime objects
predicted_time = re.sub(r'\[|\]', '', row['predicted_time'])
predicted_time = re.sub(r',\ *', ',', predicted_time)
pt_list = predicted_time.split(',')
pt_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in pt_list]
#Convert actual times to list of datetime objects
actual_times = re.sub(r'\[|\]', '', row['actual_times'])
actual_times = re.sub(r',\ *', ',', actual_times)
at_list = actual_times.split(',')
at_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in at_list]
#pair up actual times and check for intersection
for pair in zip(at_list[:-1], at_list[1:]):
exact_match = any(_ in pair for _ in pt_list)
approx_match = any(bisect.bisect(pair, _) == 1 for _ in pt_list)
if exact_match or approx_match:
return True
return False
df.apply(pred_intersects_act, axis=1)
0 False
1 True
2 True
dtype: bool
我有一个 pandas 数据框,其中包含 2 列:预测时间和实际时间。我想要包含真值或假值的第三列。换句话说,如果对于每个预测时间行,时间与同一行中的实际时间之一匹配,或者预测时间介于这些实际时间之一之间,则将 'True' 添加到第三列值。否则,在行中添加 'False'。
有什么想法可以从哪里开始?我假设这需要两件事:用于比较和迭代每一行以生成新写入的行值的日期时间模块?
当前数据帧:
;Predicted time;Actual times
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51]
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
期望输出
;Predicted time;Actual times;True or False
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];FALSE
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51];TRUE
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];TRUE
我还附上了一张图片以更清楚地显示所需的输出。
使用列表理解和 any
之间的测试值来测试至少一个 True
:
#for converting to datetimes, in Actual times was removed nested lists
f = lambda x: pd.to_datetime(x.strip('[]').split(', ')).tolist()
df[['Actual times', 'Predicted time']] = df[['Actual times', 'Predicted time']].applymap(f)
df['True or False'] = [any((s < y) & (e > y) for y in x)
for (s, e), x in zip(df['Predicted time'], df['Actual times'])]
print (df)
Predicted time \
0 [2017-09-09 06:53:37, 2017-09-09 06:53:46]
1 [2017-09-09 06:54:19, 2017-09-09 06:54:43]
2 [2017-09-09 06:54:44, 2017-09-09 06:54:48]
Actual times True or False
0 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... False
1 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... True
2 [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201... False
下面函数中的大部分工作是将数据框中的字符串转换为可用于比较的 datetime
对象的集合 -
def pred_intersects_act(row):
#Convert predicted times to list of datetime objects
predicted_time = re.sub(r'\[|\]', '', row['predicted_time'])
predicted_time = re.sub(r',\ *', ',', predicted_time)
pt_list = predicted_time.split(',')
pt_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in pt_list]
#Convert actual times to list of datetime objects
actual_times = re.sub(r'\[|\]', '', row['actual_times'])
actual_times = re.sub(r',\ *', ',', actual_times)
at_list = actual_times.split(',')
at_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in at_list]
#pair up actual times and check for intersection
for pair in zip(at_list[:-1], at_list[1:]):
exact_match = any(_ in pair for _ in pt_list)
approx_match = any(bisect.bisect(pair, _) == 1 for _ in pt_list)
if exact_match or approx_match:
return True
return False
df.apply(pred_intersects_act, axis=1)
0 False
1 True
2 True
dtype: bool