如何根据日期时间迭代和比较 2 pandas 列并添加 true 或 false 的值？

Question

我有一个 pandas 数据框，其中包含 2 列：预测时间和实际时间。我想要包含真值或假值的第三列。换句话说，如果对于每个预测时间行，时间与同一行中的实际时间之一匹配，或者预测时间介于这些实际时间之一之间，则将 'True' 添加到第三列值。否则，在行中添加 'False'。

有什么想法可以从哪里开始？我假设这需要两件事：用于比较和迭代每一行以生成新写入的行值的日期时间模块？

当前数据帧：

;Predicted time;Actual times
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51]
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]

期望输出

;Predicted time;Actual times;True or False
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];FALSE
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51];TRUE
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];TRUE

我还附上了一张图片以更清楚地显示所需的输出。

Answer 1

使用列表理解和 any 之间的测试值来测试至少一个 True:

#for converting to datetimes, in Actual times was removed nested lists
f = lambda x: pd.to_datetime(x.strip('[]').split(', ')).tolist()
df[['Actual times', 'Predicted time']] = df[['Actual times', 'Predicted time']].applymap(f)

df['True or False'] = [any((s < y) & (e > y) for y in x) 
                        for (s, e), x in zip(df['Predicted time'], df['Actual times'])]
print (df)
                               Predicted time  \
0  [2017-09-09 06:53:37, 2017-09-09 06:53:46]   
1  [2017-09-09 06:54:19, 2017-09-09 06:54:43]   
2  [2017-09-09 06:54:44, 2017-09-09 06:54:48]   

                                        Actual times  True or False  
0  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...          False  
1  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...           True  
2  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...          False

Answer 2

下面函数中的大部分工作是将数据框中的字符串转换为可用于比较的 datetime 对象的集合 -

def pred_intersects_act(row):
    #Convert predicted times to list of datetime objects
    predicted_time = re.sub(r'\[|\]', '', row['predicted_time'])
    predicted_time = re.sub(r',\ *', ',', predicted_time)
    pt_list = predicted_time.split(',')
    pt_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in pt_list]

    #Convert actual times to list of datetime objects
    actual_times = re.sub(r'\[|\]', '', row['actual_times'])
    actual_times = re.sub(r',\ *', ',', actual_times)
    at_list = actual_times.split(',')
    at_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in at_list]

    #pair up actual times and check for intersection
    for pair in zip(at_list[:-1], at_list[1:]):
        exact_match = any(_ in pair for _ in pt_list)
        approx_match = any(bisect.bisect(pair, _) == 1 for _ in pt_list)
        if exact_match or approx_match:
            return True
    return False

df.apply(pred_intersects_act, axis=1)
0    False
1     True
2     True
dtype: bool

如何根据日期时间迭代和比较 2 pandas 列并添加 true 或 false 的值？

How can iterate & compare 2 pandas columns based on datetime & add a value of true or false?

python

datetime

loops

boolean

pandas