检测 Python 中的时间字符串格式?

Detect time string format in Python?

我有一个非常大的数据集,其中 date/time 列具有各种格式。我有一个验证函数来检测可能的 date/time 字符串格式,这些格式可以处理 24 小时和 12 小时。分隔符始终为 :。的示例如下。然而,在分析我的代码之后,这似乎会成为一个瓶颈,并且在执行时间方面代价高昂。我的问题是是否有更好的方法来做到这一点而不影响性能。

import datetime
def validate_time(time_str: str):
    for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
        try:
            return datetime.datetime.strptime(time_str, time_format)
        except ValueError:
            continue
    return None

print(validate_time(time_str="9:21 PM"))

您可以用冒号分隔来获取表示小时、分钟和剩余所有内容的字符串段,而不是尝试使用每个格式字符串进行解析。然后你可以根据拆分值的数量解析结果 returns:

def validate_time_new(time_str: str):
    time_vals = time_str.split(':')
    
    try:
        if len(time_vals) == 1: 
            # No split, so invalid time
            return None
        elif len(time_vals) == 2:
            if time_vals[-1][::-2].lower() in ["am", "pm"]:
                # if last element contains am or pm, try to parse as 12hr time
                return datetime.datetime.strptime(time_str, "%I:%M %p")
            else:
                # try to parse as 24h time
                return datetime.datetime.strptime(time_str, "%H:%M")
        elif len(time_vals) == 3:
            if "." in time_vals[-1]:
                # If the last element has a decimal point, try to parse microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
            else:
                # try to parse without microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S")
        else: return None
    except ValueError:
        # If any of the attempts to parse throws an error, return None
        return None

为了测试这一点,让我们为一堆测试字符串计时这两种方法:

import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
    t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
    t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
    print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old         new             old/new     test_string
0.001628    0.001143        1.424322        12:24
0.001567    0.001012        1.548661        12:23:42
0.000935    0.000979        0.955177        13:53
0.003004    0.000722        4.161657        1:53 PM
0.004523    0.001396        3.241204        12:24:43.220
0.002148    0.000025        84.897370       not a date
0.002262    0.000622        3.638629        54:23:21