检测 Python 中的时间字符串格式?
Detect time string format in Python?
我有一个非常大的数据集,其中 date/time 列具有各种格式。我有一个验证函数来检测可能的 date/time 字符串格式,这些格式可以处理 24 小时和 12 小时。分隔符始终为 :
。的示例如下。然而,在分析我的代码之后,这似乎会成为一个瓶颈,并且在执行时间方面代价高昂。我的问题是是否有更好的方法来做到这一点而不影响性能。
import datetime
def validate_time(time_str: str):
for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
try:
return datetime.datetime.strptime(time_str, time_format)
except ValueError:
continue
return None
print(validate_time(time_str="9:21 PM"))
您可以用冒号分隔来获取表示小时、分钟和剩余所有内容的字符串段,而不是尝试使用每个格式字符串进行解析。然后你可以根据拆分值的数量解析结果 returns:
def validate_time_new(time_str: str):
time_vals = time_str.split(':')
try:
if len(time_vals) == 1:
# No split, so invalid time
return None
elif len(time_vals) == 2:
if time_vals[-1][::-2].lower() in ["am", "pm"]:
# if last element contains am or pm, try to parse as 12hr time
return datetime.datetime.strptime(time_str, "%I:%M %p")
else:
# try to parse as 24h time
return datetime.datetime.strptime(time_str, "%H:%M")
elif len(time_vals) == 3:
if "." in time_vals[-1]:
# If the last element has a decimal point, try to parse microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
else:
# try to parse without microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S")
else: return None
except ValueError:
# If any of the attempts to parse throws an error, return None
return None
为了测试这一点,让我们为一堆测试字符串计时这两种方法:
import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old new old/new test_string
0.001628 0.001143 1.424322 12:24
0.001567 0.001012 1.548661 12:23:42
0.000935 0.000979 0.955177 13:53
0.003004 0.000722 4.161657 1:53 PM
0.004523 0.001396 3.241204 12:24:43.220
0.002148 0.000025 84.897370 not a date
0.002262 0.000622 3.638629 54:23:21
我有一个非常大的数据集,其中 date/time 列具有各种格式。我有一个验证函数来检测可能的 date/time 字符串格式,这些格式可以处理 24 小时和 12 小时。分隔符始终为 :
。的示例如下。然而,在分析我的代码之后,这似乎会成为一个瓶颈,并且在执行时间方面代价高昂。我的问题是是否有更好的方法来做到这一点而不影响性能。
import datetime
def validate_time(time_str: str):
for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
try:
return datetime.datetime.strptime(time_str, time_format)
except ValueError:
continue
return None
print(validate_time(time_str="9:21 PM"))
您可以用冒号分隔来获取表示小时、分钟和剩余所有内容的字符串段,而不是尝试使用每个格式字符串进行解析。然后你可以根据拆分值的数量解析结果 returns:
def validate_time_new(time_str: str):
time_vals = time_str.split(':')
try:
if len(time_vals) == 1:
# No split, so invalid time
return None
elif len(time_vals) == 2:
if time_vals[-1][::-2].lower() in ["am", "pm"]:
# if last element contains am or pm, try to parse as 12hr time
return datetime.datetime.strptime(time_str, "%I:%M %p")
else:
# try to parse as 24h time
return datetime.datetime.strptime(time_str, "%H:%M")
elif len(time_vals) == 3:
if "." in time_vals[-1]:
# If the last element has a decimal point, try to parse microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
else:
# try to parse without microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S")
else: return None
except ValueError:
# If any of the attempts to parse throws an error, return None
return None
为了测试这一点,让我们为一堆测试字符串计时这两种方法:
import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old new old/new test_string
0.001628 0.001143 1.424322 12:24
0.001567 0.001012 1.548661 12:23:42
0.000935 0.000979 0.955177 13:53
0.003004 0.000722 4.161657 1:53 PM
0.004523 0.001396 3.241204 12:24:43.220
0.002148 0.000025 84.897370 not a date
0.002262 0.000622 3.638629 54:23:21