如何获取未知格式字符串列表的日期格式?
How to get date format of a list of unknown format strings?
假设我有 3 个列表:
date_list = ['2020-02-28', '2020-03-11', '2020-03-12', '2020-04-01']
date_list2 = ['2020-02-02', '2020-12-11', '2020-13-11', '2020-29-12']
date_list3 = ['10-02-2002', '04-12-2011', '09-10-1911', '20-12-1912']
情况是
我不知道那些格式。
我可以检查的内容:
- 4 位数 = 年份
- 2 位数字大于 13 = 日期
我怎么知道其余的格式?
我的意思是,如何区分“03”作为日期和“03”作为月份?
def define_dformat(dl):
yy, mm, dd = "", "", ""
for i, val in enumerate(dl):
sep = re.search(r'\D', val).group(0)
#print(sep) # -
print(f"element #{i+1}: {val}")
for idx, word in enumerate(val.split(sep)):
#this is year
if len(word) == 4:
#print(f"the year is index #{idx}: {word}")
yy = f"the year is word index #{idx}"
if len(word) == 2:
#this is date
if int(word) > 12:
#print(f"the date is index #{idx}: {word}")
dd = f"the date is word index #{idx}"
#else: # how to check <= 12, is it month of is it date?
return(yy, mm, dd)
print(define_dformat(date_list))
print(define_dformat(date_list2))
print(define_dformat(date_list3))
您可以创建一个包含所有已知格式的列表,并尝试用这些格式解析您的日期。如果某个格式没有出现错误,那么它就是您的输入列表的格式。
from datetime import datetime
date_list = ['2020-02-28', '2020-03-11', '2020-03-12', '2020-04-01']
date_list2 = ['2020-02-02', '2020-12-11', '2020-13-11', '2020-29-12']
date_list3 = ['10-02-2002', '04-12-2011', '09-10-1911', '20-12-1912']
def define_dformat(dl):
dformats = ['%Y-%m-%d', '%Y-%d-%m', '%d-%m-%Y']
for dformat in dformats:
for date in dl:
try:
datetime.strptime(date, dformat)
except ValueError:
break
else:
return dformat
raise Exception("date list didn't match with any known formats")
print(define_dformat(date_list))
print(define_dformat(date_list2))
print(define_dformat(date_list3))
%Y-%m-%d
%Y-%d-%m
%d-%m-%Y
假设我有 3 个列表:
date_list = ['2020-02-28', '2020-03-11', '2020-03-12', '2020-04-01']
date_list2 = ['2020-02-02', '2020-12-11', '2020-13-11', '2020-29-12']
date_list3 = ['10-02-2002', '04-12-2011', '09-10-1911', '20-12-1912']
情况是 我不知道那些格式。 我可以检查的内容:
- 4 位数 = 年份
- 2 位数字大于 13 = 日期
我怎么知道其余的格式? 我的意思是,如何区分“03”作为日期和“03”作为月份?
def define_dformat(dl):
yy, mm, dd = "", "", ""
for i, val in enumerate(dl):
sep = re.search(r'\D', val).group(0)
#print(sep) # -
print(f"element #{i+1}: {val}")
for idx, word in enumerate(val.split(sep)):
#this is year
if len(word) == 4:
#print(f"the year is index #{idx}: {word}")
yy = f"the year is word index #{idx}"
if len(word) == 2:
#this is date
if int(word) > 12:
#print(f"the date is index #{idx}: {word}")
dd = f"the date is word index #{idx}"
#else: # how to check <= 12, is it month of is it date?
return(yy, mm, dd)
print(define_dformat(date_list))
print(define_dformat(date_list2))
print(define_dformat(date_list3))
您可以创建一个包含所有已知格式的列表,并尝试用这些格式解析您的日期。如果某个格式没有出现错误,那么它就是您的输入列表的格式。
from datetime import datetime
date_list = ['2020-02-28', '2020-03-11', '2020-03-12', '2020-04-01']
date_list2 = ['2020-02-02', '2020-12-11', '2020-13-11', '2020-29-12']
date_list3 = ['10-02-2002', '04-12-2011', '09-10-1911', '20-12-1912']
def define_dformat(dl):
dformats = ['%Y-%m-%d', '%Y-%d-%m', '%d-%m-%Y']
for dformat in dformats:
for date in dl:
try:
datetime.strptime(date, dformat)
except ValueError:
break
else:
return dformat
raise Exception("date list didn't match with any known formats")
print(define_dformat(date_list))
print(define_dformat(date_list2))
print(define_dformat(date_list3))
%Y-%m-%d
%Y-%d-%m
%d-%m-%Y