读取文件时允许月份格式变化
Allowing month formatting variation when reading files
每个月我 运行 一个脚本根据 Excel 工作表的输入进行大量计算。但是,上传者与他们拼写月份的方式不一致。 July/Jul、Sep/Sept/September等 所以我目前的解决方案容易出错:
excel = pd.read_excel(f'{month} {year} Monthly Statement.xlsx')
用户在 Python 应用程序中提供的输入是数字,我想我可以将其转换为具有不同变体的字典。
##user input
while True:
month = int(8) #user input
if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
print("That is not a valid month. Remember to type it as a number. January = 1 for example")
else:
print("Thank you for giving me the month")
break
##month translation dict
month_dict = {1: ["Jan", "January"],
2: ["Feb", "February"],
8: ["Aug", "August"],
9: ["Sep", "Sept", "September"]}
我尝试过的:
##it should loop over the various way to type out the month and go with the version that does not throw an error:
excel = pd.read_excel(f'{month_dict[month]} {year} Monthly Statement.xlsx')
##but does not work since I am not sure how to loop over the different items in the dict
我该怎么做?
编辑:
文件名是:
January 2021 Monthly Statement.xlsx
Aug 2021 Monthly Statement.xlsx
September 2021 Monthly Statement.xlsx
如果月份名称可能只是被缩短了,但我们可以安全地假设它拼写正确并且文件名与您的示例一样,那么这应该有效:
month_dict = {1: "Jan",
2: "Feb",
8: "Aug",
9: "Sep"}
myfile = glob.glob(f'{month_dict[month]}* {year} Monthly Statement.xlsx')
excel = pd.read_excel(myfile[0])
详细地说,当 month
为 3 时,month-dict[month]
给我们“Mar”,因此我们要查找名为“Mar* 2021 Monthly Statement.xlsx”的文件。这将匹配“Mar”、“Marc”、“March”,不幸的是甚至匹配“Marchxx”,但不匹配“Ma”(这是一件好事,因为我们不知道是去 March 还是 May)或“ Mrch”(这不是很好...)
这就是我现在最终解决它的方式,但它不像@gimix
的回复那么精简
##user input
while True:
month = int(8) #user input
if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
print("That is not a valid month. Remember to type it as a number. January = 1 for example")
else:
print("Thank you for giving me the month")
break
##dict
month_dict = {1: ["Jan", "January"],
2: ["Feb", "February"],
3: ["Mar", "March"],
4: ["Apr", "April"],
5: ["May"],
6: ["Jun", "June"],
7: ["Jul", "July"],
8: ["Aug", "August"],
9: ["Sep", "Sept", "September"],
10: ["Oct", "October"],
11: ["Nov", "November"],
12: ["Dec", "December"]}
##loop
for i in month_dict:
try:
ex = pd.read_excel(f'{month_dict[month][i]} 2021 Monthly Statement.xlsx')
break
except IOError:
print("error")
每个月我 运行 一个脚本根据 Excel 工作表的输入进行大量计算。但是,上传者与他们拼写月份的方式不一致。 July/Jul、Sep/Sept/September等 所以我目前的解决方案容易出错:
excel = pd.read_excel(f'{month} {year} Monthly Statement.xlsx')
用户在 Python 应用程序中提供的输入是数字,我想我可以将其转换为具有不同变体的字典。
##user input
while True:
month = int(8) #user input
if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
print("That is not a valid month. Remember to type it as a number. January = 1 for example")
else:
print("Thank you for giving me the month")
break
##month translation dict
month_dict = {1: ["Jan", "January"],
2: ["Feb", "February"],
8: ["Aug", "August"],
9: ["Sep", "Sept", "September"]}
我尝试过的:
##it should loop over the various way to type out the month and go with the version that does not throw an error:
excel = pd.read_excel(f'{month_dict[month]} {year} Monthly Statement.xlsx')
##but does not work since I am not sure how to loop over the different items in the dict
我该怎么做?
编辑: 文件名是:
January 2021 Monthly Statement.xlsx
Aug 2021 Monthly Statement.xlsx
September 2021 Monthly Statement.xlsx
如果月份名称可能只是被缩短了,但我们可以安全地假设它拼写正确并且文件名与您的示例一样,那么这应该有效:
month_dict = {1: "Jan",
2: "Feb",
8: "Aug",
9: "Sep"}
myfile = glob.glob(f'{month_dict[month]}* {year} Monthly Statement.xlsx')
excel = pd.read_excel(myfile[0])
详细地说,当 month
为 3 时,month-dict[month]
给我们“Mar”,因此我们要查找名为“Mar* 2021 Monthly Statement.xlsx”的文件。这将匹配“Mar”、“Marc”、“March”,不幸的是甚至匹配“Marchxx”,但不匹配“Ma”(这是一件好事,因为我们不知道是去 March 还是 May)或“ Mrch”(这不是很好...)
这就是我现在最终解决它的方式,但它不像@gimix
的回复那么精简##user input
while True:
month = int(8) #user input
if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
print("That is not a valid month. Remember to type it as a number. January = 1 for example")
else:
print("Thank you for giving me the month")
break
##dict
month_dict = {1: ["Jan", "January"],
2: ["Feb", "February"],
3: ["Mar", "March"],
4: ["Apr", "April"],
5: ["May"],
6: ["Jun", "June"],
7: ["Jul", "July"],
8: ["Aug", "August"],
9: ["Sep", "Sept", "September"],
10: ["Oct", "October"],
11: ["Nov", "November"],
12: ["Dec", "December"]}
##loop
for i in month_dict:
try:
ex = pd.read_excel(f'{month_dict[month][i]} 2021 Monthly Statement.xlsx')
break
except IOError:
print("error")