当条件为假时如何return到for循环的顶部?
How to return to top of for-loop when condition is False?
我已经为此苦苦思索了很多天,但没有成功。我正在尝试编写一个
的函数
- 遍历目录
- 打开一个 excel 匹配字符串模式的文件
- 打开文件并搜索特定工作表('importer')
- 将数据复制到 csv 并继续追加到 csv,直到所有文件完成。
- 我希望函数忽略不包含 'importer' 选项卡的文件,或者只访问 FOR 循环中的下一个文件而不执行其余部分 ('CSV FILE CREATION')。
- 文件创建只应在文件名与模式匹配且 'importer' 工作表存在的情况下发生。我觉得我很接近,但只需要一点指导。
def append_all(input_directory):
for file in os.listdir(input_directory):
# Qualify if file exist
if bool(re.search(pattern, file)) == True:
# Join directory path name to file name
in_fpath = os.path.join(input_directory, file)
out_fpath = os.path.join(input_directory, 'history.csv')
wrkbk = xlrd.open_workbook(in_fpath)
if wrkbk.sheet_names() == 'importer':
wrksht = wrkbk.sheet_by_name('importer')
# Handling excel refresh date value to be used to populate csv file
refresh_date_float = wrksht.cell_value(1, 4)
refresh_date_value = xlrd.xldate_as_datetime(refresh_date_float, wrkbk.datemode).strftime(
'%Y/%m/%d %H:%M')
# else:
# continue
# CSV FILE CREATION
# Qualify if file exist. Default returns TRUE
if os.path.isfile(out_fpath) == False:
# os.mkdir(output_directory)
# file will be created if it does not exist
with open(out_fpath, 'w', newline='') as csvfile:
wr = csv.writer(csvfile)
# start row index 3 to skip unecessary data
for rownum in range(3, wrksht.nrows):
# wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
# Start append data
else:
with open(out_fpath, 'a', newline='') as csvfile:
wr = csv.writer(csvfile)
# start row index 4 to skip header row
for rownum in range(4, wrksht.nrows):
# wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
csvfile.close()
print('process complete')
- 使用
.rglob
from the pathlib
模块查找具有指定模式的文件。
- 这就像调用
Path.glob()
并在给定的相对模式前添加 '**/'
。
pathlib
模块提供 类 表示文件系统路径的语义适用于不同的操作系统。
- 使用
pandas.read_excel
, using the sheet_name
parameter, inside a try-except
块会容易得多。
try-except
块将尝试加载具有工作表名称的文件。如果工作表不存在,则会发生异常。在这种情况下,如果出现异常,脚本将转到下一个文件。
- 使用
pandas.concat
, and then save it to a csv with .to_csv
. 将所有文件合并到一个数据框中
from pathlib import Path
import pandas as pd
p = Path('c:/.../path_to_files') # path to files
files = list(p.rglob('*.xlsx')) # get all xlsx files that match the pattern
list_of_dataframes = list() # list to add dataframe to
for file in files:
try:
list_of_dataframes.append(pd.read_excel(file, sheet_name='importer')) # add dataframe from Excel file to list
except XLRDError: # exception because there's not importer worksheet
print(f'{file} did have the "importer" worksheet')
df = pd.concat(list_of_dataframes) # combine the dataframes from all the files
df.to_csv('my_combined_files.csv', index=False) # save to a csv
作为函数
def create_csv_from_multiple_xlsx_files(path_to_files: str, filename_pattern: str, save_name: str):
p = Path(path_to_files) # convert to pathlib object
files = list(p.rglob(filename_pattern)) # get all xlsx files that match the pattern
list_of_dataframes = list() # list to add dataframe to
for file in files:
try:
list_of_dataframes.append(pd.read_excel(file, sheet_name='importer')) # add dataframe from Excel file to list
except XLRDError: # exception because there's not importer worksheet
print(f'{file} did have the "importer" worksheet')
df = pd.concat(list_of_dataframes) # combine the dataframes from all the files
df.to_csv(f'{save_name}.csv', index=False) # save to a csv
top_level_file_dir = 'c:/.../path_to_files' # path to files
pattern = '*.xlsx' # filename pattern
csv_file_name = 'my_combined_files'
create_csv_from_multiple_xlsx_files(top_level_file_dir, pattern, csv_file_name) # call function
我已经为此苦苦思索了很多天,但没有成功。我正在尝试编写一个
的函数- 遍历目录
- 打开一个 excel 匹配字符串模式的文件
- 打开文件并搜索特定工作表('importer')
- 将数据复制到 csv 并继续追加到 csv,直到所有文件完成。
- 我希望函数忽略不包含 'importer' 选项卡的文件,或者只访问 FOR 循环中的下一个文件而不执行其余部分 ('CSV FILE CREATION')。
- 文件创建只应在文件名与模式匹配且 'importer' 工作表存在的情况下发生。我觉得我很接近,但只需要一点指导。
def append_all(input_directory):
for file in os.listdir(input_directory):
# Qualify if file exist
if bool(re.search(pattern, file)) == True:
# Join directory path name to file name
in_fpath = os.path.join(input_directory, file)
out_fpath = os.path.join(input_directory, 'history.csv')
wrkbk = xlrd.open_workbook(in_fpath)
if wrkbk.sheet_names() == 'importer':
wrksht = wrkbk.sheet_by_name('importer')
# Handling excel refresh date value to be used to populate csv file
refresh_date_float = wrksht.cell_value(1, 4)
refresh_date_value = xlrd.xldate_as_datetime(refresh_date_float, wrkbk.datemode).strftime(
'%Y/%m/%d %H:%M')
# else:
# continue
# CSV FILE CREATION
# Qualify if file exist. Default returns TRUE
if os.path.isfile(out_fpath) == False:
# os.mkdir(output_directory)
# file will be created if it does not exist
with open(out_fpath, 'w', newline='') as csvfile:
wr = csv.writer(csvfile)
# start row index 3 to skip unecessary data
for rownum in range(3, wrksht.nrows):
# wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
# Start append data
else:
with open(out_fpath, 'a', newline='') as csvfile:
wr = csv.writer(csvfile)
# start row index 4 to skip header row
for rownum in range(4, wrksht.nrows):
# wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
csvfile.close()
print('process complete')
- 使用
.rglob
from thepathlib
模块查找具有指定模式的文件。- 这就像调用
Path.glob()
并在给定的相对模式前添加'**/'
。 pathlib
模块提供 类 表示文件系统路径的语义适用于不同的操作系统。
- 这就像调用
- 使用
pandas.read_excel
, using thesheet_name
parameter, inside atry-except
块会容易得多。try-except
块将尝试加载具有工作表名称的文件。如果工作表不存在,则会发生异常。在这种情况下,如果出现异常,脚本将转到下一个文件。
- 使用
pandas.concat
, and then save it to a csv with.to_csv
. 将所有文件合并到一个数据框中
from pathlib import Path
import pandas as pd
p = Path('c:/.../path_to_files') # path to files
files = list(p.rglob('*.xlsx')) # get all xlsx files that match the pattern
list_of_dataframes = list() # list to add dataframe to
for file in files:
try:
list_of_dataframes.append(pd.read_excel(file, sheet_name='importer')) # add dataframe from Excel file to list
except XLRDError: # exception because there's not importer worksheet
print(f'{file} did have the "importer" worksheet')
df = pd.concat(list_of_dataframes) # combine the dataframes from all the files
df.to_csv('my_combined_files.csv', index=False) # save to a csv
作为函数
def create_csv_from_multiple_xlsx_files(path_to_files: str, filename_pattern: str, save_name: str):
p = Path(path_to_files) # convert to pathlib object
files = list(p.rglob(filename_pattern)) # get all xlsx files that match the pattern
list_of_dataframes = list() # list to add dataframe to
for file in files:
try:
list_of_dataframes.append(pd.read_excel(file, sheet_name='importer')) # add dataframe from Excel file to list
except XLRDError: # exception because there's not importer worksheet
print(f'{file} did have the "importer" worksheet')
df = pd.concat(list_of_dataframes) # combine the dataframes from all the files
df.to_csv(f'{save_name}.csv', index=False) # save to a csv
top_level_file_dir = 'c:/.../path_to_files' # path to files
pattern = '*.xlsx' # filename pattern
csv_file_name = 'my_combined_files'
create_csv_from_multiple_xlsx_files(top_level_file_dir, pattern, csv_file_name) # call function