当条件为假时如何return到for循环的顶部？

Question

我已经为此苦苦思索了很多天，但没有成功。我正在尝试编写一个

的函数

遍历目录
打开一个 excel 匹配字符串模式的文件
打开文件并搜索特定工作表('importer')
将数据复制到 csv 并继续追加到 csv，直到所有文件完成。
我希望函数忽略不包含 'importer' 选项卡的文件，或者只访问 FOR 循环中的下一个文件而不执行其余部分 ('CSV FILE CREATION')。
文件创建只应在文件名与模式匹配且 'importer' 工作表存在的情况下发生。我觉得我很接近，但只需要一点指导。

def append_all(input_directory):
    for file in os.listdir(input_directory):
        # Qualify if file exist
        if bool(re.search(pattern, file)) == True:
            # Join directory path name to file name
            in_fpath = os.path.join(input_directory, file)
            out_fpath = os.path.join(input_directory, 'history.csv')
            wrkbk = xlrd.open_workbook(in_fpath)
            if wrkbk.sheet_names() == 'importer':
                wrksht = wrkbk.sheet_by_name('importer')
                # Handling excel refresh date value to be used to populate csv file
                refresh_date_float = wrksht.cell_value(1, 4)
                refresh_date_value = xlrd.xldate_as_datetime(refresh_date_float, wrkbk.datemode).strftime(
                    '%Y/%m/%d %H:%M')
                # else:
                # continue

                # CSV FILE CREATION
                # Qualify if file exist. Default returns TRUE
                if os.path.isfile(out_fpath) == False:
                    # os.mkdir(output_directory)
                    # file will be created if it does not exist
                    with open(out_fpath, 'w', newline='') as csvfile:
                        wr = csv.writer(csvfile)
                        # start row index 3 to skip unecessary data
                        for rownum in range(3, wrksht.nrows):
                            # wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
                            wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
                            # Start append data
                else:
                    with open(out_fpath, 'a', newline='') as csvfile:
                        wr = csv.writer(csvfile)
                        # start row index 4 to skip header row
                        for rownum in range(4, wrksht.nrows):
                            # wr.writerow(wrksht.row_values(rownum)  + list(refresh_date_value))
                            wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])


csvfile.close()
print('process complete')

Answer 1

使用.rglob from the pathlib模块查找具有指定模式的文件。
- 这就像调用 Path.glob() 并在给定的相对模式前添加 '**/'。
- pathlib 模块提供类表示文件系统路径的语义适用于不同的操作系统。
使用 pandas.read_excel, using the sheet_name parameter, inside a try-except 块会容易得多。
- try-except 块将尝试加载具有工作表名称的文件。如果工作表不存在，则会发生异常。在这种情况下，如果出现异常，脚本将转到下一个文件。
使用 pandas.concat, and then save it to a csv with .to_csv.

from pathlib import Path
import pandas as pd

p = Path('c:/.../path_to_files')  # path to files

files = list(p.rglob('*.xlsx'))  # get all xlsx files that match the pattern

list_of_dataframes = list()  # list to add dataframe to
for file in files:
    try:
        list_of_dataframes.append(pd.read_excel(file, sheet_name='importer'))  # add dataframe from Excel file to list
    except XLRDError:  # exception because there's not importer worksheet
        print(f'{file} did have the "importer" worksheet')
        
df = pd.concat(list_of_dataframes)  # combine the dataframes from all the files

df.to_csv('my_combined_files.csv', index=False)  # save to a csv

作为函数

def create_csv_from_multiple_xlsx_files(path_to_files: str, filename_pattern: str, save_name: str):
    
    p = Path(path_to_files)  # convert to pathlib object

    files = list(p.rglob(filename_pattern))  # get all xlsx files that match the pattern

    list_of_dataframes = list()  # list to add dataframe to
    for file in files:
        try:
            list_of_dataframes.append(pd.read_excel(file, sheet_name='importer'))  # add dataframe from Excel file to list
        except XLRDError:  # exception because there's not importer worksheet
            print(f'{file} did have the "importer" worksheet')

    df = pd.concat(list_of_dataframes)  # combine the dataframes from all the files

    df.to_csv(f'{save_name}.csv', index=False)  # save to a csv
    
    
top_level_file_dir = 'c:/.../path_to_files'  # path to files
pattern = '*.xlsx'  # filename pattern
csv_file_name = 'my_combined_files'
create_csv_from_multiple_xlsx_files(top_level_file_dir, pattern, csv_file_name)  # call function

当条件为假时如何return到for循环的顶部？

How to return to top of for-loop when condition is False?

python

loops

for-loop

nested-loops

作为函数