当条件为假时如何return到for循环的顶部?

How to return to top of for-loop when condition is False?

我已经为此苦苦思索了很多天,但没有成功。我正在尝试编写一个

的函数
  1. 遍历目录
  2. 打开一个 excel 匹配字符串模式的文件
  3. 打开文件并搜索特定工作表('importer')
  4. 将数据复制到 csv 并继续追加到 csv,直到所有文件完成。
  5. 我希望函数忽略不包含 'importer' 选项卡的文件,或者只访问 FOR 循环中的下一个文件而不执行其余部分 ('CSV FILE CREATION')。
  6. 文件创建只应在文件名与模式匹配且 'importer' 工作表存在的情况下发生。我觉得我很接近,但只需要一点指导。
def append_all(input_directory):
    for file in os.listdir(input_directory):
        # Qualify if file exist
        if bool(re.search(pattern, file)) == True:
            # Join directory path name to file name
            in_fpath = os.path.join(input_directory, file)
            out_fpath = os.path.join(input_directory, 'history.csv')
            wrkbk = xlrd.open_workbook(in_fpath)
            if wrkbk.sheet_names() == 'importer':
                wrksht = wrkbk.sheet_by_name('importer')
                # Handling excel refresh date value to be used to populate csv file
                refresh_date_float = wrksht.cell_value(1, 4)
                refresh_date_value = xlrd.xldate_as_datetime(refresh_date_float, wrkbk.datemode).strftime(
                    '%Y/%m/%d %H:%M')
                # else:
                # continue

                # CSV FILE CREATION
                # Qualify if file exist. Default returns TRUE
                if os.path.isfile(out_fpath) == False:
                    # os.mkdir(output_directory)
                    # file will be created if it does not exist
                    with open(out_fpath, 'w', newline='') as csvfile:
                        wr = csv.writer(csvfile)
                        # start row index 3 to skip unecessary data
                        for rownum in range(3, wrksht.nrows):
                            # wr.writerow(wrksht.row_values(rownum) + list(refresh_date_value))
                            wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])
                            # Start append data
                else:
                    with open(out_fpath, 'a', newline='') as csvfile:
                        wr = csv.writer(csvfile)
                        # start row index 4 to skip header row
                        for rownum in range(4, wrksht.nrows):
                            # wr.writerow(wrksht.row_values(rownum)  + list(refresh_date_value))
                            wr.writerow(list(wrksht.row_values(rownum)) + [refresh_date_value])


csvfile.close()
print('process complete')
  • 使用.rglob from the pathlib模块查找具有指定模式的文件。
    • 这就像调用 Path.glob() 并在给定的相对模式前添加 '**/'
    • pathlib 模块提供 类 表示文件系统路径的语义适用于不同的操作系统。
  • 使用 pandas.read_excel, using the sheet_name parameter, inside a try-except 块会容易得多。
    • try-except 块将尝试加载具有工作表名称的文件。如果工作表不存在,则会发生异常。在这种情况下,如果出现异常,脚本将转到下一个文件。
  • 使用 pandas.concat, and then save it to a csv with .to_csv.
  • 将所有文件合并到一个数据框中
from pathlib import Path
import pandas as pd

p = Path('c:/.../path_to_files')  # path to files

files = list(p.rglob('*.xlsx'))  # get all xlsx files that match the pattern

list_of_dataframes = list()  # list to add dataframe to
for file in files:
    try:
        list_of_dataframes.append(pd.read_excel(file, sheet_name='importer'))  # add dataframe from Excel file to list
    except XLRDError:  # exception because there's not importer worksheet
        print(f'{file} did have the "importer" worksheet')
        
df = pd.concat(list_of_dataframes)  # combine the dataframes from all the files

df.to_csv('my_combined_files.csv', index=False)  # save to a csv

作为函数

def create_csv_from_multiple_xlsx_files(path_to_files: str, filename_pattern: str, save_name: str):
    
    p = Path(path_to_files)  # convert to pathlib object

    files = list(p.rglob(filename_pattern))  # get all xlsx files that match the pattern

    list_of_dataframes = list()  # list to add dataframe to
    for file in files:
        try:
            list_of_dataframes.append(pd.read_excel(file, sheet_name='importer'))  # add dataframe from Excel file to list
        except XLRDError:  # exception because there's not importer worksheet
            print(f'{file} did have the "importer" worksheet')

    df = pd.concat(list_of_dataframes)  # combine the dataframes from all the files

    df.to_csv(f'{save_name}.csv', index=False)  # save to a csv
    
    
top_level_file_dir = 'c:/.../path_to_files'  # path to files
pattern = '*.xlsx'  # filename pattern
csv_file_name = 'my_combined_files'
create_csv_from_multiple_xlsx_files(top_level_file_dir, pattern, csv_file_name)  # call function