从具有多个工作表的多个 .xlxs 文件中提取 CSV 文件

Question

为了撰写我的机械工程论文，我收到了很多配置在多个 excel 文件 (100) 和多个 sheet 文件 (22) 中的传感器数据。现在我想将其可视化为 power Bi，但 .xlxs 文件是一种减慢速度的有效方法，因此我希望所有数据 (sheets) 都在单独的 CSV 文件中。我没有任何真正的编程经验，但可以运行在 jupyter 或 spyder 中编写脚本。

我尝试了 VBA 中的一个代码，该代码将多个 excel 配置为 csv，但这只适用于 .xlsx 文件中的第一个 sheet。

我也在jupyter notebook中使用了下面的代码；但这给了我一个 excel.

的所有 sheet

data = pd.read_excel('file_name.file_format', sheet_name=None)

for sheet_name, df in data.items():
    df.to_csv(f'{sheet_name}.csv')

有没有人有用于此目的的代码，或者有人知道如何调整上面的代码以对文件夹中的所有 excel 个文件执行此操作？

Answer 1

您可以尝试遍历包含 .xlxs 的每个文件的目录，只需将 YOUR_DIR 替换为您自己的包含这些文件的文件夹路径。

我添加了“文件名”，它只是没有扩展名的文件名，因此您可以将其添加到 .csv 文件名

import os

directory = "\YOUR_DIR\HERE"
files = os.listdir(directory)

for xlxs_file in files:
    if ".xlxs" in xlxs_file:

        filename = xlxs_file.strip(".xlxs")
        xlxs_file = directory + "\" + xlxs_file

        data = pd.read_excel(xlxs_file, sheet_name=None)
        for sheet_name, df in data.items():
            df.to_csv(f'{filename}_{sheet_name}.csv')

Answer 2

只要每个文件中的工作表名称相同，那么这应该有效：

import os
import pandas as pd

# target directory where the workbooks lie
tgt_dir = r'paste\directory\here\make\sure\to\keep\letter\r\before\quote'

# list of any files within the dir that have .xlsx in them
list_xl_files = [f for f in os.listdir(tgt_dir) if '.xlsx' in f.lower()]

# type a list of the sheets you want to target and extract
list_target_sheets = ['Sheet1', 'Sheet2', 'etc']

# iterate through each file and for each sheet in target sheets
for xl_file in list_xl_files:
    for sheet in list_target_sheets:
        
        # read in the file and target sheet
        df = pd.read_excel(tgt_dir+'\'+xl_file, sheet_name=sheet)
        
        # export to csv but replace .xlsx with nothing 
        # then add _sheetname.csv so the filename shows the sheet too
        df.to_csv(tgt_dir+'\'+xl_file.replace('.xlsx','')+'_'+sheet_name+'.csv')

Answer 3

不幸的是，有些文件有额外的传感器和数据，这意味着需要额外的工作表。但是我现在可以使用这段代码了：

import os
import pandas as pd

directory = "./"
files = os.listdir(directory)
for xlxs_file in files:
    if ".xlsx" in xlxs_file:
        filename = xlxs_file.strip(".xlsx")
        xlxs_file = os.path.join(directory, xlxs_file)
        data = pd.read_excel(xlxs_file, sheet_name=None)
        for sheet_name, df in data.items():
            df.to_csv("{}-{}.csv".format(filename, sheet_name))

从具有多个工作表的多个 .xlxs 文件中提取 CSV 文件

Extracting CSV files from multiple .xlxs files with multiple worksheets

python

csv

excel

export-to-csv