将多个 Excel 文件导入到 pandas 并导出到多个 Stata 文件

import multiple Excel files to pandas and export to multiple Stata files

  1. 我的原始 Excel 文件是:

[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx].

起初我想将它们导入数据帧,然后将它们附加到一个大数据帧中,然后 df.to_dta,但是 python 显示错误并说:

MemoryError

我想问题是附加的数据框太大了。

  1. 所以我想我可以将每个 Excel 文件转换为每个 Stata 文件,即:

[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]

[excel_1.dta,excel_2.dta,...,excel_12.dta]

并将它们附加到 Stata 中,但我不知道该怎么做。

  1. 我的原始代码是
import pandas as pd
IO = 'excel_1.xlsx'

df = pd.read_excel(io=IO, skiprows = [1,2] ,
                           dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
                                  "Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
                                  "Trdsta": "str"})

df.to_stata('excel1.dta')

我想 for 循环应该可以,但我不知道该怎么做。

(追加代码:

import os
import pandas as pd


cwd = os.path.abspath('D:\onedrive\test2') 
files = os.listdir(cwd) 
print(files) 
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file, skiprows = [1,2] ,
                           dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
                                  "Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
                                  "Trdsta": "str"}), ignore_index=True) 
df.head() 

df.to_stata('test.dta')

以下是如何使用 python3 中的 for 循环将每个 Excel 文件转换为 Stata 文件。

import pandas as pd
IO = 'excel_{}.xlsx'
num_files = 12

for i in range(1, num_files + 1):
    df = pd.read_excel(
            io=IO.format(i), 
            skiprows = [1,2] ,
            dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
                   "Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
                   "Trdsta": "str"})
    df.to_stata('excel_{}.dta'.format(i))