将多个 Excel 文件导入到 pandas 并导出到多个 Stata 文件
import multiple Excel files to pandas and export to multiple Stata files
- 我的原始 Excel 文件是:
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
.
起初我想将它们导入数据帧,然后将它们附加到一个大数据帧中,然后 df.to_dta
,但是 python 显示错误并说:
MemoryError
我想问题是附加的数据框太大了。
- 所以我想我可以将每个 Excel 文件转换为每个 Stata 文件,即:
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
到
[excel_1.dta,excel_2.dta,...,excel_12.dta]
并将它们附加到 Stata 中,但我不知道该怎么做。
- 我的原始代码是
import pandas as pd
IO = 'excel_1.xlsx'
df = pd.read_excel(io=IO, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel1.dta')
我想 for
循环应该可以,但我不知道该怎么做。
(追加代码:
import os
import pandas as pd
cwd = os.path.abspath('D:\onedrive\test2')
files = os.listdir(cwd)
print(files)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"}), ignore_index=True)
df.head()
df.to_stata('test.dta')
以下是如何使用 python3 中的 for
循环将每个 Excel 文件转换为 Stata 文件。
import pandas as pd
IO = 'excel_{}.xlsx'
num_files = 12
for i in range(1, num_files + 1):
df = pd.read_excel(
io=IO.format(i),
skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel_{}.dta'.format(i))
- 我的原始 Excel 文件是:
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
.
起初我想将它们导入数据帧,然后将它们附加到一个大数据帧中,然后 df.to_dta
,但是 python 显示错误并说:
MemoryError
我想问题是附加的数据框太大了。
- 所以我想我可以将每个 Excel 文件转换为每个 Stata 文件,即:
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
到
[excel_1.dta,excel_2.dta,...,excel_12.dta]
并将它们附加到 Stata 中,但我不知道该怎么做。
- 我的原始代码是
import pandas as pd
IO = 'excel_1.xlsx'
df = pd.read_excel(io=IO, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel1.dta')
我想 for
循环应该可以,但我不知道该怎么做。
(追加代码:
import os
import pandas as pd
cwd = os.path.abspath('D:\onedrive\test2')
files = os.listdir(cwd)
print(files)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"}), ignore_index=True)
df.head()
df.to_stata('test.dta')
以下是如何使用 python3 中的 for
循环将每个 Excel 文件转换为 Stata 文件。
import pandas as pd
IO = 'excel_{}.xlsx'
num_files = 12
for i in range(1, num_files + 1):
df = pd.read_excel(
io=IO.format(i),
skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel_{}.dta'.format(i))