将数据分配给单独的数据框,按年份组合和排序
Assign data to separate dataframes, combining and sort by year
我有 40
年的数据,所以我试图将它们中的每一个单独分配给一个数据框,然后将它们全部存储在一个新的数据框中,然后对它们进行排序。以下是我目前所拥有的:
import pandas as pd
from pandas import DataFrame
year = 1976
count = 1
for i in range(0,40):
df[count] = pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1')
count = count + 1
year = 1976 + 1
我遇到了这个错误
Wrong number of items passed 12, placement implies 1
请提供任何帮助?
我认为你需要初始化你的字典:
df = {}
for i in range(0,40):
df[count] = pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1')
我认为您可以先创建 Dataframes
dfs
的列表,然后按列 year
:
创建 concat
it to one df
. count
is not necessary. Last IIUC sort_values
import pandas as pd
year = 1976
dfs = []
for i in range(0,40):
dfs.append(pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1'))
year += 1
#if need concat by columns
#df = pd.concat(dfs, axis=1)
#if need concat by rows
df = pd.concat(dfs)
#if need sort by column `year`
df.sort_values(by='year', inplace=True)
我会这样做:
import glob
import pandas as pd
files = glob.glob('42003h*.xlsx')
# if you want to merge your DFs horizontally then add: `axis=1` parameter
df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True).sort_values('year')
count = len(files)
我有 40
年的数据,所以我试图将它们中的每一个单独分配给一个数据框,然后将它们全部存储在一个新的数据框中,然后对它们进行排序。以下是我目前所拥有的:
import pandas as pd
from pandas import DataFrame
year = 1976
count = 1
for i in range(0,40):
df[count] = pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1')
count = count + 1
year = 1976 + 1
我遇到了这个错误
Wrong number of items passed 12, placement implies 1
请提供任何帮助?
我认为你需要初始化你的字典:
df = {}
for i in range(0,40):
df[count] = pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1')
我认为您可以先创建 Dataframes
dfs
的列表,然后按列 year
:
concat
it to one df
. count
is not necessary. Last IIUC sort_values
import pandas as pd
year = 1976
dfs = []
for i in range(0,40):
dfs.append(pd.read_excel('42003h'+str(year)+'.xlsx', sheet_name = 'Sheet1'))
year += 1
#if need concat by columns
#df = pd.concat(dfs, axis=1)
#if need concat by rows
df = pd.concat(dfs)
#if need sort by column `year`
df.sort_values(by='year', inplace=True)
我会这样做:
import glob
import pandas as pd
files = glob.glob('42003h*.xlsx')
# if you want to merge your DFs horizontally then add: `axis=1` parameter
df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True).sort_values('year')
count = len(files)