在 python 中创建额外的 headers (pandas) 关卡
creating additional levels of headers (pandas) in python
我是编程新手,但目前正在使用数据帧。我正在尝试将我当前的数据框堆叠到特定的“设计”中。目前我正在处理更大的文件,其中包含大量数据。但是我不能根据我的意愿堆叠()我的数据,而且形状一团糟。我需要有关如何定义多索引、创建更多级别的帮助。
希望大家能帮帮我,我贴个例子
我从我的代码中得到了什么(在 stack() 之前):
Exports NaN NaN NaN Net Exports NaN NaN
0 Total Sweden Norway Germany Total Sweden Norway
1 1032.8 358 239.7 435.1 636.8 274.1 9.7
2 1198.8 556.4 211.8 430.6 846.3 522.6 -1.1 `
使用堆栈():
Exports Total
NaN Sweden
NaN Norway
NaN Germany
Net Exports Total
NaN Sweden
NaN Norway
NaN Germany
NaN GWh
1 Exports 1032.8
NaN 358
NaN 239.7
NaN 435.1
Net Exports 636.8
NaN 274.1
NaN 9.7
NaN 353
预先感谢您帮助我
我认为你需要:
print (r.head())
Unnamed: 18 Unnamed: 19 Unnamed: 20 Unnamed: 21 Unnamed: 22 Unnamed: 23 \
0 Exports NaN NaN NaN Net Exports NaN
2 Total Sweden Norway Germany Total Sweden
189 1032.8 358 239.7 435.1 636.8 274.1
190 1198.8 556.4 211.8 430.6 846.3 522.6
191 982.7 159.3 166.2 657.2 276.3 -156.8
Unnamed: 24 Unnamed: 25 Unit:
0 NaN NaN NaN
2 Norway Germany GWh
189 9.7 353 January
190 -1.1 324.8 February
191 -105.9 539 March
#create index from column Unit
r = r.set_index('Unit:')
#create Multiindex from first and second row
#NaNs in frst row was replace by ffill - forward filling fillna()
r.columns= pd.MultiIndex.from_arrays([r.iloc[0].ffill(), r.iloc[1]], names=(None, None))
#remove first and second row
r = r.iloc[2:]
print (r.head())
Exports Net Exports
Total Sweden Norway Germany Total Sweden Norway Germany
Unit:
January 1032.8 358 239.7 435.1 636.8 274.1 9.7 353
February 1198.8 556.4 211.8 430.6 846.3 522.6 -1.1 324.8
March 982.7 159.3 166.2 657.2 276.3 -156.8 -105.9 539
April 962.3 22.1 62 878.2 -268.6 -741.3 -352.9 825.6
May 951.2 13.5 15.9 921.8 -511.5 -885.2 -496.4 870.1
print (r.stack().head(10))
Exports Net Exports
Unit:
January Germany 435.1 353
Norway 239.7 9.7
Sweden 358 274.1
Total 1032.8 636.8
February Germany 430.6 324.8
Norway 211.8 -1.1
Sweden 556.4 522.6
Total 1198.8 846.3
March Germany 657.2 539
Norway 166.2 -105.9
我是编程新手,但目前正在使用数据帧。我正在尝试将我当前的数据框堆叠到特定的“设计”中。目前我正在处理更大的文件,其中包含大量数据。但是我不能根据我的意愿堆叠()我的数据,而且形状一团糟。我需要有关如何定义多索引、创建更多级别的帮助。
希望大家能帮帮我,我贴个例子
我从我的代码中得到了什么(在 stack() 之前):
Exports NaN NaN NaN Net Exports NaN NaN
0 Total Sweden Norway Germany Total Sweden Norway
1 1032.8 358 239.7 435.1 636.8 274.1 9.7
2 1198.8 556.4 211.8 430.6 846.3 522.6 -1.1 `
使用堆栈():
Exports Total NaN Sweden NaN Norway NaN Germany Net Exports Total NaN Sweden NaN Norway NaN Germany NaN GWh 1 Exports 1032.8 NaN 358 NaN 239.7 NaN 435.1 Net Exports 636.8 NaN 274.1 NaN 9.7 NaN 353
预先感谢您帮助我
我认为你需要:
print (r.head())
Unnamed: 18 Unnamed: 19 Unnamed: 20 Unnamed: 21 Unnamed: 22 Unnamed: 23 \
0 Exports NaN NaN NaN Net Exports NaN
2 Total Sweden Norway Germany Total Sweden
189 1032.8 358 239.7 435.1 636.8 274.1
190 1198.8 556.4 211.8 430.6 846.3 522.6
191 982.7 159.3 166.2 657.2 276.3 -156.8
Unnamed: 24 Unnamed: 25 Unit:
0 NaN NaN NaN
2 Norway Germany GWh
189 9.7 353 January
190 -1.1 324.8 February
191 -105.9 539 March
#create index from column Unit
r = r.set_index('Unit:')
#create Multiindex from first and second row
#NaNs in frst row was replace by ffill - forward filling fillna()
r.columns= pd.MultiIndex.from_arrays([r.iloc[0].ffill(), r.iloc[1]], names=(None, None))
#remove first and second row
r = r.iloc[2:]
print (r.head())
Exports Net Exports
Total Sweden Norway Germany Total Sweden Norway Germany
Unit:
January 1032.8 358 239.7 435.1 636.8 274.1 9.7 353
February 1198.8 556.4 211.8 430.6 846.3 522.6 -1.1 324.8
March 982.7 159.3 166.2 657.2 276.3 -156.8 -105.9 539
April 962.3 22.1 62 878.2 -268.6 -741.3 -352.9 825.6
May 951.2 13.5 15.9 921.8 -511.5 -885.2 -496.4 870.1
print (r.stack().head(10))
Exports Net Exports
Unit:
January Germany 435.1 353
Norway 239.7 9.7
Sweden 358 274.1
Total 1032.8 636.8
February Germany 430.6 324.8
Norway 211.8 -1.1
Sweden 556.4 522.6
Total 1198.8 846.3
March Germany 657.2 539
Norway 166.2 -105.9