Pandas sawp 具有多级索引的列
Pandas sawp columns with multilevel index
这是我从 csv 文件中读取数据时的样子,我正在使用多级索引(体型和支出)进行读取。
我想要的是有一个 "Year" 列,并且 Spending 中的所有值都应显示为单独的列。基本上我想 swap/transpose "Spending" 和 "Years"
最终数据应该是这样的
我找到了一种方法,但似乎效率不高。我想知道是否有更好更清洁的方法来做到这一点?我看到几个 pd.swapaxes() 的例子,但无法让它工作。
这是我使用的代码:
d = [
["Small Narrowbodies", "TotalExpenses", "2326550.00", "2566989.00", "2710156.00"],
["Small Narrowbodies", "Pilots (000)", "583404.00", "627762.00", "669258.00"],
[
"Small Narrowbodies",
"Salaries and Wages (000)",
"432613.00",
"469059.00",
"515538.00",
],
["Small Narrowbodies", "Pilot Training (000)", "28235.00", "22388.00", "23838.00"],
[
"Small Narrowbodies",
"Benefits and Payroll Taxes (000)",
"77752.00",
"87128.00",
"77679.00",
],
[
"Small Narrowbodies",
"Per Diem/ Personnel (000)",
"44804.00",
"49187.00",
"52203.00",
],
[
"Small Narrowbodies",
"Purchased Goods (000)",
"627471.00",
"792582.00",
"772448.00",
],
["Small Narrowbodies", "Fuel/Oil (000)", "559698.00", "684007.00", "670673.00"],
["Small Narrowbodies", "Insurance (000)", "7483.00", "5449.00", "4200.00"],
[
"Small Narrowbodies",
"Other (inc. Tax) (000)",
"60290.00",
"103126.00",
"97575.00",
],
]
df = pd.DataFrame(d, columns=["Body_Type", "Spending", "1995", "1996", "1997"])
df2 = df.set_index(["Body_Type", "Spending"])
df3 = df2.transpose().unstack(level=-1).reset_index()
df3.columns = ["Body_Type", "Spending", "Year", "Amount"]
df4 = df3.pivot_table(
index["Body_Type", "Year"], columns="Spending", values="Amount", aggfunc=np.sum)
这更像是
df=df.unstack(level=0).stack(level=0)
这是我从 csv 文件中读取数据时的样子,我正在使用多级索引(体型和支出)进行读取。
我想要的是有一个 "Year" 列,并且 Spending 中的所有值都应显示为单独的列。基本上我想 swap/transpose "Spending" 和 "Years"
最终数据应该是这样的
我找到了一种方法,但似乎效率不高。我想知道是否有更好更清洁的方法来做到这一点?我看到几个 pd.swapaxes() 的例子,但无法让它工作。
这是我使用的代码:
d = [
["Small Narrowbodies", "TotalExpenses", "2326550.00", "2566989.00", "2710156.00"],
["Small Narrowbodies", "Pilots (000)", "583404.00", "627762.00", "669258.00"],
[
"Small Narrowbodies",
"Salaries and Wages (000)",
"432613.00",
"469059.00",
"515538.00",
],
["Small Narrowbodies", "Pilot Training (000)", "28235.00", "22388.00", "23838.00"],
[
"Small Narrowbodies",
"Benefits and Payroll Taxes (000)",
"77752.00",
"87128.00",
"77679.00",
],
[
"Small Narrowbodies",
"Per Diem/ Personnel (000)",
"44804.00",
"49187.00",
"52203.00",
],
[
"Small Narrowbodies",
"Purchased Goods (000)",
"627471.00",
"792582.00",
"772448.00",
],
["Small Narrowbodies", "Fuel/Oil (000)", "559698.00", "684007.00", "670673.00"],
["Small Narrowbodies", "Insurance (000)", "7483.00", "5449.00", "4200.00"],
[
"Small Narrowbodies",
"Other (inc. Tax) (000)",
"60290.00",
"103126.00",
"97575.00",
],
]
df = pd.DataFrame(d, columns=["Body_Type", "Spending", "1995", "1996", "1997"])
df2 = df.set_index(["Body_Type", "Spending"])
df3 = df2.transpose().unstack(level=-1).reset_index()
df3.columns = ["Body_Type", "Spending", "Year", "Amount"]
df4 = df3.pivot_table(
index["Body_Type", "Year"], columns="Spending", values="Amount", aggfunc=np.sum)
这更像是
df=df.unstack(level=0).stack(level=0)