Pandas Python - 如何使用来自数据透视表的 MultiIndex 创建新列 table
Pandas Python - How to create new columns with MultiIndex from pivot table
我创建了一个具有 2 种不同类型值的数据透视表 table i) 2017-2020 年的苹果数量,ii) 2017-2020 年的人数。我想创建额外的列来计算 iii) 2017-2020 年每人的苹果数。我该怎么做?
枢轴的当前代码table:
tdf = df.pivot_table(index="States",
columns="Year",
values=["Number of Apples","Number of People"],
aggfunc= lambda x: len(x.unique()),
margins=True)
tdf
这是我当前的支点 table:
Number of Apples Number of People
2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5
West Virginia 8 35 25 12 2 5 5 4
...
我希望我的数据透视表 table 看起来像这样,我在其中添加了额外的列以将苹果数量除以人数。
Number of Apples Number of People Number of Apples per Person
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5 6 5 5
West Virginia 8 35 25 12 2 5 5 4 4 7 5 3
我尝试了一些方法,例如:
- 通过分配新列名创建新列,但不适用于多列索引
tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]
- 尝试了另一种赋值方法
tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017])
;收到此错误 SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
感谢任何帮助!谢谢
给出
df
Number of Apples Number of People
2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5
West Virginia 8 35 25 12 2 5 5 4
可以在第一级索引得到sub-frames,然后除法。分区将在列上 auto-aligned。
df['Number of Apples'] / df['Number of People']
2017 2018 2019 2020
California 5.0 6.0 5.0 5.0
West Virginia 4.0 7.0 5.0 3.0
将此追加回您的 DataFrame:
pd.concat([df, pd.concat([df['Number of Apples'] / df['Number of People']], keys=['Result'], axis=1)], axis=1)
Number of Apples Number of People Result
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5.0 6.0 5.0 5.0
West Virginia 8 35 25 12 2 5 5 4 4.0 7.0 5.0 3.0
速度很快,因为它是完全矢量化的。
你在这里可以做的是stack()
,做你的事,然后unstack()
:
s = df.stack()
s['Number of Apples per Person'] = s['Number of Apples'] / s['Number of People']
df = s.unstack()
输出:
>>> df
Number of Apples Number of People Number of Apples per Person
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5.0 6.0 5.0 5.0
West Virginia 8 35 25 12 2 5 5 4 4.0 7.0 5.0 3.0
One-liner:
df = df.stack().pipe(lambda x: x.assign(**{'Number of Apples per Person': x['Number of Apples'] / x['Number of People']})).unstack()
我创建了一个具有 2 种不同类型值的数据透视表 table i) 2017-2020 年的苹果数量,ii) 2017-2020 年的人数。我想创建额外的列来计算 iii) 2017-2020 年每人的苹果数。我该怎么做?
枢轴的当前代码table:
tdf = df.pivot_table(index="States",
columns="Year",
values=["Number of Apples","Number of People"],
aggfunc= lambda x: len(x.unique()),
margins=True)
tdf
这是我当前的支点 table:
Number of Apples Number of People
2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5
West Virginia 8 35 25 12 2 5 5 4
...
我希望我的数据透视表 table 看起来像这样,我在其中添加了额外的列以将苹果数量除以人数。
Number of Apples Number of People Number of Apples per Person
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5 6 5 5
West Virginia 8 35 25 12 2 5 5 4 4 7 5 3
我尝试了一些方法,例如:
- 通过分配新列名创建新列,但不适用于多列索引
tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]
- 尝试了另一种赋值方法
tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017])
;收到此错误SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
感谢任何帮助!谢谢
给出
df
Number of Apples Number of People
2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5
West Virginia 8 35 25 12 2 5 5 4
可以在第一级索引得到sub-frames,然后除法。分区将在列上 auto-aligned。
df['Number of Apples'] / df['Number of People']
2017 2018 2019 2020
California 5.0 6.0 5.0 5.0
West Virginia 4.0 7.0 5.0 3.0
将此追加回您的 DataFrame:
pd.concat([df, pd.concat([df['Number of Apples'] / df['Number of People']], keys=['Result'], axis=1)], axis=1)
Number of Apples Number of People Result
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5.0 6.0 5.0 5.0
West Virginia 8 35 25 12 2 5 5 4 4.0 7.0 5.0 3.0
速度很快,因为它是完全矢量化的。
你在这里可以做的是stack()
,做你的事,然后unstack()
:
s = df.stack()
s['Number of Apples per Person'] = s['Number of Apples'] / s['Number of People']
df = s.unstack()
输出:
>>> df
Number of Apples Number of People Number of Apples per Person
2017 2018 2019 2020 2017 2018 2019 2020 2017 2018 2019 2020
California 10 18 20 25 2 3 4 5 5.0 6.0 5.0 5.0
West Virginia 8 35 25 12 2 5 5 4 4.0 7.0 5.0 3.0
One-liner:
df = df.stack().pipe(lambda x: x.assign(**{'Number of Apples per Person': x['Number of Apples'] / x['Number of People']})).unstack()