计算高维交叉表中的百分比
calculate percentage in high dimensional crosstab
我对 3 个变量(position
、offer
、group
)做了 crosstab
。如何计算 1 个变量 offer
的总百分比,而不是边距(即按列归一化)?
df = pd.crosstab(df.group, [df.position, df.offer], margins = True)
df
pid offer position group
1 accept left group1
1 accept left group1
1 accept right group2
1 reject right group2
1 reject right group1
2 reject right group1
2 reject left group2
2 accept left group3
3 accept right group3
3 reject right group1
3 reject right group2
我当前的交叉表:
position left right All
offer accept reject accept reject
group1 2 0 0 3 5
group2 0 1 1 2 4
group3 1 0 1 0 2
All 3 1 2 5 11
预期结果:
position left right
offer accept reject accept reject
group1 1 0 0 1
group2 0 1 0.33 0.66
group3 1 0 1 0
谢谢!
添加另一个步骤,groupby
沿列的第 0 级并将 c
除以 sum
。
c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)
position left right
offer accept reject accept reject
group
group1 1.0 0.0 0.000000 1.000000
group2 0.0 1.0 0.333333 0.666667
group3 1.0 0.0 1.000000 0.000000
如果您和我一样是个完美主义者,您可能希望所有数字都是整数,您可以这样做:
df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)
position left right
offer accept reject accept reject
group
group1 1 0 0 1
group2 0 1 0.333333 0.666667
group3 1 0 1 0
你可以使用
In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)
In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer accept reject
position left right left right
group
group1 1.0 0.000000 0.0 1.000000
group2 0.0 0.333333 1.0 0.666667
group3 1.0 1.000000 0.0 0.000000
您也可以从 pivot_table
获得 dfa
。
df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']
我对 3 个变量(position
、offer
、group
)做了 crosstab
。如何计算 1 个变量 offer
的总百分比,而不是边距(即按列归一化)?
df = pd.crosstab(df.group, [df.position, df.offer], margins = True)
df
pid offer position group
1 accept left group1
1 accept left group1
1 accept right group2
1 reject right group2
1 reject right group1
2 reject right group1
2 reject left group2
2 accept left group3
3 accept right group3
3 reject right group1
3 reject right group2
我当前的交叉表:
position left right All
offer accept reject accept reject
group1 2 0 0 3 5
group2 0 1 1 2 4
group3 1 0 1 0 2
All 3 1 2 5 11
预期结果:
position left right
offer accept reject accept reject
group1 1 0 0 1
group2 0 1 0.33 0.66
group3 1 0 1 0
谢谢!
添加另一个步骤,groupby
沿列的第 0 级并将 c
除以 sum
。
c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)
position left right
offer accept reject accept reject
group
group1 1.0 0.0 0.000000 1.000000
group2 0.0 1.0 0.333333 0.666667
group3 1.0 0.0 1.000000 0.000000
如果您和我一样是个完美主义者,您可能希望所有数字都是整数,您可以这样做:
df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)
position left right
offer accept reject accept reject
group
group1 1 0 0 1
group2 0 1 0.333333 0.666667
group3 1 0 1 0
你可以使用
In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)
In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer accept reject
position left right left right
group
group1 1.0 0.000000 0.0 1.000000
group2 0.0 0.333333 1.0 0.666667
group3 1.0 1.000000 0.0 0.000000
您也可以从 pivot_table
获得 dfa
。
df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']