如何使用 pandas 交叉表获得交叉表，以显示列变量的多个值的频率？

Question

假设我有一个数据框：

df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))

用 pd.crosstab 交叉变量很简单：

table = pd.crosstab(index=df['A'], columns=df['B'])

产量：

B  1  2  3  4
A            
0  1  0  0  0
1  0  0  0  1
2  0  1  1  0
3  0  1  0  0

例如，我想要这样的 table：

B  (1+2+3) 1  2  3  4
A            
0     1    1  0  0  0
1     0    0  0  0  1
2     2    0  1  1  0
3     1    0  1  0  0

任何人都可以让我走上正轨吗？

Answer 1

对子集使用 sum，但如果使用小的随机 df 可能会出现问题，您总是会得到另一个值，因此列的值会不同。如果使用 np.random.seed(100) 得到与我的答案相同的测试输出。

table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)

样本：

np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
table = pd.crosstab(index=df['A'], columns=df['B'])
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
print (table)
B  0  1  2  3  4  (1+2+3)
A                        
0  1  0  0  0  1        0
1  0  0  0  1  0        1
2  0  0  1  0  0        1
3  0  1  0  0  0        1

如何使用 pandas 交叉表获得交叉表，以显示列变量的多个值的频率？

How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

python

crosstab

pandas