如何针对不同列中的每个单独条目获取 Pandas 中计数的归一化值（就像分类条形图一样）

Question

我有一个如下所示的 DataFrame：

df = Pd.DataFrame({'Correct Prediction (Insert None if none of the predictions are correct)':[1,0,1,4,'NONE',1,0,3,2,'NONE'],
                   'Subject':['Physics','Maths','Chemistry','Biology','Physics','Physics','Maths','Biology','Chemistry','Maths']})

所以我想获取所有条目以查看 每个主题在 0,1,2,3,4 和 NONE 中的百分比。让我们假设 NONE 中有多少物理问题，这样我就可以得到 NONE AND 中属于物理的总问题，然后除以物理问题总数。我可以使用下面的代码来解决这个问题：

df['Subject'].value_counts()
df[(df['Subject'] == 'Physics') & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]

但是什么是更简单、更好的方法呢？

我试过了

pd.crosstab(df['Correct Prediction (Insert None if none of the predictions are correct)'],df['Subject'], normalize = True)

但它给了我奇怪的值，例如 0.1 而不是 0.333

我可以循环执行此操作：

counts = df['Subject'].value_counts()
for index in counts.index:
    print(f"Results for: {index}\n")
    total_count = counts[index]
    for position in [0,1,2,3,4,'NONE']:
        i = df[(df['Subject'] == index) & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]
        print(f"Position {position} : {round((i / total_count)*100, 2)}%")
    print("-"*50,'\n')

Answer 1

尝试以下操作：

correct_prediction = pd.Categorical([df['Correct Prediction (Insert None if none of the predictions are correct)'].tolist(), categories=[0,1,2, 3, 'NONE'])
subject = pd.Categorical(df['Subject'].tolist(), categories=['Physics', 'Maths', 'Chemistry', 'Biology'])

pd.crosstab(correct_prediction, subject, normalize='columns')

如何针对不同列中的每个单独条目获取 Pandas 中计数的归一化值（就像分类条形图一样）

How to get Normalized values of counts in Pandas against each individual entry in a different column (Just like a Categorical Bar Plot)

python

numpy

dataframe

pandas

pandas-groupby