如何针对不同列中的每个单独条目获取 Pandas 中计数的归一化值(就像分类条形图一样)

How to get Normalized values of counts in Pandas against each individual entry in a different column (Just like a Categorical Bar Plot)

我有一个如下所示的 DataFrame:

df = Pd.DataFrame({'Correct Prediction (Insert None if none of the predictions are correct)':[1,0,1,4,'NONE',1,0,3,2,'NONE'],
                   'Subject':['Physics','Maths','Chemistry','Biology','Physics','Physics','Maths','Biology','Chemistry','Maths']})

所以我想获取所有条目以查看 每个主题在 0,1,2,3,4 和 NONE 中的百分比。让我们假设 NONE 中有多少物理问题,这样我就可以得到 NONE AND 中属于物理的总问题,然后除以物理问题总数。我可以使用下面的代码来解决这个问题:

df['Subject'].value_counts()
df[(df['Subject'] == 'Physics') & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]

但是什么是更简单、更好的方法呢?

我试过了

pd.crosstab(df['Correct Prediction (Insert None if none of the predictions are correct)'],df['Subject'], normalize = True)

但它给了我奇怪的值,例如 0.1 而不是 0.333

我可以循环执行此操作:

counts = df['Subject'].value_counts()
for index in counts.index:
    print(f"Results for: {index}\n")
    total_count = counts[index]
    for position in [0,1,2,3,4,'NONE']:
        i = df[(df['Subject'] == index) & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]
        print(f"Position {position} : {round((i / total_count)*100, 2)}%")
    print("-"*50,'\n')

尝试以下操作:

correct_prediction = pd.Categorical([df['Correct Prediction (Insert None if none of the predictions are correct)'].tolist(), categories=[0,1,2, 3, 'NONE'])
subject = pd.Categorical(df['Subject'].tolist(), categories=['Physics', 'Maths', 'Chemistry', 'Biology'])

pd.crosstab(correct_prediction, subject, normalize='columns')