在直方图中绘制和标记每个 bin

Question

我有一个包含不同情绪类别（标签）的数据集。您可以从下面代码中定义的 "labels" 变量中查看哪些类别。这些类别中的每一个在该数据集中都有不同数量的可用数据，我试图通过直方图 bins 来表示数据集的分布。

import matplotlib.pyplot as plt
import numpy as np
#labels inside emo variable, however they are labeled with numbers from 0 to 6 in sequence according to labels variable
labels = ['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise','neutral']
labels_np = np.array(labels)
#df_training is holding the train_set.csv, where I am selecting a single column which is 'emotion' 
emo = df_training["emotion"].hist()
plt.plot(labels_np,emo)

df_training['emotion']:

这是我得到的错误：

**ValueError:** x and y must have same first dimension, but have shapes (7,) and (1,)

这是期望的输出：

Answer 1

您似乎只想绘制直方图并设置正确的标签。 df_training.hist 已经绘制了直方图，但使用 0,1,2,... 作为 x 标签。您可以通过调用 plt.xticks 来更改它。由于条形的中心位于位置 0.5、1.5、2.5... 将刻度放在那里会使所有内容对齐。

由于你的数据只包含0到6的值，最好只有7个bin，所以8个边界，hist可以称为bins=range(8)。默认bins=10，绝对不是你想要的

在下面的代码中，我删除了 x 网格线，因为它们令人不安并且不是真正需要的。边缘颜色设置为 ec='white' 以更好地区分条形图。 df_training 的 'emotion' 列填充了一些随机数据。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

labels = ['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise','neutral']
df_training = pd.DataFrame( {'emotion': np.random.randint(0, 7, 100)})
emo = df_training.hist(column='emotion', ec='white', bins=range(8))
plt.grid(False, axis='x')
plt.xticks(ticks=np.arange(0.5,6.6,1), labels=labels)
plt.show()

在直方图中绘制和标记每个 bin

Plotting and labeling each bin in a histogram

python

matplotlib

histogram

pandas