如何绘制 pandas 数据框的某些行

Question

我有这个示例数据框：

      animal gender     name  first  second  third
0     dog      m      Ben      5       6      3
1     dog      f    Lilly      2       3      5
2     dog      m      Bob      3       2      1
3     cat      f     Puss      1       4      4
4     cat      m  Inboots      3       6      5
5    wolf      f     Lady    NaN       0      3
6    wolf      m   Summer      2       2      1
7    wolf      m     Grey      4       2      3
8    wolf      m     Wind      2       3      5
9    lion      f     Elsa      5       1      4
10   lion      m    Simba      3       3      3
11   lion      f     Nala      4       4      2

现在，我怀疑我可能需要对此进行一些分层索引，但我还没有达到 Pandas 的程度。但是，我真的需要用它做一些（显然太高级）的事情，但还没有想出如何去做。基本上，在这种情况下，我最终想要的是一个图（可能是一个散点图，尽管现在一条线也可以）。

1) 我想要一个包含 4 个子图的图 - 每只动物一个子图。每个子图的标题应该是动物。

2) 在每个子图中，我想绘制数字（例如每年出生的幼崽数量），即 'first'、'second' 和 [=37 的值=] 对于给定的行并给它一个标签，这将在图例中显示 'name' 。对于每个子图（每只动物），我想分别绘制雄性和雌性（例如蓝色的雄性和红色的雌性），此外，还绘制动物的平均值（即每列中的平均值）给定动物）黑色。

3) 注意：例如将它与 1,2,3 作图 - 指的是列号，因此，例如，对于标题为 'dog' 的第一个子图，我想绘制类似 plt.plot(np.array([1,2,3]),x,'b', np.array([1,2,3]),y,'r', np.array([1,2,3]), np.mean(x,y,axis=1),'k') 的内容，其中 x 将（在第一种情况下）为 5,6,3 以及此蓝色图的图例会显示 'Ben'，y 会是 2,3,5，红色图的图例会显示 'Lilly'，黑色图会是 3.5, 4.5, 4，在图例中我会定义它是"mean"（对于每个子图）。

我希望我说得够清楚了。我明白，如果没有看到结果图，可能很难想象，但是……好吧，如果我知道怎么做，我就不会问了……

所以总而言之，我想在不同层次上遍历数据框，将动物放在不同的子图中，并在每个子图中比较雄性和雌性以及它们之间的平均值。

我的实际数据框要大得多，所以在理想情况下，我想要一个健壮但易于理解的解决方案（对于编程初学者）。

要了解子图应该是什么样子，这是 excel 中的产品：

Answer 1

我不确定我是否理解你的意思。但我认为您需要将数据框转换为长格式或 tidy format，因为使用该格式您将对其进行的许多操作会更容易，首先是根据分类变量绘制图表。

将 df 作为您的数据框，要将其转换为整洁的格式，只需使用：

df2 = pd.melt(df, id_vars=["animal","gender","name"])
df2
  animal gender     name variable  value
0    dog      m      Ben    first    5.0
1    dog      f    Lilly    first    2.0
2    dog      m      Bob    first    3.0
3    cat      f     Puss    first    1.0
4    cat      m  Inboots    first    3.0
...
31   wolf     m     Grey    third    3.0
32   wolf     m     Wind    third    5.0
33   lion     f     Elsa    third    4.0
34   lion     m    Simba    third    3.0
35   lion     f     Nala    third    2.0

然后（几乎）一切都变得简单，只需按如下方式使用 seaborn：

g = sns.factorplot(data=df2, # from your Dataframe
                   col="animal", # Make a subplot in columns for each variable in "animal"
                   col_wrap=2, # Maximum number of columns per row 
                   x="variable", # on x-axis make category on the variable "variable" (created by the melt operation)
                   y="value", # The corresponding y values
                   hue="gender", # color according to the column gender
                   kind="strip", # the kind of plot, the closest to what you want is a stripplot, 
                   legend_out=False, # let the legend inside the first subplot.
                   )

然后可以提高整体审美：

g.set_xlabels("year")
g.set_titles(template="{col_name}") # otherwise it's "animal = dog", now it's just "dog"
sns.despine(trim=True) # trim the axis.

要添加平均值，恐怕您必须手动完成，但是，如果您有更多数据，您也可以考虑箱形图或小提琴图，您可以在带状图，顺便说一句。

我邀请你检查Seaborn's documentation以进一步改进你的情节。

HTH

如何绘制 pandas 数据框的某些行

How to plot certain rows of a pandas dataframe

python

matplotlib

pandas

seaborn