绘制过滤数据集的箱线图时出现错误 0

Getting Error 0 when plotting boxplot of a filtered dataset

我正在研究 Kaggle: Abalone dataset,在绘制箱线图时我遇到了一个奇怪的问题。

import pandas as pd
import seaborn as sns

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', header=None)
df.columns = ['sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight', 'rings']

如果 运行:

plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df['rings'])

工作完美!

如果我像这样按性别过滤数据集:

df_f = df[df['sex']=='F']
df_m = df[df['sex']=='M']
df_i = df[df['sex']=='I']

df_f = (1307,9)df_m=(1528,9)df_i=(1342,9)

而我运行:

plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df_m['rings'])

工作完美!

但是如果我 运行 上面的代码用于 df_fdf_i 数据集,我会得到一个错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_434828/3363262611.py in <module>
----> 1 sns.boxplot(data=df_f['Rings'])

~/anaconda3/lib/python3.9/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, ax, **kwargs)
   2241 ):
   2242 
-> 2243     plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
   2244                           orient, color, palette, saturation,
   2245                           width, dodge, fliersize, linewidth)

~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
    404                  width, dodge, fliersize, linewidth):
    405 
--> 406         self.establish_variables(x, y, hue, data, orient, order, hue_order)
    407         self.establish_colors(color, palette, saturation)
    408 

~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
     96                 if hasattr(data, "shape"):
     97                     if len(data.shape) == 1:
---> 98                         if np.isscalar(data[0]):
     99                             plot_data = [data]
    100                         else:

~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in __getitem__(self, key)
    940 
    941         elif key_is_scalar:
--> 942             return self._get_value(key)
    943 
    944         if is_hashable(key):

~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
   1049 
   1050         # Similar to Index.get_value, but we do not fall back to positional
-> 1051         loc = self.index.get_loc(label)
   1052         return self.index._get_values_for_loc(self, loc, label)
   1053 

~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 0

没有缺失值,所有值都是整数。

我在这里错过了什么?

如果你想要一个分类列的每个值的箱线图,我建议:

sns.boxplot(data=df, x='rings', y='sex')

您似乎遇到了错误。在这种情况下,seaborn 中的最后一段代码很重要。在 line 447 in categorical.py,有一个 if np.isscalar(data[0])data = df_f['rings'] 的测试。由于数据现在是 pandas 系列,索引位置 0 已测试,但该索引不在选择中。

为了进一步调查这个问题,尝试用一个最小的例子重现它会有所帮助:

import seaborn as sns
import pandas as pd

df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'F'],
                   'Rings': [1, 2, 3, 4]})
df_m = df[df['Sex'] == 'M']
df_f = df[df['Sex'] == 'F']
sns.boxplot(data=df_f['Rings'])

这确实重现了错误。

解决方法是只将值传递给 seaborn 函数:

sns.boxplot(data=df_f['Rings'].values)

或者使用数据帧作为数据and the column asy`:

sns.boxplot(data=df_f, y='Rings')

由于错误在 seaborn/categorical.py 内,类似的功能将 运行 变成同样的问题。

另见 issue 2756 在 seaborn 的 github。