绘制过滤数据集的箱线图时出现错误 0
Getting Error 0 when plotting boxplot of a filtered dataset
我正在研究 Kaggle: Abalone dataset,在绘制箱线图时我遇到了一个奇怪的问题。
import pandas as pd
import seaborn as sns
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', header=None)
df.columns = ['sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight', 'rings']
如果 运行:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df['rings'])
工作完美!
如果我像这样按性别过滤数据集:
df_f = df[df['sex']=='F']
df_m = df[df['sex']=='M']
df_i = df[df['sex']=='I']
df_f = (1307,9)
、df_m=(1528,9)
和 df_i=(1342,9)
而我运行:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df_m['rings'])
工作完美!
但是如果我 运行 上面的代码用于 df_f
和 df_i
数据集,我会得到一个错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_434828/3363262611.py in <module>
----> 1 sns.boxplot(data=df_f['Rings'])
~/anaconda3/lib/python3.9/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, ax, **kwargs)
2241 ):
2242
-> 2243 plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
2244 orient, color, palette, saturation,
2245 width, dodge, fliersize, linewidth)
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
404 width, dodge, fliersize, linewidth):
405
--> 406 self.establish_variables(x, y, hue, data, orient, order, hue_order)
407 self.establish_colors(color, palette, saturation)
408
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
96 if hasattr(data, "shape"):
97 if len(data.shape) == 1:
---> 98 if np.isscalar(data[0]):
99 plot_data = [data]
100 else:
~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in __getitem__(self, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):
~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = self.index.get_loc(label)
1052 return self.index._get_values_for_loc(self, loc, label)
1053
~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
没有缺失值,所有值都是整数。
我在这里错过了什么?
如果你想要一个分类列的每个值的箱线图,我建议:
sns.boxplot(data=df, x='rings', y='sex')
您似乎遇到了错误。在这种情况下,seaborn 中的最后一段代码很重要。在 line 447 in categorical.py,有一个 if np.isscalar(data[0])
和 data = df_f['rings']
的测试。由于数据现在是 pandas 系列,索引位置 0 已测试,但该索引不在选择中。
为了进一步调查这个问题,尝试用一个最小的例子重现它会有所帮助:
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'F'],
'Rings': [1, 2, 3, 4]})
df_m = df[df['Sex'] == 'M']
df_f = df[df['Sex'] == 'F']
sns.boxplot(data=df_f['Rings'])
这确实重现了错误。
解决方法是只将值传递给 seaborn 函数:
sns.boxplot(data=df_f['Rings'].values)
或者使用数据帧作为数据and the column as
y`:
sns.boxplot(data=df_f, y='Rings')
由于错误在 seaborn/categorical.py
内,类似的功能将 运行 变成同样的问题。
另见 issue 2756 在 seaborn 的 github。
我正在研究 Kaggle: Abalone dataset,在绘制箱线图时我遇到了一个奇怪的问题。
import pandas as pd
import seaborn as sns
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', header=None)
df.columns = ['sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight', 'rings']
如果 运行:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df['rings'])
工作完美!
如果我像这样按性别过滤数据集:
df_f = df[df['sex']=='F']
df_m = df[df['sex']=='M']
df_i = df[df['sex']=='I']
df_f = (1307,9)
、df_m=(1528,9)
和 df_i=(1342,9)
而我运行:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df_m['rings'])
工作完美!
但是如果我 运行 上面的代码用于 df_f
和 df_i
数据集,我会得到一个错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_434828/3363262611.py in <module>
----> 1 sns.boxplot(data=df_f['Rings'])
~/anaconda3/lib/python3.9/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, ax, **kwargs)
2241 ):
2242
-> 2243 plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
2244 orient, color, palette, saturation,
2245 width, dodge, fliersize, linewidth)
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
404 width, dodge, fliersize, linewidth):
405
--> 406 self.establish_variables(x, y, hue, data, orient, order, hue_order)
407 self.establish_colors(color, palette, saturation)
408
~/anaconda3/lib/python3.9/site-packages/seaborn/categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
96 if hasattr(data, "shape"):
97 if len(data.shape) == 1:
---> 98 if np.isscalar(data[0]):
99 plot_data = [data]
100 else:
~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in __getitem__(self, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):
~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = self.index.get_loc(label)
1052 return self.index._get_values_for_loc(self, loc, label)
1053
~/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
没有缺失值,所有值都是整数。
我在这里错过了什么?
如果你想要一个分类列的每个值的箱线图,我建议:
sns.boxplot(data=df, x='rings', y='sex')
您似乎遇到了错误。在这种情况下,seaborn 中的最后一段代码很重要。在 line 447 in categorical.py,有一个 if np.isscalar(data[0])
和 data = df_f['rings']
的测试。由于数据现在是 pandas 系列,索引位置 0 已测试,但该索引不在选择中。
为了进一步调查这个问题,尝试用一个最小的例子重现它会有所帮助:
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'F'],
'Rings': [1, 2, 3, 4]})
df_m = df[df['Sex'] == 'M']
df_f = df[df['Sex'] == 'F']
sns.boxplot(data=df_f['Rings'])
这确实重现了错误。
解决方法是只将值传递给 seaborn 函数:
sns.boxplot(data=df_f['Rings'].values)
或者使用数据帧作为数据and the column as
y`:
sns.boxplot(data=df_f, y='Rings')
由于错误在 seaborn/categorical.py
内,类似的功能将 运行 变成同样的问题。
另见 issue 2756 在 seaborn 的 github。