使用带有分类数据的 seaborn barplot 的困难
Difficulties using seaborn barplot with categorical data
我在使用 seaborn 的 "categorical" 绘图函数实际绘制分类数据率时遇到了一个反复出现的问题。
我在这里制作了一个简单的例子,我可以发誓用它来使用 seaborn。我设法找到了使用虚拟变量的解决方法,但这并不总是很方便。有谁知道为什么我的 "Version 2" 条形图用例不起作用?
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generate some example data of labels and associated values
outcomes = ['A' for _ in range(50)] + \
['B' for _ in range(20)] + \
['C' for _ in range(5)]
trial = range(len(outcomes))
df = DataFrame({'Trial': trial, 'Outcome': outcomes})
plt.close('all')
# Version 1: This works but is a non-ideal workaround
# Generate separate boolean columns for each outcome
df2 = pd.get_dummies(df.Outcome).astype(bool)
plt.figure()
sns.barplot(data=df2, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V1')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()
# Version 2: This doesn't work and results in the following error
# unsupported operand type(s) for /: 'str' and 'int'
plt.figure()
sns.barplot(x='Outcome', data=df, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V2')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()
添加 y
参数对您有用:
sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: 100 * np.mean(x))
但是,在您的情况下,使用 sns.countplot
进行绘图更有意义(因为您希望将试验 10 视为一次出现,而不是实际的数字 10):
sns.countplot(x='Outcome', data=df)
当然,如果你想要百分比,你可以这样做:
sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: len(x) / len(df) * 100)
说明
对于宽格式数据框(例如df2
),您可以只将数据框传递给data
参数,Seaborn会自动沿x轴绘制每个数字列.
对于长格式数据框(例如 df
),您需要将参数传递给 x
和 y
参数。
来自 sns.barplot
文档字符串(添加了 em):
Input data can be passed in a variety of formats, including:
- Vectors of data represented as lists, numpy arrays, or pandas Series
objects passed directly to the
x
, y
, and/or hue
parameters.
- A "long-form" DataFrame, in which case the
x
, y
, and hue
variables will determine how the data are plotted.
- A "wide-form" DataFrame, such that each numeric column will be plotted.
- Anything accepted by
plt.boxplot
(e.g. a 2d array or list of vectors)
我在使用 seaborn 的 "categorical" 绘图函数实际绘制分类数据率时遇到了一个反复出现的问题。
我在这里制作了一个简单的例子,我可以发誓用它来使用 seaborn。我设法找到了使用虚拟变量的解决方法,但这并不总是很方便。有谁知道为什么我的 "Version 2" 条形图用例不起作用?
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generate some example data of labels and associated values
outcomes = ['A' for _ in range(50)] + \
['B' for _ in range(20)] + \
['C' for _ in range(5)]
trial = range(len(outcomes))
df = DataFrame({'Trial': trial, 'Outcome': outcomes})
plt.close('all')
# Version 1: This works but is a non-ideal workaround
# Generate separate boolean columns for each outcome
df2 = pd.get_dummies(df.Outcome).astype(bool)
plt.figure()
sns.barplot(data=df2, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V1')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()
# Version 2: This doesn't work and results in the following error
# unsupported operand type(s) for /: 'str' and 'int'
plt.figure()
sns.barplot(x='Outcome', data=df, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V2')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()
添加 y
参数对您有用:
sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: 100 * np.mean(x))
但是,在您的情况下,使用 sns.countplot
进行绘图更有意义(因为您希望将试验 10 视为一次出现,而不是实际的数字 10):
sns.countplot(x='Outcome', data=df)
当然,如果你想要百分比,你可以这样做:
sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: len(x) / len(df) * 100)
说明
对于宽格式数据框(例如df2
),您可以只将数据框传递给data
参数,Seaborn会自动沿x轴绘制每个数字列.
对于长格式数据框(例如 df
),您需要将参数传递给 x
和 y
参数。
来自 sns.barplot
文档字符串(添加了 em):
Input data can be passed in a variety of formats, including:
- Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the
x
,y
, and/orhue
parameters.- A "long-form" DataFrame, in which case the
x
,y
, andhue
variables will determine how the data are plotted.- A "wide-form" DataFrame, such that each numeric column will be plotted.
- Anything accepted by
plt.boxplot
(e.g. a 2d array or list of vectors)