OneHotEncoder : ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

OneHotEncoder : ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

from sklearn.preprocessing import OneHotEncoder

df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_mask = (df.dtypes == "object")
categorical_columns = df.columns[categorical_mask].tolist()
ohe = OneHotEncoder(categories = categorical_mask, sparse = False)
df_encoded = ohe.fit_transform(df)
print(df_encoded[:5, :])

错误:

请问我的代码有什么问题吗?

这是数据片段:

[2

OneHotEncoder 中的 categories 参数不是 select 要编码的特征,因为你需要一个 ColumnTransformer。试试这个:

df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_features = df.select_dtypes("object").columns

column_trans = ColumnTransformer(
    [
        ("onehot_categorical", OneHotEncoder(), categorical_features),
    ],
    remainder="passthrough",  # or drop if you don't want the non-categoricals at all...
)
df_encoded = column_trans.fit_transform(df)

请注意,根据 the docs,类别参数是

categories‘auto’ or a list of array-like, default=’auto’

Categories (unique values) per feature:

    ‘auto’ : Determine categories automatically from the training data.

    list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric

values within a single feature, and should be sorted in case of numeric values.

所以它应该包含每个分类特征的每个可能的类别或级别。如果您知道完整的可能级别集但怀疑您的训练数据可能会遗漏一些级别,您可能会使用它。在你的情况下,我认为你不需要它,所以 'auto',即默认值应该没问题。