尝试将 LabelEncoder 和 OneHotEncoder 用于具有多列的数据集
Trying to use LabelEncoder and OneHotEncoder into a Dataset with Multiple Columns
我正在尝试转换多个列,其中有一堆分类值数据;但是当我使用 OneHotEncoder
时出现错误
My Dataframe
1) 分隔X_census和Y_census中的列(X_census包含分类值):
X_census = df[['workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country']]
Y_census = df['income']
2) 使用 LabelEncoder:
处理来自 X_census 的分类值
从 sklearn.preprocessing 导入 LabelEncoder
le = LabelEncoder()
X_1 = X_census.apply(le.fit_transform)
X_2 = X_1.to_numpy()
3) 现在在我的 X_2 中使用 OneHotEncoder 将分类值转换为数值:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
oh = OneHotEncoder()
onehotencoder_census = ColumnTransformer(transformers=[('OneHot', oh, X_2[:])],remainder='passthrough')
X_census = onehotencoder_census.fit_transform(X_census) # Error appears here!
The Error
你可以使用pandas.get_dummies
df = pd.DataFrame({"marital_status":['S','M','D','S','M','D','S','M','D'],
"sex":["male","female","male","female","male","female","male","female","male"],
"education":['grad','post-grad','grad','post-grad','grad','post-grad','grad','post-grad','grad'],
"income":[125,135,120,110,90,150,180,130,110]})
pd.get_dummies(df)
我正在尝试转换多个列,其中有一堆分类值数据;但是当我使用 OneHotEncoder
时出现错误My Dataframe
1) 分隔X_census和Y_census中的列(X_census包含分类值):
X_census = df[['workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country']]
Y_census = df['income']
2) 使用 LabelEncoder:
处理来自 X_census 的分类值从 sklearn.preprocessing 导入 LabelEncoder
le = LabelEncoder()
X_1 = X_census.apply(le.fit_transform)
X_2 = X_1.to_numpy()
3) 现在在我的 X_2 中使用 OneHotEncoder 将分类值转换为数值:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
oh = OneHotEncoder()
onehotencoder_census = ColumnTransformer(transformers=[('OneHot', oh, X_2[:])],remainder='passthrough')
X_census = onehotencoder_census.fit_transform(X_census) # Error appears here!
The Error
你可以使用pandas.get_dummies
df = pd.DataFrame({"marital_status":['S','M','D','S','M','D','S','M','D'], "sex":["male","female","male","female","male","female","male","female","male"], "education":['grad','post-grad','grad','post-grad','grad','post-grad','grad','post-grad','grad'], "income":[125,135,120,110,90,150,180,130,110]})
pd.get_dummies(df)