如何使用 sklearn 标签编码器并直接应用于我的数据框
How can I use sklearn label encoder and apply to my dataframe directly
我有一个数据框,我想直接在上面使用 LabelEncoder。
数据帧:
df.select_dtypes('object').iloc[:,1:]
Gender Married x_y x_z
0 Male No 0 No
1 Male Yes 1 No
2 Male Yes 2 Yes
3 Male Yes 3+ No
4 Male No 1 No
我试过这些:
le = LabelEncoder()
df.select_dtypes('object').iloc[:,1:].apply(le.fit_transform, axis=1)
TypeError: ("'<' not supported between instances of 'float' and
'str'", 'occurred at index 11')
df.select_dtypes('object').iloc[:,1:].apply(LabelEncoder.fit_transform)
TypeError: ("fit_transform() missing 1 required positional argument:
'y'", 'occurred at index Gender')
关于如何使用它的任何帮助。
我能想到的最基本的方法是 select 对象列,然后遍历它们并 fit_transform()
使用 LabelEncoder
for col in df.select_dtypes(object).columns:
df[col]=LabelEncoder().fit_transform(df[col])
假设 df
是您要转换的过滤数据框(例如,根据问题中的示例):
>>> df.apply(LabelEncoder().fit_transform)
Gender Married x_y x_z
0 0 0 0 0
1 0 1 1 0
2 0 1 2 1
3 0 1 3 0
4 0 0 1 0
为了使解码更通用,您需要跟踪标签编码器(我使用了以数据帧列名称为键的字典)。然后你需要适应每一个。
encoders = {col: LabelEncoder().fit(df[col]) for col in df}
encoded_df = pd.DataFrame(
{col: encoders[col].transform(df[col]) for col in df},
index=df.index)
>>>encoded_df
Gender Married x_y x_z
0 0 0 0 0
1 0 1 1 0
2 0 1 2 1
3 0 1 3 0
4 0 0 1 0
decoded_df = pd.DataFrame(
{col: encoders[col].inverse_transform(encoded_df[col]) for col in encoded_df},
index=encoded_df.index)
Gender Married x_y x_z
0 Male No 0 No
1 Male Yes 1 No
2 Male Yes 2 Yes
3 Male Yes 3+ No
4 Male No 1 No
我有一个数据框,我想直接在上面使用 LabelEncoder。
数据帧:
df.select_dtypes('object').iloc[:,1:]
Gender Married x_y x_z
0 Male No 0 No
1 Male Yes 1 No
2 Male Yes 2 Yes
3 Male Yes 3+ No
4 Male No 1 No
我试过这些:
le = LabelEncoder()
df.select_dtypes('object').iloc[:,1:].apply(le.fit_transform, axis=1)
TypeError: ("'<' not supported between instances of 'float' and 'str'", 'occurred at index 11')
df.select_dtypes('object').iloc[:,1:].apply(LabelEncoder.fit_transform)
TypeError: ("fit_transform() missing 1 required positional argument: 'y'", 'occurred at index Gender')
关于如何使用它的任何帮助。
我能想到的最基本的方法是 select 对象列,然后遍历它们并 fit_transform()
使用 LabelEncoder
for col in df.select_dtypes(object).columns:
df[col]=LabelEncoder().fit_transform(df[col])
假设 df
是您要转换的过滤数据框(例如,根据问题中的示例):
>>> df.apply(LabelEncoder().fit_transform)
Gender Married x_y x_z
0 0 0 0 0
1 0 1 1 0
2 0 1 2 1
3 0 1 3 0
4 0 0 1 0
为了使解码更通用,您需要跟踪标签编码器(我使用了以数据帧列名称为键的字典)。然后你需要适应每一个。
encoders = {col: LabelEncoder().fit(df[col]) for col in df}
encoded_df = pd.DataFrame(
{col: encoders[col].transform(df[col]) for col in df},
index=df.index)
>>>encoded_df
Gender Married x_y x_z
0 0 0 0 0
1 0 1 1 0
2 0 1 2 1
3 0 1 3 0
4 0 0 1 0
decoded_df = pd.DataFrame(
{col: encoders[col].inverse_transform(encoded_df[col]) for col in encoded_df},
index=encoded_df.index)
Gender Married x_y x_z
0 Male No 0 No
1 Male Yes 1 No
2 Male Yes 2 Yes
3 Male Yes 3+ No
4 Male No 1 No