一种热编码和 pandas 使用 scikit 学习
onehot encoding and pandas using scikit learn
我正在从 pandas 数据帧构建一个热编码函数,但无法弄清楚如何将数据返回到数据帧中。我得到:
"IndexError: only integers, slices (:
), ellipsis (...
),
numpy.newaxis (None
) and integer or boolean arrays are valid
indices
如何将其重新整合回 pandas 数据框?
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
# one hot encoder
enc = OneHotEncoder(sparse=False)
column_norm = column + "_encoded"
df = enc.fit_transform(df_to_encode[[column]])
return df
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)
你不需要sklearn
,你可以简单地使用pandas.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
return pd.get_dummies(df_to_encode, columns=features)
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode, df)
您可以使用内置于 SciKit 的 OneHotEncoder
的 get_feature_names
,然后删除旧列。这样,你仍然可以使用 OneHotEncoder
而不是 pd.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
enc = OneHotEncoder(sparse=False)
df_enc = pd.DataFrame(enc.fit_transform(df_to_encode[[column]]))
df_enc.columns = enc.get_feature_names([column])
df_to_encode.drop(column, axis = 1, inplace = True)
df_fin = pd.concat([df_to_encode, df_enc], axis = 1)
return df_fin
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)
我正在从 pandas 数据帧构建一个热编码函数,但无法弄清楚如何将数据返回到数据帧中。我得到:
"IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
如何将其重新整合回 pandas 数据框?
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
# one hot encoder
enc = OneHotEncoder(sparse=False)
column_norm = column + "_encoded"
df = enc.fit_transform(df_to_encode[[column]])
return df
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)
你不需要sklearn
,你可以简单地使用pandas.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
return pd.get_dummies(df_to_encode, columns=features)
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode, df)
您可以使用内置于 SciKit 的 OneHotEncoder
的 get_feature_names
,然后删除旧列。这样,你仍然可以使用 OneHotEncoder
而不是 pd.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
enc = OneHotEncoder(sparse=False)
df_enc = pd.DataFrame(enc.fit_transform(df_to_encode[[column]]))
df_enc.columns = enc.get_feature_names([column])
df_to_encode.drop(column, axis = 1, inplace = True)
df_fin = pd.concat([df_to_encode, df_enc], axis = 1)
return df_fin
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)