如何适应列顺序编码
How to fit column wise ordinal encoding
我有一个如下所示的数据框
tdf = pd.DataFrame({'grade': np.random.choice(list('AAAD'),size=(5)),
'dash': np.random.choice(list('PPPS'),size=(5)),
'dumeel': np.random.choice(list('QWRR'),size=(5)),
'dumma': np.random.choice((1234),size=(5)),
'target': np.random.choice([0,1],size=(5))
})
我想得到一个基于给定序数编码技术的映射字典here
from feature_engine.encoding import OrdinalEncoder
X = tdf.drop(['target'], axis=1)
y = tdf.target
train_t, test_t, y_train, y_test = train_test_split(X, y,
test_size=0.25,
random_state=0)
cat_list= tdf.select_dtypes(include=['object']).columns.tolist()
ordinal_encoders = {}
for col in cat_list:
print(col)
ordi = OrdinalEncoder(encoding_method='ordered')
ordinal_encoders[col] = ordi
ordi.fit(train_t[col], y_train)
train_t[col] = ordi.transform(train_t[col])
但是,我收到以下错误
TypeError: X is not a pandas dataframe. The dataset should be a pandas
dataframe.
如何逐列拟合和转换序号编码器?我能够如下所示初始化编码器,但无法适应和转换它们
{'grade': OrdinalEncoder(),
'dash': OrdinalEncoder(),
'dumeel': OrdinalEncoder()}
我想这样做是因为稍后,我希望最终得到映射字典(每个类别的序数值并将其存储在字典中)
scikit
learn 有一个 so-called ColumnTransformer
用于那个确切的案例。在那里你可以指定各种转换器和它们应该应用的列。
在代码中,大致读起来像
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[('ord', OrdinalEncoder(encoding_method='ordered'), ['grade', 'dash', 'dumeel'])], remainder="passthrough") # remainder passthrough means that all not mentioned columns will not be touched.
transformed = transformer.fit_transform(tdf)
我有一个如下所示的数据框
tdf = pd.DataFrame({'grade': np.random.choice(list('AAAD'),size=(5)),
'dash': np.random.choice(list('PPPS'),size=(5)),
'dumeel': np.random.choice(list('QWRR'),size=(5)),
'dumma': np.random.choice((1234),size=(5)),
'target': np.random.choice([0,1],size=(5))
})
我想得到一个基于给定序数编码技术的映射字典here
from feature_engine.encoding import OrdinalEncoder
X = tdf.drop(['target'], axis=1)
y = tdf.target
train_t, test_t, y_train, y_test = train_test_split(X, y,
test_size=0.25,
random_state=0)
cat_list= tdf.select_dtypes(include=['object']).columns.tolist()
ordinal_encoders = {}
for col in cat_list:
print(col)
ordi = OrdinalEncoder(encoding_method='ordered')
ordinal_encoders[col] = ordi
ordi.fit(train_t[col], y_train)
train_t[col] = ordi.transform(train_t[col])
但是,我收到以下错误
TypeError: X is not a pandas dataframe. The dataset should be a pandas dataframe.
如何逐列拟合和转换序号编码器?我能够如下所示初始化编码器,但无法适应和转换它们
{'grade': OrdinalEncoder(),
'dash': OrdinalEncoder(),
'dumeel': OrdinalEncoder()}
我想这样做是因为稍后,我希望最终得到映射字典(每个类别的序数值并将其存储在字典中)
scikit
learn 有一个 so-called ColumnTransformer
用于那个确切的案例。在那里你可以指定各种转换器和它们应该应用的列。
在代码中,大致读起来像
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[('ord', OrdinalEncoder(encoding_method='ordered'), ['grade', 'dash', 'dumeel'])], remainder="passthrough") # remainder passthrough means that all not mentioned columns will not be touched.
transformed = transformer.fit_transform(tdf)