Python 中的反向标签编码器功能

Question

考虑下面的示例 table，我正在尝试对

进行预测

如您所见，我混合了数值 (Num1 & Num2) 和分类特征 (Cat1 & Cat2)预测一个值，我正在使用随机森林回归

读入文件后，我使用 LabelEncoder 将分类特征转换为数字特征，就像这样

category_col =['Cat1', 'Cat2'] 
labelEncoder = preprocessing.LabelEncoder()

# creating a map of all the numerical values of each categorical labels.
mapping_dict={}
for col in category_col:
    df[col] = labelEncoder.fit_transform(df[col])
    le_name_mapping = dict(zip(labelEncoder.classes_, labelEncoder.transform(labelEncoder.classes_)))
    mapping_dict[col]=le_name_mapping

转换后，我会将我的数据帧拆分为训练集和测试集并进行预测，就像这样

train_features, test_features, train_labels, test_labels = train_test_split(df, labels, test_size = 0.30)

rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(train_features, train_labels)
predictions = rf.predict(test_features)

我的问题是，如何更改 Cat1 和 Cat2 的数字以再次显示原始类别，以便我可以导出预测，就像这样

我知道我需要使用 labelEncoder.inverse_transform，但是，我似乎无法获得正确的语法来取回类别文本以配合结果.

感谢任何帮助！

Answer 1

快速解决方案，基于您已有的代码：

# Invert the mapping dictionary you created
inv_mapping_dict = {cat: {v: k for k, v in map_dict.items()} for cat, map_dict in mapping_dict.items()}

# Assuming `predictions` is your resulting dataframe.
# Replace the predictions with the inverted mapping dictionary.
predictions.replace(inv_mapping_dict)

为了获得更好的方法，您也可以在创建初始映射字典时考虑此处的答案：

Label encoding across multiple columns in scikit-learn

除了在类别列上使用 for 循环来创建映射字典，您还可以在列上创建 LabelEncoder 字典，然后在开始和结束时同时应用所有列的拟合和反转。

Python 中的反向标签编码器功能

Reverse Label Encoder Features in Python

python

regression

machine-learning

random-forest