如何从训练有素的随机森林模型中获得预测？

Question

我有一个包含两列用户 posts (posts) 和性格类型 (type) 的数据集，我需要根据 posts 使用此数据集的性格类型，所以我使用随机森林回归进行预测这是我的代码：-

df = pd.read_csv('personality_types.csv')

count_vectorizer = CountVectorizer(decode_error='ignore')
X = count_vectorizer.fit_transform(df['posts'])
y = df['type'].values

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.33)

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(Xtrain, Ytrain)
Y_prediction = random_forest.predict(Xtest)

准确度：

random_forest.score(Xtrain, Ytrain)
acc_random_forest = round(random_forest.score(Xtrain, Ytrain) * 100, 2)
print(round(acc_random_forest,2,), "%")

100%

现在我想从自定义文本中获得预测，我该如何实现？我如何使用此模型分别获得 post 的性格类型。

Answer 1

如果 df 的自定义文本格式与 posts 相同，您可以执行以下操作：

custom_text = count_vectorizer.transform(df['custom_text'])
value_predicted = random_forest.predict(custom_text)

value_predicted 包含结果。当然，count_vectorizer 和 random_forest 应该是您示例中训练的模型。

此外，您的示例中可能有错字，您应该检查测试的表现，而不是火车：

random_forest.score()
acc_random_forest = round(random_forest.score(Xtest, Ytest) * 100, 2)
print(round(acc_random_forest,2,), "%")
Out:
<Some score>

100% 的准确率分数看起来像 overfitting。

Answer 2

在 df 的同一数据集中创建一个新列。将其命名为 custom_text 或 user_text 或任何 else.Take 输入将其存储在该列中，以便该列的所有行都包含相同的值

custom_text = input("Enter Text")
custom_text = count_vectorizer.transform(df['custom_text'])
value_predicted = random_forest.predict(custom_text)
print(value_predicted[0])

因为 value_predicted 的所有值都包含相同的值

如何从训练有素的随机森林模型中获得预测？

how to get prediction from trained random forest model?

python

machine-learning

random-forest

scikit-learn