如何训练 ML 模型以获得不止一种可能的输出？

Question

我一直在尝试搜索这个问题，但找不到答案。在python中使用分类算法是否可以有多个可能的输出？

我一直在研究农作物数据库，该模型将使用 pH 值、土壤类型、温度和湿度等输入来预测哪种作物适合土地，在这种情况下，不止一种作物具有相似的特征pH值和土壤类型。所以，我希望我的模型 return 所有可能的结果。

所以，如果可能的话，你能告诉我吗？如果是，我如何获得输出？

  Temp Humidity Moisture    Stype   suitable-crop      Ph
0   26    52        38     Sandy      Maize         Slightly acidic
1   32    62        34      Red     Ground Nuts     Neutral
2   29    52        45     Loamy     Sugarcane      Slightly alkaline
3   34    65        62     Black      Cotton        Moderately acidic
4   26    50        35     Sandy      Barley        slightly acidic

上面给出的是示例数据。这里的目标是'suitable-crop'。如您所见，玉米和大麦对温度、湿度、土壤类型 (Stype) 和 ph 的要求几乎相同。我使用随机森林算法来预测输出。

Enter the soil type:sandy
Enter the pH type:slightly acidic
Enter temperature:27
Eneter Humidity:50
Enter moisture:37

result = model.predict([[Stype1, pH1, Temperature, Humidity, Moisture]])
print(result)

['Sugarcane']

这就是结果。我希望我的模型 return 所有可能的合适输出，如大麦和玉米。

Answer 1

是的，这是可能的。 scikit-learn user guide 有很好的讨论（在决策树的上下文中，但以下引用更普遍）：

A multi-output problem is a supervised learning problem with several outputs to predict, that is when Y is a 2d array of size [n_samples, n_outputs].

When there is no correlation between the outputs, a very simple way to solve this kind of problem is to build n independent models, i.e. one for each output, and then to use those models to independently predict each one of the n outputs. However, because it is likely that the output values related to the same input are themselves correlated, an often better way is to build a single model capable of predicting simultaneously all n outputs. First, it requires lower training time since only a single estimator is built. Second, the generalization accuracy of the resulting estimator may often be increased.

第二种方法是否可行可能取决于您使用的分类算法。那些原生支持 multi-output 分类的算法通常会在 scikit-learn 中实现。同样，用户指南是一个很好的起点。

如何训练 ML 模型以获得不止一种可能的输出？

How to train ML model to get more than one possible output?

python

classification

machine-learning

jupyter-notebook