如何训练 ML 模型以获得不止一种可能的输出?

How to train ML model to get more than one possible output?

我一直在尝试搜索这个问题,但找不到答案。在python中使用分类算法是否可以有多个可能的输出?

我一直在研究农作物数据库,该模型将使用 pH 值、土壤类型、温度和湿度等输入来预测哪种作物适合土地,在这种情况下,不止一种作物具有相似的特征pH值和土壤类型。所以,我希望我的模型 return 所有可能的结果。

所以,如果可能的话,你能告诉我吗?如果是,我如何获得输出?

  Temp Humidity Moisture    Stype   suitable-crop      Ph
0   26    52        38     Sandy      Maize         Slightly acidic
1   32    62        34      Red     Ground Nuts     Neutral
2   29    52        45     Loamy     Sugarcane      Slightly alkaline
3   34    65        62     Black      Cotton        Moderately acidic
4   26    50        35     Sandy      Barley        slightly acidic

上面给出的是示例数据。这里的目标是'suitable-crop'。如您所见,玉米和大麦对温度、湿度、土壤类型 (Stype) 和 ph 的要求几乎相同。我使用随机森林算法来预测输出。

Enter the soil type:sandy
Enter the pH type:slightly acidic
Enter temperature:27
Eneter Humidity:50
Enter moisture:37
result = model.predict([[Stype1, pH1, Temperature, Humidity, Moisture]])
print(result)

['Sugarcane'] 

这就是结果。我希望我的模型 return 所有可能的合适输出,如大麦和玉米。

是的,这是可能的。 scikit-learn user guide 有很好的讨论(在决策树的上下文中,但以下引用更普遍):

A multi-output problem is a supervised learning problem with several outputs to predict, that is when Y is a 2d array of size [n_samples, n_outputs].

When there is no correlation between the outputs, a very simple way to solve this kind of problem is to build n independent models, i.e. one for each output, and then to use those models to independently predict each one of the n outputs. However, because it is likely that the output values related to the same input are themselves correlated, an often better way is to build a single model capable of predicting simultaneously all n outputs. First, it requires lower training time since only a single estimator is built. Second, the generalization accuracy of the resulting estimator may often be increased.

第二种方法是否可行可能取决于您使用的分类算法。那些原生支持 multi-output 分类的算法通常会在 scikit-learn 中实现。同样,用户指南是一个很好的起点。