在 spark 中 运行 ml.PredictionModel 时出现类型不匹配错误

type mismatch error while running ml.PredictionModel in spark

训练所有模型后,我正在尝试重命名每个模型预测列以唯一标识 dataset.I 中的模型预测,我收到如下指定的类型不匹配错误:

import org.apache.spark.ml.PredictionModel

import org.apache.spark.sql.DataFrame

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))

其输出如下:

models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_699528805899) with 2 classes), (DT,()), (RF,RandomForestClassificationModel (uid=rfc_403e93000cb6) with 10 trees), (GBM,GBTClassificationModel (uid=gbtc_e778e2781d0b) with 20 trees))

def mlData(inputData: DataFrame, responseColumn: String, baseModels:

  Seq[(String, PredictionModel[_, _])]): DataFrame= {

  baseModels.map{ case(name, model) =>

  model.transform(inputData)

  .select("row_id", model.getPredictionCol )

  .withColumnRenamed("prediction", s"${name}_prediction")

  }.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))

  .join(inputData.select("row_id", responseColumn), Seq("row_id"),

  "inner")

}

其输出如下:

mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]) org.apache.spark.sql.DataFrame

val mlTrainData= mlData(transferData, "value", models).drop("row_id")

我收到类型不匹配错误,实际上不应该发生

<console>:102: error: type mismatch; found : Seq[(String, Any)] required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])] val mlTrainData= mlData(transferData, "value", models).drop("row_id")

仅根据输出,很明显 DT 元组中的第二个元素是 Unit 而不是 PredictionModel - 这就是为什么整个对象是 Seq[(_, Any)] 并且你的代码失败了。

由于您没有提供上下文,因此不清楚您是如何到达那里的。