在 spark 中 运行 ml.PredictionModel 时出现类型不匹配错误
type mismatch error while running ml.PredictionModel in spark
训练所有模型后,我正在尝试重命名每个模型预测列以唯一标识 dataset.I 中的模型预测,我收到如下指定的类型不匹配错误:
import org.apache.spark.ml.PredictionModel
import org.apache.spark.sql.DataFrame
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
其输出如下:
models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_699528805899)
with 2 classes), (DT,()), (RF,RandomForestClassificationModel
(uid=rfc_403e93000cb6) with 10 trees), (GBM,GBTClassificationModel
(uid=gbtc_e778e2781d0b) with 20 trees))
def mlData(inputData: DataFrame, responseColumn: String, baseModels:
Seq[(String, PredictionModel[_, _])]): DataFrame= {
baseModels.map{ case(name, model) =>
model.transform(inputData)
.select("row_id", model.getPredictionCol )
.withColumnRenamed("prediction", s"${name}_prediction")
}.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
.join(inputData.select("row_id", responseColumn), Seq("row_id"),
"inner")
}
其输出如下:
mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])
org.apache.spark.sql.DataFrame
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
我收到类型不匹配错误,实际上不应该发生
<console>:102: error: type mismatch;
found : Seq[(String, Any)]
required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
仅根据输出,很明显 DT
元组中的第二个元素是 Unit
而不是 PredictionModel
- 这就是为什么整个对象是 Seq[(_, Any)]
并且你的代码失败了。
由于您没有提供上下文,因此不清楚您是如何到达那里的。
训练所有模型后,我正在尝试重命名每个模型预测列以唯一标识 dataset.I 中的模型预测,我收到如下指定的类型不匹配错误:
import org.apache.spark.ml.PredictionModel
import org.apache.spark.sql.DataFrame
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
其输出如下:
models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_699528805899)
with 2 classes), (DT,()), (RF,RandomForestClassificationModel
(uid=rfc_403e93000cb6) with 10 trees), (GBM,GBTClassificationModel
(uid=gbtc_e778e2781d0b) with 20 trees))
def mlData(inputData: DataFrame, responseColumn: String, baseModels:
Seq[(String, PredictionModel[_, _])]): DataFrame= {
baseModels.map{ case(name, model) =>
model.transform(inputData)
.select("row_id", model.getPredictionCol )
.withColumnRenamed("prediction", s"${name}_prediction")
}.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
.join(inputData.select("row_id", responseColumn), Seq("row_id"),
"inner")
}
其输出如下:
mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])
org.apache.spark.sql.DataFrame
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
我收到类型不匹配错误,实际上不应该发生
<console>:102: error: type mismatch;
found : Seq[(String, Any)]
required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
仅根据输出,很明显 DT
元组中的第二个元素是 Unit
而不是 PredictionModel
- 这就是为什么整个对象是 Seq[(_, Any)]
并且你的代码失败了。
由于您没有提供上下文,因此不清楚您是如何到达那里的。