在火花中将 Seq[(String, Any)] 转换为 Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
convert Seq[(String, Any)] to Seq[(String, org.apache.spark.ml.PredictionModel[_, _])] in spark
我已经将我的数据集训练成不同的模型,例如 nbModel、dtModel、rfModel、GbmModel。所有这些都是机器学习模型
现在当我将它保存到变量中时
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
我正在获取一个 Seq[(String, Any)]
models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes), (DT,()), (RF,RandomForestClassificationModel (uid=rfc_3f42daf4ea14) with 15 trees), (GBM,GBTClassificationModel (uid=gbtc_534a972357fa) with 20 trees))
如果单个模型,例如 nbModel
val models = ("NB", nbModel)
输出:models: (String, org.apache.spark.ml.classification.NaiveBayesModel) = (NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes)
当我尝试合并这些模型中的几列时,出现类型不匹配错误
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
<console>:75: error: type mismatch;
found : Seq[(String, Any)]
required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
我的 MlDATA 也是
def mlData(inputData: DataFrame, responseColumn: String, baseModels:
| Seq[(String, PredictionModel[_, _])]): DataFrame= {
| baseModels.map{ case(name, model) =>
| model.transform(inputData)
| .select("row_id", model.getPredictionCol )
| .withColumnRenamed("prediction", s"${name}_prediction")
| }.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
| .join(inputData.select("row_id", responseColumn), Seq("row_id"),
| "inner")
| }
输出:mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])org.apache.spark.sql.DataFrame
能否请您替换代码
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
来自
val models = Seq(("NB", nbModel), ("DT", null : org.apache.spark.mllib.tree.model.DecisionTreeModel), ("RF", rfModel), ("GBM",gbmModel))
我想说的是,你的 dtModel 被分配了 () 类型 Unit 。于是整个数据集的类型就变成了DecisionTreeModel和Unit的超类,即Any。您需要确保 dtModel 是 DecisionTreeModel 类型,如果它为 null,如果您已经处理了 null 情况,那也没关系。空的 DecisionTreeModel 也可以。
我已经将我的数据集训练成不同的模型,例如 nbModel、dtModel、rfModel、GbmModel。所有这些都是机器学习模型
现在当我将它保存到变量中时
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
我正在获取一个 Seq[(String, Any)]
models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes), (DT,()), (RF,RandomForestClassificationModel (uid=rfc_3f42daf4ea14) with 15 trees), (GBM,GBTClassificationModel (uid=gbtc_534a972357fa) with 20 trees))
如果单个模型,例如 nbModel
val models = ("NB", nbModel)
输出:models: (String, org.apache.spark.ml.classification.NaiveBayesModel) = (NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes)
当我尝试合并这些模型中的几列时,出现类型不匹配错误
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
<console>:75: error: type mismatch;
found : Seq[(String, Any)]
required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
我的 MlDATA 也是
def mlData(inputData: DataFrame, responseColumn: String, baseModels:
| Seq[(String, PredictionModel[_, _])]): DataFrame= {
| baseModels.map{ case(name, model) =>
| model.transform(inputData)
| .select("row_id", model.getPredictionCol )
| .withColumnRenamed("prediction", s"${name}_prediction")
| }.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
| .join(inputData.select("row_id", responseColumn), Seq("row_id"),
| "inner")
| }
输出:mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])org.apache.spark.sql.DataFrame
能否请您替换代码
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
来自
val models = Seq(("NB", nbModel), ("DT", null : org.apache.spark.mllib.tree.model.DecisionTreeModel), ("RF", rfModel), ("GBM",gbmModel))
我想说的是,你的 dtModel 被分配了 () 类型 Unit 。于是整个数据集的类型就变成了DecisionTreeModel和Unit的超类,即Any。您需要确保 dtModel 是 DecisionTreeModel 类型,如果它为 null,如果您已经处理了 null 情况,那也没关系。空的 DecisionTreeModel 也可以。