字段 "features" 不存在。 SparkML
Field "features" does not exist. SparkML
我正在尝试使用 Zeppelin 在 Spark ML 中构建模型。
我是这个领域的新手,需要一些帮助。我想我需要为列设置正确的数据类型并将第一列设置为标签。任何帮助将不胜感激,谢谢
val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true
val df = training.toDF
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val lrModel = lr.fit(df)
// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")
我正在使用的 csv 文件的片段是:
IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,
正如您所提到的,您缺少 features
列。它是一个包含所有预测变量的向量。您必须使用 VectorAssembler
.
创建它
IsAlert
是标签,所有其他变量 (p1,p2,...) 是预测变量,您可以创建 features
列(实际上您可以随意命名它而不是features
) 作者:
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
//creating features column
val assembler = new VectorAssembler()
.setInputCols(Array("P1","P2","P3","P4","P5","P6","P7","P8","E1","E2"))
.setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFeaturesCol("features") // setting features column
.setLabelCol("IsAlert") // setting label column
//creating pipeline
val pipeline = new Pipeline().setStages(Array(assembler,lr))
//fitting the model
val lrModel = pipeline.fit(df)
参考:https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.
我正在尝试使用 Zeppelin 在 Spark ML 中构建模型。 我是这个领域的新手,需要一些帮助。我想我需要为列设置正确的数据类型并将第一列设置为标签。任何帮助将不胜感激,谢谢
val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true
val df = training.toDF
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val lrModel = lr.fit(df)
// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")
我正在使用的 csv 文件的片段是:
IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,
正如您所提到的,您缺少 features
列。它是一个包含所有预测变量的向量。您必须使用 VectorAssembler
.
IsAlert
是标签,所有其他变量 (p1,p2,...) 是预测变量,您可以创建 features
列(实际上您可以随意命名它而不是features
) 作者:
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
//creating features column
val assembler = new VectorAssembler()
.setInputCols(Array("P1","P2","P3","P4","P5","P6","P7","P8","E1","E2"))
.setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFeaturesCol("features") // setting features column
.setLabelCol("IsAlert") // setting label column
//creating pipeline
val pipeline = new Pipeline().setStages(Array(assembler,lr))
//fitting the model
val lrModel = pipeline.fit(df)
参考:https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.