如何解释火花逻辑回归预测中的概率列?
How to interpret probability column in spark logistic regression prediction?
我正在通过 spark.ml.classification.LogisticRegressionModel.predict
获取预测结果。许多行的 prediction
列为 1.0
,probability
列为 .04
。 model.getThreshold
是 0.5
所以我假设模型将超过 0.5
概率阈值的所有东西分类为 1.0
.
我应该如何解释 1.0 prediction
和 probability
0.04 的结果?
执行 LogisticRegression
的概率列应包含一个列表,其长度与 class 的数量相同,其中每个索引给出 class 的相应概率。我用两个 class 做了一个小例子来说明:
case class Person(label: Double, age: Double, height: Double, weight: Double)
val df = List(Person(0.0, 15, 175, 67),
Person(0.0, 30, 190, 100),
Person(1.0, 40, 155, 57),
Person(1.0, 50, 160, 56),
Person(0.0, 15, 170, 56),
Person(1.0, 80, 180, 88)).toDF()
val assembler = new VectorAssembler().setInputCols(Array("age", "height", "weight"))
.setOutputCol("features")
.select("label", "features")
val df2 = assembler.transform(df)
df2.show
+-----+------------------+
|label| features|
+-----+------------------+
| 0.0| [15.0,175.0,67.0]|
| 0.0|[30.0,190.0,100.0]|
| 1.0| [40.0,155.0,57.0]|
| 1.0| [50.0,160.0,56.0]|
| 0.0| [15.0,170.0,56.0]|
| 1.0| [80.0,180.0,88.0]|
+-----+------------------+
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
val Array(testing, training) = df2.randomSplit(Array(0.7, 0.3))
val model = lr.fit(training)
val predictions = model.transform(testing)
predictions.select("probability", "prediction").show(false)
+----------------------------------------+----------+
|probability |prediction|
+----------------------------------------+----------+
|[0.7487950501224138,0.2512049498775863] |0.0 |
|[0.6458452667523259,0.35415473324767416]|0.0 |
|[0.3888393314864866,0.6111606685135134] |1.0 |
+----------------------------------------+----------+
这是概率以及算法做出的最终预测。最后概率最大的class就是预测的
我正在通过 spark.ml.classification.LogisticRegressionModel.predict
获取预测结果。许多行的 prediction
列为 1.0
,probability
列为 .04
。 model.getThreshold
是 0.5
所以我假设模型将超过 0.5
概率阈值的所有东西分类为 1.0
.
我应该如何解释 1.0 prediction
和 probability
0.04 的结果?
执行 LogisticRegression
的概率列应包含一个列表,其长度与 class 的数量相同,其中每个索引给出 class 的相应概率。我用两个 class 做了一个小例子来说明:
case class Person(label: Double, age: Double, height: Double, weight: Double)
val df = List(Person(0.0, 15, 175, 67),
Person(0.0, 30, 190, 100),
Person(1.0, 40, 155, 57),
Person(1.0, 50, 160, 56),
Person(0.0, 15, 170, 56),
Person(1.0, 80, 180, 88)).toDF()
val assembler = new VectorAssembler().setInputCols(Array("age", "height", "weight"))
.setOutputCol("features")
.select("label", "features")
val df2 = assembler.transform(df)
df2.show
+-----+------------------+
|label| features|
+-----+------------------+
| 0.0| [15.0,175.0,67.0]|
| 0.0|[30.0,190.0,100.0]|
| 1.0| [40.0,155.0,57.0]|
| 1.0| [50.0,160.0,56.0]|
| 0.0| [15.0,170.0,56.0]|
| 1.0| [80.0,180.0,88.0]|
+-----+------------------+
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
val Array(testing, training) = df2.randomSplit(Array(0.7, 0.3))
val model = lr.fit(training)
val predictions = model.transform(testing)
predictions.select("probability", "prediction").show(false)
+----------------------------------------+----------+
|probability |prediction|
+----------------------------------------+----------+
|[0.7487950501224138,0.2512049498775863] |0.0 |
|[0.6458452667523259,0.35415473324767416]|0.0 |
|[0.3888393314864866,0.6111606685135134] |1.0 |
+----------------------------------------+----------+
这是概率以及算法做出的最终预测。最后概率最大的class就是预测的