org.apache.spark.ml.feature.IDF 错误

org.apache.spark.ml.feature.IDF error

http://spark.apache.org/docs/latest/ml-features.html中所述

import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}

Spark 显示

scala> import org.apache.spark.ml.feature.IDF
<console>:13: error: object IDF is not a member of package org.apache.spark.ml.feature
       import org.apache.spark.ml.feature.IDF

然而,import org.apache.spark.mllib.feature.IDF 工作正常。

任何错误原因。我是 spark 和 scala 的新手。

这在 spark-1.4.1 中不可重现。您使用的是哪个版本?

scala> import org.apache.spark.ml.feature.IDF
import org.apache.spark.ml.feature.IDF

scala> import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}

EDIT1

Spark 1.2.x 仅包含:org.apache.spark.mllib.feature.IDF

尝试在此处搜索 IDF:https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.mllib.feature.IDF

错误的原因是 feature.IDF class 被引入到 spark-ml with spark 1.4.因此 object IDF is not a member of package org.apache.spark.ml.feature 错误。

您可以尝试使用 spark-mllib IDF class。