混用不同版本的Spark-core和Spark-mllib会出现编译错误的原因是什么?
What is the reason for compilation errors if different version of Spark-core and Spark-mllib are mixed?
我正在从这里复制并粘贴确切的 Spark MLlib LDA 示例:http://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda
我正在尝试 Scala 示例代码,但在尝试保存和加载 LDA 模型时出现以下错误:
- 在最后一行的前一行:
value saveis not a member is not a member of org.apach.spark.mllib.clustering.DistributedLDAModel
- 最后一行:
not found: value DistributedLDAModel
这是代码,知道我正在使用 SBT 创建我的 Scala 项目框架并加载库,然后将其导入 Eclipse (Mars) 进行编辑,我正在使用 spark-core 1.5.0
和 spark-mllib 1.3.1
和 Scala version 2.11.7
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
import org.apache.spark.mllib.linalg.Vectors
object sample {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("sample_SBT").setMaster("local[2]")
val sc = new SparkContext(conf)
// Load and parse the data
val data = sc.textFile("data/mllib/sample_lda_data.txt")
val parsedData = data.map(s => Vectors.dense(s.trim.split(' ').map(_.toDouble)))
// Index documents with unique IDs
val corpus = parsedData.zipWithIndex.map(_.swap).cache()
// Cluster the documents into three topics using LDA
val ldaModel = new LDA().setK(3).run(corpus)
// Output topics. Each is a distribution over words (matching word count vectors)
println("Learned topics (as distributions over vocab of " + ldaModel.vocabSize + " words):")
val topics = ldaModel.topicsMatrix
for (topic <- Range(0, 3)) {
print("Topic " + topic + ":")
for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
println()
}
// Save and load model.
ldaModel.save(sc, "myLDAModel")
val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
}
}
首先,代码编译正常。我用于设置的东西:
./build.sbt
name := "SO_20150917"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.5.0",
"org.apache.spark" %% "spark-mllib" % "1.5.0"
)
./src/main/scala/somefun/
package somefun
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
import org.apache.spark.mllib.linalg.Vectors
object Example {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("sample_SBT").setMaster("local[2]")
val sc = new SparkContext(conf)
// Load and parse the data
val data = sc.textFile("data/mllib/sample_lda_data.txt")
val parsedData = data.map(s => Vectors.dense(s.trim.split(' ').map(_.toDouble)))
// Index documents with unique IDs
val corpus = parsedData.zipWithIndex.map(_.swap).cache()
// Cluster the documents into three topics using LDA
val ldaModel = new LDA().setK(3).run(corpus)
// Output topics. Each is a distribution over words (matching word count vectors)
println("Learned topics (as distributions over vocab of " + ldaModel.vocabSize + " words):")
val topics = ldaModel.topicsMatrix
for (topic <- Range(0, 3)) {
print("Topic " + topic + ":")
for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
println()
}
// Save and load model.
ldaModel.save(sc, "myLDAModel")
val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
}
}
通过 sbt run
执行(当然)因为 "data/mllib/sample_lda_data.txt" 丢失
[error] (run-main-0) org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/martin/IdeaProjects/SO_20150917/data/mllib/sample_lda_data.txt
@Rami:因此,请检查您的设置,因为从我的角度来看一切都很好。
关于@Rami的问题:
也许这有帮助:
val sparkVersion = "1.5.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion
)
我正在从这里复制并粘贴确切的 Spark MLlib LDA 示例:http://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda
我正在尝试 Scala 示例代码,但在尝试保存和加载 LDA 模型时出现以下错误:
- 在最后一行的前一行:
value saveis not a member is not a member of org.apach.spark.mllib.clustering.DistributedLDAModel
- 最后一行:
not found: value DistributedLDAModel
这是代码,知道我正在使用 SBT 创建我的 Scala 项目框架并加载库,然后将其导入 Eclipse (Mars) 进行编辑,我正在使用 spark-core 1.5.0
和 spark-mllib 1.3.1
和 Scala version 2.11.7
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
import org.apache.spark.mllib.linalg.Vectors
object sample {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("sample_SBT").setMaster("local[2]")
val sc = new SparkContext(conf)
// Load and parse the data
val data = sc.textFile("data/mllib/sample_lda_data.txt")
val parsedData = data.map(s => Vectors.dense(s.trim.split(' ').map(_.toDouble)))
// Index documents with unique IDs
val corpus = parsedData.zipWithIndex.map(_.swap).cache()
// Cluster the documents into three topics using LDA
val ldaModel = new LDA().setK(3).run(corpus)
// Output topics. Each is a distribution over words (matching word count vectors)
println("Learned topics (as distributions over vocab of " + ldaModel.vocabSize + " words):")
val topics = ldaModel.topicsMatrix
for (topic <- Range(0, 3)) {
print("Topic " + topic + ":")
for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
println()
}
// Save and load model.
ldaModel.save(sc, "myLDAModel")
val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
}
}
首先,代码编译正常。我用于设置的东西:
./build.sbt
name := "SO_20150917"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.5.0",
"org.apache.spark" %% "spark-mllib" % "1.5.0"
)
./src/main/scala/somefun/
package somefun
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
import org.apache.spark.mllib.linalg.Vectors
object Example {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("sample_SBT").setMaster("local[2]")
val sc = new SparkContext(conf)
// Load and parse the data
val data = sc.textFile("data/mllib/sample_lda_data.txt")
val parsedData = data.map(s => Vectors.dense(s.trim.split(' ').map(_.toDouble)))
// Index documents with unique IDs
val corpus = parsedData.zipWithIndex.map(_.swap).cache()
// Cluster the documents into three topics using LDA
val ldaModel = new LDA().setK(3).run(corpus)
// Output topics. Each is a distribution over words (matching word count vectors)
println("Learned topics (as distributions over vocab of " + ldaModel.vocabSize + " words):")
val topics = ldaModel.topicsMatrix
for (topic <- Range(0, 3)) {
print("Topic " + topic + ":")
for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
println()
}
// Save and load model.
ldaModel.save(sc, "myLDAModel")
val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
}
}
通过 sbt run
执行(当然)因为 "data/mllib/sample_lda_data.txt" 丢失
[error] (run-main-0) org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/martin/IdeaProjects/SO_20150917/data/mllib/sample_lda_data.txt
@Rami:因此,请检查您的设置,因为从我的角度来看一切都很好。
关于@Rami的问题:
也许这有帮助:
val sparkVersion = "1.5.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion
)