使用scala从spark中的文本文件创建sqlContext

Question

我正在尝试使用 scala 创建 SQLContext 以下是 m 段代码。

object SqltextContextSparkScala {
  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir", "C:\hadoop-2.6.0")
    val conf = new SparkConf().setAppName("SampleSparkScalaApp").setMaster("local[2]").set("spark.executor.memory", "1g")

    val sc = new SparkContext(conf);
    val sqlContext = new SQLContext(sc);

    val readfile = sc.textFile("C:\tmp\people.txt")

    import sqlContext.implicits._

    val person = readfile.map(_.split(",")).map(p=> new Person(p(0), p(1), p(2)))
      sqlContext.to

  }

}

我在 class 上创建了人：

class Person(id:String,name:String,age:String){

}

如何在此处创建数据框：

val people = readfile.map(_.split(",")).map(p=> new Person(p(0), p(1), p(2)))

Answer 1

之前：

val people=

添加声明：

import textContext.implicits._

之后：

val people

就这样

val peopleDF = people.toDF()

大功告成。

Answer 2

找到解决方案..问题在于定义 calss Person。

之前是这样的：

class Person(id:String,name:String,age:String){

}

发现 case 在如下声明之前需要

case class Person(id:String,name:String,age:String){

}

但不确定这里的 case 有什么用。

使用scala从spark中的文本文件创建sqlContext

Create sqlContext from a text file in spark using scala

scala

apache-spark

apache-spark-sql