对 Spark Dataframe 行的操作

Manipulation on Spark Dataframe row

我是 spark、scala 等方面的新手 下面是我的代码

val eventdf = sqlContext.sql("SELECT sensor, data.actor FROM eventTable")

eventdf.map {
  case (r) => (r.getString(0) + count, r.getString(1), count)
}.saveToCassandra("caliper", "event", SomeColumns("sensor", "sendtime", "count"))

在这里,我想用 r.getString(1) 执行一些操作,然后传递给 cassandra 进行保存。

如果您不能将转换直接应用到数据框列,我可以提出以下建议:

import org.apache.spark.sql.Row
import sqlContext.implicits._

val newRDD = eventdf.map {
  case Row(val1: String, val2: String) => 
    // process val2 here and save the result to val2_processed
    (val1 + count, val2_processed, count) 
}

val newDF = newRDD.toDF("col1", "col2", "col3") // If you need to convert it back to DF

newDF.saveToCassandra(...)