对 Spark Dataframe 行的操作
Manipulation on Spark Dataframe row
我是 spark、scala 等方面的新手
下面是我的代码
val eventdf = sqlContext.sql("SELECT sensor, data.actor FROM eventTable")
eventdf.map {
case (r) => (r.getString(0) + count, r.getString(1), count)
}.saveToCassandra("caliper", "event", SomeColumns("sensor", "sendtime", "count"))
在这里,我想用 r.getString(1)
执行一些操作,然后传递给 cassandra 进行保存。
如果您不能将转换直接应用到数据框列,我可以提出以下建议:
import org.apache.spark.sql.Row
import sqlContext.implicits._
val newRDD = eventdf.map {
case Row(val1: String, val2: String) =>
// process val2 here and save the result to val2_processed
(val1 + count, val2_processed, count)
}
val newDF = newRDD.toDF("col1", "col2", "col3") // If you need to convert it back to DF
newDF.saveToCassandra(...)
我是 spark、scala 等方面的新手 下面是我的代码
val eventdf = sqlContext.sql("SELECT sensor, data.actor FROM eventTable")
eventdf.map {
case (r) => (r.getString(0) + count, r.getString(1), count)
}.saveToCassandra("caliper", "event", SomeColumns("sensor", "sendtime", "count"))
在这里,我想用 r.getString(1)
执行一些操作,然后传递给 cassandra 进行保存。
如果您不能将转换直接应用到数据框列,我可以提出以下建议:
import org.apache.spark.sql.Row
import sqlContext.implicits._
val newRDD = eventdf.map {
case Row(val1: String, val2: String) =>
// process val2 here and save the result to val2_processed
(val1 + count, val2_processed, count)
}
val newDF = newRDD.toDF("col1", "col2", "col3") // If you need to convert it back to DF
newDF.saveToCassandra(...)