如何将 SchemaRDD 映射到 PairRDD
How to map a SchemaRDD to a PairRDD
我想弄清楚如何将从 sql HiveContext 检索到的 SchemaRDD 对象映射到 PairRDDFunctions[String, Vector] 对象,其中字符串值是 schemaRDD 中的名称列,其余列(BytesIn、BytesOut 等)是向量。
假设您有以下列:"name"、"bytesIn"、"bytesOut"
val schemaRDD: SchemaRDD = ...
val pairs: RDD[(String, (Long, Long)] =
schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map {
case Row(name, bytesIn, bytesOut) =>
name -> (bytesIn, bytesOut)
}
// To import PairRDDFunctions via implicits
import SparkContext._
pairs.groupByKey ... etc
我想弄清楚如何将从 sql HiveContext 检索到的 SchemaRDD 对象映射到 PairRDDFunctions[String, Vector] 对象,其中字符串值是 schemaRDD 中的名称列,其余列(BytesIn、BytesOut 等)是向量。
假设您有以下列:"name"、"bytesIn"、"bytesOut"
val schemaRDD: SchemaRDD = ...
val pairs: RDD[(String, (Long, Long)] =
schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map {
case Row(name, bytesIn, bytesOut) =>
name -> (bytesIn, bytesOut)
}
// To import PairRDDFunctions via implicits
import SparkContext._
pairs.groupByKey ... etc