如何将 SchemaRDD 映射到 PairRDD

How to map a SchemaRDD to a PairRDD

我想弄清楚如何将从 sql HiveContext 检索到的 SchemaRDD 对象映射到 PairRDDFunctions[String, Vector] 对象,其中字符串值是 schemaRDD 中的名称列,其余列(BytesIn、BytesOut 等)是向量。

假设您有以下列:"name"、"bytesIn"、"bytesOut"

val schemaRDD: SchemaRDD = ...
val pairs: RDD[(String, (Long, Long)] = 
  schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map { 
     case Row(name, bytesIn, bytesOut) => 
       name -> (bytesIn, bytesOut)
  }

// To import PairRDDFunctions via implicits
import SparkContext._

pairs.groupByKey ... etc