使用 Scala 将 RDD 映射到 Spark 中的案例(模式)
Mapping RDD to case(Schema) in Spark with Scala
我是 scala 和 spark 的新手。我有一个小问题。我有一个具有以下架构的 RDD。
RDD[((String, String), (Int, Timestamp, String, Int))]
我必须映射这个 RDD 以像这样转换它
RDD[(Int, String, String, String, Timestamp, Int)]
我为此编写了以下代码
map { case ((pid, name), (id, date, code, level)) => (id, name, code, pid, date, level) }
这项工作很好。现在我有另一个 RDD
RDD[((String, String), List[(Int, Timestamp, String, Int)])]
我想像上面这样改造
RDD[(Int, String, String, String, Timestamp, Int)]
我该怎么做我已经尝试过此代码但它不起作用
map {
case ((pid, name), List(id, date, code, level)) => (id, name, code, pid, date, level)
}
如何实现?
这是您要找的东西吗?
val input: RDD[((String, String), List[(Int, Timestamp, String, Int)])] = ...
val output: RDD[(Int, String, String, String, Timestamp, Int)] = input.flatMap { case ((pid, name), list) =>
list.map { case (id, date, code, level) =>
(id, name, code, pid, date, level)
}
}
或用于理解:
val output: RDD[(Int, String, String, String, Timestamp, Int)] = for {
((pid, name), list) <- input
(id, date, code, level) <- list
} yield (id, name, code, pid, date, level)
尝试
map {
case ((id, name), list) => (id, name, list.flatten)
}
我是 scala 和 spark 的新手。我有一个小问题。我有一个具有以下架构的 RDD。
RDD[((String, String), (Int, Timestamp, String, Int))]
我必须映射这个 RDD 以像这样转换它
RDD[(Int, String, String, String, Timestamp, Int)]
我为此编写了以下代码
map { case ((pid, name), (id, date, code, level)) => (id, name, code, pid, date, level) }
这项工作很好。现在我有另一个 RDD
RDD[((String, String), List[(Int, Timestamp, String, Int)])]
我想像上面这样改造
RDD[(Int, String, String, String, Timestamp, Int)]
我该怎么做我已经尝试过此代码但它不起作用
map {
case ((pid, name), List(id, date, code, level)) => (id, name, code, pid, date, level)
}
如何实现?
这是您要找的东西吗?
val input: RDD[((String, String), List[(Int, Timestamp, String, Int)])] = ...
val output: RDD[(Int, String, String, String, Timestamp, Int)] = input.flatMap { case ((pid, name), list) =>
list.map { case (id, date, code, level) =>
(id, name, code, pid, date, level)
}
}
或用于理解:
val output: RDD[(Int, String, String, String, Timestamp, Int)] = for {
((pid, name), list) <- input
(id, date, code, level) <- list
} yield (id, name, code, pid, date, level)
尝试
map {
case ((id, name), list) => (id, name, list.flatten)
}