Scala - 合并列表以映射
Scala - merge a list to map
我需要将一个列表从 RDD
合并到一个集合中,但我在 Scala 中无法做到:
var accounts = set("name" -> "", "id" -> 0, ....)
//Split the RDD into lines and split each line by `|` to get the values
stream.foreachRDD {_.map(_._2).flatMap(_.split("|")).foreach(f => /*merge here ?*/)}
如何将这些值与我的帐户集相关联?
例如,假设从 CSV 加载一个 RDD(我编造了这个数据)
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
...
RDD最多有300个columns/fields。
我的主要 objective 是将其转换为一些 json 但我需要通过将每个值加载到映射或 class.[=14= 来将每个值关联到一个键]
var election = Map ("firstname" -> "Donald",
"lastname" -> "Trump",
"country" -> "US",
"event" -> "Election",
"period" -> "March"
"var1" -> "Spring",
....
"varN" -> "...")
我不确定我是否理解正确,但这有帮助吗?
val data = List(
"Donald|Trump|US|Election|March",
"John|Smith|UK|Election|February"
)
val mapKeys = List("firstname", "lastname", "country", "event", "period")
val election = data.map { row =>
(mapKeys zip row.split("\|").toList).map {
case (key, value) => key -> value
}.toMap
}
因此,您将获得一个地图列表 - 对于您数据的每一行,您将获得一个包含 key/value 对的地图,如您所描述的。
对@slouc 的回答进行一些清理
stream.foreachRDD {_.map(_._2).map(l => (mapKeys zip l.split("\|")).toMap).saveToEs(conf)}
我需要将一个列表从 RDD
合并到一个集合中,但我在 Scala 中无法做到:
var accounts = set("name" -> "", "id" -> 0, ....)
//Split the RDD into lines and split each line by `|` to get the values
stream.foreachRDD {_.map(_._2).flatMap(_.split("|")).foreach(f => /*merge here ?*/)}
如何将这些值与我的帐户集相关联?
例如,假设从 CSV 加载一个 RDD(我编造了这个数据)
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
...
RDD最多有300个columns/fields。
我的主要 objective 是将其转换为一些 json 但我需要通过将每个值加载到映射或 class.[=14= 来将每个值关联到一个键]
var election = Map ("firstname" -> "Donald",
"lastname" -> "Trump",
"country" -> "US",
"event" -> "Election",
"period" -> "March"
"var1" -> "Spring",
....
"varN" -> "...")
我不确定我是否理解正确,但这有帮助吗?
val data = List(
"Donald|Trump|US|Election|March",
"John|Smith|UK|Election|February"
)
val mapKeys = List("firstname", "lastname", "country", "event", "period")
val election = data.map { row =>
(mapKeys zip row.split("\|").toList).map {
case (key, value) => key -> value
}.toMap
}
因此,您将获得一个地图列表 - 对于您数据的每一行,您将获得一个包含 key/value 对的地图,如您所描述的。
对@slouc 的回答进行一些清理
stream.foreachRDD {_.map(_._2).map(l => (mapKeys zip l.split("\|")).toMap).saveToEs(conf)}