将数据帧转换为哈希图,其中 Key 是 int,Value 是 Scala 中的列表
Converting a dataframe into a hashmap where Key is int and Value is a list in Scala
我有一个如下所示的数据框:
key
words
1
['a','test']
2
['hi', 'there]
我想创建以下哈希图:
地图(1 -> ['a', 'test'], 2 -> ['hi', 'there'])
但是我不知道该怎么做,谁能帮帮我?
谢谢!
一定有几十种方法可以做到这一点。一个是:
df.collect().map { case row => (row.getAs[Int](0) -> row.getAs[mutable.WrappedArray[String]](1))}.toMap
这与 中的解决方案非常相似。下面应该给你你想要的输出。它将所有地图收集为一个集合,然后使用 UDF 创建单个地图。这伴随着关于 UDF 函数潜在性能不佳的常见警告。
import org.apache.spark.sql.functions.{col, map, collect_list, lit}
import org.apache.spark.sql.functions.udf
val joinMap = udf { values: Seq[Map[Int, Seq[String]]] =>
values.flatten.toMap
}
val df = Seq((1, Seq("a", "test")), (2, Seq("hi", "there"))).toDF("key", "words")
val rDf = df
.select(lit(1) as "id", map(col("key"), col("words")) as "kwMap")
.groupBy("id")
.agg(collect_list(col("kwMap")) as "kwMaps")
.select(joinMap(col("kwMaps")) as "map")
rDf.show
我有一个如下所示的数据框:
key | words |
---|---|
1 | ['a','test'] |
2 | ['hi', 'there] |
我想创建以下哈希图:
地图(1 -> ['a', 'test'], 2 -> ['hi', 'there'])
但是我不知道该怎么做,谁能帮帮我?
谢谢!
一定有几十种方法可以做到这一点。一个是:
df.collect().map { case row => (row.getAs[Int](0) -> row.getAs[mutable.WrappedArray[String]](1))}.toMap
这与
import org.apache.spark.sql.functions.{col, map, collect_list, lit}
import org.apache.spark.sql.functions.udf
val joinMap = udf { values: Seq[Map[Int, Seq[String]]] =>
values.flatten.toMap
}
val df = Seq((1, Seq("a", "test")), (2, Seq("hi", "there"))).toDF("key", "words")
val rDf = df
.select(lit(1) as "id", map(col("key"), col("words")) as "kwMap")
.groupBy("id")
.agg(collect_list(col("kwMap")) as "kwMaps")
.select(joinMap(col("kwMaps")) as "map")
rDf.show