Spark - 转置 DataFrame 列

Question

我有一个 table 如下所示：

ID     Type      5m      10m     15m
1      A         3       9       13
1      B         7       8       22
1      C         5       11      13
2      A         1       3       20
2      B         16      17      30
...

如果可能，我想按以下格式创建新列：

ID     A_5m     A_10m     A_15m    B_5m    B_10m     B_15m     C_5m     C_10m     C_15m

我目前正在引用以下 SO：

创建新列很有用：A B，但在创建类型和距离时我迷路了。

有什么想法吗？

Answer 1

Its good for creating the new columns: A B, but I am lost when it comes to creating the types plus the distance.

没有什么不同。您可以在单个数据透视表中应用多个聚合：

val df = Seq(
   (1, "A", 3 , 9 , 13), (1, "B", 7 , 8 , 22),(1, "C", 5 , 11, 13),
   (2, "A", 1 , 3 , 20), (2, "B", 16, 17, 30)
).toDF("id", "type", "5m", "10m", "15m")

df.groupBy("id").pivot("type").agg(
  first("5m") as "5m", first("10m") as "10m", first("15m") as "15m"
).show
+---+----+-----+-----+----+-----+-----+----+-----+-----+ 
| id|A_5m|A_10m|A_15m|B_5m|B_10m|B_15m|C_5m|C_10m|C_15m|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
|  1|   3|    9|   13|   7|    8|   22|   5|   11|   13|
|  2|   1|    3|   20|  16|   17|   30|null| null| null|
+---+----+-----+-----+----+-----+-----+----+-----+-----+

Spark 将根据基础名称和级别自动生成名称。

Spark - 转置 DataFrame 列

Spark - Transpose DataFrame Columns

transpose

scala

dataframe

apache-spark