Spark - 转置 DataFrame 列
Spark - Transpose DataFrame Columns
我有一个 table 如下所示:
ID Type 5m 10m 15m
1 A 3 9 13
1 B 7 8 22
1 C 5 11 13
2 A 1 3 20
2 B 16 17 30
...
如果可能,我想按以下格式创建新列:
ID A_5m A_10m A_15m B_5m B_10m B_15m C_5m C_10m C_15m
我目前正在引用以下 SO:
创建新列很有用:A B
,但在创建类型和距离时我迷路了。
有什么想法吗?
Its good for creating the new columns: A B, but I am lost when it comes to creating the types plus the distance.
没有什么不同。您可以在单个数据透视表中应用多个聚合:
val df = Seq(
(1, "A", 3 , 9 , 13), (1, "B", 7 , 8 , 22),(1, "C", 5 , 11, 13),
(2, "A", 1 , 3 , 20), (2, "B", 16, 17, 30)
).toDF("id", "type", "5m", "10m", "15m")
df.groupBy("id").pivot("type").agg(
first("5m") as "5m", first("10m") as "10m", first("15m") as "15m"
).show
+---+----+-----+-----+----+-----+-----+----+-----+-----+
| id|A_5m|A_10m|A_15m|B_5m|B_10m|B_15m|C_5m|C_10m|C_15m|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
| 1| 3| 9| 13| 7| 8| 22| 5| 11| 13|
| 2| 1| 3| 20| 16| 17| 30|null| null| null|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
Spark 将根据基础名称和级别自动生成名称。
我有一个 table 如下所示:
ID Type 5m 10m 15m
1 A 3 9 13
1 B 7 8 22
1 C 5 11 13
2 A 1 3 20
2 B 16 17 30
...
如果可能,我想按以下格式创建新列:
ID A_5m A_10m A_15m B_5m B_10m B_15m C_5m C_10m C_15m
我目前正在引用以下 SO:
创建新列很有用:A B
,但在创建类型和距离时我迷路了。
有什么想法吗?
Its good for creating the new columns: A B, but I am lost when it comes to creating the types plus the distance.
没有什么不同。您可以在单个数据透视表中应用多个聚合:
val df = Seq(
(1, "A", 3 , 9 , 13), (1, "B", 7 , 8 , 22),(1, "C", 5 , 11, 13),
(2, "A", 1 , 3 , 20), (2, "B", 16, 17, 30)
).toDF("id", "type", "5m", "10m", "15m")
df.groupBy("id").pivot("type").agg(
first("5m") as "5m", first("10m") as "10m", first("15m") as "15m"
).show
+---+----+-----+-----+----+-----+-----+----+-----+-----+
| id|A_5m|A_10m|A_15m|B_5m|B_10m|B_15m|C_5m|C_10m|C_15m|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
| 1| 3| 9| 13| 7| 8| 22| 5| 11| 13|
| 2| 1| 3| 20| 16| 17| 30|null| null| null|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
Spark 将根据基础名称和级别自动生成名称。