在 User ID Spark 上旋转 Dataframe 列

Pivoting a Dataframe column transforming on a User ID Spark

我有一个看起来像

的数据框
+------+------------+------------------+
|UserID|Attribute   | Value            |
+------+------------+------------------+
|123   |  City      | San Francisco    |
|123   |  Lang      | English          |
|111   |  Lang      | French           |
|111   |  Age       | 23               |
|111   |  Gender    | Female           |
+------+------------+------------------+

所以我有一些不同的属性,对于某些用户来说可以为空(有限的属性说最多 20 个)

我想将此 DF 转换为

+-----+--------------+---------+-----+--------+
|User |City          | Lang    | Age | Gender |
+-----+--------------+---------+-----+--------+
|123  |San Francisco | English | NULL| NULL   |
|111  |          NULL| French  | 23  | Female |
+-----+--------------+---------+-----+--------+

我对 Spark 和 Scala 很陌生。

您可以使用 pivot 获得所需的输出:

import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._

df.groupBy("UserID")
  .pivot("Attribute")
  .agg(first("Value")).show()    

这将为您提供所需的输出:

+------+----+-------------+------+-------+
|UserID| Age|         City|Gender|   Lang|
+------+----+-------------+------+-------+
|   111|  23|         null|Female| French|
|   123|null|San Francisco|  null|English|
+------+----+-------------+------+-------+