使用从单个列中的唯一值派生的新列创建数据框

Question

我有一个格式如下的数据框：

我想将其转换为：

换句话说，为列 fieldname 中的每个不同值创建一列，并用 fieldvalue.[=12 中的相应值填充它=]

我将如何在 pyspark 中执行此操作？

Answer 1

这是一个行到列的问题应该使用pivot。

df = df.groupBy('id').pivot('fieldname').agg(F.first('fieldvalue'))

Create dataframe with new columns derived from unique values in a single column