将列组合成键值对列表(无 UDF)

Combine columns into list of key, value pairs (no UDF)

我想创建一个新列,它是其他一些列的 JSON 表示。列表中的键值对。

来源:

origin destination count
toronto ottawa 5
montreal vancouver 10

我想要的:

origin destination count json
toronto ottawa 5 [{"origin":"toronto"},{"destination","ottawa"}, {"count": "5"}]
montreal vancouver 10 [{"origin":"montreal"},{"destination","vancouver"}, {"count": "10"}]

(一切都可以是字符串,没关系)。

我试过类似的东西:

df.withColumn('json', to_json(struct(col('origin'), col('destination'), col('count'))))

但是它在一个对象中创建了包含所有 key:value 对的列:

{"origin":"United States","destination":"Romania"}

如果没有 UDF,这可能吗?

解决这个问题的方法:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'json', 
    F.array(
        F.to_json(F.struct('origin')),
        F.to_json(F.struct('destination')),
        F.to_json(F.struct('count'))
    ).cast('string')
)

df2.show(truncate=False)
+--------+-----------+-----+--------------------------------------------------------------------+
|origin  |destination|count|json                                                                |
+--------+-----------+-----+--------------------------------------------------------------------+
|toronto |ottawa     |5    |[{"origin":"toronto"}, {"destination":"ottawa"}, {"count":"5"}]     |
|montreal|vancouver  |10   |[{"origin":"montreal"}, {"destination":"vancouver"}, {"count":"10"}]|
+--------+-----------+-----+--------------------------------------------------------------------+

另一种方法是在调用之前创建映射列数组 to_json:

from pyspark.sql import functions as F

df1 = df.withColumn(
    'json',
    F.to_json(F.array(*[F.create_map(F.lit(c), F.col(c)) for c in df.columns]))
)

df1.show(truncate=False)

#+--------+-----------+-----+------------------------------------------------------------------+
#|origin  |destination|count|json                                                              |
#+--------+-----------+-----+------------------------------------------------------------------+
#|toronto |ottawa     |5    |[{"origin":"toronto"},{"destination":"ottawa"},{"count":"5"}]     |
#|montreal|vancouver  |10   |[{"origin":"montreal"},{"destination":"vancouver"},{"count":"10"}]|
#+--------+-----------+-----+------------------------------------------------------------------+