如何将数组项分成 Spark 中的单独列?

How to separate array items into separate columns in Spark?

+---------------------------+
|address                    |
+---------------------------+
|[San Jone, 19422, CA, 126]|
|[Queens, 11372, NY, 5543]  |
+---------------------------+

如果里面的值在数组中,如何将一列分成四列?

预期输出:

+-----------------------------+
|city    | Zip  |state|street |
+-----------------------------+
|San Jose| 19422| CA  |126    |
|Queens  | 11372| NY  |5543   |
+-----------------------------+

编辑:

 [
    {
        "firstName": "Rack",
        "lastName": "Jackon",
        "gender": "man",
        "age": 24,
        "address": {
            "streetAddress": "126",
            "city": "San Jone",
            "state": "CA",
            "postalCode": "394221"
        }
    },
   


{
    "firstName": "Apache",
    "lastName": "Spark",
    "gender": "Woman",
    "age": 24,
    "address": {
        "streetAddress": "5543",
        "city": "Queens",
        "state": "NY",
        "postalCode": "11372"
    }
}

]

这是我的 .json 文件,创建数据框后,我需要将地址分成 4 列。

试试下面的代码。

scala> df.show(false)
+--------------------------+
|address                   |
+--------------------------+
|[San Jone, 19422, CA, 126]|
|[Queens, 11372, NY, 5543] |
+--------------------------+
scala> val columns = Seq("city","zip","state","street").zipWithIndex
scala> df.select(columns.map(c => col(s"address")(c._2).as(c._1)):_*).show(false)
+--------+-----+-----+------+
|city    |zip  |state|street|
+--------+-----+-----+------+
|San Jone|19422|CA   |126   |
|Queens  |11372|NY   |5543  |
+--------+-----+-----+------+