我如何在 scala spark 2.0 中解析 json 文件并将数据插入配置单元 table？

Question

我想在 spark 2.0(scala) 中解析 json 文件。接下来我想将数据保存在 Hive table 中。我如何使用 scala 解析 json 文件？ json 文件示例）metadata.json:

  {
        "syslog": {
            "month": "Sep",
            "day": "26",
            "time": "23:03:44",
            "host": "cdpcapital.onmicrosoft.com"
        },
        "prefix": {
            "cef_version": "CEF:0",
            "device_vendor": "Microsoft",
            "device_product": "SharePoint Online",
        },
        "extensions": {
            "eventId": "7808891",
            "msg": "ManagedSyncClientAllowed",
            "art": "1506467022378",
            "cat": "SharePoint",
            "act": "ManagedSyncClientAllowed",
            "rt": "1506466717000",
            "requestClientApplication": "Microsoft SkyDriveSync",
            "cs1": "0bdbe027-8f50-4ec3-843f-e27c41a63957",
            "cs1Label": "Organization ID",
            "cs2Label": "Modified Properties",
            "ahost": "cdpdiclog101.cgimss.com",
            "agentZoneURI": "/All Zones",
            "amac": "F0-1F-AF-DA-8F-1B",
            "av": "7.6.0.8009.0",
        }
    },

谢谢

Answer 1

您可以使用类似的东西：

val jsonDf = sparkSession
      .read
      //.option("wholeFile", true) if its not a Single Line JSON
      .json("resources/json/metadata.json")

    jsonDf.printSchema()

jsonDf.registerTempTable("metadata")

有关此内容的更多详细信息https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

我如何在 scala spark 2.0 中解析 json 文件并将数据插入配置单元 table？

How can i parse json file in scala spark 2.0 and can I insert data into hive table?

json

hadoop

scala

data-integration

apache-spark