我如何在 scala spark 2.0 中解析 json 文件并将数据插入配置单元 table?
How can i parse json file in scala spark 2.0 and can I insert data into hive table?
我想在 spark 2.0(scala) 中解析 json 文件。接下来我想将数据保存在 Hive table 中。
我如何使用 scala 解析 json 文件?
json 文件示例)metadata.json:
{
"syslog": {
"month": "Sep",
"day": "26",
"time": "23:03:44",
"host": "cdpcapital.onmicrosoft.com"
},
"prefix": {
"cef_version": "CEF:0",
"device_vendor": "Microsoft",
"device_product": "SharePoint Online",
},
"extensions": {
"eventId": "7808891",
"msg": "ManagedSyncClientAllowed",
"art": "1506467022378",
"cat": "SharePoint",
"act": "ManagedSyncClientAllowed",
"rt": "1506466717000",
"requestClientApplication": "Microsoft SkyDriveSync",
"cs1": "0bdbe027-8f50-4ec3-843f-e27c41a63957",
"cs1Label": "Organization ID",
"cs2Label": "Modified Properties",
"ahost": "cdpdiclog101.cgimss.com",
"agentZoneURI": "/All Zones",
"amac": "F0-1F-AF-DA-8F-1B",
"av": "7.6.0.8009.0",
}
},
谢谢
您可以使用类似的东西:
val jsonDf = sparkSession
.read
//.option("wholeFile", true) if its not a Single Line JSON
.json("resources/json/metadata.json")
jsonDf.printSchema()
jsonDf.registerTempTable("metadata")
有关此内容的更多详细信息https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
我想在 spark 2.0(scala) 中解析 json 文件。接下来我想将数据保存在 Hive table 中。 我如何使用 scala 解析 json 文件? json 文件示例)metadata.json:
{
"syslog": {
"month": "Sep",
"day": "26",
"time": "23:03:44",
"host": "cdpcapital.onmicrosoft.com"
},
"prefix": {
"cef_version": "CEF:0",
"device_vendor": "Microsoft",
"device_product": "SharePoint Online",
},
"extensions": {
"eventId": "7808891",
"msg": "ManagedSyncClientAllowed",
"art": "1506467022378",
"cat": "SharePoint",
"act": "ManagedSyncClientAllowed",
"rt": "1506466717000",
"requestClientApplication": "Microsoft SkyDriveSync",
"cs1": "0bdbe027-8f50-4ec3-843f-e27c41a63957",
"cs1Label": "Organization ID",
"cs2Label": "Modified Properties",
"ahost": "cdpdiclog101.cgimss.com",
"agentZoneURI": "/All Zones",
"amac": "F0-1F-AF-DA-8F-1B",
"av": "7.6.0.8009.0",
}
},
谢谢
您可以使用类似的东西:
val jsonDf = sparkSession
.read
//.option("wholeFile", true) if its not a Single Line JSON
.json("resources/json/metadata.json")
jsonDf.printSchema()
jsonDf.registerTempTable("metadata")
有关此内容的更多详细信息https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables