通过为具有 JSON 的列定义架构来为 Hive Table 创建一个视图
Create a VIEW for Hive Table by defining schema for a column which has JSON
- 我正在将 Kafka 流中的原始 JSON 字符串作为镶木地板存储到 HDFS
- 我已经在 Hive 上为 HDFS 文件夹创建了一个外部 table
- 现在我想为存储在 Hive table、
中的 RAW 数据创建一个 VIEW
Kafka 流到 HDFS
public static void main(String[] args) throws Exception {
String brokers = "quickstart:9092";
String topics = "simple_topic_6";
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(EventKafkaToParquet.class.getName())
.master(master).getOrCreate();
SQLContext sqlContext = sparkSession.sqlContext();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
Dataset<Row> rawDataSet = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("subscribe", topics).load();
rawDataSet.printSchema();
rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
rawDataSet.createOrReplaceTempView("basicView");
Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
writeDataset
.repartition(1)
.writeStream()
.option("path","/user/cloudera/employee/")
.option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
.format("parquet")
.trigger(Trigger.ProcessingTime(5000))
.start()
.awaitTermination();
}
Hive 上的外部 table
CREATE EXTERNAL TABLE employee_raw ( employee STRING )
STORED AS PARQUET
LOCATION '/user/cloudera/employee' ;
现在我想在 employee_raw table 之上创建一个 HIVE 视图,输出为
firstName, lastName, street, city, state, zip
employee_rawtable的输出是
hive> select * from employee_raw;
OK
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
Time taken: 0.123 seconds, Fetched: 5 row(s)
感谢您的意见
根据你的描述我找你主要喜欢“Extract values from JSON string in Hive", so you may find the answer in the linked thread.
- 我正在将 Kafka 流中的原始 JSON 字符串作为镶木地板存储到 HDFS
- 我已经在 Hive 上为 HDFS 文件夹创建了一个外部 table
- 现在我想为存储在 Hive table、 中的 RAW 数据创建一个 VIEW
Kafka 流到 HDFS
public static void main(String[] args) throws Exception {
String brokers = "quickstart:9092";
String topics = "simple_topic_6";
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(EventKafkaToParquet.class.getName())
.master(master).getOrCreate();
SQLContext sqlContext = sparkSession.sqlContext();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
Dataset<Row> rawDataSet = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("subscribe", topics).load();
rawDataSet.printSchema();
rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
rawDataSet.createOrReplaceTempView("basicView");
Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
writeDataset
.repartition(1)
.writeStream()
.option("path","/user/cloudera/employee/")
.option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
.format("parquet")
.trigger(Trigger.ProcessingTime(5000))
.start()
.awaitTermination();
}
Hive 上的外部 table
CREATE EXTERNAL TABLE employee_raw ( employee STRING )
STORED AS PARQUET
LOCATION '/user/cloudera/employee' ;
现在我想在 employee_raw table 之上创建一个 HIVE 视图,输出为
firstName, lastName, street, city, state, zip
employee_rawtable的输出是
hive> select * from employee_raw;
OK
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
Time taken: 0.123 seconds, Fetched: 5 row(s)
感谢您的意见
根据你的描述我找你主要喜欢“Extract values from JSON string in Hive", so you may find the answer in the linked thread.