通过为具有 JSON 的列定义架构来为 Hive Table 创建一个视图

Create a VIEW for Hive Table by defining schema for a column which has JSON

  1. 我正在将 Kafka 流中的原始 JSON 字符串作为镶木地板存储到 HDFS
  2. 我已经在 Hive 上为 HDFS 文件夹创建了一个外部 table
  3. 现在我想为存储在 Hive table、
  4. 中的 RAW 数据创建一个 VIEW

Kafka 流到 HDFS

public static void main(String[] args) throws Exception {

    String brokers = "quickstart:9092";
    String topics = "simple_topic_6";
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(EventKafkaToParquet.class.getName())
            .master(master).getOrCreate();
    SQLContext sqlContext = sparkSession.sqlContext();
    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");

    Dataset<Row> rawDataSet = sparkSession.readStream()
            .format("kafka")
            .option("kafka.bootstrap.servers", brokers)
            .option("subscribe", topics).load();
    rawDataSet.printSchema();

    rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
    rawDataSet.createOrReplaceTempView("basicView");
    Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
    writeDataset
            .repartition(1)
            .writeStream()
            .option("path","/user/cloudera/employee/")
            .option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
            .format("parquet")
            .trigger(Trigger.ProcessingTime(5000))
            .start()
            .awaitTermination();
}

Hive 上的外部 table

CREATE EXTERNAL TABLE employee_raw ( employee STRING )  
STORED AS PARQUET
LOCATION '/user/cloudera/employee' ;

现在我想在 employee_raw table 之上创建一个 HIVE 视图,输出为

firstName, lastName, street, city, state, zip

employee_rawtable的输出是

hive> select * from employee_raw;
OK
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
Time taken: 0.123 seconds, Fetched: 5 row(s)

感谢您的意见

根据你的描述我找你主要喜欢“Extract values from JSON string in Hive", so you may find the answer in the linked thread.