从现有外部 table 创建外部 Hive table

Create an external Hive table from an existing external table

我在 HDFS 路径中有一组 CSV 文件,我从这些文件创建了一个外部 Hive table,比方说 table_A。由于一些条目是多余的,我尝试基于 table_A 创建另一个 Hive table,比如 table_B,它具有不同的记录。我能够将 table_B 创建为非外部 table(Hive 仓库)。我想知道我是否可以将 table_B 创建为外部 table?如果可能,它是否会从 table_A 复制记录并在指定路径上创建自己的 table_B 存储(最好也是 CSV)?

I am presuming you want to select distinct data from "uncleaned" table and insert into "cleaned" table.

CREATE EXTERNAL TABLE `uncleaned`(
  `a` int, 
  `b` string,
  `c` string, 
  `d` string, 
  `e` bigint
  ) 
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/external/uncleaned'

创建另一个 table,它可以是外部的也可以不是(无关紧要)。

CREATE EXTERNAL TABLE `cleaned`(
  `a` int, 
  `b` string,
  `c` string, 
  `d` string, 
  `e` bigint
  ) 
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/external/cleaned'

Read from first table and you can insert it by

insert overwrite table cleaned 

select distinct a,b,c,d,e from uncleaned;