从现有外部 table 创建外部 Hive table
Create an external Hive table from an existing external table
我在 HDFS 路径中有一组 CSV 文件,我从这些文件创建了一个外部 Hive table,比方说 table_A。由于一些条目是多余的,我尝试基于 table_A 创建另一个 Hive table,比如 table_B,它具有不同的记录。我能够将 table_B 创建为非外部 table(Hive 仓库)。我想知道我是否可以将 table_B 创建为外部 table?如果可能,它是否会从 table_A 复制记录并在指定路径上创建自己的 table_B 存储(最好也是 CSV)?
I am presuming you want to select distinct data from "uncleaned" table
and insert into "cleaned" table.
CREATE EXTERNAL TABLE `uncleaned`(
`a` int,
`b` string,
`c` string,
`d` string,
`e` bigint
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/external/uncleaned'
创建另一个 table,它可以是外部的也可以不是(无关紧要)。
CREATE EXTERNAL TABLE `cleaned`(
`a` int,
`b` string,
`c` string,
`d` string,
`e` bigint
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/external/cleaned'
Read from first table and you can insert it by
insert overwrite table cleaned
select distinct a,b,c,d,e from uncleaned;
我在 HDFS 路径中有一组 CSV 文件,我从这些文件创建了一个外部 Hive table,比方说 table_A。由于一些条目是多余的,我尝试基于 table_A 创建另一个 Hive table,比如 table_B,它具有不同的记录。我能够将 table_B 创建为非外部 table(Hive 仓库)。我想知道我是否可以将 table_B 创建为外部 table?如果可能,它是否会从 table_A 复制记录并在指定路径上创建自己的 table_B 存储(最好也是 CSV)?
I am presuming you want to select distinct data from "uncleaned" table and insert into "cleaned" table.
CREATE EXTERNAL TABLE `uncleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/uncleaned'
创建另一个 table,它可以是外部的也可以不是(无关紧要)。
CREATE EXTERNAL TABLE `cleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/cleaned'
Read from first table and you can insert it by
insert overwrite table cleaned select distinct a,b,c,d,e from uncleaned;