Impala - 处理分区列上的特殊字符
Impala - Handle special characters on partition column
我目前正在从事一项将数据从暂存 table 复制到最终 table 的工作。 staging table 中用于在最终 table 上进行分区的列有多个带单引号的记录(例如 supplies'A、demand'A 等)。因此,impala INSERT OVERWRITE 语句失败并显示以下消息:
Query: insert OVERWRITE rec_details (
rec_id, rec_name, rec_value ) PARTITION (rec_part) SELECT
rec_id, rec_name, rec_value, rec_name FROM staging_rec_details Query submitted at: 2017-06-12 03:23:22 (Coordinator:
http://hostname:port) Query progress can be monitored at:
http://hostname:port/query_plan?query_id=ea4e14229d1c0119:a839f51500000000
WARNINGS: TableLoadingException: Failed to load metadata for table:
rec_details CAUSED BY: IllegalStateException: Invalid partition name:
rec_part=-supplies'A
DDL语句如下:
--DDL 1 - Staging Table
CREATE EXTERNAL TABLE staging_rec_details(
rec_id STRING,
rec_name STRING,
rec_value STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '[=11=]7'
LINES TERMINATED BY '[=11=]1'
--WITH SERDEPROPERTIES ('serialization.format'='\t', 'field.delim'='\t')
STORED AS TEXTFILE
LOCATION '/staging/staging_rec_details'
--DDL 2 - Final Table
CREATE EXTERNAL TABLE rec_details(
rec_id STRING,
rec_name STRING,
rec_value STRING
)
PARTITIONED BY (rec_part STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '[=11=]7'
LINES TERMINATED BY '[=11=]1'
--WITH SERDEPROPERTIES ('serialization.format'='\t', 'field.delim'='\t')
STORED AS PARQUET
LOCATION '/data/rec_details'
以下是用于插入记录的Impala语句:
--Impala SQL
INSERT OVERWRITE rec_details
(
rec_id, rec_name, rec_value
)
PARTITION (rec_part)
SELECT
rec_id, rec_name, rec_value, rec_name
FROM staging_rec_details
当分区列有单引号等特殊字符时,如何将数据插入最终table?
问题已通过替换特殊字符得到解决:
-- Modified Impala SQL
INSERT OVERWRITE rec_details
(
rec_id, rec_name, rec_value
) PARTITION (rec_part)
SELECT
rec_id, rec_name, rec_value,
regexp_replace(rec_name,'\'','')
FROM staging_rec_details
我目前正在从事一项将数据从暂存 table 复制到最终 table 的工作。 staging table 中用于在最终 table 上进行分区的列有多个带单引号的记录(例如 supplies'A、demand'A 等)。因此,impala INSERT OVERWRITE 语句失败并显示以下消息:
Query: insert OVERWRITE rec_details ( rec_id, rec_name, rec_value ) PARTITION (rec_part) SELECT rec_id, rec_name, rec_value, rec_name FROM staging_rec_details Query submitted at: 2017-06-12 03:23:22 (Coordinator: http://hostname:port) Query progress can be monitored at: http://hostname:port/query_plan?query_id=ea4e14229d1c0119:a839f51500000000 WARNINGS: TableLoadingException: Failed to load metadata for table: rec_details CAUSED BY: IllegalStateException: Invalid partition name: rec_part=-supplies'A
DDL语句如下:
--DDL 1 - Staging Table
CREATE EXTERNAL TABLE staging_rec_details(
rec_id STRING,
rec_name STRING,
rec_value STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '[=11=]7'
LINES TERMINATED BY '[=11=]1'
--WITH SERDEPROPERTIES ('serialization.format'='\t', 'field.delim'='\t')
STORED AS TEXTFILE
LOCATION '/staging/staging_rec_details'
--DDL 2 - Final Table
CREATE EXTERNAL TABLE rec_details(
rec_id STRING,
rec_name STRING,
rec_value STRING
)
PARTITIONED BY (rec_part STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '[=11=]7'
LINES TERMINATED BY '[=11=]1'
--WITH SERDEPROPERTIES ('serialization.format'='\t', 'field.delim'='\t')
STORED AS PARQUET
LOCATION '/data/rec_details'
以下是用于插入记录的Impala语句:
--Impala SQL
INSERT OVERWRITE rec_details
(
rec_id, rec_name, rec_value
)
PARTITION (rec_part)
SELECT
rec_id, rec_name, rec_value, rec_name
FROM staging_rec_details
当分区列有单引号等特殊字符时,如何将数据插入最终table?
问题已通过替换特殊字符得到解决:
-- Modified Impala SQL
INSERT OVERWRITE rec_details
(
rec_id, rec_name, rec_value
) PARTITION (rec_part)
SELECT
rec_id, rec_name, rec_value,
regexp_replace(rec_name,'\'','')
FROM staging_rec_details