Hive 不识别 Thorn 字符定界符
Thorn character delimiter is not recognized in Hive
如post所述 Using the Icelandic Thorn character as a delimiter in Hive
thorn 字符定界符在 Hive 中无法识别
示例 table
CREATE EXTERNAL TABLE IF NOT EXISTS zzzzz_raw (
spot_id INT,
activity_type_id INT,
activity_type STRING,
activity_id INT,
activity_sub_type STRING,
report_name STRING,
tag_method_id INT
)
PARTITIONED BY ( dt DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-2' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/raw/data/networkmatchtablesactivity/activity_cat';
输出
select * 来自activity_cat_raw限制1;
4552126þ805759þeaasv101þ2275868þbfeaac01þBF_EA Access_Info Pageþ2 NULL NULL NULL NULL NULL NULL 2015-03-24
我是不是漏掉了什么?
我找到了答案。
我使用了 '-61' 分隔符而不是 '-2'(thorn 分隔符),然后使用子字符串来删除附加符号,如下所示
CREATE EXTERNAL TABLE IF NOT EXISTS SSSSSS (
spot_id STRING,
activity_type_id STRING,
activity_type STRING,
activity_id STRING,
activity_sub_type STRING,
report_name STRING,
tag_method_id STRING
)
PARTITIONED BY ( dt STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-61' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'SSSSSS';
然后用substring去除其他符号
INSERT OVERWRITE TABLE vvvvvv PARTITION (dt)
SELECT spot_id STRING,
substr(activity_type_id,2),
dt
FROM SSSSS
希望对您有所帮助..
如post所述 Using the Icelandic Thorn character as a delimiter in Hive thorn 字符定界符在 Hive 中无法识别
示例 table
CREATE EXTERNAL TABLE IF NOT EXISTS zzzzz_raw (
spot_id INT,
activity_type_id INT,
activity_type STRING,
activity_id INT,
activity_sub_type STRING,
report_name STRING,
tag_method_id INT
)
PARTITIONED BY ( dt DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-2' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/raw/data/networkmatchtablesactivity/activity_cat';
输出
select * 来自activity_cat_raw限制1;
4552126þ805759þeaasv101þ2275868þbfeaac01þBF_EA Access_Info Pageþ2 NULL NULL NULL NULL NULL NULL 2015-03-24
我是不是漏掉了什么?
我找到了答案。 我使用了 '-61' 分隔符而不是 '-2'(thorn 分隔符),然后使用子字符串来删除附加符号,如下所示
CREATE EXTERNAL TABLE IF NOT EXISTS SSSSSS (
spot_id STRING,
activity_type_id STRING,
activity_type STRING,
activity_id STRING,
activity_sub_type STRING,
report_name STRING,
tag_method_id STRING
)
PARTITIONED BY ( dt STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-61' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'SSSSSS';
然后用substring去除其他符号
INSERT OVERWRITE TABLE vvvvvv PARTITION (dt)
SELECT spot_id STRING,
substr(activity_type_id,2),
dt
FROM SSSSS
希望对您有所帮助..