如何在 impala 中将 csv 加载到外部 table 时删除双引号？

Question

这是数据（也可以从here下载）：

"Creation Date","Status","First 3 Chars of Postal Code","Intersection Street 1","Intersection Street 2","Ward","Service Request Type","Division","Section"
"2010-01-01 00:38:26.0000000","Closed","Intersection","High Park Blvd","Parkside Dr","Parkdale-High Park (13)","Road - Sanding / Salting Required","Transportation Services","Road Operations"
"2010-01-01 01:19:18.0000000","Closed","M4T","","","Toronto Centre-Rosedale (27)","Water Service Line-Turn On","Toronto Water","District Ops"

这是我创建的 table 查询：

CREATE TABLE sr.sr2013 ( 
creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002', 
'mapkey.delim'='\u0003', 
'serialization.format'=',', 
'field.delim'=',', 
'skip.header.line.count'='1',
'quoteChar'= "\"") ;

这是加载数据查询：

load data inpath '/user/rxie/SR2013.csv' into table sr2013;

加载数据后，检查table发现所有原始引号都被保留：

所以这里至少有两个问题： 1、table创建中的选项'skip.header.line.count'='1',不排除表头； 2. 将数据加载到 table

时，如选项 'quoteChar'= "\"" 所示，双引号未被删除

谁能分享更多的光？对我来说它看起来像错误。

更新 1：

在Hue/Hive编辑中：

creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING )                               
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES (                             
   'colelction.delim'='\u0002',                     
   'field.delim'=',',                               
   'mapkey.delim'='\u0003',                         
   'serialization.format'=',',
   'skip.header.line.count'='1',   
   'quoteChar'= "\"") 


   LOAD DATA LOCAL INPATH '/home/rxie/data/csv/SR2015.csv' INTO TABLE sr2015;

错误：

Error while compiling statement: FAILED: SemanticException line 1:26 Invalid path ''/home/rxie/data/csv/SR2015.csv'': No files matching path file:/home/rxie/data/csv/SR2015.csv

Answer 1

下面是我加载 csv 时排除引号的方法如下：

在 Hive Editor 中（我认为 beeline 也不错，虽然我没有测试它）：

创建蜂巢table

创建外部 TABLE sr2015(
creation_date 字符串，
状态字符串，
first_3_chars_of_postal_code 字符串，
intersection_street_1 字符串，
intersection_street_2 字符串，
病房 STRING，
service_request_type 字符串，
除法 STRING，
部分字符串）
行格式 SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 使用 SERDEPROPERTIES（
'colelction.delim'='\u0002',
'field.delim'=',',
'mapkey.delim'='\u0003',
'serialization.format'=',', 'skip.header.line.count'='1',
'quoteChar'= "\"")
将数据加载到 Hive table:

加载数据路径 "hdfs:///user/rxie/SR2015.csv" 进入 TABLE sr2015;

未决问题（将讨论here）：在 Impala

中无法访问 table

如何在 impala 中将 csv 加载到外部 table 时删除双引号？

How to remove double quotes when loading csv into external table in impala?

csv

impala