如何在 impala 中将 csv 加载到外部 table 时删除双引号?

How to remove double quotes when loading csv into external table in impala?

这是数据(也可以从here下载):

"Creation Date","Status","First 3 Chars of Postal Code","Intersection Street 1","Intersection Street 2","Ward","Service Request Type","Division","Section"
"2010-01-01 00:38:26.0000000","Closed","Intersection","High Park Blvd","Parkside Dr","Parkdale-High Park (13)","Road - Sanding / Salting Required","Transportation Services","Road Operations"
"2010-01-01 01:19:18.0000000","Closed","M4T","","","Toronto Centre-Rosedale (27)","Water Service Line-Turn On","Toronto Water","District Ops"

这是我创建的 table 查询:

CREATE TABLE sr.sr2013 ( 
creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002', 
'mapkey.delim'='\u0003', 
'serialization.format'=',', 
'field.delim'=',', 
'skip.header.line.count'='1',
'quoteChar'= "\"") ;

这是加载数据查询:

load data inpath '/user/rxie/SR2013.csv' into table sr2013;

加载数据后,检查table发现所有原始引号都被保留:

所以这里至少有两个问题: 1、table创建中的选项'skip.header.line.count'='1',不排除表头; 2. 将数据加载到 table

时,如选项 'quoteChar'= "\"" 所示,双引号未被删除

谁能分享更多的光?对我来说它看起来像错误。

更新 1:

在Hue/Hive编辑中:

creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING )                               
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES (                             
   'colelction.delim'='\u0002',                     
   'field.delim'=',',                               
   'mapkey.delim'='\u0003',                         
   'serialization.format'=',',
   'skip.header.line.count'='1',   
   'quoteChar'= "\"") 


   LOAD DATA LOCAL INPATH '/home/rxie/data/csv/SR2015.csv' INTO TABLE sr2015;  

错误:

Error while compiling statement: FAILED: SemanticException line 1:26 Invalid path ''/home/rxie/data/csv/SR2015.csv'': No files matching path file:/home/rxie/data/csv/SR2015.csv

下面是我加载 csv 时排除引号的方法如下:

在 Hive Editor 中(我认为 beeline 也不错,虽然我没有测试它):

  1. 创建蜂巢table

    创建外部 TABLE sr2015(
    creation_date 字符串,
    状态字符串,
    first_3_chars_of_postal_code 字符串,
    intersection_street_1 字符串,
    intersection_street_2 字符串,
    病房 STRING,
    service_request_type 字符串,
    除法 STRING,
    部分字符串)
    行格式 SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 使用 SERDEPROPERTIES(
    'colelction.delim'='\u0002',
    'field.delim'=',',
    'mapkey.delim'='\u0003',
    'serialization.format'=',', 'skip.header.line.count'='1',
    'quoteChar'= "\"")

  2. 将数据加载到 Hive table:

    加载数据路径 "hdfs:///user/rxie/SR2015.csv" 进入 TABLE sr2015;

未决问题(将讨论here): 在 Impala

中无法访问 table