由于特殊字符导致雪花 csv 分隔符问题
Snowflake csv delimitter issue due to special character
有一个文件以特殊字符作为分隔符 (§) utfCode -> 0xA7
文件快照如下
"Diablo"§"tRaider"§"2019-08-12"
"GOT"§"BeltMorham"§"2019-01-02"
"Tomb Raider"§"RealMason"§"2019-04-02"
现在文件格式如下
Create FILE FORMAT GamerFF
SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = '§'
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '2'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '4'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\N');
然而,当我尝试从文件中读取时。
select from stage (file_format=>gamerFF).
Found character '\u00c2' instead of record delimiter '\n' File 'gngamer.txt', line 1, character 4 Row 1, column "TRANSIENT_STAGE_TABLE"["":1]
它抱怨另一个字符 Â, utfcode -> \u00c2
当我用新的分隔符更新格式时,它适用于第一列。
但是当我尝试阅读下一栏时,它就会抛出错误。关于错误代码 (§)
select , from stage (file_format=>gamerFF).
Invalid UTF8 detected in string '0xA7"1"' File 'gngamer.txt', line 1, character 5 Row 1, column "TRANSIENT_STAGE_TABLE"["":2]
使用 validate_utf=false 没有帮助,因为它在字段值中引入了特殊字符。
现在看来我需要两个分隔符。
我无法更改输入文件。
任何人都可以建议
您需要将 FIELD_DELIMITER 设置为“\xc2\xa7”:
create or replace FILE FORMAT GamerFF COMPRESSION = 'AUTO'
FIELD_DELIMITER = '\xc2\xa7'
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '2'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '4'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\N');
有一个文件以特殊字符作为分隔符 (§) utfCode -> 0xA7
文件快照如下
"Diablo"§"tRaider"§"2019-08-12"
"GOT"§"BeltMorham"§"2019-01-02"
"Tomb Raider"§"RealMason"§"2019-04-02"
现在文件格式如下
Create FILE FORMAT GamerFF
SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = '§'
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '2'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '4'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\N');
然而,当我尝试从文件中读取时。
select from stage (file_format=>gamerFF).
Found character '\u00c2' instead of record delimiter '\n' File 'gngamer.txt', line 1, character 4 Row 1, column "TRANSIENT_STAGE_TABLE"["":1]
它抱怨另一个字符 Â, utfcode -> \u00c2 当我用新的分隔符更新格式时,它适用于第一列。
但是当我尝试阅读下一栏时,它就会抛出错误。关于错误代码 (§)
select , from stage (file_format=>gamerFF).
Invalid UTF8 detected in string '0xA7"1"' File 'gngamer.txt', line 1, character 5 Row 1, column "TRANSIENT_STAGE_TABLE"["":2]
使用 validate_utf=false 没有帮助,因为它在字段值中引入了特殊字符。
现在看来我需要两个分隔符。 我无法更改输入文件。
任何人都可以建议
您需要将 FIELD_DELIMITER 设置为“\xc2\xa7”:
create or replace FILE FORMAT GamerFF COMPRESSION = 'AUTO'
FIELD_DELIMITER = '\xc2\xa7'
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '2'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '4'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\N');