将带有空值和空字符串的雪花 table 复制到可以使用 psql 复制命令导入的 csv

Question

因此，如果您在 Snowflake 中有此 table：

create table t (x string, y string) as select '', null;

然后使用 file_format csv 将其复制到外部阶段，如果您未将 field_optionally_enclosed_by 设置为 none 以外的值，则会出现此错误：

无法在未指定文件格式选项 field_optionally_enclosed_by 的情况下卸载空字符串。

所以，假设它设置为“”。

create stage some_stg
url='s3://<some-bucket>/<some-dir>'
file_format = (type = csv field_optionally_enclosed_by='"' compression = none)
credentials = (aws_role = '<your-arn-for-snowflake>')

如果您不想弄乱让雪花使用您的 s3 存储桶，我相信这个问题会在内部阶段重现。

当您运行为 table 以上的副本时：

copy into @some_stg/t.csv from t overwrite = true;

你得到一个如下所示的文件 (t_0_0_0.csv):

"","\N"

并且在 postgres 中创建等效的 table 之后：

create table t (x varchar, y varchar);

当你像这样使用 psql 副本将其加载到 postgres 中时：

psql -h <host> -U <user> -c "copy t from stdin with csv null '\N'" < t_0_0_0.csv

postgres 上 t 的内容是：

x, y
"","\N"

现在这是有道理的，因为雪花将 \N 放在双引号中，所以 psql 副本保留了它。如果您编辑 t_0_0_0.csv 并删除 \N:

周围的双引号

"",\N

并且运行 psql再次复制然后\N被正确转换为null

似乎没有办法从支持空字符串和 null 的 snowflake 生成 csv 文件，可以保留加载到 postgres 中。我弄乱了雪花配置 EMPTY_FIELD_AS_NULL 和 NULL_IF，在雪花的文档中它甚至谈到了这个问题：

When unloading empty string data from tables, choose one of the following options:

Preferred: Enclose strings in quotes by setting the FIELD_OPTIONALLY_ENCLOSED_BY option, to distinguish empty strings from NULLs in output CSV files.

它会 "distinguish" 它们，但不是以 psql 副本可以使用的方式，而无需事先使用 sed 操作文件。

有谁知道如何生成以 psql 副本可以重现的方式保留空字符串和 null 的雪花 csv？

Answer 1

您是否尝试过在您的文件格式中使用 NULL_IF 选项，以下文件格式将卸载您的空雪花空数据。

CREATE OR REPLACE FILE FORMAT UPDATED_FORMAT_NAME
TYPE = 'CSV'
COMPRESSION = 'NONE'
FIELD_DELIMITER =','
NULL_IF=()

将带有空值和空字符串的雪花 table 复制到可以使用 psql 复制命令导入的 csv

Copying a snowflake table with nulls and empty strings to csv that can be imported with psql copy command

csv

string

postgresql

null

snowflake-cloud-data-platform