使用雪花从数组中提取值

Question

我的其中一个专栏中以以下格式存储了数据；

[
{
  "arrival_date": "2022-02-15T08:00:00.000Z",
  "cargo_available_timestamp": "2022-02-16T13:00:00.000Z",
  "cargo_type": "unable_to_provide",
  "carton_count": null,
  "lfd": "2022-02-17T08:00:00.000Z"
},
{
  "arrival_date": "2022-02-16T08:00:00.000Z",
  "flight_status": "in_transit",
  "flight_status_other": null
}
 ]

我正在尝试提取 lfd 的值，如数据中所示，使用以下代码；

select col1,
       json_extract_path_text(get(col1,0),'lfd') as value
from table

但是 get() 命令似乎没有获取数组。我收到以下错误：

SQL compilation error: error line 4 at position 20 Invalid argument types for 
 function 'GET': (VARCHAR(16777216), NUMBER(1,0))

当我检查 col1 的数据类型时，它是 varchar。我可以知道如何解析此 varchar 以提取 lfd 的值。谢谢

Answer 1

所以用一个CTE来提供假数据，并为我们解析JSON：

WITH fake_data AS (
    SELECT parse_json(column1) as json
    FROM VALUES 
    ('[
    {
      "arrival_date": "2022-02-15T08:00:00.000Z",
      "cargo_available_timestamp": "2022-02-16T13:00:00.000Z",
      "cargo_type": "unable_to_provide",
      "carton_count": null,
      "lfd": "2022-02-17T08:00:00.000Z"
    },
    {
      "arrival_date": "2022-02-16T08:00:00.000Z",
      "flight_status": "in_transit",
      "flight_status_other": null
    }
     ]')
 )
 select 
    json[0] as array_0
    ,json[1] as array_1
    ,array_0:lfd as lfd_0
    ,array_1:lfd as lfd_1
 from fake_data;

我们得到：

ARRAY_0	ARRAY_1	LFD_0	LFD_1
{ "arrival_date": "2022-02-15T08:00:00.000Z", "cargo_available_timestamp": "2022-02-16T13:00:00.000Z", "cargo_type": "unable_to_provide", "carton_count": null, "lfd": "2022-02-17T08:00:00.000Z" }	{ "arrival_date": "2022-02-16T08:00:00.000Z", "flight_status": "in_transit", "flight_status_other": null }	"2022-02-17T08:00:00.000Z"

因此，如果您知道 JSON 数组将始终按顺序排列，您可以使用：

 select 
    json[0]:lfd as lfd
    ,to_timestamp_ntz(lfd) as lfd_as_timestamp
 from fake_data;

LFD	LFD_AS_TIMESTAMP
"2022-02-17T08:00:00.000Z"	2022-02-17 08:00:00.000

现在，如果您不确定数组的顺序，或者您需要选择一个数组元素，您将需要展平数组。

WITH fake_data AS (
    SELECT parse_json(column1) as json
    FROM VALUES 
    ('[
    {
      "arrival_date": "2022-02-15T08:00:00.000Z",
      "cargo_available_timestamp": "2022-02-16T13:00:00.000Z",
      "cargo_type": "unable_to_provide",
      "carton_count": null,
      "lfd": "2022-02-17T08:00:00.000Z"
    },
    {
      "arrival_date": "2022-02-16T08:00:00.000Z",
      "flight_status": "in_transit",
      "flight_status_other": null
    }
     ]')
     ,('[
       {
      "arrival_date": "2022-02-16T08:00:00.000Z",
      "flight_status": "in_transit",
      "flight_status_other": null
    },
    {
      "arrival_date": "2022-02-15T08:00:00.000Z",
      "cargo_available_timestamp": "2022-02-16T13:00:00.000Z",
      "cargo_type": "unable_to_provide",
      "carton_count": null,
      "lfd": "2022-02-18T08:00:00.000Z"
    }
     ]')
 )
 select 
    to_timestamp_ntz(f.value:lfd) as lfd_ntz
 from fake_data d, table(flatten(input=>d.json)) f
 where lfd_ntz is not null;

LFD_NTZ
2022-02-17 08:00:00.000
2022-02-18 08:00:00.000

使用雪花从数组中提取值

Extracting values from an array using snowflake

json

snowflake-cloud-data-platform