用最接近的上下非空值元素差值填充空值
Fill null values with closest upper and lower non-null value element difference
有没有办法在 Snowflake 中使用 sql
实现以下目标?
原文Table:
订单 |值 |
1 | 80 |
2 |空 |
3 |空 |
4 | 20 |
输出table:
订单 |值 |
1 | 80 |
2 | 60 |
3 | 40 |
4 | 20 |
我知道这需要相当复杂的 window 操作,并且可能超出标准 sql
。
因此假设order
只是步长的方向而不是步长的线性映射。否则 rn
可以替换为 ov
以匹配您原始订单中的步骤。
如果您同时运行多个数据集,它没有分区。
SELECT *
,lead(v) ignore nulls over (order by ov) as nv
,lag(v) ignore nulls over (order by ov) as pv
,lead(nrn) ignore nulls over (order by ov) as nnrn
,lag(nrn) ignore nulls over (order by ov) as pnrn
,nvl(v, ((nv-pv)/(nnrn-pnrn)*(rn-pnrn))+pv) as value
FROM (
SELECT *
,row_number() over (order by ov) as rn
,nvl2(v,rn,null) as nrn
FROM values
(1,80),
(2,null),
(3,null),
(4,20)
t(ov, v)
)
order by ov;
给出:
OV
V
RN
NRN
NV
PV
NNRN
PNRN
VALUE
1
80
1
1
20
4
80
2
2
20
80
4
1
60
3
3
20
80
4
1
40
4
20
4
4
80
1
20
或者如果你想让它看起来整洁:
WITH data as (
SELECT *
FROM values
(1,80),
(2,null),
(3,null),
(4,20),
(5,null),
(6,null),
(7,11)
t(ov, v)
)
SELECT ov, v, value
FROM (
SELECT *
,lead(v) ignore nulls over (order by ov) as nv
,lag(v) ignore nulls over (order by ov) as pv
,lead(nrn) ignore nulls over (order by ov) as nnrn
,lag(nrn) ignore nulls over (order by ov) as pnrn
,nvl(v, ((nv-pv)/(nnrn-pnrn)*(rn-pnrn))+pv) as value
FROM (
SELECT *
,row_number() over (order by ov) as rn
,nvl2(v,rn,null) as nrn
FROM data
)
)
order by ov;
OV
V
VALUE
1
80
80
2
60
3
40
4
20
20
5
17
6
14
7
11
11
有没有办法在 Snowflake 中使用 sql
实现以下目标?
原文Table:
订单 |值 |
1 | 80 |
2 |空 |
3 |空 |
4 | 20 |
输出table:
订单 |值 |
1 | 80 |
2 | 60 |
3 | 40 |
4 | 20 |
我知道这需要相当复杂的 window 操作,并且可能超出标准 sql
。
因此假设order
只是步长的方向而不是步长的线性映射。否则 rn
可以替换为 ov
以匹配您原始订单中的步骤。
如果您同时运行多个数据集,它没有分区。
SELECT *
,lead(v) ignore nulls over (order by ov) as nv
,lag(v) ignore nulls over (order by ov) as pv
,lead(nrn) ignore nulls over (order by ov) as nnrn
,lag(nrn) ignore nulls over (order by ov) as pnrn
,nvl(v, ((nv-pv)/(nnrn-pnrn)*(rn-pnrn))+pv) as value
FROM (
SELECT *
,row_number() over (order by ov) as rn
,nvl2(v,rn,null) as nrn
FROM values
(1,80),
(2,null),
(3,null),
(4,20)
t(ov, v)
)
order by ov;
给出:
OV | V | RN | NRN | NV | PV | NNRN | PNRN | VALUE |
---|---|---|---|---|---|---|---|---|
1 | 80 | 1 | 1 | 20 | 4 | 80 | ||
2 | 2 | 20 | 80 | 4 | 1 | 60 | ||
3 | 3 | 20 | 80 | 4 | 1 | 40 | ||
4 | 20 | 4 | 4 | 80 | 1 | 20 |
或者如果你想让它看起来整洁:
WITH data as (
SELECT *
FROM values
(1,80),
(2,null),
(3,null),
(4,20),
(5,null),
(6,null),
(7,11)
t(ov, v)
)
SELECT ov, v, value
FROM (
SELECT *
,lead(v) ignore nulls over (order by ov) as nv
,lag(v) ignore nulls over (order by ov) as pv
,lead(nrn) ignore nulls over (order by ov) as nnrn
,lag(nrn) ignore nulls over (order by ov) as pnrn
,nvl(v, ((nv-pv)/(nnrn-pnrn)*(rn-pnrn))+pv) as value
FROM (
SELECT *
,row_number() over (order by ov) as rn
,nvl2(v,rn,null) as nrn
FROM data
)
)
order by ov;
OV | V | VALUE |
---|---|---|
1 | 80 | 80 |
2 | 60 | |
3 | 40 | |
4 | 20 | 20 |
5 | 17 | |
6 | 14 | |
7 | 11 | 11 |