具有特殊情况的 Oracle 中的线性插值
Linear Interpolation in Oracle with special cases
为了填补缺失值,我需要对那些缺失值进行插值。
我得到了如下数据集(示例):
Country Year Value
A 2000 1.5
A 2001 2.5
A 2002 null
A 2003 4.5
B 2000 null
B 2000 null
B 2002 5.3
B 2003 6.3
C 2000 1
C 2001 null
C 2002 null
C 2003 4
因此我预计:
Country Year Value
A 2000 1.5
A 2001 2.5
A 2002 3.5
A 2003 4.5
B 2000 3.3
B 2000 4.3
B 2002 5.3
B 2003 6.3
C 2000 1
C 2001 2
C 2002 3
C 2003 4
我怎样才能通过线性插值来插值这些值。我真的不知道如何在 oracle 中有效地做到这一点。
Oracle 有多种线性插值函数。 REGR_SLOPE
和 REGR_INTERCEPT
在这里很有用。
你的问题在于它不是值和年份之间的线性回归。它是国家/地区组内值和行号之间的线性回归。所以我们需要先计算那个行号,然后才能计算插值。
with input_data (country, year, value) AS (
SELECT 'A', 2000, 1.5 FROM DUAL UNION ALL
SELECT 'A', 2001, 2.5 FROM DUAL UNION ALL
SELECT 'A', 2002, null FROM DUAL UNION ALL
SELECT 'A', 2003, 4.5 FROM DUAL UNION ALL
SELECT 'B', 2000, null FROM DUAL UNION ALL
SELECT 'B', 2000, null FROM DUAL UNION ALL
SELECT 'B', 2002, 5.3 FROM DUAL UNION ALL
SELECT 'B', 2003, 6.3 FROM DUAL UNION ALL
SELECT 'C', 2000, 1 FROM DUAL UNION ALL
SELECT 'C', 2001, null FROM DUAL UNION ALL
SELECT 'C', 2002, null FROM DUAL UNION ALL
SELECT 'C', 2003, 4 FROM DUAL
), ordered_input as (
SELECT
i.*,
row_number() over ( partition by country order by year) rn
FROM input_data i
)
SELECT
country,
year,
value,
rn * regr_slope(value, rn) over ( partition by country) +
regr_intercept(value, rn) over ( partition by country)
as interpolated_value
FROM ordered_input
ORDER BY country, year, rn;
+---------+------+-------+--------------------+
| COUNTRY | YEAR | VALUE | INTERPOLATED_VALUE |
+---------+------+-------+--------------------+
| A | 2000 | 1.5 | 1.5 |
| A | 2001 | 2.5 | 2.5 |
| A | 2002 | | 3.5 |
| A | 2003 | 4.5 | 4.5 |
| B | 2000 | | 3.3 |
| B | 2000 | | 4.3 |
| B | 2002 | 5.3 | 5.3 |
| B | 2003 | 6.3 | 6.3 |
| C | 2000 | 1 | 1 |
| C | 2001 | | 2 |
| C | 2002 | | 3 |
| C | 2003 | 4 | 4 |
+---------+------+-------+--------------------+
为了填补缺失值,我需要对那些缺失值进行插值。
我得到了如下数据集(示例):
Country Year Value
A 2000 1.5
A 2001 2.5
A 2002 null
A 2003 4.5
B 2000 null
B 2000 null
B 2002 5.3
B 2003 6.3
C 2000 1
C 2001 null
C 2002 null
C 2003 4
因此我预计:
Country Year Value
A 2000 1.5
A 2001 2.5
A 2002 3.5
A 2003 4.5
B 2000 3.3
B 2000 4.3
B 2002 5.3
B 2003 6.3
C 2000 1
C 2001 2
C 2002 3
C 2003 4
我怎样才能通过线性插值来插值这些值。我真的不知道如何在 oracle 中有效地做到这一点。
Oracle 有多种线性插值函数。 REGR_SLOPE
和 REGR_INTERCEPT
在这里很有用。
你的问题在于它不是值和年份之间的线性回归。它是国家/地区组内值和行号之间的线性回归。所以我们需要先计算那个行号,然后才能计算插值。
with input_data (country, year, value) AS (
SELECT 'A', 2000, 1.5 FROM DUAL UNION ALL
SELECT 'A', 2001, 2.5 FROM DUAL UNION ALL
SELECT 'A', 2002, null FROM DUAL UNION ALL
SELECT 'A', 2003, 4.5 FROM DUAL UNION ALL
SELECT 'B', 2000, null FROM DUAL UNION ALL
SELECT 'B', 2000, null FROM DUAL UNION ALL
SELECT 'B', 2002, 5.3 FROM DUAL UNION ALL
SELECT 'B', 2003, 6.3 FROM DUAL UNION ALL
SELECT 'C', 2000, 1 FROM DUAL UNION ALL
SELECT 'C', 2001, null FROM DUAL UNION ALL
SELECT 'C', 2002, null FROM DUAL UNION ALL
SELECT 'C', 2003, 4 FROM DUAL
), ordered_input as (
SELECT
i.*,
row_number() over ( partition by country order by year) rn
FROM input_data i
)
SELECT
country,
year,
value,
rn * regr_slope(value, rn) over ( partition by country) +
regr_intercept(value, rn) over ( partition by country)
as interpolated_value
FROM ordered_input
ORDER BY country, year, rn;
+---------+------+-------+--------------------+ | COUNTRY | YEAR | VALUE | INTERPOLATED_VALUE | +---------+------+-------+--------------------+ | A | 2000 | 1.5 | 1.5 | | A | 2001 | 2.5 | 2.5 | | A | 2002 | | 3.5 | | A | 2003 | 4.5 | 4.5 | | B | 2000 | | 3.3 | | B | 2000 | | 4.3 | | B | 2002 | 5.3 | 5.3 | | B | 2003 | 6.3 | 6.3 | | C | 2000 | 1 | 1 | | C | 2001 | | 2 | | C | 2002 | | 3 | | C | 2003 | 4 | 4 | +---------+------+-------+--------------------+