自加入 table 中的下一个日期作为另一个日期字段
Self joining next date in table as another date field
尝试使用汇率 table 和交易 table,以及如何将交易加入交易所 table 取决于该交易的最新汇率何时货币是相对于交易而言的。 table 包含许多重复项和许多类型的货币以及与此问题无关的其他字段。
我计划通过使用 BETWEEN 作为每种货币汇率的开始日期和结束日期以及交易日期来加入交易,但是我没有结束日期字段。一种货币的汇率在下一种货币开始时结束。
例如:
Currency
Start Date
Rate
EUR
01/12/2021
1.25
US
01/12/2021
0.75
EUR
25/12/2021
1.11
US
10/12/2021
0.8
EUR
01/12/2021
1.25
US
25/12/2021
1.11
应该变成:
Currency
Start Date
Rate
End Date
EUR
01/12/2021
1.25
24/12/2021
US
01/12/2021
0.75
09/12/2021
EUR
25/12/2021
1.11
today
US
10/12/2021
0.8
today
我在想:
with ordered_currency as (
select distinct Currency, Start_date
from exchange_rate_table
order by currency, start_date asc
)
这将按日期顺序生成所有货币的 table,删除任何重复项。
下一步是检查下一行是否是同一种货币,如果是则将其开始日期作为当前行的结束日期。如果它不是同一种货币,则只需将 current_date() 作为 end_date,然后可以将此 End_date 字段连接回原始 table.
但是,我不确定用于评估下一行是否具有相同字段(在本例中为货币)的语法
如有任何帮助,我们将不胜感激!
however I don't have an end date field.
如果你这样做会容易得多。您可以使用 LEAD:
“从下一行获取一个”
WITH exchg_from_to AS(
SELECT
Currency,
StartDate,
LEAD(StartDate, 1, TO_DATE('9999-12-31')) OVER(PARTITION BY Currency ORDER BY StartDate), CAST('9999-12-31' as date) EndDate,
Rate
FROM
exchange_rate_table
)
SELECT *
FROM
tran t
JOIN exchg_from_to e
ON
t.TranDate >= e.StartDate AND
t.TranDate < e.EndDate AND
t.Currency = e.Currency
您可能需要稍微调整一下 TO_DATE..(我从未使用过 SnowFlake,但文档说支持 LEAD))
我不推荐使用 BETWEEN,因为它包含两端;如果你有一个日期是 bang 的 tran,它将匹配两次,一次是第 N 行的开始日期,一次是 N-1 结束日期。使用 >=
和 <
对确保没有重叠
所以另一种不使用 JOINS 的方法是:
输入数据的 CTE:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('US','01/12/2021', 0.75),
('EUR','25/12/2021', 1.11),
('US','10/12/2021', 0.8),
('EUR','01/12/2021', 1.25),
('US','25/12/2021', 1.11)
)
然后获取每一行的 end_time(与 Caius 的回答相同):
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM fake_data
ORDER BY 2;
给出:
CURRENCY
START_DATE
RATE
END_DATE
EUR
2021-12-01
1.25
2021-12-01
US
2021-12-01
0.75
2021-12-01
EUR
2021-12-01
1.25
2021-12-10
US
2021-12-10
0.8
2021-12-25
EUR
2021-12-25
1.11
2021-12-25
US
2021-12-25
1.11
2022-02-02
但是现在将结果链接在一起,我们希望过滤掉相同的条纹,我们可以使用 LAG 和 QUALIFY
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
QUALIFY lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
并且考虑到先前的“行尾”在该上下文中没有用,我们可以将其删除,但现在我们希望下一个进入时间是连胜的 end_time
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
)
WHERE lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
我从 QUALIFY 换成了外部 WHERE 子句,因为它在使用嵌套 SELECT 时似乎很常见
CURRENCY
START_DATE
RATE
END_DATE
EUR
2021-12-01
1.25
2021-12-01
US
2021-12-01
0.75
2021-12-01
EUR
2021-12-01
1.25
2021-12-10
US
2021-12-10
0.8
2021-12-25
EUR
2021-12-25
1.11
2021-12-25
US
2021-12-25
1.11
2022-02-02
现在有一些 JOIN 方法允许的事情,比如聚合连续行数和这些行的值,但我们也可以在这里做,我们只需要做几个额外的步骤。
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.25),
('EUR','03/12/2021', 1.25),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.11),
('US','25/12/2021', 1.11)
)
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
ORDER BY 2;
这给出了一个分组标记streak_date
CURRENCY
START_DATE
RATE
LAG_CURRENCY
NEW_ROW
STREAK_START
STREAK_DATE
EUR
2021-12-01
1.25
TRUE
2021-12-01
2021-12-01
EUR
2021-12-02
1.25
EUR
FALSE
2021-12-01
EUR
2021-12-03
1.25
EUR
FALSE
2021-12-01
US
2021-12-04
0.75
EUR
TRUE
2021-12-04
2021-12-04
US
2021-12-10
0.8
US
FALSE
2021-12-04
EUR
2021-12-23
1.11
US
TRUE
2021-12-23
2021-12-23
EUR
2021-12-24
1.11
EUR
FALSE
2021-12-23
US
2021-12-25
1.11
EUR
TRUE
2021-12-25
2021-12-25
因此您可以将其包装起来并进行任何您喜欢的聚合数学计算:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.26),
('EUR','03/12/2021', 1.27),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.14),
('US','25/12/2021', 1.11)
)
SELECT b.currency
,min(b.start_date) as start_date
,any_value(streak_end) as end_date
,count(*) as streak_count
,avg(rate) as avg_rate
,min(rate) as min_rate
,max(rate) as max_rate
FROM (
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
,lead(streak_start,1,current_date) ignore nulls over (order by start_date) as streak_end
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
) as b
GROUP BY b.streak_date,1
ORDER BY 2;
CURRENCY
START_DATE
END_DATE
STREAK_COUNT
AVG_RATE
MIN_RATE
EUR
2021-12-01
2021-12-04
3
1.25
1.25
US
2021-12-04
2021-12-23
2
0.775
0.75
EUR
2021-12-23
2021-12-25
2
1.11
1.11
US
2021-12-25
2022-02-02
1
1.11
1.11
尝试使用汇率 table 和交易 table,以及如何将交易加入交易所 table 取决于该交易的最新汇率何时货币是相对于交易而言的。 table 包含许多重复项和许多类型的货币以及与此问题无关的其他字段。
我计划通过使用 BETWEEN 作为每种货币汇率的开始日期和结束日期以及交易日期来加入交易,但是我没有结束日期字段。一种货币的汇率在下一种货币开始时结束。
例如:
Currency | Start Date | Rate |
---|---|---|
EUR | 01/12/2021 | 1.25 |
US | 01/12/2021 | 0.75 |
EUR | 25/12/2021 | 1.11 |
US | 10/12/2021 | 0.8 |
EUR | 01/12/2021 | 1.25 |
US | 25/12/2021 | 1.11 |
应该变成:
Currency | Start Date | Rate | End Date |
---|---|---|---|
EUR | 01/12/2021 | 1.25 | 24/12/2021 |
US | 01/12/2021 | 0.75 | 09/12/2021 |
EUR | 25/12/2021 | 1.11 | today |
US | 10/12/2021 | 0.8 | today |
我在想:
with ordered_currency as (
select distinct Currency, Start_date
from exchange_rate_table
order by currency, start_date asc
)
这将按日期顺序生成所有货币的 table,删除任何重复项。
下一步是检查下一行是否是同一种货币,如果是则将其开始日期作为当前行的结束日期。如果它不是同一种货币,则只需将 current_date() 作为 end_date,然后可以将此 End_date 字段连接回原始 table.
但是,我不确定用于评估下一行是否具有相同字段(在本例中为货币)的语法
如有任何帮助,我们将不胜感激!
however I don't have an end date field.
如果你这样做会容易得多。您可以使用 LEAD:
“从下一行获取一个”WITH exchg_from_to AS(
SELECT
Currency,
StartDate,
LEAD(StartDate, 1, TO_DATE('9999-12-31')) OVER(PARTITION BY Currency ORDER BY StartDate), CAST('9999-12-31' as date) EndDate,
Rate
FROM
exchange_rate_table
)
SELECT *
FROM
tran t
JOIN exchg_from_to e
ON
t.TranDate >= e.StartDate AND
t.TranDate < e.EndDate AND
t.Currency = e.Currency
您可能需要稍微调整一下 TO_DATE..(我从未使用过 SnowFlake,但文档说支持 LEAD))
我不推荐使用 BETWEEN,因为它包含两端;如果你有一个日期是 bang 的 tran,它将匹配两次,一次是第 N 行的开始日期,一次是 N-1 结束日期。使用 >=
和 <
对确保没有重叠
所以另一种不使用 JOINS 的方法是:
输入数据的 CTE:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('US','01/12/2021', 0.75),
('EUR','25/12/2021', 1.11),
('US','10/12/2021', 0.8),
('EUR','01/12/2021', 1.25),
('US','25/12/2021', 1.11)
)
然后获取每一行的 end_time(与 Caius 的回答相同):
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM fake_data
ORDER BY 2;
给出:
CURRENCY | START_DATE | RATE | END_DATE |
---|---|---|---|
EUR | 2021-12-01 | 1.25 | 2021-12-01 |
US | 2021-12-01 | 0.75 | 2021-12-01 |
EUR | 2021-12-01 | 1.25 | 2021-12-10 |
US | 2021-12-10 | 0.8 | 2021-12-25 |
EUR | 2021-12-25 | 1.11 | 2021-12-25 |
US | 2021-12-25 | 1.11 | 2022-02-02 |
但是现在将结果链接在一起,我们希望过滤掉相同的条纹,我们可以使用 LAG 和 QUALIFY
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
QUALIFY lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
并且考虑到先前的“行尾”在该上下文中没有用,我们可以将其删除,但现在我们希望下一个进入时间是连胜的 end_time
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
)
WHERE lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
我从 QUALIFY 换成了外部 WHERE 子句,因为它在使用嵌套 SELECT 时似乎很常见
CURRENCY | START_DATE | RATE | END_DATE |
---|---|---|---|
EUR | 2021-12-01 | 1.25 | 2021-12-01 |
US | 2021-12-01 | 0.75 | 2021-12-01 |
EUR | 2021-12-01 | 1.25 | 2021-12-10 |
US | 2021-12-10 | 0.8 | 2021-12-25 |
EUR | 2021-12-25 | 1.11 | 2021-12-25 |
US | 2021-12-25 | 1.11 | 2022-02-02 |
现在有一些 JOIN 方法允许的事情,比如聚合连续行数和这些行的值,但我们也可以在这里做,我们只需要做几个额外的步骤。
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.25),
('EUR','03/12/2021', 1.25),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.11),
('US','25/12/2021', 1.11)
)
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
ORDER BY 2;
这给出了一个分组标记streak_date
CURRENCY | START_DATE | RATE | LAG_CURRENCY | NEW_ROW | STREAK_START | STREAK_DATE |
---|---|---|---|---|---|---|
EUR | 2021-12-01 | 1.25 | TRUE | 2021-12-01 | 2021-12-01 | |
EUR | 2021-12-02 | 1.25 | EUR | FALSE | 2021-12-01 | |
EUR | 2021-12-03 | 1.25 | EUR | FALSE | 2021-12-01 | |
US | 2021-12-04 | 0.75 | EUR | TRUE | 2021-12-04 | 2021-12-04 |
US | 2021-12-10 | 0.8 | US | FALSE | 2021-12-04 | |
EUR | 2021-12-23 | 1.11 | US | TRUE | 2021-12-23 | 2021-12-23 |
EUR | 2021-12-24 | 1.11 | EUR | FALSE | 2021-12-23 | |
US | 2021-12-25 | 1.11 | EUR | TRUE | 2021-12-25 | 2021-12-25 |
因此您可以将其包装起来并进行任何您喜欢的聚合数学计算:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.26),
('EUR','03/12/2021', 1.27),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.14),
('US','25/12/2021', 1.11)
)
SELECT b.currency
,min(b.start_date) as start_date
,any_value(streak_end) as end_date
,count(*) as streak_count
,avg(rate) as avg_rate
,min(rate) as min_rate
,max(rate) as max_rate
FROM (
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
,lead(streak_start,1,current_date) ignore nulls over (order by start_date) as streak_end
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
) as b
GROUP BY b.streak_date,1
ORDER BY 2;
CURRENCY | START_DATE | END_DATE | STREAK_COUNT | AVG_RATE | MIN_RATE |
---|---|---|---|---|---|
EUR | 2021-12-01 | 2021-12-04 | 3 | 1.25 | 1.25 |
US | 2021-12-04 | 2021-12-23 | 2 | 0.775 | 0.75 |
EUR | 2021-12-23 | 2021-12-25 | 2 | 1.11 | 1.11 |
US | 2021-12-25 | 2022-02-02 | 1 | 1.11 | 1.11 |