自加入 table 中的下一个日期作为另一个日期字段

Question

尝试使用汇率 table 和交易 table，以及如何将交易加入交易所 table 取决于该交易的最新汇率何时货币是相对于交易而言的。 table 包含许多重复项和许多类型的货币以及与此问题无关的其他字段。

我计划通过使用 BETWEEN 作为每种货币汇率的开始日期和结束日期以及交易日期来加入交易，但是我没有结束日期字段。一种货币的汇率在下一种货币开始时结束。

例如：

Currency	Start Date	Rate
EUR	01/12/2021	1.25
US	01/12/2021	0.75
EUR	25/12/2021	1.11
US	10/12/2021	0.8
EUR	01/12/2021	1.25
US	25/12/2021	1.11

应该变成：

Currency	Start Date	Rate	End Date
EUR	01/12/2021	1.25	24/12/2021
US	01/12/2021	0.75	09/12/2021
EUR	25/12/2021	1.11	today
US	10/12/2021	0.8	today

我在想：

with ordered_currency as (
select distinct Currency, Start_date
from exchange_rate_table
order by currency, start_date asc
)

这将按日期顺序生成所有货币的 table，删除任何重复项。

下一步是检查下一行是否是同一种货币，如果是则将其开始日期作为当前行的结束日期。如果它不是同一种货币，则只需将 current_date() 作为 end_date，然后可以将此 End_date 字段连接回原始 table.

但是，我不确定用于评估下一行是否具有相同字段（在本例中为货币）的语法

如有任何帮助，我们将不胜感激！

Answer 1

however I don't have an end date field.

如果你这样做会容易得多。您可以使用 LEAD:

“从下一行获取一个”

WITH exchg_from_to AS(
    SELECT
      Currency,
      StartDate,
      LEAD(StartDate, 1, TO_DATE('9999-12-31')) OVER(PARTITION BY Currency ORDER BY StartDate), CAST('9999-12-31' as date) EndDate,
      Rate
    FROM
      exchange_rate_table
)


SELECT * 
FROM 
  tran t 
  JOIN exchg_from_to e 
  ON 
    t.TranDate >= e.StartDate AND 
    t.TranDate < e.EndDate AND
    t.Currency = e.Currency

您可能需要稍微调整一下 TO_DATE..（我从未使用过 SnowFlake，但文档说支持 LEAD））

我不推荐使用 BETWEEN，因为它包含两端；如果你有一个日期是 bang 的 tran，它将匹配两次，一次是第 N 行的开始日期，一次是 N-1 结束日期。使用 >= 和 < 对确保没有重叠

Answer 2

所以另一种不使用 JOINS 的方法是：

输入数据的 CTE：

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('US','01/12/2021', 0.75),
        ('EUR','25/12/2021', 1.11),
        ('US','10/12/2021', 0.8),
        ('EUR','01/12/2021', 1.25),
        ('US','25/12/2021', 1.11)
)

然后获取每一行的 end_time（与 Caius 的回答相同）：

SELECT currency
    ,start_date
    ,rate
    ,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM fake_data
ORDER BY 2;

给出：

CURRENCY	START_DATE	RATE	END_DATE
EUR	2021-12-01	1.25	2021-12-01
US	2021-12-01	0.75	2021-12-01
EUR	2021-12-01	1.25	2021-12-10
US	2021-12-10	0.8	2021-12-25
EUR	2021-12-25	1.11	2021-12-25
US	2021-12-25	1.11	2022-02-02

但是现在将结果链接在一起，我们希望过滤掉相同的条纹，我们可以使用 LAG 和 QUALIFY

SELECT currency
    ,start_date
    ,rate
    ,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
QUALIFY lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;

并且考虑到先前的“行尾”在该上下文中没有用，我们可以将其删除，但现在我们希望下一个进入时间是连胜的 end_time

SELECT currency
    ,start_date
    ,rate
    ,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM (
    SELECT currency
        ,start_date
        ,rate
        ,lag(currency) over (order by start_date) as lag_currency
    FROM fake_data
)
WHERE lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;

我从 QUALIFY 换成了外部 WHERE 子句，因为它在使用嵌套 SELECT 时似乎很常见

CURRENCY	START_DATE	RATE	END_DATE
EUR	2021-12-01	1.25	2021-12-01
US	2021-12-01	0.75	2021-12-01
EUR	2021-12-01	1.25	2021-12-10
US	2021-12-10	0.8	2021-12-25
EUR	2021-12-25	1.11	2021-12-25
US	2021-12-25	1.11	2022-02-02

现在有一些 JOIN 方法允许的事情，比如聚合连续行数和这些行的值，但我们也可以在这里做，我们只需要做几个额外的步骤。

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('EUR','02/12/2021', 1.25),
        ('EUR','03/12/2021', 1.25),
        ('US','04/12/2021', 0.75),
        ('US','10/12/2021', 0.8),
        ('EUR','23/12/2021', 1.11),
        ('EUR','24/12/2021', 1.11),
        ('US','25/12/2021', 1.11)
)
SELECT a.*
    ,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
FROM (
    SELECT currency
        ,start_date
        ,rate
        ,lag(currency) over (order by start_date) as lag_currency
        ,lag_currency IS NULL OR lag_currency != currency as new_row
        ,iff(new_row, start_date, null) as streak_start
    FROM fake_data
) AS a
ORDER BY 2;

这给出了一个分组标记streak_date

CURRENCY	START_DATE	RATE	LAG_CURRENCY	NEW_ROW	STREAK_START	STREAK_DATE
EUR	2021-12-01	1.25		TRUE	2021-12-01	2021-12-01
EUR	2021-12-02	1.25	EUR	FALSE		2021-12-01
EUR	2021-12-03	1.25	EUR	FALSE		2021-12-01
US	2021-12-04	0.75	EUR	TRUE	2021-12-04	2021-12-04
US	2021-12-10	0.8	US	FALSE		2021-12-04
EUR	2021-12-23	1.11	US	TRUE	2021-12-23	2021-12-23
EUR	2021-12-24	1.11	EUR	FALSE		2021-12-23
US	2021-12-25	1.11	EUR	TRUE	2021-12-25	2021-12-25

因此您可以将其包装起来并进行任何您喜欢的聚合数学计算：

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('EUR','02/12/2021', 1.26),
        ('EUR','03/12/2021', 1.27),
        ('US','04/12/2021', 0.75),
        ('US','10/12/2021', 0.8),
        ('EUR','23/12/2021', 1.11),
        ('EUR','24/12/2021', 1.14),
        ('US','25/12/2021', 1.11)
)
SELECT b.currency
    ,min(b.start_date) as start_date
    ,any_value(streak_end) as end_date
    ,count(*) as streak_count
    ,avg(rate) as avg_rate
    ,min(rate) as min_rate
    ,max(rate) as max_rate
FROM (
    SELECT a.*
        ,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
        ,lead(streak_start,1,current_date) ignore nulls over (order by start_date) as streak_end
    FROM (
        SELECT currency
            ,start_date
            ,rate
            ,lag(currency) over (order by start_date) as lag_currency
            ,lag_currency IS NULL OR lag_currency != currency as new_row
            ,iff(new_row, start_date, null) as streak_start
        FROM fake_data
    ) AS a
) as b
GROUP BY b.streak_date,1
ORDER BY 2;

CURRENCY	START_DATE	END_DATE	STREAK_COUNT	AVG_RATE	MIN_RATE
EUR	2021-12-01	2021-12-04	3	1.25	1.25
US	2021-12-04	2021-12-23	2	0.775	0.75
EUR	2021-12-23	2021-12-25	2	1.11	1.11
US	2021-12-25	2022-02-02	1	1.11	1.11

自加入 table 中的下一个日期作为另一个日期字段

Self joining next date in table as another date field

sql

snowflake-cloud-data-platform