自加入 table 中的下一个日期作为另一个日期字段

Self joining next date in table as another date field

尝试使用汇率 table 和交易 table,以及如何将交易加入交易所 table 取决于该交易的最新汇率何时货币是相对于交易而言的。 table 包含许多重复项和许多类型的货币以及与此问题无关的其他字段。

我计划通过使用 BETWEEN 作为每种货币汇率的开始日期和结束日期以及交易日期来加入交易,但是我没有结束日期字段。一种货币的汇率在下一种货币开始时结束。

例如:

Currency Start Date Rate
EUR 01/12/2021 1.25
US 01/12/2021 0.75
EUR 25/12/2021 1.11
US 10/12/2021 0.8
EUR 01/12/2021 1.25
US 25/12/2021 1.11

应该变成:

Currency Start Date Rate End Date
EUR 01/12/2021 1.25 24/12/2021
US 01/12/2021 0.75 09/12/2021
EUR 25/12/2021 1.11 today
US 10/12/2021 0.8 today

我在想:

with ordered_currency as (
select distinct Currency, Start_date
from exchange_rate_table
order by currency, start_date asc
)

这将按日期顺序生成所有货币的 table,删除任何重复项。

下一步是检查下一行是否是同一种货币,如果是则将其开始日期作为当前行的结束日期。如果它不是同一种货币,则只需将 current_date() 作为 end_date,然后可以将此 End_date 字段连接回原始 table.

但是,我不确定用于评估下一行是否具有相同字段(在本例中为货币)的语法

如有任何帮助,我们将不胜感激!

however I don't have an end date field.

如果你这样做会容易得多。您可以使用 LEAD:

“从下一行获取一个”
WITH exchg_from_to AS(
    SELECT
      Currency,
      StartDate,
      LEAD(StartDate, 1, TO_DATE('9999-12-31')) OVER(PARTITION BY Currency ORDER BY StartDate), CAST('9999-12-31' as date) EndDate,
      Rate
    FROM
      exchange_rate_table
)


SELECT * 
FROM 
  tran t 
  JOIN exchg_from_to e 
  ON 
    t.TranDate >= e.StartDate AND 
    t.TranDate < e.EndDate AND
    t.Currency = e.Currency

您可能需要稍微调整一下 TO_DATE..(我从未使用过 SnowFlake,但文档说支持 LEAD))

我不推荐使用 BETWEEN,因为它包含两端;如果你有一个日期是 bang 的 tran,它将匹配两次,一次是第 N 行的开始日期,一次是 N-1 结束日期。使用 >=< 对确保没有重叠

所以另一种不使用 JOINS 的方法是:

输入数据的 CTE:

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('US','01/12/2021', 0.75),
        ('EUR','25/12/2021', 1.11),
        ('US','10/12/2021', 0.8),
        ('EUR','01/12/2021', 1.25),
        ('US','25/12/2021', 1.11)
)

然后获取每一行的 end_time(与 Ca​​ius 的回答相同):

SELECT currency
    ,start_date
    ,rate
    ,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM fake_data
ORDER BY 2;

给出:

CURRENCY START_DATE RATE END_DATE
EUR 2021-12-01 1.25 2021-12-01
US 2021-12-01 0.75 2021-12-01
EUR 2021-12-01 1.25 2021-12-10
US 2021-12-10 0.8 2021-12-25
EUR 2021-12-25 1.11 2021-12-25
US 2021-12-25 1.11 2022-02-02

但是现在将结果链接在一起,我们希望过滤掉相同的条纹,我们可以使用 LAG 和 QUALIFY

SELECT currency
    ,start_date
    ,rate
    ,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
QUALIFY lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;

并且考虑到先前的“行尾”在该上下文中没有用,我们可以将其删除,但现在我们希望下一个进入时间是连胜的 end_time

SELECT currency
    ,start_date
    ,rate
    ,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM (
    SELECT currency
        ,start_date
        ,rate
        ,lag(currency) over (order by start_date) as lag_currency
    FROM fake_data
)
WHERE lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;

我从 QUALIFY 换成了外部 WHERE 子句,因为它在使用嵌套 SELECT 时似乎很常见

CURRENCY START_DATE RATE END_DATE
EUR 2021-12-01 1.25 2021-12-01
US 2021-12-01 0.75 2021-12-01
EUR 2021-12-01 1.25 2021-12-10
US 2021-12-10 0.8 2021-12-25
EUR 2021-12-25 1.11 2021-12-25
US 2021-12-25 1.11 2022-02-02

现在有一些 JOIN 方法允许的事情,比如聚合连续行数和这些行的值,但我们也可以在这里做,我们只需要做几个额外的步骤。

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('EUR','02/12/2021', 1.25),
        ('EUR','03/12/2021', 1.25),
        ('US','04/12/2021', 0.75),
        ('US','10/12/2021', 0.8),
        ('EUR','23/12/2021', 1.11),
        ('EUR','24/12/2021', 1.11),
        ('US','25/12/2021', 1.11)
)
SELECT a.*
    ,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
FROM (
    SELECT currency
        ,start_date
        ,rate
        ,lag(currency) over (order by start_date) as lag_currency
        ,lag_currency IS NULL OR lag_currency != currency as new_row
        ,iff(new_row, start_date, null) as streak_start
    FROM fake_data
) AS a
ORDER BY 2;

这给出了一个分组标记streak_date

CURRENCY START_DATE RATE LAG_CURRENCY NEW_ROW STREAK_START STREAK_DATE
EUR 2021-12-01 1.25 TRUE 2021-12-01 2021-12-01
EUR 2021-12-02 1.25 EUR FALSE 2021-12-01
EUR 2021-12-03 1.25 EUR FALSE 2021-12-01
US 2021-12-04 0.75 EUR TRUE 2021-12-04 2021-12-04
US 2021-12-10 0.8 US FALSE 2021-12-04
EUR 2021-12-23 1.11 US TRUE 2021-12-23 2021-12-23
EUR 2021-12-24 1.11 EUR FALSE 2021-12-23
US 2021-12-25 1.11 EUR TRUE 2021-12-25 2021-12-25

因此您可以将其包装起来并进行任何您喜欢的聚合数学计算:

WITH fake_data(currency, start_date, rate) AS (
    SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3 
    FROM VALUES 
        ('EUR','01/12/2021', 1.25),
        ('EUR','02/12/2021', 1.26),
        ('EUR','03/12/2021', 1.27),
        ('US','04/12/2021', 0.75),
        ('US','10/12/2021', 0.8),
        ('EUR','23/12/2021', 1.11),
        ('EUR','24/12/2021', 1.14),
        ('US','25/12/2021', 1.11)
)
SELECT b.currency
    ,min(b.start_date) as start_date
    ,any_value(streak_end) as end_date
    ,count(*) as streak_count
    ,avg(rate) as avg_rate
    ,min(rate) as min_rate
    ,max(rate) as max_rate
FROM (
    SELECT a.*
        ,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
        ,lead(streak_start,1,current_date) ignore nulls over (order by start_date) as streak_end
    FROM (
        SELECT currency
            ,start_date
            ,rate
            ,lag(currency) over (order by start_date) as lag_currency
            ,lag_currency IS NULL OR lag_currency != currency as new_row
            ,iff(new_row, start_date, null) as streak_start
        FROM fake_data
    ) AS a
) as b
GROUP BY b.streak_date,1
ORDER BY 2;
CURRENCY START_DATE END_DATE STREAK_COUNT AVG_RATE MIN_RATE
EUR 2021-12-01 2021-12-04 3 1.25 1.25
US 2021-12-04 2021-12-23 2 0.775 0.75
EUR 2021-12-23 2021-12-25 2 1.11 1.11
US 2021-12-25 2022-02-02 1 1.11 1.11