SQL:如何"aggregate sequences"?
SQL: How to "aggregate sequences"?
我的问题有点难以解释。
我有一个 table 看起来像这样
column1 | column2 | date
------------------------------------------
u01 test 2001-01-01
u01 test 2001-02-01
u01 test2 2001-03-01
u01 test2 2001-04-01
u01 test3 2001-05-01
u01 test 2001-06-01
在我的目的地 table 中,我想聚合相同的值,但前提是它们彼此 "follow"。这意味着我的目的地 table 看起来像这样:
column1 | column2 | validfrom validto
------------------------------------------
u01 test 2001-01-01 2001-03-01
u01 test2 2001-03-01 2001-05-01
u01 test3 2001-05-01 2001-06-01
u01 test 2001-06-01
我尝试使用 rownumber,所以目前我得到了一些编号的行,但问题仍然存在,我不知道如何 "aggregate sequences"。
欢迎任何想法或方法!
这是一个缺口和孤岛问题。这是使用行号的解决方案:
select column1, column2, min(date), max(date)
from (select t.*,
row_number() over (partition by column1 order by date) as seqnum_1,
row_number() over (partition by column1, column2 order by date) as seqnum_2
from t
) t
group by column1, column2, (seqnum_1 - seqnum_2);
为什么这行得通有点难以解释。如果您查看子查询的结果,我发现它非常明显。您将看到行号的差异如何定义您要查找的组。
这应该适用于您的情况
set @rnum = 0
set @col1 = ''
set @col2 = ''
SELECT YY.col1 AS col1, YY.col2 AS col2, rr.aamin AS valid_from, rr.bbmin AS valid_to
FROM (
(
SELECT col1, col2, num
FROM (
SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1
FROM test t
)
GROUP BY col1, col2, num
) YY INNER JOIN (
SELECT *
FROM (
SELECT num AS aanum, min(AA.DATE_1) AS aamin
FROM (
SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1
FROM test t
) AA
) GG
LEFT JOIN (
SELECT num AS bbnum, min(DATE_1) AS bbmin(SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1 FROM test t)
GROUP BY num
) BB
ON (GG.aanum + 1 = BB.bbnum)
)
) RR
ON RR.aanum = yy.num
Teradata 中有一个很好的扩展来标准化周期:
SELECT
column1
,column2
-- split the Period into seperate columns again
,Begin(pd)
,NullIf(End(pd), DATE '9999-12-31')
FROM
(
SELECT NORMALIZE -- normalize overlapping periods
column1
,column2
-- NORMALIZE only works with periods, so create a Period based on current & next row
,PERIOD(date
,Coalesce(Lead(date)
Over (PARTITION BY column1
ORDER BY date)
,DATE '9999-12-31')
) AS pd
FROM tab
) AS dt
如果您的 Teradata 版本不支持 LEAD
,您可以改用它:
Min(date)
Over (PARTITION BY column1
ORDER BY date
ROWS BETWEEN 1 Following and 1 Following)
我的问题有点难以解释。 我有一个 table 看起来像这样
column1 | column2 | date
------------------------------------------
u01 test 2001-01-01
u01 test 2001-02-01
u01 test2 2001-03-01
u01 test2 2001-04-01
u01 test3 2001-05-01
u01 test 2001-06-01
在我的目的地 table 中,我想聚合相同的值,但前提是它们彼此 "follow"。这意味着我的目的地 table 看起来像这样:
column1 | column2 | validfrom validto
------------------------------------------
u01 test 2001-01-01 2001-03-01
u01 test2 2001-03-01 2001-05-01
u01 test3 2001-05-01 2001-06-01
u01 test 2001-06-01
我尝试使用 rownumber,所以目前我得到了一些编号的行,但问题仍然存在,我不知道如何 "aggregate sequences"。
欢迎任何想法或方法!
这是一个缺口和孤岛问题。这是使用行号的解决方案:
select column1, column2, min(date), max(date)
from (select t.*,
row_number() over (partition by column1 order by date) as seqnum_1,
row_number() over (partition by column1, column2 order by date) as seqnum_2
from t
) t
group by column1, column2, (seqnum_1 - seqnum_2);
为什么这行得通有点难以解释。如果您查看子查询的结果,我发现它非常明显。您将看到行号的差异如何定义您要查找的组。
这应该适用于您的情况
set @rnum = 0
set @col1 = ''
set @col2 = ''
SELECT YY.col1 AS col1, YY.col2 AS col2, rr.aamin AS valid_from, rr.bbmin AS valid_to
FROM (
(
SELECT col1, col2, num
FROM (
SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1
FROM test t
)
GROUP BY col1, col2, num
) YY INNER JOIN (
SELECT *
FROM (
SELECT num AS aanum, min(AA.DATE_1) AS aamin
FROM (
SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1
FROM test t
) AA
) GG
LEFT JOIN (
SELECT num AS bbnum, min(DATE_1) AS bbmin(SELECT CASE
WHEN @col1 = @col1
THEN @rnum
ELSE @rnum + 1
END AS num, @col1 = column1 AS column1, @col2 = column2 AS column1, DATE_1 FROM test t)
GROUP BY num
) BB
ON (GG.aanum + 1 = BB.bbnum)
)
) RR
ON RR.aanum = yy.num
Teradata 中有一个很好的扩展来标准化周期:
SELECT
column1
,column2
-- split the Period into seperate columns again
,Begin(pd)
,NullIf(End(pd), DATE '9999-12-31')
FROM
(
SELECT NORMALIZE -- normalize overlapping periods
column1
,column2
-- NORMALIZE only works with periods, so create a Period based on current & next row
,PERIOD(date
,Coalesce(Lead(date)
Over (PARTITION BY column1
ORDER BY date)
,DATE '9999-12-31')
) AS pd
FROM tab
) AS dt
如果您的 Teradata 版本不支持 LEAD
,您可以改用它:
Min(date)
Over (PARTITION BY column1
ORDER BY date
ROWS BETWEEN 1 Following and 1 Following)