Min/Max 大日期范围内的日期值取决于值
Min/Max Date Values over Large Date Range depending on Value
我正在查询包含快照日期、客户 ID 和该客户当天的 'value' 的客户数据快照。我使用滞后函数 return 前几天的值来了解是否有 drop/rise/complete loss/complete 新值(从 £0 到 > £0)。
最后的任务是确定客户价值为 0 英镑的最短和最长日期。
最初我尝试按客户和价值分组 MIN(Date) 和 Max(Date)。但是,如果客户在不同的日期范围内跌至 0 英镑,它将返回最新日期范围的最大值和最早日期范围的最小值,而不是理想值 - 将两个范围都返回到 0 英镑。
我试过使用 DENSE_RANK() 来拆分客户的每个值,但这样做只会将所有 £0 值排在同一等级。
下面是一些示例代码,向您展示了我正在处理的数据以及我是如何尝试拆分它的:
DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
Row_ID INT IDENTITY(1,1)
,SnapshotDate DATE
,SnapshotDateKey INT
,CustomerId INT
,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00
SELECT * FROM #SnapshotTable
-- Code that doesn't work correctly
SELECT
CustomerId
,Value
,MinDate = MIN(SnapshotDateKey)
,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
CustomerId
,Value
-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET
TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
Row_ID
,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
FROM #SnapshotTable
) AS SRC
ON SRC.Row_ID = TGT.Row_ID
SELECT * FROM #SnapshotTable
现在我可以看到 dense_rank() 函数正在按我想要的方式运行,但老实说,我已经研究了一段时间了,但我不知道该怎么做做对了。
有人可以告诉我需要做什么吗?
我期待看到:
SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0
编辑:对于那些偶然发现这个问题的人,在这里的人的帮助下,我找到了 this as a good read for understanding the issue/solving the issue.
这是一个 gaps-and-islands 问题。但是,据称重复的公认答案根本不是解决这个问题的最佳方法。而且投票率更高的答案仍然过于复杂。
一个更简单的方法是:
select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
row_number() over (partition by customerid, value order by snapshotdate) as seqnum
from snapshottable st
) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);
Here 是一个 db<>fiddle.
我正在查询包含快照日期、客户 ID 和该客户当天的 'value' 的客户数据快照。我使用滞后函数 return 前几天的值来了解是否有 drop/rise/complete loss/complete 新值(从 £0 到 > £0)。
最后的任务是确定客户价值为 0 英镑的最短和最长日期。
最初我尝试按客户和价值分组 MIN(Date) 和 Max(Date)。但是,如果客户在不同的日期范围内跌至 0 英镑,它将返回最新日期范围的最大值和最早日期范围的最小值,而不是理想值 - 将两个范围都返回到 0 英镑。
我试过使用 DENSE_RANK() 来拆分客户的每个值,但这样做只会将所有 £0 值排在同一等级。
下面是一些示例代码,向您展示了我正在处理的数据以及我是如何尝试拆分它的:
DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
Row_ID INT IDENTITY(1,1)
,SnapshotDate DATE
,SnapshotDateKey INT
,CustomerId INT
,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00
SELECT * FROM #SnapshotTable
-- Code that doesn't work correctly
SELECT
CustomerId
,Value
,MinDate = MIN(SnapshotDateKey)
,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
CustomerId
,Value
-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET
TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
Row_ID
,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
FROM #SnapshotTable
) AS SRC
ON SRC.Row_ID = TGT.Row_ID
SELECT * FROM #SnapshotTable
现在我可以看到 dense_rank() 函数正在按我想要的方式运行,但老实说,我已经研究了一段时间了,但我不知道该怎么做做对了。
有人可以告诉我需要做什么吗?
我期待看到:
SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0
编辑:对于那些偶然发现这个问题的人,在这里的人的帮助下,我找到了 this as a good read for understanding the issue/solving the issue.
这是一个 gaps-and-islands 问题。但是,据称重复的公认答案根本不是解决这个问题的最佳方法。而且投票率更高的答案仍然过于复杂。
一个更简单的方法是:
select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
row_number() over (partition by customerid, value order by snapshotdate) as seqnum
from snapshottable st
) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);
Here 是一个 db<>fiddle.