SQL 中的复杂 RANK
Complex RANK in SQL
我有一个复杂的查询,它真的超出了我的头脑。
我认为需要 RANK 的 RANK,但必须有更好的现有方法。
这里我有一个简单的table:
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
我需要做的是按 MANUFACTURER 和 DATEOF 列对 table 进行 GROUP BY,
然后选择具有最低 PREFER 编号的行(在本例中为 26)。
使用 RANK 函数很容易:
SELECT sq.*
FROM
(
SELECT
*,
RANK() OVER (PARTITION BY Manufacturer,DateOF ORDER BY Prefer) AS RankPrefer
FROM
table1
WHERE
RankPrefer = 1
) sq
所以我将得到状态为 ComputerNew 的 2 行的结果。
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
这很简单,不是问题。
问题是:
我必须执行以下规则:
如果具有最低 Prefer 值的行(例如:26)
结果在他们的 Status 字段中有 ComputerNew 值,
然后我必须包含更多具有 ComputerInstalled 值的行。
结果应该是这样的:
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
类似这个规则,我还有一个:
如果具有最低 Prefer 值的行(例如:26)
结果在他们的 Status 字段中有 ComputerOld 值,
然后我必须包含更多具有 ComputerUninstalled 值的行。
我认为 RANK of RANKING 可以解决这个问题,但现在我真的迷路了。
如有任何帮助,我们将不胜感激。
谢谢
编辑 1:
Gordon 的解决方案几乎不错,但并不完美。
我给你更多的测试数据,你可以看到哪里失败了。
要测试的 SQLFiddle 是 here.
我把测试数据也放在这里:
INSERT Table1 VALUES ('HP10011','04/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP10011','04/04/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP10011','04/04/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP10011','04/30/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP10011','05/23/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP10011','06/03/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP10077','04/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','04/25/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/02/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/02/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/06/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/06/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerUninstalled',70)
查询 returns ComputerInstalled 和 ComputerUninstalled 两行
对于以下数据:
'HP1910','05/01/2014','ComputerInstalled',30
'HP1910','05/01/2014','ComputerUninstalled',70
它应该只选择 ComputerInstalled,因为对于那个 Manufacturer,在同一个月,它应该选择最低的 Prefer (30)。
这个数据集的结果应该是这样的:
Manufacturer DateOF Status Prefer
HP10011 2014-04-01 ComputerUninstalled 70
HP10011 2014-04-04 ComputerOld 26
HP10011 2014-04-04 ComputerOld 26
HP10011 2014-04-30 ComputerUninstalled 70
HP10011 2014-05-23 QuickDispose 10
HP10011 2014-06-03 QuickDispose 10
HP10077 2014-04-01 ComputerUninstalled 70
HP1910 2014-04-25 QuickDispose 10
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP3720 2014-05-07 ComputerInstalled 30
HP3720 2014-05-07 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-06-06 ComputerUninstalled 70
HP3720 2014-06-06 ComputerUninstalled 70
HP3720 2014-06-10 ComputerOld 26
HP3720 2014-06-10 ComputerUninstalled 70
HP3720 2014-06-10 ComputerUninstalled 70
HP3720 2014-06-11 ComputerOld 26
HP3720 2014-06-11 ComputerUninstalled 70
HP3720 2014-06-11 ComputerUninstalled 70
我认为这应该能满足您的需求:
WITH sq AS
( SELECT *, RANK() OVER (PARTITION BY Manufacturer,DateOF ORDER BY Prefer) AS RankPrefer
FROM table1
)
SELECT *
FROM sq
WHERE RankPrefer <= (SELECT TOP 1 RankPrefer FROM sq WHERE Status != 'ComputerNew' ORDER BY RankPrefer)
这是一个想法。找出行的偏好排名。然后使用 exists
.
确定 rank = 1 的行是否符合您的条件
最终查询如下所示:
with r as (
select t.*,
rank() over (partition by manufacturer, dateof order by Prefer) as seqnum
from table1 t
),
r1 as (
select r.*
from r
where seqnum = 1
)
select r.*
from r
where r.seqnum = 1 or
(exists (select 1 from r1 where status = 'ComputerNew' and r1.dateof = r.dateof) and r.status = 'ComputerInstalled' or
exists (select 1 from r1 where status = 'ComputerOld' and r1.dateof = r.dateof) and r.status = 'ComputerUninstalled'
);
好的,既然您已经对问题进行了一些编辑,我有一个不同的答案,我相信可以解决问题。这是查询:
;with r as (
select t.*,
CAST(MONTH(dateof) AS VARCHAR(2)) + '-' + CAST(YEAR(dateof) AS VARCHAR(4)) AS EffDate,
rank() over (partition by manufacturer, CAST(MONTH(dateof) AS VARCHAR(2)) + '-' + CAST(YEAR(dateof) AS VARCHAR(4)) order by Prefer) as seqnum
from Table1 t
),
r1 as (
select r.*
from r
where seqnum = 1
)
select r.*
from r
where r.seqnum = 1
or
(
r.Status = 'ComputerUninstalled' and
exists ( Select 1
from r1
where r1.Manufacturer = r.Manufacturer
and r1.EffDate = r.EffDate
and r1.Status = 'ComputerOld' )
and r.seqNum = ( Select Min(SeqNum) From r as r2
Where r2.Manufacturer = r.Manufacturer
And r2.EffDate = r.EffDate
And r2.SeqNum > 1 )
)
or
(
r.Status = 'ComputerInstalled' and
exists ( Select 1
from r1
where r1.Manufacturer = r.Manufacturer
and r1.EffDate = r.EffDate
and r1.Status = 'ComputerNew' )
and r.seqNum = ( Select Min(SeqNum) From r as r2
Where r2.Manufacturer = r.Manufacturer
And r2.EffDate = r.EffDate
And r2.SeqNum > 1 )
);
注意:我得到的记录比您预期的结果集显示的多 2 条。但是根据您的描述,我认为您的预期结果有误。 2014 年 5 月有 6 个 "ComputerInstalled" 的 HP1910,Prefer 为 30。其中 4 个日期为 5 月 1 日,2 个日期为 5 月 2 日。你遗漏了 5 月 2 日的记录。除此之外,这个结果集符合您的预期结果,我相信应该适用于更大的数据集。
我有一个复杂的查询,它真的超出了我的头脑。 我认为需要 RANK 的 RANK,但必须有更好的现有方法。
这里我有一个简单的table:
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerUninstalled 70
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
我需要做的是按 MANUFACTURER 和 DATEOF 列对 table 进行 GROUP BY, 然后选择具有最低 PREFER 编号的行(在本例中为 26)。
使用 RANK 函数很容易:
SELECT sq.*
FROM
(
SELECT
*,
RANK() OVER (PARTITION BY Manufacturer,DateOF ORDER BY Prefer) AS RankPrefer
FROM
table1
WHERE
RankPrefer = 1
) sq
所以我将得到状态为 ComputerNew 的 2 行的结果。
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
这很简单,不是问题。
问题是:
我必须执行以下规则:
如果具有最低 Prefer 值的行(例如:26) 结果在他们的 Status 字段中有 ComputerNew 值, 然后我必须包含更多具有 ComputerInstalled 值的行。
结果应该是这样的:
Manufacturer DateOF Status Prefer
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerNew 26
Dell 05-2014 ComputerInstalled 30
Dell 05-2014 ComputerInstalled 30
类似这个规则,我还有一个:
如果具有最低 Prefer 值的行(例如:26) 结果在他们的 Status 字段中有 ComputerOld 值, 然后我必须包含更多具有 ComputerUninstalled 值的行。
我认为 RANK of RANKING 可以解决这个问题,但现在我真的迷路了。
如有任何帮助,我们将不胜感激。
谢谢
编辑 1:
Gordon 的解决方案几乎不错,但并不完美。
我给你更多的测试数据,你可以看到哪里失败了。 要测试的 SQLFiddle 是 here.
我把测试数据也放在这里:
INSERT Table1 VALUES ('HP10011','04/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP10011','04/04/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP10011','04/04/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP10011','04/30/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP10011','05/23/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP10011','06/03/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP10077','04/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','04/25/2014','QuickDispose',10)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/01/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP1910','05/02/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP1910','05/02/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/07/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerInstalled',30)
INSERT Table1 VALUES ('HP3720','05/08/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/06/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/06/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/10/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerOld',26)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerUninstalled',70)
INSERT Table1 VALUES ('HP3720','06/11/2014','ComputerUninstalled',70)
查询 returns ComputerInstalled 和 ComputerUninstalled 两行 对于以下数据:
'HP1910','05/01/2014','ComputerInstalled',30
'HP1910','05/01/2014','ComputerUninstalled',70
它应该只选择 ComputerInstalled,因为对于那个 Manufacturer,在同一个月,它应该选择最低的 Prefer (30)。
这个数据集的结果应该是这样的:
Manufacturer DateOF Status Prefer
HP10011 2014-04-01 ComputerUninstalled 70
HP10011 2014-04-04 ComputerOld 26
HP10011 2014-04-04 ComputerOld 26
HP10011 2014-04-30 ComputerUninstalled 70
HP10011 2014-05-23 QuickDispose 10
HP10011 2014-06-03 QuickDispose 10
HP10077 2014-04-01 ComputerUninstalled 70
HP1910 2014-04-25 QuickDispose 10
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP1910 2014-05-01 ComputerInstalled 30
HP3720 2014-05-07 ComputerInstalled 30
HP3720 2014-05-07 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-05-08 ComputerInstalled 30
HP3720 2014-06-06 ComputerUninstalled 70
HP3720 2014-06-06 ComputerUninstalled 70
HP3720 2014-06-10 ComputerOld 26
HP3720 2014-06-10 ComputerUninstalled 70
HP3720 2014-06-10 ComputerUninstalled 70
HP3720 2014-06-11 ComputerOld 26
HP3720 2014-06-11 ComputerUninstalled 70
HP3720 2014-06-11 ComputerUninstalled 70
我认为这应该能满足您的需求:
WITH sq AS
( SELECT *, RANK() OVER (PARTITION BY Manufacturer,DateOF ORDER BY Prefer) AS RankPrefer
FROM table1
)
SELECT *
FROM sq
WHERE RankPrefer <= (SELECT TOP 1 RankPrefer FROM sq WHERE Status != 'ComputerNew' ORDER BY RankPrefer)
这是一个想法。找出行的偏好排名。然后使用 exists
.
最终查询如下所示:
with r as (
select t.*,
rank() over (partition by manufacturer, dateof order by Prefer) as seqnum
from table1 t
),
r1 as (
select r.*
from r
where seqnum = 1
)
select r.*
from r
where r.seqnum = 1 or
(exists (select 1 from r1 where status = 'ComputerNew' and r1.dateof = r.dateof) and r.status = 'ComputerInstalled' or
exists (select 1 from r1 where status = 'ComputerOld' and r1.dateof = r.dateof) and r.status = 'ComputerUninstalled'
);
好的,既然您已经对问题进行了一些编辑,我有一个不同的答案,我相信可以解决问题。这是查询:
;with r as (
select t.*,
CAST(MONTH(dateof) AS VARCHAR(2)) + '-' + CAST(YEAR(dateof) AS VARCHAR(4)) AS EffDate,
rank() over (partition by manufacturer, CAST(MONTH(dateof) AS VARCHAR(2)) + '-' + CAST(YEAR(dateof) AS VARCHAR(4)) order by Prefer) as seqnum
from Table1 t
),
r1 as (
select r.*
from r
where seqnum = 1
)
select r.*
from r
where r.seqnum = 1
or
(
r.Status = 'ComputerUninstalled' and
exists ( Select 1
from r1
where r1.Manufacturer = r.Manufacturer
and r1.EffDate = r.EffDate
and r1.Status = 'ComputerOld' )
and r.seqNum = ( Select Min(SeqNum) From r as r2
Where r2.Manufacturer = r.Manufacturer
And r2.EffDate = r.EffDate
And r2.SeqNum > 1 )
)
or
(
r.Status = 'ComputerInstalled' and
exists ( Select 1
from r1
where r1.Manufacturer = r.Manufacturer
and r1.EffDate = r.EffDate
and r1.Status = 'ComputerNew' )
and r.seqNum = ( Select Min(SeqNum) From r as r2
Where r2.Manufacturer = r.Manufacturer
And r2.EffDate = r.EffDate
And r2.SeqNum > 1 )
);
注意:我得到的记录比您预期的结果集显示的多 2 条。但是根据您的描述,我认为您的预期结果有误。 2014 年 5 月有 6 个 "ComputerInstalled" 的 HP1910,Prefer 为 30。其中 4 个日期为 5 月 1 日,2 个日期为 5 月 2 日。你遗漏了 5 月 2 日的记录。除此之外,这个结果集符合您的预期结果,我相信应该适用于更大的数据集。