如何按范围分区
How to partition by range
我正在尝试将此 table 分成 3 个分区,并创建一个列,该行所在的分区。此 table 通过添加新行和集合来保留有关文档的历史数据 IsDeleted = 1
用于旧行。您可以看到文档的每次修订都会删除旧版本的所有行并使用新行号重新创建它。
我不确定从哪里开始,因为我以前没有使用过分区子句,如果有任何帮助,我们将不胜感激。
当前 Table:
+----+----------------+------------+-----------+-------------------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate |
+----+----------------+------------+-----------+-------------------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.222 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.111 |
+----+----------------+------------+-----------+-------------------------+
预期结果:
+----+----------------+------------+-----------+-------------------------+-----------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate | PartitionNumber |
+----+----------------+------------+-----------+-------------------------+-----------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.222 | 3 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.111 | 3 |
+----+----------------+------------+-----------+-------------------------+-----------------+
更新:
除了 Jason 的回答之外,我还添加了一个 partition by 子句以重置我 table 中每个文档的排名。我希望这对以后的人有所帮助。
SELECT ID,
DocumentNumber,
LineNumber,
IsDeleted,
CreatedDate,
SUM(CASE WHEN LineNumber = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY DocumentNumber ORDER BY CreatedDate)
AS 'PartitionNumber'
FROM CurrentTable
我想我在这方面跟上了你的步伐。下面为您提供了您想要的内容,但如果数据更多,它将进入比 3 个更多的分区,我认为这是预期的。
if object_id('tempdb.dbo.#test') is not null drop table #test
create table #test
(
id int,
linenumber int,
isdeleted bit,
createddate datetime,
documentnumber varchar(50)
)
insert into #test
select 1 , 1 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 2 , 2 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 3 , 3 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 4 , 4 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 5 , 1 , 1 , '2017-01-21 12:11:14.500', 'D001'
union all select 6 , 2 , 1 , '2017-01-21 12:11:14.500', 'D001'
union all select 7 , 1 , 0 , '2017-01-21 15:20:20.222', 'D001'
union all select 8 , 2 , 0 , '2017-01-21 15:21:21.111', 'D001'
select
*,
DENSE_RANK() over (partition by documentNumber order by isdeleted desc, case when isdeleted=0 then getdate() else createddate end) as partitionValues
from #test
我通过这样做得到了你想要的东西:
SELECT ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate,
SUM(CASE WHEN LineNumber = 1 THEN 1 ELSE 0 END)
OVER (ORDER BY ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate)
AS 'PartitionNumber'
FROM CurrentTable
GROUP BY ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate
我使用 SUM
和 CASE
将值 1 分配给所有行号为 1 的值,将 0 分配给其他行。然后我使用 window 函数来计算 运行 总数。
结果:
+----+----------------+------------+-----------+-------------------------+----------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate | PartitionNumber|
+----+--- ------------+------------+-----------+-------------------------+----------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.223 | 3 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.110 | 3 |
+----+--- ------------+----------------------------------- --------------+----------------+
每个分区的创建日期是否相同...因为在分区 3 中它是不同的。如果相同,则可以使用 DENSE_Rank()
SELECT *,
DENSE_RANK() OVER(PARTITION BY documentNumber,CreatedDate ORDER BY documentNumber,CreatedDate ) as PartitionNumber
FROM Table
我正在尝试将此 table 分成 3 个分区,并创建一个列,该行所在的分区。此 table 通过添加新行和集合来保留有关文档的历史数据 IsDeleted = 1
用于旧行。您可以看到文档的每次修订都会删除旧版本的所有行并使用新行号重新创建它。
我不确定从哪里开始,因为我以前没有使用过分区子句,如果有任何帮助,我们将不胜感激。
当前 Table:
+----+----------------+------------+-----------+-------------------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate |
+----+----------------+------------+-----------+-------------------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.222 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.111 |
+----+----------------+------------+-----------+-------------------------+
预期结果:
+----+----------------+------------+-----------+-------------------------+-----------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate | PartitionNumber |
+----+----------------+------------+-----------+-------------------------+-----------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.222 | 3 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.111 | 3 |
+----+----------------+------------+-----------+-------------------------+-----------------+
更新:
除了 Jason 的回答之外,我还添加了一个 partition by 子句以重置我 table 中每个文档的排名。我希望这对以后的人有所帮助。
SELECT ID,
DocumentNumber,
LineNumber,
IsDeleted,
CreatedDate,
SUM(CASE WHEN LineNumber = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY DocumentNumber ORDER BY CreatedDate)
AS 'PartitionNumber'
FROM CurrentTable
我想我在这方面跟上了你的步伐。下面为您提供了您想要的内容,但如果数据更多,它将进入比 3 个更多的分区,我认为这是预期的。
if object_id('tempdb.dbo.#test') is not null drop table #test
create table #test
(
id int,
linenumber int,
isdeleted bit,
createddate datetime,
documentnumber varchar(50)
)
insert into #test
select 1 , 1 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 2 , 2 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 3 , 3 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 4 , 4 , 1 , '2017-01-20 14:10:13.533', 'D001'
union all select 5 , 1 , 1 , '2017-01-21 12:11:14.500', 'D001'
union all select 6 , 2 , 1 , '2017-01-21 12:11:14.500', 'D001'
union all select 7 , 1 , 0 , '2017-01-21 15:20:20.222', 'D001'
union all select 8 , 2 , 0 , '2017-01-21 15:21:21.111', 'D001'
select
*,
DENSE_RANK() over (partition by documentNumber order by isdeleted desc, case when isdeleted=0 then getdate() else createddate end) as partitionValues
from #test
我通过这样做得到了你想要的东西:
SELECT ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate,
SUM(CASE WHEN LineNumber = 1 THEN 1 ELSE 0 END)
OVER (ORDER BY ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate)
AS 'PartitionNumber'
FROM CurrentTable
GROUP BY ID,DocumentNumber,LineNumber,IsDeleted,CreatedDate
我使用 SUM
和 CASE
将值 1 分配给所有行号为 1 的值,将 0 分配给其他行。然后我使用 window 函数来计算 运行 总数。
结果:
+----+----------------+------------+-----------+-------------------------+----------------+
| ID | DocumentNumber | LineNumber | IsDeleted | CreatedDate | PartitionNumber|
+----+--- ------------+------------+-----------+-------------------------+----------------+
| 1 | D001 | 1 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 2 | D001 | 2 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 3 | D001 | 3 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 4 | D001 | 4 | 1 | 2017-01-20 14:10:13.533 | 1 |
| 5 | D001 | 1 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 6 | D001 | 2 | 1 | 2017-01-21 12:11:14.500 | 2 |
| 7 | D001 | 1 | 0 | 2017-01-21 15:20:20.223 | 3 |
| 8 | D001 | 2 | 0 | 2017-01-21 15:21:21.110 | 3 |
+----+--- ------------+----------------------------------- --------------+----------------+
每个分区的创建日期是否相同...因为在分区 3 中它是不同的。如果相同,则可以使用 DENSE_Rank()
SELECT *,
DENSE_RANK() OVER(PARTITION BY documentNumber,CreatedDate ORDER BY documentNumber,CreatedDate ) as PartitionNumber
FROM Table