均匀分布相关记录
Evenly distributing related records
我有一个 table 拥有超过 100k 个邮箱和具有权限的用户。
+---------+---------+
| Mailbox | Trustee |
+---------+---------+
| smb1 | mbx1 |
| smb2 | mbx1 |
| smb2 | mbx2 |
| smb2 | mbx3 |
| smb3 | mbx4 |
| smb3 | mbx5 |
| mbx1 | mbx6 |
| mbx7 | mbx4 |
| smb4 | mbx8 |
| smb4 | mbx9 |
| mbx8 | mbx10 |
+---------+---------+
需要在邮箱列中对受托人和他们有权访问的邮箱进行分组。例如mbx1、mbx2 和 mbx 3 通过访问 smb2 而相关,因此它们进入存储桶 1。mbx 进入存储桶 1 意味着 smb1 也进入存储桶 1,因为 mbx 1 是该存储桶的受托人。然后再往下,因为 mbx6 与 mbx1 有关系,它也进入桶 1。希望其他人有意义。所以请注意受托人可以访问 smb(共享邮箱)或 mbx(邮箱)
我选择的 table 只有邮箱和受托人,我想写入下面的临时文件 table。
+---------+---------+--------+
| Mailbox | Trustee | Bucket |
+---------+---------+--------+
| smb1 | mbx1 | 1 |
| smb2 | mbx1 | 1 |
| smb2 | mbx2 | 1 |
| smb2 | mbx3 | 1 |
| smb3 | mbx4 | 2 |
| smb3 | mbx5 | 2 |
| mbx1 | mbx6 | 1 |
| mbx7 | mbx4 | 2 |
| smb4 | mbx8 | 3 |
| smb4 | mbx9 | 3 |
| mbx8 | mbx10 | 3 |
+---------+---------+--------+
然后我想将桶计数放在一起以形成均匀的组。想法是我可以说例如最大计数 100,因此创建最多可容纳 100 个用户的存储桶组。
+---------+---------+-------+
| Groups | Buckets | Count |
+---------+---------+-------+
| 1 | 1 | 5 |
| 2 | 2,3 | 6 |
+---------+---------+-------+
编辑:
我已经走到这一步了,我可以传入一个邮箱并获取所有受托人,然后是这些受托人有权访问的其他邮箱。
DECLARE @int int = 1;
WITH Buckets_CTE
(Trustee)
AS (
SELECT DISTINCT Trustee
FROM EXOPerms
WHERE Mailbox = 'smb1'
)
SELECT DISTINCT Mailbox,Trustee
FROM EXOPerms
Where Trustee IN (
SELECT DISTINCT Trustee
FROM Buckets_CTE)
ORDER BY Trustee
目前顶部的 DECLARE Int 是多余的,只是为了看看我是否可以实现存储桶功能。
这是一个 while 循环解决方案。它只是遍历每一行并更新 Bucket。
添加 ID
以逐行循环数据
要检查 mailbox/trustee 是否存在于另一行中,请检查 i.Mailbox in (m.Mailbox, m.Trustee)
:
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID -- don't compare the same row
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
注意,更新Bucket时,会与当前Bucket进行比较,只取较小的值。这是为了解决像下面这样的情况,即前面的行之间的关系直到后面的行才知道。
ID MailBox Trustee
1 a b
2 c d
3 e f
4 c f
ID 1, 2, 3 在顺序处理时分配单独的Bucket。只有当进程 ID 为 4 时,它才会将 ID 2 和 3 链接在一起
完成查询
declare @mailbox table
(
ID int identity,
Mailbox varchar(5),
Trustee varchar(5),
Bucket int
)
insert into @mailbox (Mailbox, Trustee) values
( 'smb1', 'mbx1' ),
( 'smb2', 'mbx1' ),
( 'smb2', 'mbx2' ),
( 'smb2', 'mbx3' ),
( 'smb3', 'mbx4' ),
( 'smb3', 'mbx5' ),
( 'mbx1', 'mbx6' ),
( 'mbx7', 'mbx4' ),
( 'smb4', 'mbx8' ),
( 'smb4', 'mbx9' ),
( 'mbx8', 'mbx10');
declare @ID int,
@Bucket int = 1 -- start from 1
-- get the minimum ID for start
select @ID = min(ID) from @mailbox where Bucket is null
while exists
(
select *
from @mailbox
where ID >= @ID
)
begin
-- if the mailbox is found in other row with Bucket value
-- (Bucket is not null)
if exists
(
select *
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
and m.Bucket is not null
)
begin
-- Update Bucket from other row
update i
set Bucket = case when i.Bucket is null
or i.Bucket > m.Bucket
then m.Bucket
else i.Bucket
end
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
and m.Bucket is not null
-- Update other rows that might linked to current ID
update m
set Bucket = case when i.Bucket > m.Bucket
then m.Bucket
else i.Bucket
end
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
end
else
begin
-- no other row found with same mailbox.
-- Assign Bucket from @Bucket, increment @Bucket
update m
set Bucket = @Bucket
from @mailbox m
where m.ID = @ID;
select @Bucket = @Bucket + 1;
end
-- Get next ID
select @ID = min(ID) from @mailbox where ID > @ID;
end
select *
from @mailbox
order by ID
我有一个 table 拥有超过 100k 个邮箱和具有权限的用户。
+---------+---------+
| Mailbox | Trustee |
+---------+---------+
| smb1 | mbx1 |
| smb2 | mbx1 |
| smb2 | mbx2 |
| smb2 | mbx3 |
| smb3 | mbx4 |
| smb3 | mbx5 |
| mbx1 | mbx6 |
| mbx7 | mbx4 |
| smb4 | mbx8 |
| smb4 | mbx9 |
| mbx8 | mbx10 |
+---------+---------+
需要在邮箱列中对受托人和他们有权访问的邮箱进行分组。例如mbx1、mbx2 和 mbx 3 通过访问 smb2 而相关,因此它们进入存储桶 1。mbx 进入存储桶 1 意味着 smb1 也进入存储桶 1,因为 mbx 1 是该存储桶的受托人。然后再往下,因为 mbx6 与 mbx1 有关系,它也进入桶 1。希望其他人有意义。所以请注意受托人可以访问 smb(共享邮箱)或 mbx(邮箱)
我选择的 table 只有邮箱和受托人,我想写入下面的临时文件 table。
+---------+---------+--------+
| Mailbox | Trustee | Bucket |
+---------+---------+--------+
| smb1 | mbx1 | 1 |
| smb2 | mbx1 | 1 |
| smb2 | mbx2 | 1 |
| smb2 | mbx3 | 1 |
| smb3 | mbx4 | 2 |
| smb3 | mbx5 | 2 |
| mbx1 | mbx6 | 1 |
| mbx7 | mbx4 | 2 |
| smb4 | mbx8 | 3 |
| smb4 | mbx9 | 3 |
| mbx8 | mbx10 | 3 |
+---------+---------+--------+
然后我想将桶计数放在一起以形成均匀的组。想法是我可以说例如最大计数 100,因此创建最多可容纳 100 个用户的存储桶组。
+---------+---------+-------+
| Groups | Buckets | Count |
+---------+---------+-------+
| 1 | 1 | 5 |
| 2 | 2,3 | 6 |
+---------+---------+-------+
编辑: 我已经走到这一步了,我可以传入一个邮箱并获取所有受托人,然后是这些受托人有权访问的其他邮箱。
DECLARE @int int = 1;
WITH Buckets_CTE
(Trustee)
AS (
SELECT DISTINCT Trustee
FROM EXOPerms
WHERE Mailbox = 'smb1'
)
SELECT DISTINCT Mailbox,Trustee
FROM EXOPerms
Where Trustee IN (
SELECT DISTINCT Trustee
FROM Buckets_CTE)
ORDER BY Trustee
目前顶部的 DECLARE Int 是多余的,只是为了看看我是否可以实现存储桶功能。
这是一个 while 循环解决方案。它只是遍历每一行并更新 Bucket。
添加ID
以逐行循环数据
要检查 mailbox/trustee 是否存在于另一行中,请检查 i.Mailbox in (m.Mailbox, m.Trustee)
:
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID -- don't compare the same row
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
注意,更新Bucket时,会与当前Bucket进行比较,只取较小的值。这是为了解决像下面这样的情况,即前面的行之间的关系直到后面的行才知道。
ID MailBox Trustee
1 a b
2 c d
3 e f
4 c f
ID 1, 2, 3 在顺序处理时分配单独的Bucket。只有当进程 ID 为 4 时,它才会将 ID 2 和 3 链接在一起
完成查询
declare @mailbox table
(
ID int identity,
Mailbox varchar(5),
Trustee varchar(5),
Bucket int
)
insert into @mailbox (Mailbox, Trustee) values
( 'smb1', 'mbx1' ),
( 'smb2', 'mbx1' ),
( 'smb2', 'mbx2' ),
( 'smb2', 'mbx3' ),
( 'smb3', 'mbx4' ),
( 'smb3', 'mbx5' ),
( 'mbx1', 'mbx6' ),
( 'mbx7', 'mbx4' ),
( 'smb4', 'mbx8' ),
( 'smb4', 'mbx9' ),
( 'mbx8', 'mbx10');
declare @ID int,
@Bucket int = 1 -- start from 1
-- get the minimum ID for start
select @ID = min(ID) from @mailbox where Bucket is null
while exists
(
select *
from @mailbox
where ID >= @ID
)
begin
-- if the mailbox is found in other row with Bucket value
-- (Bucket is not null)
if exists
(
select *
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
and m.Bucket is not null
)
begin
-- Update Bucket from other row
update i
set Bucket = case when i.Bucket is null
or i.Bucket > m.Bucket
then m.Bucket
else i.Bucket
end
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
and m.Bucket is not null
-- Update other rows that might linked to current ID
update m
set Bucket = case when i.Bucket > m.Bucket
then m.Bucket
else i.Bucket
end
from @mailbox i
inner join @mailbox m
on i.ID <> m.ID
and (
i.Mailbox in (m.Mailbox, m.Trustee)
or i.Trustee in (m.Mailbox, m.Trustee)
)
where i.ID = @ID
end
else
begin
-- no other row found with same mailbox.
-- Assign Bucket from @Bucket, increment @Bucket
update m
set Bucket = @Bucket
from @mailbox m
where m.ID = @ID;
select @Bucket = @Bucket + 1;
end
-- Get next ID
select @ID = min(ID) from @mailbox where ID > @ID;
end
select *
from @mailbox
order by ID