从 table 中查找取消和替换匹配项的最佳方法
Best way to find cancel and substitute match from the table
我有一个 table,里面有所有订单信息。
Id 是 table 的唯一键,它按 Order_Id 分组(同一产品的每个订单都具有相同的 Order_Id)。最主要的是,如果订单被取消,则记录为取消订单(Cancelled?= True),对于后续订单,它必须有替代订单。它可以是精确的一对一匹配,如 Id 5 和 6,但也可以是一对多(Id 2,3 和 4)、多对一或多对多。如您所见,Id 1 和 7 不是 cancelled/substitute 匹配的一部分,因此应从匹配中排除。
我的目标是从下面的 table 中找到 cancelled/substitute 匹配项。它可以是 sql 查询或存储过程。我也在考虑让另一列 - Parent_Id 记录取消的 Id 以替换 Id 但它也必须从存储过程更新。
有什么想法吗?谢谢
验证(所有)组合是 potentially very hard as this is essentially a subset sum problem。如果您可以添加一些限制,那么它通常会变得更容易。
下面的解决方案有以下限制:
order_id
中取消和替代交易的顺序不能混用。该解决方案将 order_id
中 紧接在 已取消交易之后的交易相加,总计 运行ning(运行ning 总和)。当 运行ning 总数达到取消数量时,交易被视为匹配。混单交易会弄乱运行宁合计,无法找到匹配的数量。
- 在一个
order_id
中,不能超过一个被取消的交易。这将需要重置 运行ning 总数。这可以做到,但会给解决方案带来更多的复杂性。
我在限制1中指的混合顺序示例
order id cancelled quantity
----- -- --------- --------
1 1 yes 100 --> 1 is cancelled
1 2 no 50 --> 2 is unrelated
1 3 no 100 --> 3 is the substitute for 1, but it does not come directly after 1...
示例数据
强加的限制对样本数据有效:所有替代交易直接发生在取消的交易之后,并且在单个 order_id
!
中没有多次取消
create table transactions
(
order_id int,
id int,
quantity int,
cancelled_date date,
created_date date,
cancelled bit
);
insert into transactions (order_id, id, quantity, cancelled_date, created_date, cancelled) values
(100000, 1, 100, null , '2020-10-10', 0),
(100000, 2, 200, '2020-10-11', '2020-10-10', 1),
(100000, 3, 50, null , '2020-10-12', 0),
(100000, 4, 150, null , '2020-10-12', 0),
(100001, 5, 300, '2020-10-12', '2020-10-11', 1),
(100001, 6, 300, null , '2020-10-13', 0),
(100001, 7, 50, null , '2020-10-14', 0);
解决方案
要查看结果中的 运行ning 总数并更好地理解最终解决方案,您可以 运行 此查询。
with cte_cancel as
(
select t.order_id,
t.id, -- using id to get transaction order (alternative would be created_date, but then what if an order is created and cancelled on the same day?)
t.quantity
from transactions t
where t.cancelled = 1
)
select c.order_id,
c.id as cancelled_id,
c.quantity,
t.id as substitute_id,
t.quantity,
sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) as qty_sum,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
order by c.id, t.id;
这会产生:
order_id cancelled_id quantity substitute_id quantity qty_sum parent_order_id
-------- ------------ -------- ------------- -------- ------- ---------------
100000 2 200 3 50 50 2
100000 2 200 4 150 200 2
100001 5 300 6 300 300 5
100001 5 300 7 50 350 null
只为您提供匹配的解决方案的最小版本如下:
with cte_cancel as
(
select t.order_id,
t.id,
t.quantity
from transactions t
where t.cancelled = 1
),
cte_match as
(
select c.order_id,
c.id as cancelled_id,
t.id as substitute_id,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
)
select m.order_id,
m.cancelled_id,
m.substitute_id
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
m.cancelled_id;
导致:
order_id cancelled_id substitute_id
-------- ------------ -------------
100000 2 3
100000 2 4
100001 5 6
具有相关数量和日期的版本可能是这样的:
with cte_cancel as
(
select t.order_id,
t.id,
t.cancelled_date,
t.quantity
from transactions t
where t.cancelled = 1
),
cte_match as
(
select c.order_id,
c.id as cancelled_id,
c.quantity as cancelled_qty,
c.cancelled_date,
t.id as substitute_id,
t.quantity as substitute_qty,
t.created_date as substitute_date,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
)
select m.order_id,
m.cancelled_id,
m.cancelled_qty,
m.cancelled_date,
m.substitute_id,
m.substitute_qty,
m.substitute_date
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
m.cancelled_id;
导致:
order_id cancelled_id cancelled_qty cancelled_date substitute_id substitute_qty substitute_date
-------- ------------ ------------- -------------- ------------- -------------- ---------------
100000 2 200 2020-10-11 3 50 2020-10-12
100000 2 200 2020-10-11 4 150 2020-10-12
100001 5 300 2020-10-12 6 300 2020-10-13
Fiddle 以查看所有内容。
我有一个 table,里面有所有订单信息。
Id 是 table 的唯一键,它按 Order_Id 分组(同一产品的每个订单都具有相同的 Order_Id)。最主要的是,如果订单被取消,则记录为取消订单(Cancelled?= True),对于后续订单,它必须有替代订单。它可以是精确的一对一匹配,如 Id 5 和 6,但也可以是一对多(Id 2,3 和 4)、多对一或多对多。如您所见,Id 1 和 7 不是 cancelled/substitute 匹配的一部分,因此应从匹配中排除。
我的目标是从下面的 table 中找到 cancelled/substitute 匹配项。它可以是 sql 查询或存储过程。我也在考虑让另一列 - Parent_Id 记录取消的 Id 以替换 Id 但它也必须从存储过程更新。
有什么想法吗?谢谢
验证(所有)组合是 potentially very hard as this is essentially a subset sum problem。如果您可以添加一些限制,那么它通常会变得更容易。
下面的解决方案有以下限制:
order_id
中取消和替代交易的顺序不能混用。该解决方案将order_id
中 紧接在 已取消交易之后的交易相加,总计 运行ning(运行ning 总和)。当 运行ning 总数达到取消数量时,交易被视为匹配。混单交易会弄乱运行宁合计,无法找到匹配的数量。- 在一个
order_id
中,不能超过一个被取消的交易。这将需要重置 运行ning 总数。这可以做到,但会给解决方案带来更多的复杂性。
我在限制1中指的混合顺序示例
order id cancelled quantity
----- -- --------- --------
1 1 yes 100 --> 1 is cancelled
1 2 no 50 --> 2 is unrelated
1 3 no 100 --> 3 is the substitute for 1, but it does not come directly after 1...
示例数据
强加的限制对样本数据有效:所有替代交易直接发生在取消的交易之后,并且在单个 order_id
!
create table transactions
(
order_id int,
id int,
quantity int,
cancelled_date date,
created_date date,
cancelled bit
);
insert into transactions (order_id, id, quantity, cancelled_date, created_date, cancelled) values
(100000, 1, 100, null , '2020-10-10', 0),
(100000, 2, 200, '2020-10-11', '2020-10-10', 1),
(100000, 3, 50, null , '2020-10-12', 0),
(100000, 4, 150, null , '2020-10-12', 0),
(100001, 5, 300, '2020-10-12', '2020-10-11', 1),
(100001, 6, 300, null , '2020-10-13', 0),
(100001, 7, 50, null , '2020-10-14', 0);
解决方案
要查看结果中的 运行ning 总数并更好地理解最终解决方案,您可以 运行 此查询。
with cte_cancel as
(
select t.order_id,
t.id, -- using id to get transaction order (alternative would be created_date, but then what if an order is created and cancelled on the same day?)
t.quantity
from transactions t
where t.cancelled = 1
)
select c.order_id,
c.id as cancelled_id,
c.quantity,
t.id as substitute_id,
t.quantity,
sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) as qty_sum,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
order by c.id, t.id;
这会产生:
order_id cancelled_id quantity substitute_id quantity qty_sum parent_order_id
-------- ------------ -------- ------------- -------- ------- ---------------
100000 2 200 3 50 50 2
100000 2 200 4 150 200 2
100001 5 300 6 300 300 5
100001 5 300 7 50 350 null
只为您提供匹配的解决方案的最小版本如下:
with cte_cancel as
(
select t.order_id,
t.id,
t.quantity
from transactions t
where t.cancelled = 1
),
cte_match as
(
select c.order_id,
c.id as cancelled_id,
t.id as substitute_id,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
)
select m.order_id,
m.cancelled_id,
m.substitute_id
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
m.cancelled_id;
导致:
order_id cancelled_id substitute_id
-------- ------------ -------------
100000 2 3
100000 2 4
100001 5 6
具有相关数量和日期的版本可能是这样的:
with cte_cancel as
(
select t.order_id,
t.id,
t.cancelled_date,
t.quantity
from transactions t
where t.cancelled = 1
),
cte_match as
(
select c.order_id,
c.id as cancelled_id,
c.quantity as cancelled_qty,
c.cancelled_date,
t.id as substitute_id,
t.quantity as substitute_qty,
t.created_date as substitute_date,
case when sum(t.quantity) over(partition by t.order_id
order by t.id
rows between unbounded preceding and current row) <= c.quantity
then c.id end as parent_order_id
from cte_cancel c
join transactions t
on t.order_id = c.order_id
where t.cancelled = 0
and t.id > c.id
)
select m.order_id,
m.cancelled_id,
m.cancelled_qty,
m.cancelled_date,
m.substitute_id,
m.substitute_qty,
m.substitute_date
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
m.cancelled_id;
导致:
order_id cancelled_id cancelled_qty cancelled_date substitute_id substitute_qty substitute_date
-------- ------------ ------------- -------------- ------------- -------------- ---------------
100000 2 200 2020-10-11 3 50 2020-10-12
100000 2 200 2020-10-11 4 150 2020-10-12
100001 5 300 2020-10-12 6 300 2020-10-13
Fiddle 以查看所有内容。