从 table 中查找取消和替换匹配项的最佳方法

Best way to find cancel and substitute match from the table

我有一个 table,里面有所有订单信息。

Id 是 table 的唯一键,它按 Order_Id 分组(同一产品的每个订单都具有相同的 Order_Id)。最主要的是,如果订单被取消,则记录为取消订单(Cancelled?= True),对于后续订单,它必须有替代订单。它可以是精确的一对一匹配,如 Id 5 和 6,但也可以是一对多(Id 2,3 和 4)、多对一或多对多。如您所见,Id 1 和 7 不是 cancelled/substitute 匹配的一部分,因此应从匹配中排除。

我的目标是从下面的 table 中找到 cancelled/substitute 匹配项。它可以是 sql 查询或存储过程。我也在考虑让另一列 - Parent_Id 记录取消的 Id 以替换 Id 但它也必须从存储过程更新。

有什么想法吗?谢谢

验证(所有)组合是 potentially very hard as this is essentially a subset sum problem。如果您可以添加一些限制,那么它通常会变得更容易。

下面的解决方案有以下限制:

  1. order_id中取消和替代交易的顺序不能混用。该解决方案将 order_id 紧接在 已取消交易之后的交易相加,总计 运行ning(运行ning 总和)。当 运行ning 总数达到取消数量时,交易被视为匹配。混单交易会弄乱运行宁合计,无法找到匹配的数量。
  2. 在一个 order_id 中,不能超过一个被取消的交易。这将需要重置 运行ning 总数。这可以做到,但会给解决方案带来更多的复杂性。

我在限制1中指的混合顺序示例

order id cancelled quantity
----- -- --------- --------
1     1  yes       100     --> 1 is cancelled
1     2  no        50      --> 2 is unrelated
1     3  no        100     --> 3 is the substitute for 1, but it does not come directly after 1...

示例数据

强加的限制对样本数据有效:所有替代交易直接发生在取消的交易之后,并且在单个 order_id!

中没有多次取消
create table transactions
(
  order_id int,
  id int,
  quantity int,
  cancelled_date date,
  created_date date,
  cancelled bit
);

insert into transactions (order_id, id, quantity, cancelled_date, created_date, cancelled) values
(100000, 1, 100, null        , '2020-10-10', 0),
(100000, 2, 200, '2020-10-11', '2020-10-10', 1),
(100000, 3,  50, null        , '2020-10-12', 0),
(100000, 4, 150, null        , '2020-10-12', 0),
(100001, 5, 300, '2020-10-12', '2020-10-11', 1),
(100001, 6, 300, null        , '2020-10-13', 0),
(100001, 7,  50, null        , '2020-10-14', 0);

解决方案

要查看结果中的 运行ning 总数并更好地理解最终解决方案,您可以 运行 此查询。

with cte_cancel as
(
  select t.order_id,
         t.id, -- using id to get transaction order (alternative would be created_date, but then what if an order is created and cancelled on the same day?)
         t.quantity
  from transactions t
  where t.cancelled = 1
)
select c.order_id,
       c.id as cancelled_id,
       c.quantity,
       t.id as substitute_id,
       t.quantity,
       sum(t.quantity) over(partition by t.order_id
                            order by t.id
                            rows between unbounded preceding and current row) as qty_sum,
       case when sum(t.quantity) over(partition by t.order_id
                                      order by t.id
                                      rows between unbounded preceding and current row) <= c.quantity
            then c.id end as parent_order_id
from cte_cancel c
join transactions t
  on t.order_id = c.order_id
where t.cancelled = 0
  and t.id > c.id
order by c.id, t.id;

这会产生:

order_id cancelled_id quantity substitute_id quantity qty_sum parent_order_id
-------- ------------ -------- ------------- -------- ------- ---------------
100000   2            200      3             50       50      2
100000   2            200      4             150      200     2
100001   5            300      6             300      300     5
100001   5            300      7             50       350     null

只为您提供匹配的解决方案的最小版本如下:

with cte_cancel as
(
  select t.order_id,
         t.id,
         t.quantity
  from transactions t
  where t.cancelled = 1
),
cte_match as
(
  select c.order_id,
         c.id as cancelled_id,
         t.id as substitute_id,
         case when sum(t.quantity) over(partition by t.order_id
                                        order by t.id
                                        rows between unbounded preceding and current row) <= c.quantity
              then c.id end as parent_order_id
  from cte_cancel c
  join transactions t
    on t.order_id = c.order_id
  where t.cancelled = 0
    and t.id > c.id
)
select m.order_id,
       m.cancelled_id,
       m.substitute_id
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
         m.cancelled_id;

导致:

order_id cancelled_id substitute_id
-------- ------------ -------------
100000   2            3
100000   2            4
100001   5            6

具有相关数量和日期的版本可能是这样的:

with cte_cancel as
(
  select t.order_id,
         t.id,
         t.cancelled_date,
         t.quantity
  from transactions t
  where t.cancelled = 1
),
cte_match as
(
  select c.order_id,
         c.id as cancelled_id,
         c.quantity as cancelled_qty,
         c.cancelled_date,
         t.id as substitute_id,
         t.quantity as substitute_qty,
         t.created_date as substitute_date,
         case when sum(t.quantity) over(partition by t.order_id
                                        order by t.id
                                        rows between unbounded preceding and current row) <= c.quantity
              then c.id end as parent_order_id
  from cte_cancel c
  join transactions t
    on t.order_id = c.order_id
  where t.cancelled = 0
    and t.id > c.id
)
select m.order_id,
       m.cancelled_id,
       m.cancelled_qty,
       m.cancelled_date,
       m.substitute_id,
       m.substitute_qty,
       m.substitute_date
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
         m.cancelled_id;

导致:

order_id cancelled_id cancelled_qty cancelled_date substitute_id substitute_qty substitute_date
-------- ------------ ------------- -------------- ------------- -------------- ---------------
100000   2            200           2020-10-11     3             50             2020-10-12
100000   2            200           2020-10-11     4             150            2020-10-12
100001   5            300           2020-10-12     6             300            2020-10-13

Fiddle 以查看所有内容。