Oracle - 与插入、更新和删除合并的过程

Oracle - Procedure to merge with insert, update and delete

我需要创建一个程序来以最有效的方式处理这种情况(数据量非常大)。

我有一个名为 ORDER_A 的 table,它每天都会收到一个完整的负载(它被截断,所有记录都被再次插入)。 我有一个名为 ORDER_B 的 table,它是 ORDER_A 的副本,包含相同的数据和一些额外的控制日期。 我还有一个 table MANAGER 来保存开始和结束日期,如果程序是 运行.

ORDER_A 中完成所有插入后,我想执行一个过程,对于 ORDER_A 上的每条记录,必须查找具有相同标识符的记录(主键:order_id) 在 table B.

我的table是这样的

CREATE TABLE ORDER_A
(
    ORDER_ID NUMBER NOT NULL,
    ORDER_CODE VARCHAR2(50),
    ORDER_STATUS VARCHAR2(20),
    ORDER_USER_ID NUMBER,
    ORDER_DATE TIMESTAMP(6),
    CHECKSUM_CODE VARCHAR2(40),
    PRIMARY KEY (ORDER_ID)
);

CREATE TABLE ORDER_B
(
    ORDER_ID NUMBER NOT NULL,
    ORDER_CODE VARCHAR2(50),
    ORDER_STATUS VARCHAR2(20),
    ORDER_USER_ID NUMBER,
    ORDER_DATE TIMESTAMP(6)
    INSERT_AT TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP,
    UPDATED_AT TIMESTAMP(6),
    CHECKSUM_CODE VARCHAR2(40),
    FLAG_DELETED NUMBER(1),
    PRIMARY KEY (ORDER_ID)
);

-- index on checksum column for both tables
CREATE INDEX idx_cksum on ORDER_A (CHECKSUM_CODE ASC);
CREATE INDEX idx_cksum on ORDER_B (CHECKSUM_CODE ASC);


-- Manager table
CREATE TABLE MANAGER
(
    TABLE_NAME VARCHAR2(40),
    PROCEDURE_NAME VARCHAR2(50),
    START_TS TIMESTAMP(6),
    FINISH_TS TIMESTAMP(6),
    IS_RUNNING NUMBER(1)
);
    

我正在考虑类似下面这个过程的事情,但我不确定这是否是最好的方法以及如何处理删除案例

create or replace procedure MERGE_DATA_ORDER
DECLARE
 is_running number;
 ex_running EXCEPTION;
BEGIN

SELECT IS_RUNNING INTO is_running FROM MANAGER WHERE PROCEDURE_NAME = 'MERGE_DATA_ORDER';

IF is_running = 1 
   then RAISE ex_running

ELSE

-- Update the flag on manager table
UPDATE MANAGER SET IS_RUNNING = 1, START_TS = SYSTIMESTAMP WHERE PROCEDURE_NAME = 'MERGE_DATA_ORDER';
COMMIT;
        

-- update all records with a checksum using STANDARD_HASH with MD5
    UPDATE ORDER_A
        SET CHECKSUM_CODE =
            STANDARD_HASH
            (
                ORDER_ID ||
                ORDER_CODE ||
                ORDER_STATUS ||
                ORDER_USER_ID ||
                ORDER_DATE,
                'MD5'
            );          
        COMMIT;
        
-- then, I do a MERGE operation, using the checksum as a comparator 
 merge into ORDER_B b
    using (select a.* from  ORDER_A a) m
        on (m.ORDER_ID = b.ORDER_ID)
    when matched then
      update
        set 
            b.ORDER_ID = m.ORDER_ID,
            b.ORDER_CODE = m.ORDER_CODE,
            b.ORDER_STATUS = m.ORDER_STATUS,
            b.ORDER_USER_ID = m.ORDER_USER_ID,
            b.ORDER_DATE = m.ORDER_DATE,            
            b.COD_CHECKSUM = m.COD_CHECKSUM,
            b.DAT_UPDATE = SYSTIMESTAMP
      where b.CHECKSUM_CODE <> m.CHECKSUM_CODE

    when not matched then
      insert (
            b.ORDER_ID,
            b.ORDER_CODE,
            b.ORDER_STATUS,
            b.ORDER_USER_ID,
            b.ORDER_DATE,
            b.COD_CHECKSUM
            )
        values (
            m.ORDER_ID,
            m.ORDER_CODE,
            m.ORDER_STATUS,
            m.ORDER_USER_ID,
            m.ORDER_DATE,
            m.COD_CHECKSUM
            );

   END IF;

-- set the flag to 0   
   UPDATE MANAGER SET IS_RUNNING = 0, FINISH_TS = SYSTIMESTAMP WHERE PROCEDURE_NAME = 'MERGE_DATA_ORDER';
   COMMIT;
END;
/

我需要一些帮助来完成此代码、性能提示和处理删除问题;

我认为您可以将此作为数据加载的一部分作为单个语句来执行。让我们假设 ORDER_A 已经加载(但我稍后会对此进行评论)。然后,您可以通过在 ORDER_A 和 ORDER_B 之间进行完全外部联接来定义 insert/update 的结果,并使用 CASE 语句从 [=24= 投影“正确”值] 或 ORDER_B。同样,您可以投影 FLAG_DELTED。它看起来像这样。在此示例中,我跳过了 MD5,但如果确实需要,可以添加它 - 稍后也会详细介绍

select
       case 
       when ( b.order_id is null ) then a.order_id
       else case when (
            b.ORDER_ID      != m.ORDER_ID or
            b.ORDER_CODE    != m.ORDER_CODE or
            b.ORDER_STATUS  != m.ORDER_STATUS or
            b.ORDER_USER_ID != m.ORDER_USER_ID or
            b.ORDER_DATE    != m.ORDER_DATE     or    
            b.DAT_UPDATE    != SYSTIMESTAMP ) then b.order_id else a.order_id end
        end as newOrder_id
     , case when ( b.order_id is null ) then a.order_code
       else case when (
            b.ORDER_ID      != m.ORDER_ID or
            b.ORDER_CODE    != m.ORDER_CODE or
            b.ORDER_STATUS  != m.ORDER_STATUS or
            b.ORDER_USER_ID != m.ORDER_USER_ID or
            b.ORDER_DATE    != m.ORDER_DATE     or    
            b.DAT_UPDATE    != SYSTIMESTAMP ) then b.order_code else a.order_code end
        end as newOrder_code 
     , case when ( b.order_id is null ) then a.order_status
       else case when (
            b.ORDER_ID      != m.ORDER_ID or
            b.ORDER_CODE    != m.ORDER_CODE or
            b.ORDER_STATUS  != m.ORDER_STATUS or
            b.ORDER_USER_ID != m.ORDER_USER_ID or
            b.ORDER_DATE    != m.ORDER_DATE     or    
            b.DAT_UPDATE    != SYSTIMESTAMP ) then b.order_status else a.order_status end
        end as newOrder_status
/* etc... ( Repeat for all projected columns ) 
   Then for the flag_deleted column */
     , case when ( a.order_id is null ) then 1
            when ( b.order_id is null ) then 0
            else b.flag_deleted 
       end as newFlag_deleted
from Order_b b
full outer join Order_a a
on b.order_id = a.order_id

ORDER_A 可能是一个外部 table,所以您只需要在前面加上一个

CREATE TABLE NEW_ORDER_A as select....

然后你就得到了你需要的结果。

在您的示例中,您表现不佳的地方是 ORDER_A 的更新。您正在生成重做、撤消并失去任何压缩优势。您也在维护索引,但不需要索引。 假设您有资源,您现在可以使用 DIRECT PATH 和并行性,这会很好地扩展。

最后,如果你真的需要MD5,你需要在每列之间添加一个特殊字符,否则会产生歧义。例如,以下 woukd 具有相同的 MD5

COL1   COL2
AA     BBB
AAB    BB