根据之前的结束日期批量更新开始日期

Bulk Update start date based on previous end date

我正在尝试对包含大约 70 万条记录的 table 进行批量更新。我需要用之前记录的有效结束日期更新有效开始日期。使用子查询时,我在更新语句的性能方面遇到了问题。即使使用日期过滤器(7/1/2016-7/15/2016,大约有 2k 条记录),也需要一个多小时才能达到 运行。我尝试将其作为简单的更新语句、插入语句和循环语句。使用 ROWID 而不是 account_dim_key 的解释计划(table 上的 PK)要优化得多,但是,我得到一个错误,子查询 returns 多于一行。我不确定为什么 ROWID 会发生这种情况。

ID是table上的自然键,account_dim_key是PK,是唯一的。两者都有索引。 Table 是 2 类 SCD。

  1. 如何使用ROWID修改更新语句
  2. 使用 FORALL 更新会更好吗?如果是这样,我将如何编写它(pl sql 的新手并且不熟悉数组)

使用 ROWID 更新语句 returns 错误单行子查询 returns 多行但具有最佳解释计划

UPDATE DEXWHS.D_ACCOUNT_VEEVA
   SET effective_end_dt =
          (SELECT prev_dt
             FROM (SELECT LAG (
                             effective_end_dt,
                             1,
                             effective_start_dt)
                          OVER (PARTITION BY account_dim_key
                                ORDER BY effective_start_dt)
                             AS prev_dt,
                          ROWID AS rid
                     FROM dexwhs.d_account_veeva ac2) a
            WHERE a.rid = ROWID)

使用 acocunt_dim_key 更新语句而不是最佳解释计划

UPDATE DEXWHS.D_ACCOUNT_VEEVA
   SET effective_end_dt =
          (SELECT prev_dt
             FROM (SELECT LAG (
                             effective_end_dt,
                             1,
                             effective_start_dt)
                          OVER (PARTITION BY id
                                ORDER BY effective_start_dt, account_dim_key)
                             AS prev_dt,
                          account_dim_key AS rid
                     FROM dexwhs.d_account_veeva ac2) a
            WHERE a.rid = account_dim_key)

循环更新

CREATE OR REPLACE PROCEDURE PREV_UPDT
IS
   CURSOR c1
   IS
        SELECT account_dim_key,
               id,
               active_flag,
               effective_end_dt,
               effective_start_dt,
               created_date,
               last_modified_date,
               (SELECT prev_dt
                  FROM (SELECT LAG (
                                  effective_end_dt,
                                  1,
                                  effective_start_dt)
                               OVER (
                                  PARTITION BY id
                                  ORDER BY effective_start_dt, account_dim_key)
                                  AS prev_dt,
                               account_dim_key AS rid
                          FROM dexwhs.d_account_veeva ac2) a
                 WHERE a.rid = src.account_dim_key)
          FROM dexwhs.d_account_veeva src
      ORDER BY id, effective_start_dt, account_dim_key;
   r1   c1%ROWTYPE;
BEGIN
   OPEN c1;

   LOOP
      FETCH c1 INTO r1;

      EXIT WHEN c1%NOTFOUND;
      DBMS_OUTPUT.PUT_LINE ('id=' || r1.id);

      UPDATE dexwhs.D_ACCOUNT_VEEVA trgt
         SET trgt.effective_start_dt = r1.prev_date,
             trgt.audit_last_update_dt = SYSDATE,
       WHERE trgt.account_dim_key = r1.account_dim_key;

      DBMS_OUTPUT.PUT_LINE ('r1.id_found');
   END LOOP;

   CLOSE c1;
END

如果 account_dim_key 是主键,则尝试 MERGE

MERGE INTO dexwhs.d_account_veeva a
USING (
   SELECT  account_dim_key,
           LAG ( effective_end_dt, 1, effective_start_dt)
           OVER (PARTITION BY account_dim_key
                 ORDER BY effective_start_dt)
           AS prev_dt
   FROM dexwhs.d_account_veeva
) b
ON (a.account_dim_key  = b.account_dim_key )
WHEN MATCHED THEN UPDATE SET a.effective_end_dt = b.prev_dt

查询必须花费一些时间,因为它正在更新整个 table。

也许您可以在 (account_dim_key, effective_start_dt) 列上使用复合索引来加快 LAG ... (PARTITION BY account_dim_key ORDER BY effective_start_dt) 部分的速度。

CREATE INDEX some_name 
ON dexwhs.d_account_veeva(account_dim_key, effective_start_dt)

但是 Oracle 可以忽略此索引并更喜欢完整 table 扫描,因为子查询是针对整个 table.