为什么这个 SQL 查询会陷入死循环？

Question

以下 PostgreSQL 查询

UPDATE table_A A 
SET is_active = false 
FROM table_A 
WHERE A.parent_id IS NULL AND A.is_active = true AND A.id = ANY 
(SELECT (B.parent_id) 
FROM table_A B 
INNER JOIN table_B ON table_A.foreign_id = table_B.id 
WHERE table_B.deleted = true);

卡在无休止的加载中。我知道相关的 sub-queries 可能需要很长时间，但是使用相同参数的 SELECT 可以快速运行并返回所需的结果。我有一个小数据集，我让运行一整天只是为了确保它最终不会随着时间的推移而工作。

Table_A 使用分层数据结构，只有特定级别的层次结构具有外键，我可以使用它来连接和检查第二个 table。这个想法是：

查找 Table_A 中关联的 Table_B 行的“已删除”值设置为 true 的所有行。
从这组结果中得到 parent_id 列
对于 table_A 中的任何行，其 id 是 parent_id 列的一部分，因此对于所有 parents，检查它们的 is_active 是否是为真，如果为真，则为假。

解释：

Update on table_A A  (cost=0.00..3906658758867.89 rows=89947680 width=192)  ->  Nested Loop  (cost=0.00..3906658758867.89 rows=89947680 width=192)
    Join Filter: (SubPlan 1)
    ->  Seq Scan on table_A  (cost=0.00..37899.20 rows=410720 width=14)
    ->  Materialize  (cost=0.00..37901.39 rows=438 width=185)
          ->  Seq Scan on table_A A  (cost=0.00..37899.20 rows=438 width=185)
                Filter: ((parent_id IS NULL) AND is_active)
    SubPlan 1
      ->  Nested Loop  (cost=0.00..42405.74 rows=410720 width=8)
            ->  Seq Scan on table_B  (cost=0.00..399.34 rows=1 width=0)
                  Filter: (deleted AND (table_A.foreign_id = id))
            ->  Seq Scan on table_A B  (cost=0.00..37899.20 rows=410720 width=8) 
JIT:  Functions: 17 "  Options: Inlining true, Optimization true, Expressions true, Deforming true"

Answer 1

我认为this fiddle比较符合你的情况。

两个不同的事情可能会导致这个运行 super-slow。

首先，您的更新查询似乎对 90 兆行 table 进行了完整 table 扫描。这是很多数据。

在 table_A 上创建索引可能有助于加快在 table_A 中找到符合条件的行。

CREATE INDEX "active_parentnull_id" 
          ON table_A USING BTREE
             ("is_active", ("parent_id" IS NULL), "id");

同样，在 TABLE_B 上创建索引可能会有所帮助。

CREATE INDEX "deleted_id"
          ON table_B USING BTREE
          ("deleted", "id");

其次，您可能正在更新大量行。由于事务语义的原因，对大量行的更新操作可能会花费非常长的时间：RDBMS 尽最大努力让它看起来，对于您的数据的其他用户，就像您的更新是即时发生的一样。要为多行实现这一点需要大量的 IO 和 CPU.

所以，您应该尝试运行批量更新。像这样重构您的查询并使用 LIMIT 子句。

UPDATE table_A
   SET is_active = false   
 WHERE id IN (
       SELECT DISTINCT id
         FROM table_A A
        WHERE A.parent_id IS NULL
          AND A.is_active = true
          AND A.id = ANY (
             SELECT (B.parent_id) 
               FROM table_A B 
               INNER JOIN table_B ON table_A.foreign_id = table_B.id 
               WHERE table_B.deleted = true)
         LIMIT 1000);

然后运行重复查询，直到不更新任何行。这可能需要一段时间，但肯定比尝试一次完成所有事情花费的时间更少。

Answer 2

有时正确使用别名会有所作为。

比较以下 2 个查询计划。
第一个是对示例数据的原始查询运行。
cost=0.00..314144.00 rows=4975 估计要在少于 10 行的 table 上更新 4975 行？

第二个是第一个的略微修改版本。
cost=92.26..122.20 rows=2

EXPLAIN
UPDATE table_A A SET is_active = false 
FROM table_A 
WHERE A.parent_id IS NULL 
  AND A.is_active = true 
  AND A.id = ANY (
    SELECT (B.parent_id) 
    FROM table_A B 
    INNER JOIN table_B ON table_A.foreign_id = table_B.id 
    WHERE table_B.deleted = true
);

| QUERY PLAN                                                                                     |
| :--------------------------------------------------------------------------------------------- |
| Update on table_a a  (cost=0.00..314144.00 rows=4975 width=25)                                 |
|   ->  Nested Loop  (cost=0.00..314144.00 rows=4975 width=25)                                   |
|         Join Filter: (SubPlan 1)                                                               |
|         ->  Seq Scan on table_a  (cost=0.00..29.90 rows=1990 width=10)                         |
|         ->  Materialize  (cost=0.00..29.93 rows=5 width=18)                                    |
|               ->  Seq Scan on table_a a  (cost=0.00..29.90 rows=5 width=18)                    |
|                     Filter: ((parent_id IS NULL) AND is_active)                                |
|         SubPlan 1                                                                              |
|           ->  Nested Loop  (cost=0.15..57.97 rows=1990 width=4)                                |
|                 ->  Index Scan using table_b_pkey on table_b  (cost=0.15..8.17 rows=1 width=0) |
|                       Index Cond: (table_a.foreign_id = id)                                    |
|                       Filter: deleted                                                          |
|                 ->  Seq Scan on table_a b  (cost=0.00..29.90 rows=1990 width=4)                |

EXPLAIN 
UPDATE table_A SET is_active = false 
WHERE parent_id IS NULL 
  AND is_active = true 
  AND id = ANY (
    SELECT a2.parent_id
    FROM table_A a2 
    JOIN table_B b ON a2.foreign_id = b.id 
    WHERE b.deleted = true
  );

| QUERY PLAN                                                                                       |
| :----------------------------------------------------------------------------------------------- |
| Update on table_a  (cost=92.26..122.20 rows=2 width=31)                                          |
|   ->  Hash Join  (cost=92.26..122.20 rows=2 width=31)                                            |
|         Hash Cond: (table_a.id = a2.parent_id)                                                   |
|         ->  Seq Scan on table_a  (cost=0.00..29.90 rows=5 width=18)                              |
|               Filter: ((parent_id IS NULL) AND is_active)                                        |
|         ->  Hash  (cost=89.76..89.76 rows=200 width=16)                                          |
|               ->  HashAggregate  (cost=87.76..89.76 rows=200 width=16)                           |
|                     Group Key: a2.parent_id                                                      |
|                     ->  Hash Join  (cost=50.14..85.27 rows=995 width=16)                         |
|                           Hash Cond: (a2.foreign_id = b.id)                                      |
|                           ->  Seq Scan on table_a a2  (cost=0.00..29.90 rows=1990 width=14)      |
|                           ->  Hash  (cost=34.70..34.70 rows=1235 width=10)                       |
|                                 ->  Seq Scan on table_b b  (cost=0.00..34.70 rows=1235 width=10) |
|                                       Filter: deleted                                            |

第二个查询只使用了几个别名。

update 语句也可以写成 sub-query 上的连接。

UPDATE table_A AS parent
   SET is_active = false 
FROM (
   SELECT child.parent_id
   FROM table_A AS child 
   JOIN table_B AS dream
     ON child.foreign_id = dream.id
   WHERE child.parent_id IS NOT NULL
     AND dream.deleted = true
   GROUP BY child.parent_id
) dreamless 
WHERE parent.id = dreamless.parent_id
  AND parent.parent_id IS NULL
  AND parent.is_active = true;

1 rows affected

SELECT * FROM table_A

id | parent_id | is_active | foreign_id
-: | --------: | :-------- | ---------:
 2 |         1 | t         |          2
 3 |         1 | t         |          5
 4 |      null | t         |          3
 5 |         3 | t         |          4
 1 |      null | f         |          1

在 db<>fiddle here

上测试

为什么这个 SQL 查询会陷入死循环？

Why does this SQL query get stuck in an endless loop?

sql

postgresql

query-optimization