Postgresql Alter Table 冻结数据库(CPU 被某些 Select 高负载)

Postgresql Alter Table freezes DB (CPU is high-loaded by some Select)

PostgreSQL 9.6

  1. 我愿意

    ALTER TABLE "Users"
       ADD COLUMN "someField" BOOLEAN NULL DEFAULT NULL;
    
  2. 数据库冻结 10 多分钟。

  3. table"Users"有5000行,但是数据库很大(150+Gb)。

  4. 立即,超过 20+ SELECT 个关于用户 table 的查询出现,(检查):

    SELECT query FROM pg_stat_activity
    WHERE state = 'active' and query LIKE 'SELECT%'
    

    (之前没有查询)

  5. 这些 SELECT 查询占用了所有 CPU.

  6. 我尝试重新启动数据库并尝试在 "Users" table 上执行 VACUUM ANALYZE table:

INFO:  vacuuming "public.Users"
INFO:  index "Users_pkey" now contains 5556 row versions in 86 pages
DETAIL:  0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO:  index "users_subscription_canceled" now contains 5028 row versions in 508 pages
DETAIL:  0 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.54 sec.
INFO:  index "users_shopper_id" now contains 5556 row versions in 205 pages
DETAIL:  0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.64 sec.
INFO:  index "users_referrer" now contains 5556 row versions in 586 pages
DETAIL:  0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.80 sec.
INFO:  index "users_referral_code" now contains 5556 row versions in 84 pages
DETAIL:  0 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO:  "Users": found 0 removable, 2137 nonremovable row versions in 803 out of 2780 pages
DETAIL:  445 dead row versions cannot be removed yet.
There were 25647 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 6.91 sec.
INFO:  "Users": stopping truncate due to conflicting lock request
INFO:  vacuuming "pg_toast.pg_toast_26460"
INFO:  index "pg_toast_26460_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_26460": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.Users"
INFO:  "Users": scanned 2780 of 2780 pages, containing 5539 live rows and 459 dead rows; 5539 rows in sample, 5539 estimated total rows
VACUUM
  1. 如果我执行 DROP COLUMNRENAME COLUMN,会发生相同的第 4 点和第 5 点 - 并且数据库冻结。

问题:

  1. 这是数据库的某种错误吗?添加 Nullable 字段应该非常快。

  2. 非常欢迎任何建议,我厌倦了谷歌搜索和调试:)

这是正常现象;问题是你的工作量。

您说得对,ALTER TABLE ... ADD COLUMN 是一个非常快的操作。那不是问题。问题是这样的 ALTER TABLE 需要 table 上的短 ACCESS EXCLUSIVE 锁,因为它修改了 table 结构。

这样的 ACCESS EXCLUSIVE 锁与 SELECT 语句放在 table 上的 ACCESS SHARE 锁不兼容。这就是目的:如果 table 在 运行 时发生变化,SELECT 语句应该如何表现?

现在的问题是您的查询需要很长时间,或者有人忘记关闭锁定 table 的事务。

你可以用

检查这个
SELECT pid, a.state, a.xact_start
FROM pg_locks AS l
   JOIN pg_stat_activity AS a USING (pid)
WHERE l.relation = 'Users'::regclass;

这将显示所有锁定 table 的事务以及它们开始的时间。

现在你的 ALTER TABLE 必须等到所有这些事务都完成,并且所有稍后发出的短 SELECT 语句必须在 ALTER TABLE.[=23 后面排队=]

一旦 ALTER TABLE 获得所需的锁,它会很快完成并释放 table 上的锁。现在所有其他排队的语句将同时松动并在您的机器上造成高负载。

解决方案由两部分组成:

  1. 修复应用程序,使其立即关闭交易。

  2. 使用连接池尽可能减少max_connections。那么可以阻塞的语句数就被限制了,机器过载的危险就小了。