Postgresql Alter Table 冻结数据库(CPU 被某些 Select 高负载)
Postgresql Alter Table freezes DB (CPU is high-loaded by some Select)
PostgreSQL 9.6
我愿意
ALTER TABLE "Users"
ADD COLUMN "someField" BOOLEAN NULL DEFAULT NULL;
数据库冻结 10 多分钟。
table"Users"
有5000行,但是数据库很大(150+Gb)。
立即,超过 20+ SELECT
个关于用户 table 的查询出现,(检查):
SELECT query FROM pg_stat_activity
WHERE state = 'active' and query LIKE 'SELECT%'
(之前没有查询)
这些 SELECT
查询占用了所有 CPU.
我尝试重新启动数据库并尝试在 "Users"
table 上执行 VACUUM ANALYZE
table:
INFO: vacuuming "public.Users"
INFO: index "Users_pkey" now contains 5556 row versions in 86 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO: index "users_subscription_canceled" now contains 5028 row versions in 508 pages
DETAIL: 0 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.54 sec.
INFO: index "users_shopper_id" now contains 5556 row versions in 205 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.64 sec.
INFO: index "users_referrer" now contains 5556 row versions in 586 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.80 sec.
INFO: index "users_referral_code" now contains 5556 row versions in 84 pages
DETAIL: 0 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO: "Users": found 0 removable, 2137 nonremovable row versions in 803 out of 2780 pages
DETAIL: 445 dead row versions cannot be removed yet.
There were 25647 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 6.91 sec.
INFO: "Users": stopping truncate due to conflicting lock request
INFO: vacuuming "pg_toast.pg_toast_26460"
INFO: index "pg_toast_26460_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_26460": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.Users"
INFO: "Users": scanned 2780 of 2780 pages, containing 5539 live rows and 459 dead rows; 5539 rows in sample, 5539 estimated total rows
VACUUM
- 如果我执行
DROP COLUMN
或 RENAME COLUMN
,会发生相同的第 4 点和第 5 点 - 并且数据库冻结。
问题:
这是数据库的某种错误吗?添加 Nullable 字段应该非常快。
非常欢迎任何建议,我厌倦了谷歌搜索和调试:)
这是正常现象;问题是你的工作量。
您说得对,ALTER TABLE ... ADD COLUMN
是一个非常快的操作。那不是问题。问题是这样的 ALTER TABLE
需要 table 上的短 ACCESS EXCLUSIVE
锁,因为它修改了 table 结构。
这样的 ACCESS EXCLUSIVE
锁与 SELECT
语句放在 table 上的 ACCESS SHARE
锁不兼容。这就是目的:如果 table 在 运行 时发生变化,SELECT
语句应该如何表现?
现在的问题是您的查询需要很长时间,或者有人忘记关闭锁定 table 的事务。
你可以用
检查这个
SELECT pid, a.state, a.xact_start
FROM pg_locks AS l
JOIN pg_stat_activity AS a USING (pid)
WHERE l.relation = 'Users'::regclass;
这将显示所有锁定 table 的事务以及它们开始的时间。
现在你的 ALTER TABLE
必须等到所有这些事务都完成,并且所有稍后发出的短 SELECT
语句必须在 ALTER TABLE
.[=23 后面排队=]
一旦 ALTER TABLE
获得所需的锁,它会很快完成并释放 table 上的锁。现在所有其他排队的语句将同时松动并在您的机器上造成高负载。
解决方案由两部分组成:
修复应用程序,使其立即关闭交易。
使用连接池尽可能减少max_connections
。那么可以阻塞的语句数就被限制了,机器过载的危险就小了。
PostgreSQL 9.6
我愿意
ALTER TABLE "Users" ADD COLUMN "someField" BOOLEAN NULL DEFAULT NULL;
数据库冻结 10 多分钟。
table
"Users"
有5000行,但是数据库很大(150+Gb)。立即,超过 20+
SELECT
个关于用户 table 的查询出现,(检查):SELECT query FROM pg_stat_activity WHERE state = 'active' and query LIKE 'SELECT%'
(之前没有查询)
这些
SELECT
查询占用了所有 CPU.我尝试重新启动数据库并尝试在
"Users"
table 上执行VACUUM ANALYZE
table:
INFO: vacuuming "public.Users"
INFO: index "Users_pkey" now contains 5556 row versions in 86 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO: index "users_subscription_canceled" now contains 5028 row versions in 508 pages
DETAIL: 0 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.54 sec.
INFO: index "users_shopper_id" now contains 5556 row versions in 205 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.64 sec.
INFO: index "users_referrer" now contains 5556 row versions in 586 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.80 sec.
INFO: index "users_referral_code" now contains 5556 row versions in 84 pages
DETAIL: 0 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO: "Users": found 0 removable, 2137 nonremovable row versions in 803 out of 2780 pages
DETAIL: 445 dead row versions cannot be removed yet.
There were 25647 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 6.91 sec.
INFO: "Users": stopping truncate due to conflicting lock request
INFO: vacuuming "pg_toast.pg_toast_26460"
INFO: index "pg_toast_26460_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_26460": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.Users"
INFO: "Users": scanned 2780 of 2780 pages, containing 5539 live rows and 459 dead rows; 5539 rows in sample, 5539 estimated total rows
VACUUM
- 如果我执行
DROP COLUMN
或RENAME COLUMN
,会发生相同的第 4 点和第 5 点 - 并且数据库冻结。
问题:
这是数据库的某种错误吗?添加 Nullable 字段应该非常快。
非常欢迎任何建议,我厌倦了谷歌搜索和调试:)
这是正常现象;问题是你的工作量。
您说得对,ALTER TABLE ... ADD COLUMN
是一个非常快的操作。那不是问题。问题是这样的 ALTER TABLE
需要 table 上的短 ACCESS EXCLUSIVE
锁,因为它修改了 table 结构。
这样的 ACCESS EXCLUSIVE
锁与 SELECT
语句放在 table 上的 ACCESS SHARE
锁不兼容。这就是目的:如果 table 在 运行 时发生变化,SELECT
语句应该如何表现?
现在的问题是您的查询需要很长时间,或者有人忘记关闭锁定 table 的事务。
你可以用
检查这个SELECT pid, a.state, a.xact_start
FROM pg_locks AS l
JOIN pg_stat_activity AS a USING (pid)
WHERE l.relation = 'Users'::regclass;
这将显示所有锁定 table 的事务以及它们开始的时间。
现在你的 ALTER TABLE
必须等到所有这些事务都完成,并且所有稍后发出的短 SELECT
语句必须在 ALTER TABLE
.[=23 后面排队=]
一旦 ALTER TABLE
获得所需的锁,它会很快完成并释放 table 上的锁。现在所有其他排队的语句将同时松动并在您的机器上造成高负载。
解决方案由两部分组成:
修复应用程序,使其立即关闭交易。
使用连接池尽可能减少
max_connections
。那么可以阻塞的语句数就被限制了,机器过载的危险就小了。