Postgresql 计划不周的查询运行时间太长
Postgresql poorly planned query runs too long
我有一个复杂的查询,下面已经大大简化了,运行ning on "PostgreSQL 11.9 on aarch64-unknown-linux-gnu,由 aarch64-unknown-[=100= 编译]-gnu-gcc (GCC) 7.4.0,64 位",运行在 AWS Aurora Serverless 2xlarge 服务器(8 核,64GB RAM)上运行。
我有以下...
mv_journey,一个具有约 5.5 亿行的物化视图,其中包含有关具有起点和终点的旅程的信息,以及关于这些的一些度量(旅程多长时间take, etc),用 from_id
和 from_region
列定义,用于标识起点,to_id
和 to_region
用于标识目的地。
place_from 和 place_to,它们是从函数 fn_location_get
计算得出的,在 CTE 的初始步骤中,包含 id
和 region
(映射到 from_id
、from_region
和 to_id
、to_region
, 分别)。这些还包含来自该区域的汇总级别,例如 country
、continent
。通常这些 return 在 ~100 到 20,000 行之间。
稍后在该 CTE 中,我使用 place_from
和 place_to
过滤 550M mv_journey
行,并使用 group by
根据旅程创建汇总报告,例如从一个国家到另一个国家。
简化查询是这样的。
WITH place_from AS (
select *
from fn_location_get(...)
), place_to AS (
select *
from fn_location_get(...)
)
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
我在 mv_journey
上有索引
CREATE INDEX idx_mv_journey_from ON mv_journey (from_id, from_region);
CREATE INDEX idx_mv_journey_to ON mv_journey (to_id, to_region);
当我 运行 查询(使用 SET LOCAL work_mem = '2048MB'
调用快速排序)在 place_from
(92) 中有少量行而在 [=28= 中有大量行时] (~18,000),查询 运行s 在大约 25 秒内使用以下查询计划(包括 CTE 中生成 place_from
和 place_to
的步骤)。
"GroupAggregate (cost=530108.64..530129.64 rows=30 width=686) (actual time=13097.187..25408.707 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=34.275..34.331 rows=92 loops=1)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=96.287..97.428 rows=18085 loops=1)"
" -> Sort (cost=530088.14..530088.22 rows=30 width=622) (actual time=12935.329..13295.468 rows=1871349 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 826782kB"
" -> Merge Join (cost=529643.68..530087.41 rows=30 width=622) (actual time=4708.780..6021.449 rows=1871349 loops=1)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" -> Sort (cost=529573.85..529719.16 rows=58124 width=340) (actual time=4583.265..4788.625 rows=1878801 loops=1)"
" Sort Key: j.to_id, j.to_region"
" Sort Method: quicksort Memory: 623260kB"
" -> Nested Loop (cost=0.57..524974.25 rows=58124 width=340) (actual time=34.324..3079.815 rows=1878801 loops=1)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320) (actual time=34.277..34.432 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..524.37 rows=58 width=60) (actual time=0.018..30.022 rows=20422 loops=92)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 81687"
" -> Sort (cost=69.83..72.33 rows=1000 width=320) (actual time=125.505..223.780 rows=1871350 loops=1)"
" Sort Key: d.id, d.region"
" Sort Method: quicksort Memory: 3329kB"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320) (actual time=96.292..103.677 rows=18085 loops=1)"
"Planning Time: 0.546 ms"
"Execution Time: 25501.827 ms"
问题是当我交换 from/to 中的位置时,即 place_from
中的大量行(~18,000)和 place_to
中的少量行( 92),查询需要永远。顺便说一下,mv_journey
预计在两种情况下匹配的行数相同 - 在一个方向上预期的记录不会比另一个方向多。
如果没有它 运行 几个小时并且 PGAdmin 4 失去与服务器的连接,我还没有完成第二个查询。因此,我什至无法对其执行 EXPLAIN ANALYZE
。但是我有 EXPLAIN
:
"GroupAggregate (cost=474135.40..474152.90 rows=25 width=686)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" -> Sort (cost=474114.90..474114.96 rows=25 width=622)"
" Sort Key: [many dimension columns...]"
" -> Merge Join (cost=473720.23..474114.31 rows=25 width=622)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))## Heading ##"
" -> Sort (cost=473650.40..473779.18 rows=51511 width=340)"
" Sort Key: j.to_id, j.to_region"
" -> Nested Loop (cost=0.57..469619.00 rows=51511 width=340)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..469.08 rows=52 width=60)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" -> Sort (cost=69.83..72.33 rows=1000 width=320)"
" Sort Key: d.id, d.region"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320)"
我的假设是,如果我在 from/to 的两边都有等效的索引,那么 Postgres 将使用镜像相反的查询计划,对源进行合并连接,并使用嵌套循环连接idx_mv_journey_to
为目的地。
但看起来查询规划器的行数估计在两个查询中都偏离了。尽管如此,第一个查询执行得如此之好似乎只是运气。
我尝试了以下方法,none 其中有效
- 交换内部连接语句,使目标连接在前
ALTER TABLE mv_journey ALTER COLUMN to_id SET STATISTICS 1000; ANALYZE mv_journey
ALTER TABLE mv_journey ALTER COLUMN from_id SET STATISTICS 1000; ANALYZE mv_journey
我猜计划是在 CTE 执行开始之前完成的?这就是为什么它不知道创建 place_from
和 place_to
集的 fn_location_get
调用会产生什么?
fn_location_get
是一个复杂的函数,它有自己的递归 CTE,我不想将其逻辑从函数中取出并带入此 CTE。
摆脱这种困境的最佳方法是什么?
最直接的方法是创建两个临时表作为函数调用的结果,手动分析它们,然后运行查询临时表而不是函数调用。
我在写问题的过程中找到了答案...不要使用 CTE,而是使用临时表。
DROP TABLE IF EXISTS place_from;
CREATE TEMP TABLE place_from AS
select *
from fn_location_get(...);
DROP TABLE IF EXISTS place_to;
CREATE TEMP TABLE place_to AS
select *
from fn_location_get(...);
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
我认为这行得通,因为在报告 select
的查询计划完成时,临时表的行数已知,可以制定更好的查询计划。
但是,行数估计仍然不准确。足够好,可以选择正确的计划,但不准确。
"GroupAggregate (cost=200682.98..200706.78 rows=34 width=686) (actual time=21233.486..33200.052 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" -> Sort (cost=200682.98..200683.07 rows=34 width=622) (actual time=21077.807..21443.739 rows=1802571 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 800480kB"
" -> Merge Join (cost=200555.00..200682.12 rows=34 width=622) (actual time=4820.798..6106.722 rows=1802571 loops=1)"
" Merge Cond: ((from_id = o.id) AND (from_region = o.region))"
" -> Sort (cost=199652.79..199677.24 rows=9779 width=340) (actual time=4794.354..5003.954 rows=1810023 loops=1)"
" Sort Key: j.from_id, j.from_region"
" Sort Method: quicksort Memory: 603741kB"
" -> Nested Loop (cost=0.57..199004.67 rows=9779 width=340) (actual time=0.044..3498.767 rows=1810023 loops=1)"
" -> Seq Scan on place_to d (cost=0.00..11.90 rows=190 width=320) (actual time=0.006..0.078 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_to on mv_journey j (cost=0.57..1046.82 rows=51 width=60) (actual time=0.020..35.055 rows=19674 loops=92)"
" Index Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 78697"
" -> Sort (cost=902.20..920.02 rows=7125 width=320) (actual time=26.434..121.106 rows=1802572 loops=1)"
" Sort Key: o.id, o.region"
" Sort Method: quicksort Memory: 3329kB"
" -> Seq Scan on place_from o (cost=0.00..446.25 rows=7125 width=320) (actual time=0.016..4.205 rows=18085 loops=1)"
"Planning Time: 0.792 ms"
"Execution Time: 33286.461 ms"
更新:当按照 jjanes 的建议在 CREATE 之后添加手动 ANALYZE 时,现在的估计符合预期。
我有一个复杂的查询,下面已经大大简化了,运行ning on "PostgreSQL 11.9 on aarch64-unknown-linux-gnu,由 aarch64-unknown-[=100= 编译]-gnu-gcc (GCC) 7.4.0,64 位",运行在 AWS Aurora Serverless 2xlarge 服务器(8 核,64GB RAM)上运行。
我有以下...
mv_journey,一个具有约 5.5 亿行的物化视图,其中包含有关具有起点和终点的旅程的信息,以及关于这些的一些度量(旅程多长时间take, etc),用 from_id
和 from_region
列定义,用于标识起点,to_id
和 to_region
用于标识目的地。
place_from 和 place_to,它们是从函数 fn_location_get
计算得出的,在 CTE 的初始步骤中,包含 id
和 region
(映射到 from_id
、from_region
和 to_id
、to_region
, 分别)。这些还包含来自该区域的汇总级别,例如 country
、continent
。通常这些 return 在 ~100 到 20,000 行之间。
稍后在该 CTE 中,我使用 place_from
和 place_to
过滤 550M mv_journey
行,并使用 group by
根据旅程创建汇总报告,例如从一个国家到另一个国家。
简化查询是这样的。
WITH place_from AS (
select *
from fn_location_get(...)
), place_to AS (
select *
from fn_location_get(...)
)
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
我在 mv_journey
上有索引CREATE INDEX idx_mv_journey_from ON mv_journey (from_id, from_region);
CREATE INDEX idx_mv_journey_to ON mv_journey (to_id, to_region);
当我 运行 查询(使用 SET LOCAL work_mem = '2048MB'
调用快速排序)在 place_from
(92) 中有少量行而在 [=28= 中有大量行时] (~18,000),查询 运行s 在大约 25 秒内使用以下查询计划(包括 CTE 中生成 place_from
和 place_to
的步骤)。
"GroupAggregate (cost=530108.64..530129.64 rows=30 width=686) (actual time=13097.187..25408.707 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=34.275..34.331 rows=92 loops=1)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=96.287..97.428 rows=18085 loops=1)"
" -> Sort (cost=530088.14..530088.22 rows=30 width=622) (actual time=12935.329..13295.468 rows=1871349 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 826782kB"
" -> Merge Join (cost=529643.68..530087.41 rows=30 width=622) (actual time=4708.780..6021.449 rows=1871349 loops=1)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" -> Sort (cost=529573.85..529719.16 rows=58124 width=340) (actual time=4583.265..4788.625 rows=1878801 loops=1)"
" Sort Key: j.to_id, j.to_region"
" Sort Method: quicksort Memory: 623260kB"
" -> Nested Loop (cost=0.57..524974.25 rows=58124 width=340) (actual time=34.324..3079.815 rows=1878801 loops=1)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320) (actual time=34.277..34.432 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..524.37 rows=58 width=60) (actual time=0.018..30.022 rows=20422 loops=92)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 81687"
" -> Sort (cost=69.83..72.33 rows=1000 width=320) (actual time=125.505..223.780 rows=1871350 loops=1)"
" Sort Key: d.id, d.region"
" Sort Method: quicksort Memory: 3329kB"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320) (actual time=96.292..103.677 rows=18085 loops=1)"
"Planning Time: 0.546 ms"
"Execution Time: 25501.827 ms"
问题是当我交换 from/to 中的位置时,即 place_from
中的大量行(~18,000)和 place_to
中的少量行( 92),查询需要永远。顺便说一下,mv_journey
预计在两种情况下匹配的行数相同 - 在一个方向上预期的记录不会比另一个方向多。
如果没有它 运行 几个小时并且 PGAdmin 4 失去与服务器的连接,我还没有完成第二个查询。因此,我什至无法对其执行 EXPLAIN ANALYZE
。但是我有 EXPLAIN
:
"GroupAggregate (cost=474135.40..474152.90 rows=25 width=686)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" -> Sort (cost=474114.90..474114.96 rows=25 width=622)"
" Sort Key: [many dimension columns...]"
" -> Merge Join (cost=473720.23..474114.31 rows=25 width=622)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))## Heading ##"
" -> Sort (cost=473650.40..473779.18 rows=51511 width=340)"
" Sort Key: j.to_id, j.to_region"
" -> Nested Loop (cost=0.57..469619.00 rows=51511 width=340)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..469.08 rows=52 width=60)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" -> Sort (cost=69.83..72.33 rows=1000 width=320)"
" Sort Key: d.id, d.region"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320)"
我的假设是,如果我在 from/to 的两边都有等效的索引,那么 Postgres 将使用镜像相反的查询计划,对源进行合并连接,并使用嵌套循环连接idx_mv_journey_to
为目的地。
但看起来查询规划器的行数估计在两个查询中都偏离了。尽管如此,第一个查询执行得如此之好似乎只是运气。
我尝试了以下方法,none 其中有效
- 交换内部连接语句,使目标连接在前
ALTER TABLE mv_journey ALTER COLUMN to_id SET STATISTICS 1000; ANALYZE mv_journey
ALTER TABLE mv_journey ALTER COLUMN from_id SET STATISTICS 1000; ANALYZE mv_journey
我猜计划是在 CTE 执行开始之前完成的?这就是为什么它不知道创建 place_from
和 place_to
集的 fn_location_get
调用会产生什么?
fn_location_get
是一个复杂的函数,它有自己的递归 CTE,我不想将其逻辑从函数中取出并带入此 CTE。
摆脱这种困境的最佳方法是什么?
最直接的方法是创建两个临时表作为函数调用的结果,手动分析它们,然后运行查询临时表而不是函数调用。
我在写问题的过程中找到了答案...不要使用 CTE,而是使用临时表。
DROP TABLE IF EXISTS place_from;
CREATE TEMP TABLE place_from AS
select *
from fn_location_get(...);
DROP TABLE IF EXISTS place_to;
CREATE TEMP TABLE place_to AS
select *
from fn_location_get(...);
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
我认为这行得通,因为在报告 select
的查询计划完成时,临时表的行数已知,可以制定更好的查询计划。
但是,行数估计仍然不准确。足够好,可以选择正确的计划,但不准确。
"GroupAggregate (cost=200682.98..200706.78 rows=34 width=686) (actual time=21233.486..33200.052 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" -> Sort (cost=200682.98..200683.07 rows=34 width=622) (actual time=21077.807..21443.739 rows=1802571 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 800480kB"
" -> Merge Join (cost=200555.00..200682.12 rows=34 width=622) (actual time=4820.798..6106.722 rows=1802571 loops=1)"
" Merge Cond: ((from_id = o.id) AND (from_region = o.region))"
" -> Sort (cost=199652.79..199677.24 rows=9779 width=340) (actual time=4794.354..5003.954 rows=1810023 loops=1)"
" Sort Key: j.from_id, j.from_region"
" Sort Method: quicksort Memory: 603741kB"
" -> Nested Loop (cost=0.57..199004.67 rows=9779 width=340) (actual time=0.044..3498.767 rows=1810023 loops=1)"
" -> Seq Scan on place_to d (cost=0.00..11.90 rows=190 width=320) (actual time=0.006..0.078 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_to on mv_journey j (cost=0.57..1046.82 rows=51 width=60) (actual time=0.020..35.055 rows=19674 loops=92)"
" Index Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 78697"
" -> Sort (cost=902.20..920.02 rows=7125 width=320) (actual time=26.434..121.106 rows=1802572 loops=1)"
" Sort Key: o.id, o.region"
" Sort Method: quicksort Memory: 3329kB"
" -> Seq Scan on place_from o (cost=0.00..446.25 rows=7125 width=320) (actual time=0.016..4.205 rows=18085 loops=1)"
"Planning Time: 0.792 ms"
"Execution Time: 33286.461 ms"
更新:当按照 jjanes 的建议在 CREATE 之后添加手动 ANALYZE 时,现在的估计符合预期。