Bitmap Index Scan always Followed by Bitmap Heap Scan for JSON field query
Bitmap Index Scan always Followed by Bitmap Heap Scan for JSON field query
我有以下索引:
CREATE INDEX index_c_profiles_on_city_state_name_domain ON
c_profiles ((data->>'state'), (data->>'city'), name, domain);
我正在使用以下查询:
SELECT mm.name, mm.domain, mm.data ->> 'city' as city, mm.data ->>
'state' as state
FROM c_profiles as mm
WHERE ((mm.data ->> 'state') = 'AZ')
但是当我使用 EXPLAIN ANALYZE 对此进行测试时,它总是先进行位图索引扫描(良好且快速),然后进行非常非常慢的位图堆扫描(通常比单独的索引扫描慢 100 倍)。
我也试过只索引WHERE条件,结果是一样的,它在使用索引后仍然在做非常慢的位图堆扫描。
为什么 Postgres 这样做?我怎样才能让它只进行索引扫描以加快此查询?
这是 EXPLAIN ANALYZE 结果示例:
[
{
"Execution Time": 53.655,
"Planning Time": 0.081,
"Plan": {
"Exact Heap Blocks": 1338,
"Node Type": "Bitmap Heap Scan",
"Actual Total Time": 53.031,
"Shared Hit Blocks": 727,
"Schema": "public",
"Plans": [
{
"Node Type": "Bitmap Index Scan",
"Actual Total Time": 0.455,
"Shared Hit Blocks": 2,
"Shared Read Blocks": 13,
"Temp Written Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Hit Blocks": 0,
"Plan Width": 0,
"Actual Loops": 1,
"Actual Startup Time": 0.455,
"Temp Read Blocks": 0,
"Local Read Blocks": 0,
"Index Name": "index_mattermark_profiles_on_city_state_name_domain",
"Startup Cost": 0,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Written Blocks": 0,
"Plan Rows": 788,
"Index Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
"Actual Rows": 1417,
"Parent Relationship": "Outer",
"Total Cost": 34.33
}
],
"Shared Read Blocks": 650,
"Relation Name": "mattermark_profiles",
"Local Hit Blocks": 0,
"Local Dirtied Blocks": 0,
"Temp Written Blocks": 0,
"Plan Width": 1010,
"Actual Loops": 1,
"Rows Removed by Index Recheck": 0,
"Lossy Heap Blocks": 0,
"Alias": "mm",
"Recheck Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
"Temp Read Blocks": 0,
"Output": [
"name",
"domain",
"(data ->> 'city'::text)",
"(data ->> 'state'::text)"
],
"Actual Startup Time": 0.703,
"Local Read Blocks": 0,
"Startup Cost": 34.53,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Written Blocks": 0,
"Plan Rows": 788,
"Actual Rows": 1417,
"Total Cost": 2894.17
},
"Triggers": []
}
]
PostgreSQL 选择位图索引扫描而不是普通的索引扫描,因为它认为它会更快。
当估计的结果行数很高时通常会出现这种情况。
正常的索引扫描必须为找到的每个索引条目访问 table,这会导致 table 上出现大量随机 I/O,并且可能需要相同的块来多次处理。
位图索引扫描首先找到所有索引条目,按照它们在 table 中的物理位置对它们进行排序 ,然后扫描所需的块table。这样效率更高,因为它将按顺序扫描 table 个块。
第二步,位图堆扫描,在EXPLAIN
输出中显示为它自己的节点,通常是更昂贵的步骤。
所以一切看起来都井井有条。
您可以尝试将 enable_bitmapscan
设置为 off
并查看 PostgreSQL 是否正确,并且最终的计划会更昂贵。
我有以下索引:
CREATE INDEX index_c_profiles_on_city_state_name_domain ON
c_profiles ((data->>'state'), (data->>'city'), name, domain);
我正在使用以下查询:
SELECT mm.name, mm.domain, mm.data ->> 'city' as city, mm.data ->>
'state' as state
FROM c_profiles as mm
WHERE ((mm.data ->> 'state') = 'AZ')
但是当我使用 EXPLAIN ANALYZE 对此进行测试时,它总是先进行位图索引扫描(良好且快速),然后进行非常非常慢的位图堆扫描(通常比单独的索引扫描慢 100 倍)。
我也试过只索引WHERE条件,结果是一样的,它在使用索引后仍然在做非常慢的位图堆扫描。
为什么 Postgres 这样做?我怎样才能让它只进行索引扫描以加快此查询?
这是 EXPLAIN ANALYZE 结果示例:
[
{
"Execution Time": 53.655,
"Planning Time": 0.081,
"Plan": {
"Exact Heap Blocks": 1338,
"Node Type": "Bitmap Heap Scan",
"Actual Total Time": 53.031,
"Shared Hit Blocks": 727,
"Schema": "public",
"Plans": [
{
"Node Type": "Bitmap Index Scan",
"Actual Total Time": 0.455,
"Shared Hit Blocks": 2,
"Shared Read Blocks": 13,
"Temp Written Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Hit Blocks": 0,
"Plan Width": 0,
"Actual Loops": 1,
"Actual Startup Time": 0.455,
"Temp Read Blocks": 0,
"Local Read Blocks": 0,
"Index Name": "index_mattermark_profiles_on_city_state_name_domain",
"Startup Cost": 0,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Written Blocks": 0,
"Plan Rows": 788,
"Index Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
"Actual Rows": 1417,
"Parent Relationship": "Outer",
"Total Cost": 34.33
}
],
"Shared Read Blocks": 650,
"Relation Name": "mattermark_profiles",
"Local Hit Blocks": 0,
"Local Dirtied Blocks": 0,
"Temp Written Blocks": 0,
"Plan Width": 1010,
"Actual Loops": 1,
"Rows Removed by Index Recheck": 0,
"Lossy Heap Blocks": 0,
"Alias": "mm",
"Recheck Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
"Temp Read Blocks": 0,
"Output": [
"name",
"domain",
"(data ->> 'city'::text)",
"(data ->> 'state'::text)"
],
"Actual Startup Time": 0.703,
"Local Read Blocks": 0,
"Startup Cost": 34.53,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Written Blocks": 0,
"Plan Rows": 788,
"Actual Rows": 1417,
"Total Cost": 2894.17
},
"Triggers": []
}
]
PostgreSQL 选择位图索引扫描而不是普通的索引扫描,因为它认为它会更快。
当估计的结果行数很高时通常会出现这种情况。
正常的索引扫描必须为找到的每个索引条目访问 table,这会导致 table 上出现大量随机 I/O,并且可能需要相同的块来多次处理。
位图索引扫描首先找到所有索引条目,按照它们在 table 中的物理位置对它们进行排序 ,然后扫描所需的块table。这样效率更高,因为它将按顺序扫描 table 个块。
第二步,位图堆扫描,在EXPLAIN
输出中显示为它自己的节点,通常是更昂贵的步骤。
所以一切看起来都井井有条。
您可以尝试将 enable_bitmapscan
设置为 off
并查看 PostgreSQL 是否正确,并且最终的计划会更昂贵。