Bitmap Index Scan always Followed by Bitmap Heap Scan for JSON field query

Bitmap Index Scan always Followed by Bitmap Heap Scan for JSON field query

我有以下索引:

CREATE INDEX index_c_profiles_on_city_state_name_domain ON 
c_profiles ((data->>'state'), (data->>'city'), name, domain);

我正在使用以下查询:

SELECT mm.name, mm.domain, mm.data ->> 'city' as city, mm.data ->> 
'state' as state 
FROM c_profiles as mm
WHERE ((mm.data ->> 'state') = 'AZ')

但是当我使用 EXPLAIN ANALYZE 对此进行测试时,它总是先进行位图索引扫描(良好且快速),然后进行非常非常慢的位图堆扫描(通常比单独的索引扫描慢 100 倍)。

我也试过只索引WHERE条件,结果是一样的,它在使用索引后仍然在做非常慢的位图堆扫描。

为什么 Postgres 这样做?我怎样才能让它只进行索引扫描以加快此查询?

这是 EXPLAIN ANALYZE 结果示例:

[
  {
    "Execution Time": 53.655,
    "Planning Time": 0.081,
    "Plan": {
      "Exact Heap Blocks": 1338,
      "Node Type": "Bitmap Heap Scan",
      "Actual Total Time": 53.031,
      "Shared Hit Blocks": 727,
      "Schema": "public",
      "Plans": [
        {
          "Node Type": "Bitmap Index Scan",
          "Actual Total Time": 0.455,
          "Shared Hit Blocks": 2,
          "Shared Read Blocks": 13,
          "Temp Written Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Hit Blocks": 0,
          "Plan Width": 0,
          "Actual Loops": 1,
          "Actual Startup Time": 0.455,
          "Temp Read Blocks": 0,
          "Local Read Blocks": 0,
          "Index Name": "index_mattermark_profiles_on_city_state_name_domain",
          "Startup Cost": 0,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Written Blocks": 0,
          "Plan Rows": 788,
          "Index Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
          "Actual Rows": 1417,
          "Parent Relationship": "Outer",
          "Total Cost": 34.33
        }
      ],
      "Shared Read Blocks": 650,
      "Relation Name": "mattermark_profiles",
      "Local Hit Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Temp Written Blocks": 0,
      "Plan Width": 1010,
      "Actual Loops": 1,
      "Rows Removed by Index Recheck": 0,
      "Lossy Heap Blocks": 0,
      "Alias": "mm",
      "Recheck Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
      "Temp Read Blocks": 0,
      "Output": [
        "name",
        "domain",
        "(data ->> 'city'::text)",
        "(data ->> 'state'::text)"
      ],
      "Actual Startup Time": 0.703,
      "Local Read Blocks": 0,
      "Startup Cost": 34.53,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Written Blocks": 0,
      "Plan Rows": 788,
      "Actual Rows": 1417,
      "Total Cost": 2894.17
    },
    "Triggers": []
  }
]

PostgreSQL 选择位图索引扫描而不是普通的索引扫描,因为它认为它会更快。

当估计的结果行数很高时通常会出现这种情况。

正常的索引扫描必须为找到的每个索引条目访问 table,这会导致 table 上出现大量随机 I/O,并且可能需要相同的块来多次处理。

位图索引扫描首先找到所有索引条目,按照它们在 table 中的物理位置对它们进行排序 ,然后扫描所需的块table。这样效率更高,因为它将按顺序扫描 table 个块。

第二步,位图堆扫描,在EXPLAIN输出中显示为它自己的节点,通常是更昂贵的步骤。

所以一切看起来都井井有条。

您可以尝试将 enable_bitmapscan 设置为 off 并查看 PostgreSQL 是否正确,并且最终的计划会更昂贵。