升级到版本 11 后 Postgres 查询产生高磁盘 IO

Question

从 RDS 版本 9.6 升级到 RDS 版本 11 后，Postgres 查询开始谈论高读取 IOPS 和 CPU。数据集与升级前相同。不确定是什么问题。

下面是解释计划：

会不会是因为索引损坏了？

Explain (analyze true, verbose true, costs true, buffers true, timing true )
select consumertr0_.ref_id as col_0_0_
from consumer_transactions consumertr0_
where (consumertr0_.remaining_amount is not null)
  and (consumertr0_.expiry_time is not null)
  and consumertr0_.expiry_time>'2020-12-15T00:00:00'
  and consumertr0_.expiry_time<now()
  and consumertr0_.remaining_amount>0
order by consumertr0_.expiry_time asc
limit 20000;

                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..80391.67 rows=20000 width=24) (actual time=191716.213..192077.229 rows=20000 loops=1)
   Output: ref_id, expiry_time
   Buffers: shared hit=9481343 read=1566736
   I/O Timings: read=97.486
   ->  Index Scan using consumer_transactions_expiry_time_remaining_amount on public.consumer_transactions consumertr0_  (cost=0.57..1109723.40 rows=276081 width=24) (actual time=191716.211..192075.241 rows=20000 loops=1)
         Output: ref_id, expiry_time
         Index Cond: ((consumertr0_.expiry_time > '2020-12-15 00:00:00'::timestamp without time zone) AND (consumertr0_.expiry_time < now()))
         Buffers: shared hit=9481343 read=1566736
         I/O Timings: read=97.486
 Planning Time: 1.525 ms
 Execution Time: 192078.720 ms
(11 rows)

索引定义：

"consumer_transactions_expiry_time_remaining_amount" btree
   (expiry_time, remaining_amount)
WHERE expiry_time IS NOT NULL
  AND remaining_amount IS NOT NULL
  AND remaining_amount > 0::numeric

分析详情：

        relname        |         last_analyze          |       last_autoanalyze      
 consumer_transactions | 2021-01-24 22:00:03.144379+00 |

先前以低 IOPS 要求非常快速地处理了相同数量的记录。虽然我没有9.6之前版本的讲解计划

解法： 我用不同的名称创建了相同的索引，它解决了这个问题。一旦确定为什么旧索引在升级后突然变得如此缓慢，我就会删除旧索引。

用新索引解释计划：

explain (analyze true, verbose true, costs true, buffers true, timing true )  select consumertr0_.ref_id as col_0_0_ from consumer_transactions consumertr0_ where (consumertr0_.remaining_amount is not null) and (consumertr0_.expiry_time is not null) and consumertr0_.expiry_time>'2019-07-01T00:00:00' and consumertr0_.expiry_time<now() and consumertr0_.remaining_amount>0 order by consumertr0_.expiry_time asc limit 20000;

            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..73592.06 rows=20000 width=24) (actual time=0.048..18.307 rows=20000 loops=1)
   Output: ref_id, expiry_time
   Buffers: shared hit=11140
   ->  Index Scan using consumer_transactions_expiry_time_remaining_amount2 on public.consumer_transactions consumertr0_  (cost=0.57..22273478.26 rows=6053275 width=24) (actual time=0.047..16.119 rows=20000 loops=1)
         Output: ref_id, expiry_time
         Index Cond: ((consumertr0_.expiry_time > '2019-07-01 00:00:00'::timestamp without time zone) AND (consumertr0_.expiry_time < now()))
         Buffers: shared hit=11140
 Planning Time: 1.160 ms
 Execution Time: 19.600 ms
(9 rows)

(END)

同样，旧的解释计划直接从实际时间 191716.211 开始，而新的解释计划从 0.047 开始。我不明白 191716.211 之前的实际时间在哪里。

仅供参考： 索引膨胀详细信息：

 current_database | schemaname |        tblname        |                       idxname                       |  real_size  | extra_size  |    extra_ratio    | fillfactor | bloat_size  |    bloat_ratio    
| is_na 
------------------+------------+-----------------------+-----------------------------------------------------+-------------+-------------+-------------------+------------+-------------+-------------------
+-------
| f
 proddb            | public     | consumer_transactions | consumer_transactions_expiry_time_remaining_amount  |  4748820480 |  3698360320 |  77.8795563145819 |         90 |  3583516672 |  75.4611947765185 | f
 proddb            | public     | consumer_transactions | consumer_transactions_expiry_time_remaining_amount2 |  1755013120 |   704552960 |  40.1451676896866 |         90 |   589709312 |  33.6014190024973 | f

Answer 1

旧索引非常臃肿：扫描它必须查看 11048079 个 8kB 块（并从磁盘读取其中的 1566736 个）以找到匹配的行，而新索引只需要查看 11140 个块。

我不确定索引是如何进入这种状态的。

第二个索引列似乎没什么用。

此查询的完美索引是：

CREATE INDEX ON public.consumer_transactions (expiry_time) INCLUDE (ref_id)
WHERE remaining_amount > 0;

如果您 VACUUM table，您将获得快速的仅索引扫描。

升级到版本 11 后 Postgres 查询产生高磁盘 IO

Postgres query making high disk IO after upgrade to version 11

postgresql

query-optimization

postgresql-performance