eclipselink jpa 使用 COUNT(id) 而不是 COUNT(*) 生成计数查询

Question

我正在使用 Eclipselink、Spring 数据和 Postgresql。在我的项目中，我注意到当使用 Spring 数据存储库提供的分页结果时，会出现如下查询：

SELECT COUNT(id) 
FROM table 
WHERE [part generated according to specification]

其中 "id" 是 "table" 的主键。通过解释挖掘，我注意到对于非常大的 table，COUNT(id) 比 COUNT() 慢大约 10 倍（count(id) 在 "id" 中查找非空值column while count( ) 只是 returns 匹配条件的行数)，count(*) 也可以使用索引 while count(id) - not.

我跟踪了 Spring 数据基本存储库 class，似乎只有 JPA 实现负责此查询生成。

使用 count(id) 而不是更快的 COUNT(* ) 的原因是什么？
我可以更改此行为（无论如何 - 甚至增强现有组件）吗？

感谢任何帮助

-- [编辑] --

有个table:

\d ord_order
                                       Table "public.ord_order"
         Column          |           Type            |                       Modificators
-------------------------+--------------------------+----------------------------------------------------------
 id                      | integer                  | NOT NULL DEFAULT nextval('ord_order_id_seq'::regclass)
 test_order              | boolean                  | DEFAULT false
...
Indexes:
    "pk_order" PRIMARY KEY, btree (id)
    "idx_test_order" btree (test_order)



# explain SELECT COUNT(*) FROM ord_order WHERE (test_order = false);
                                QUERY PLAN
--------------------------------------------------------------------------
 Aggregate  (cost=89898.79..89898.80 rows=1 width=0)
   ->  Index Only Scan using idx_test_order on ord_order  (cost=0.43..85375.37 rows=1809366 width=0)
         Index Cond: (test_order = false)
         Filter: (NOT test_order)
(4 wiersze)



# explain SELECT COUNT(id) FROM ord_order WHERE (test_order = false);
                                QUERY PLAN
--------------------------------------------------------------------------
 Aggregate  (cost=712924.52..712924.53 rows=1 width=4)
   ->  Seq Scan on ord_order  (cost=0.00..708401.10 rows=1809366 width=4)
         Filter: (NOT test_order)
(3 wiersze)

现在区别是 ~90k 与 ~713k 以及索引扫描与全扫描

Answer 1

count(*) 可以使用索引，因为查询中只引用了一个列 (test_order)。 count(id) 引用两列，因此 Postgres 必须 select id 列和 test_order 列才能构建结果。

正如我已经提到的，有些人认为 count(id) 比 count(*) 快 - 当查询没有限制时 。对于任何具有像样的优化器的 DBMS 来说，这个神话从来都不是真的。我想这就是为什么您的混淆层使用 count(id) 而不是 count(*) 的原因。

假设您不想摆脱 ORM（以重新控制您的应用程序正在使用的 SQL），我能看到的唯一解决方法是创建一个部分索引Postgres 可以使用：

create index on ord_order (id)
where test_order = false;

Answer 2

我设法提供自定义 Spring 数据存储库基础 class 实现和使用该实现的工厂。由于结果生成的计数查询现在具有以下形式：

SELECT COUNT(1) FROM table

与 COUNT(* ) 具有相同的计划。这似乎是一个很好的解决方案，并且适用于应用程序中所有定义的存储库。

我不知道如何生成 COUNT(*)，COUNT(1) 更容易，因为 COUNT 函数需要一些表达式作为参数，我可以提供静态值 - 1

eclipselink jpa 使用 COUNT(id) 而不是 COUNT(*) 生成计数查询

eclipselink jpa generates count queries using COUNT(id) instead COUNT(*)

java

postgresql

hibernate

jpa

spring-data-jpa