PostgreSQL 使用 CTE 优化包含 Window 函数的查询性能

PostgreSQL optimize query performance that contains Window function with CTE

此处 amenity_categoryparent_path 列是 JSONB 列,其值类似于 ["Tv","Air Condition"] 和 ["20000","20100"," 203"] 分别。除此之外,其他列都是正常的 varcharnumeric 类型。我有大约 2.5M 行,主键在 id 上,并且已编入索引。基本上 CTE 部分在 rp.parent_path 匹配多行时需要时间。

样本数据集:

当前查询:

WITH CTE AS
(
  SELECT id,
  property_name,
  property_type_category,
  review_score, 
  amenity_category.name, 
  count(*) AS cnt FROM table_name rp, 
  jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
  WHERE rp.parent_path ? '203' AND number_of_review >= 1
  GROUP BY amenity_category.name,id 
),
CTE2 as
(
  SELECT id, property_name,property_type_category,name,
  ROW_NUMBER() OVER (PARTITION BY property_type_category,
  name ORDER BY review_score DESC),
  COUNT(id) OVER (PARTITION BY property_type_category,
  name ORDER BY name DESC) 
  FROM CTE
)

SELECT id, property_name, property_type_category, name, COUNT 
FROM CTE2
where row_number = 1

当前输出:

所以我的基本问题是有没有其他方法可以重写此查询或优化当前查询?

如果可以安全地假设 amenity_categories 中的数组元素是不同的(没有重复的数组元素),我们可以从根本上简化为:

SELECT DISTINCT ON (property_type_category, ac.name)
       id, property_name, property_type_category, ac.name
     , COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM   table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE  parent_path ? '203'
AND    number_of_review >= 1
ORDER  BY property_type_category, ac.name, review_score DESC;

如果review_score可以为NULL,则:

...
ORDER  BY property_type_category, ac.name, review_score DESC NULLS LAST;

这是有效的,因为 DISTINCT ON 作为最后一步应用(在 window 函数之后)。参见:

  • Best way to get result count before LIMIT was applied
  • PostgreSQL: running count of rows for a query 'by minute'

parent_pathnumber_of_review 可能应该被编入索引。取决于 WHERE 条件的数据分布和选择性,您没有透露。

关于DISTINCT ON

  • Select first row in each GROUP BY group?

假设 idNOT NULLcount(*) 更快,相当于 count(id)