PostgreSQL 使用 CTE 优化包含 Window 函数的查询性能
PostgreSQL optimize query performance that contains Window function with CTE
此处 amenity_category
和 parent_path
列是 JSONB
列,其值类似于 ["Tv","Air Condition"] 和 ["20000","20100"," 203"] 分别。除此之外,其他列都是正常的 varchar
和 numeric
类型。我有大约 2.5M 行,主键在 id
上,并且已编入索引。基本上 CTE 部分在 rp.parent_path 匹配多行时需要时间。
样本数据集:
当前查询:
WITH CTE AS
(
SELECT id,
property_name,
property_type_category,
review_score,
amenity_category.name,
count(*) AS cnt FROM table_name rp,
jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
WHERE rp.parent_path ? '203' AND number_of_review >= 1
GROUP BY amenity_category.name,id
),
CTE2 as
(
SELECT id, property_name,property_type_category,name,
ROW_NUMBER() OVER (PARTITION BY property_type_category,
name ORDER BY review_score DESC),
COUNT(id) OVER (PARTITION BY property_type_category,
name ORDER BY name DESC)
FROM CTE
)
SELECT id, property_name, property_type_category, name, COUNT
FROM CTE2
where row_number = 1
当前输出:
所以我的基本问题是有没有其他方法可以重写此查询或优化当前查询?
如果可以安全地假设 amenity_categories
中的数组元素是不同的(没有重复的数组元素),我们可以从根本上简化为:
SELECT DISTINCT ON (property_type_category, ac.name)
id, property_name, property_type_category, ac.name
, COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE parent_path ? '203'
AND number_of_review >= 1
ORDER BY property_type_category, ac.name, review_score DESC;
如果review_score
可以为NULL,则:
...
ORDER BY property_type_category, ac.name, review_score DESC NULLS LAST;
这是有效的,因为 DISTINCT ON
作为最后一步应用(在 window 函数之后)。参见:
- Best way to get result count before LIMIT was applied
- PostgreSQL: running count of rows for a query 'by minute'
parent_path
和 number_of_review
可能应该被编入索引。取决于 WHERE
条件的数据分布和选择性,您没有透露。
关于DISTINCT ON
:
- Select first row in each GROUP BY group?
假设 id
是 NOT NULL
,count(*)
更快,相当于 count(id)
。
此处 amenity_category
和 parent_path
列是 JSONB
列,其值类似于 ["Tv","Air Condition"] 和 ["20000","20100"," 203"] 分别。除此之外,其他列都是正常的 varchar
和 numeric
类型。我有大约 2.5M 行,主键在 id
上,并且已编入索引。基本上 CTE 部分在 rp.parent_path 匹配多行时需要时间。
样本数据集:
当前查询:
WITH CTE AS
(
SELECT id,
property_name,
property_type_category,
review_score,
amenity_category.name,
count(*) AS cnt FROM table_name rp,
jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
WHERE rp.parent_path ? '203' AND number_of_review >= 1
GROUP BY amenity_category.name,id
),
CTE2 as
(
SELECT id, property_name,property_type_category,name,
ROW_NUMBER() OVER (PARTITION BY property_type_category,
name ORDER BY review_score DESC),
COUNT(id) OVER (PARTITION BY property_type_category,
name ORDER BY name DESC)
FROM CTE
)
SELECT id, property_name, property_type_category, name, COUNT
FROM CTE2
where row_number = 1
当前输出:
所以我的基本问题是有没有其他方法可以重写此查询或优化当前查询?
如果可以安全地假设 amenity_categories
中的数组元素是不同的(没有重复的数组元素),我们可以从根本上简化为:
SELECT DISTINCT ON (property_type_category, ac.name)
id, property_name, property_type_category, ac.name
, COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE parent_path ? '203'
AND number_of_review >= 1
ORDER BY property_type_category, ac.name, review_score DESC;
如果review_score
可以为NULL,则:
...
ORDER BY property_type_category, ac.name, review_score DESC NULLS LAST;
这是有效的,因为 DISTINCT ON
作为最后一步应用(在 window 函数之后)。参见:
- Best way to get result count before LIMIT was applied
- PostgreSQL: running count of rows for a query 'by minute'
parent_path
和 number_of_review
可能应该被编入索引。取决于 WHERE
条件的数据分布和选择性,您没有透露。
关于DISTINCT ON
:
- Select first row in each GROUP BY group?
假设 id
是 NOT NULL
,count(*)
更快,相当于 count(id)
。