我应该在连接条件中还是在先前的 CTE 中放置行号过滤器?
Should I put a row number filter in join condition or in a prior CTE?
我有一个 subscription
table 和一个 payments
table 我需要加入。
我试图在 2 个选项之间做出决定,性能是一个关键考虑因素。
以下两个选项中哪个效果更好?
我正在使用 Impala,这些 table 很大(数百万行)我只需要为每个 id
和 [=15= 获取一行] 分组(因此 row_number()
分析函数)。
我已经缩短了查询以说明我的问题:
选项 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
选项 2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
“选项 0”也存在。更传统的 "derived table" 根本不需要使用任何 CTE。
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
) p ON s.id = p.id
AND p.sameday_rownum = 1
所有选项 0,1 和 2 都可能产生相同或非常相似的解释计划(尽管我对 SQL 服务器的声明比 Impala 更有信心)。
采用 CTE 本身并不能使查询更高效或性能更好,因此选项 1 和选项 2 之间的语法更改并不重要。我自己更喜欢选项 0,因为我更喜欢将 CTE 用于特定任务(例如递归)。
你应该做的是use explain plans研究每个选项产生的结果。
我有一个 subscription
table 和一个 payments
table 我需要加入。
我试图在 2 个选项之间做出决定,性能是一个关键考虑因素。
以下两个选项中哪个效果更好?
我正在使用 Impala,这些 table 很大(数百万行)我只需要为每个 id
和 [=15= 获取一行] 分组(因此 row_number()
分析函数)。
我已经缩短了查询以说明我的问题:
选项 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
选项 2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
“选项 0”也存在。更传统的 "derived table" 根本不需要使用任何 CTE。
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
) p ON s.id = p.id
AND p.sameday_rownum = 1
所有选项 0,1 和 2 都可能产生相同或非常相似的解释计划(尽管我对 SQL 服务器的声明比 Impala 更有信心)。
采用 CTE 本身并不能使查询更高效或性能更好,因此选项 1 和选项 2 之间的语法更改并不重要。我自己更喜欢选项 0,因为我更喜欢将 CTE 用于特定任务(例如递归)。
你应该做的是use explain plans研究每个选项产生的结果。