Redshift：从 table 连接到由 SELECT * 组成的 subquery/CTE 相当于连接 table 本身，还是性能下降？

Question

在 Redshift 上，如果从源 table 执行 SELECT *，在连接中使用的 CTE/subquery 与仅引用和连接的代码相比是否会导致性能下降直接到来源table？也就是说，这段代码在性能上有什么区别吗：

WITH cte_source_2 AS (SELECT * FROM source_2)
SELECT
    s1.field_1, s2.field_2
FROM
    source_1 AS s1
LEFT JOIN
    cte_source_2 AS s2
    ON
        s1.key_field = s2.key_field

还有这段代码：

SELECT
    s1.field_1, s2.field_2
FROM
    source_1 AS s1
LEFT JOIN
    source_2 AS s2
    ON
        s1.key_field = s2.key_field

我认为不会，查询优化器会将第一个版本缩减为第二个版本，但得到的结果相互矛盾（我认为主要是由于缓存）。

表达这个问题的另一种方式是，抛开 CTE，特别是在 Redshift 上，这样做：

SELECT
    .....
FROM
    (SELECT * FROM source_1) AS s1
LEFT JOIN
    .......

执行与此相同的操作：

SELECT
    .....
FROM
    source_1 AS s1
LEFT JOIN
    .......

不幸的是，我没有获得任何分析信息的权限。谢谢！

Answer 1

在 Redshift 上，cte 非常方便，但查询仍然解析为子 select。参见 https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html

第二段

因此，你是对的。两种方式的性能都相同。

在 cte 被解析为临时表的 postgres 上不是这种情况。见第一段https://www.postgresql.org/docs/current/queries-with.html

Redshift：从 table 连接到由 SELECT * 组成的 subquery/CTE 相当于连接 table 本身，还是性能下降？

Redshift: is a join to a subquery/CTE consisting of SELECT * from a table equivalent to joining the table itself, or a performance hit?

sql

subquery

common-table-expression

amazon-redshift