Redshift调优加入

Question

我正在尝试加入红移中的两个 table。一大一小。连接是通过他们的 id，我已经按列（用于连接）在集群上分配了大的，我也将此列用作排序键。小的 table 我已经在所有节点上整体分布，并使用 sortkey 和用于连接的列。

示例：

create table table_small diststyle all SORTKEY(id) as select * from another_small_table;

create table big_table distkey (id) diststyle key SORTKEY(id) as SELECT * from another_big_table;

explain SELECT * FROM big_table big JOIN small_table small ON big.id = small.id;

查询计划说 redshift 正在执行哈希连接而不是合并连接。这是预期的行为？我希望合并加入。

Answer 1

根据

https://docs.aws.amazon.com/redshift/latest/dg/c-the-query-plan.html

Merge Join

Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.

因此 - 由于您没有 (id) 的分发密钥，因此不会使用此连接类型。

Redshift调优加入

Redshift tuning join

sql

join

query-performance

amazon-redshift