Clickhouse 加入条件

Clickhouse join with condition

我发现了奇怪的东西,查询:

SELECT *
FROM progress as pp
ALL LEFT JOIN links as ll USING (viewId)
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8' 

结果:0 rows in set. Elapsed: 5.267 sec. Processed 8.62 million rows, 484.94 MB (1.64 million rows/s., 92.08 MB/s.)

此处修改查询:

SELECT *
FROM
  (SELECT *
   FROM progress
   WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') AS p ALL
LEFT JOIN
  (SELECT *
   FROM links
   WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) AS l ON p.viewId = l.viewId;

结果:0 rows in set. Elapsed: 0.076 sec. Processed 4.48 million rows, 161.35 MB (58.69 million rows/s., 2.12 GB/s.)

但是看起来很脏。

不是应该根据where条件优化查询吗?

在此处编写查询的正确方法是什么?如果在 where in 中呢?

然后我尝试添加另一个连接:

SELECT *
FROM
  (SELECT videoUuid AS contentUuid,
          viewId
   FROM
     (SELECT *
      FROM progress
      WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') p ALL
   LEFT JOIN
     (SELECT *
      FROM links
      WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) USING `viewId`) ALL
LEFT JOIN `metaInfo` USING `viewId`,
                           `contentUuid`;

结果又很慢,考虑到我只想连接 3 个表,条件选择一行:

0 rows in set. Elapsed: 1.747 sec. Processed 9.13 million rows, 726.55 MB (5.22 million rows/s., 415.85 MB/s.)

此时 CH 不能很好地处理 multi-joins 查询(DB star-schema)并且查询优化器不够好,不能完全依赖它。

因此需要明确说明如何 'execute' 通过使用子查询而不是联接来进行查询。

考虑测试查询:

SELECT table_01.number AS r
FROM numbers(87654321) AS table_01
  INNER JOIN numbers(7654321) AS table_02 ON (table_01.number = table_02.number)
  INNER JOIN numbers(654321) AS table_03 ON (table_02.number = table_03.number)
  INNER JOIN numbers(54321) AS table_04 ON (table_03.number = table_04.number)
WHERE r = 54320
/*
┌─────r─┐
│ 54320 │
└───────┘

1 rows in set. Elapsed: 6.261 sec. Processed 96.06 million rows, 768.52 MB (15.34 million rows/s., 122.74 MB/s.)
*/

让我们使用子查询重写它以显着加快速度。

SELECT number AS r
FROM numbers(87654321)
WHERE r = 54320 AND number IN (
  SELECT number AS r
  FROM numbers(7654321)
  WHERE r = 54320 AND number IN (
    SELECT number AS r
    FROM numbers(654321)
    WHERE r = 54320 AND number IN (
      SELECT number AS r
      FROM numbers(54321)
      WHERE r = 54320
    )
  )
)
/*
┌─────r─┐
│ 54320 │
└───────┘

1 rows in set. Elapsed: 0.481 sec. Processed 96.06 million rows, 768.52 MB (199.69 million rows/s., 1.60 GB/s.)
*/

还有其他优化方法JOIN:


一些有用的参考:

Altinity webinar: Tips and tricks every ClickHouse user should know

Altinity webinar: Secrets of ClickHouse Query Performance

Isn't it supposed to optimize the query concidering where condition?

尚未实现此类优化

这是预期的行为。 根据 CH doc https://clickhouse.tech/docs/en/sql-reference/statements/select/join/#performance“当 运行 连接一个 JOIN 时,相对于查询的其他阶段没有优化执行顺序。连接(右边的搜索 table) 运行 在 WHERE 过滤之前和聚合之前。"