在同一 table 上进行聚合的多个左连接会导致 SAP HANA 中的性能受到巨大影响

Multiple left joins with aggregation on same table causes huge performance hit in SAP HANA

我在 HANA 上加入了两个 table,为了获得一些统计数据,我还加入了项目 table 3 次以获得总计数、处理的条目数和错误,如下图。

这是一个开发系统,项目 table 只有 1500 个项目。但是下面的查询运行了 17 秒。

当我删除三个聚合项中的任何一个(但保留相应的 JOIN)时,查询几乎立即执行。

我也试过在特定 JOIN 中使用的字段上添加索引,但这没有什么区别。

select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by, 
count( distinct rp.guid ), 
count( distinct rp2.guid ), 
count( distinct rp3.guid )
    from zbsbpi_rk as rk
    left join zbsbpi_rp as rp
      on rp.header = rk.guid
    left join zbsbpi_rp as rp2
      on rp2.header = rk.guid
     and rp2.processed = 'X'
    left join zbsbpi_rp as rp3
      on rp3.header = rk.guid
     and rp3.result_status = 'E'
    where rk.run_id = '0000000010'
    group by rk.guid, run_id, status, created_at, created_by

我认为您可以重新编写查询以提高性能:

select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by, 
count( distinct rp.guid ), 
count( distinct (CASE WHEN rp.processed = 'X' then rp.guid else null end) ), 
count( distinct (CASE WHEN rp.result_status = 'E' then rp.guid else null end))
    from zbsbpi_rk as rk
    left join zbsbpi_rp as rp
      on rp.header = rk.guid
where rk.run_id = '0000000010'
    group by rk.guid, run_id, status, created_at, created_by

我不完全确定 count distinct case 构造 是否适用于 hana,但你可以试试。

抱歉,我忘了我已经在这里发布了这个问题。我在这里没有得到任何快乐后在 answers.sap.com 上发布了同样的问题:https://answers.sap.com/questions/172096/multiple-left-joins-with-aggregation-on-same-table.html

我最终想出了解决方案,这有点 "doh!" 时刻:

  select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
    count( distinct rp.guid ), 
    count( distinct rp2.guid ), 
    count( distinct rp3.guid )
    from zbsbpi_rk as rk
    join zbsbpi_rp as rp
      on rp.header = rk.guid
    left join zbsbpi_rp as rp2
      on rp2.guid = rp.guid
     and rp2.processed = 'X'
    left join zbsbpi_rp as rp3
      on rp3.guid = rp.guid
     and rp3.result_status = 'E'
    where rk.run_id = '0000000010'
    group by rk.guid, run_id, status, created_at, created_by

后续的左连接只需要连接到同一 table 上的第一个连接,因为无论如何第一个连接包含所有记录的超集。