在同一 table 上进行聚合的多个左连接会导致 SAP HANA 中的性能受到巨大影响
Multiple left joins with aggregation on same table causes huge performance hit in SAP HANA
我在 HANA 上加入了两个 table,为了获得一些统计数据,我还加入了项目 table 3 次以获得总计数、处理的条目数和错误,如下图。
这是一个开发系统,项目 table 只有 1500 个项目。但是下面的查询运行了 17 秒。
当我删除三个聚合项中的任何一个(但保留相应的 JOIN)时,查询几乎立即执行。
我也试过在特定 JOIN 中使用的字段上添加索引,但这没有什么区别。
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.header = rk.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.header = rk.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
我认为您可以重新编写查询以提高性能:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct (CASE WHEN rp.processed = 'X' then rp.guid else null end) ),
count( distinct (CASE WHEN rp.result_status = 'E' then rp.guid else null end))
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
我不完全确定 count distinct case 构造 是否适用于 hana,但你可以试试。
抱歉,我忘了我已经在这里发布了这个问题。我在这里没有得到任何快乐后在 answers.sap.com 上发布了同样的问题:https://answers.sap.com/questions/172096/multiple-left-joins-with-aggregation-on-same-table.html
我最终想出了解决方案,这有点 "doh!" 时刻:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.guid = rp.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.guid = rp.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
后续的左连接只需要连接到同一 table 上的第一个连接,因为无论如何第一个连接包含所有记录的超集。
我在 HANA 上加入了两个 table,为了获得一些统计数据,我还加入了项目 table 3 次以获得总计数、处理的条目数和错误,如下图。
这是一个开发系统,项目 table 只有 1500 个项目。但是下面的查询运行了 17 秒。
当我删除三个聚合项中的任何一个(但保留相应的 JOIN)时,查询几乎立即执行。
我也试过在特定 JOIN 中使用的字段上添加索引,但这没有什么区别。
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.header = rk.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.header = rk.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
我认为您可以重新编写查询以提高性能:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct (CASE WHEN rp.processed = 'X' then rp.guid else null end) ),
count( distinct (CASE WHEN rp.result_status = 'E' then rp.guid else null end))
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
我不完全确定 count distinct case 构造 是否适用于 hana,但你可以试试。
抱歉,我忘了我已经在这里发布了这个问题。我在这里没有得到任何快乐后在 answers.sap.com 上发布了同样的问题:https://answers.sap.com/questions/172096/multiple-left-joins-with-aggregation-on-same-table.html
我最终想出了解决方案,这有点 "doh!" 时刻:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.guid = rp.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.guid = rp.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
后续的左连接只需要连接到同一 table 上的第一个连接,因为无论如何第一个连接包含所有记录的超集。