在不执行两个不同查询的情况下获取计数 > 1 的行

Getting rows with count > 1 without executing two different queries

我有一个 table 看起来像这样:

ITE_ITEM_ID SIT_SITE_ID INVENTORY_ID ITE_VARIATION_ID STOCK_EAN STOCK_FULFILLMENT_ACTIVE
16302514 B WAE6496 62101793519 79098642 1
6210113 M GKVU072 [object Object] NULL 1
16021657 M YHQG635 60513515602 NULL 1
8449326 A ZZRV751 52555136800 NULL 1
1154338160 B VXWP565 NULL NULL 0
559568 M GYPZ201 32325593678 NULL 1
13255753 B PH63355 70388916917 NULL 1
7614543 M XOQO412 51698700618 NULL 1

我正在尝试获取共享相同 STOCK_EAN 且计数 > 1

的不同 ITE_VARIATION_ID

我目前通过拆分两个语句来做到这一点:

create multiset volatile table eans_plus as (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
group by 1
having count_ean > 1
) with data primary index (stock_ean) on commit preserve rows;

SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;

但是这需要一段时间才能执行(>140 秒),因为 table 非常大,我想知道是否有更有效的方法来执行此操作,要么避免执行两个查询,要么添加索引。我在 Teradata

上使用 alation

您可以很容易地组合成一个查询,但这样做的性能改进可能非常小:

WITH eans_plus AS (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
WHERE STOCK_EAN IS NOT NULL
group by 1
having count_ean > 1
)
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;

您还可以将 window 函数与 QUALIFY 一起使用;不确定这是否会有所改善

SELECT STOCK_EAN, ITE_VARIATION_ID
FROM my_table a
WHERE STOCK_EAN IS NOT NULL
QUALIFY COUNT(*) OVER (PARTITION BY STOCK_EAN) > 1;

使 STOCK_EAN 成为 my_table 的主索引(如果还没有的话)可以改进这些查询,但您需要了解它如何影响此 table.

单个 table 连接索引可能会提高这些查询的性能,尽管 table 维护会导致性能下降。

CREATE JOIN INDEX my_table_aji AS
SELECT STOCK_EAN, COUNT(*) as theCount FROM my_table
GROUP BY 1
PRIMARY INDEX (STOCK_EAN);

编辑:添加WHERE STOCK_EAN IS NOT NULL过滤

I am trying to get the different ITE_VARIATION_ID that share the same STOCK_EAN and have a count > 1

对我来说这听起来像是聚合查询:

select STOCK_EAN, ITE_VARIATION_ID, COUNT(*)
from t
where stock_ean is not null
group by STOCK_EAN, ITE_VARIATION_ID
qualify sum(count(*)) over (partition by stock_ean) > 1;

过滤 NULL 值可能有助于提高性能。