在不执行两个不同查询的情况下获取计数 > 1 的行
Getting rows with count > 1 without executing two different queries
我有一个 table 看起来像这样:
ITE_ITEM_ID
SIT_SITE_ID
INVENTORY_ID
ITE_VARIATION_ID
STOCK_EAN
STOCK_FULFILLMENT_ACTIVE
16302514
B
WAE6496
62101793519
79098642
1
6210113
M
GKVU072
[object Object]
NULL
1
16021657
M
YHQG635
60513515602
NULL
1
8449326
A
ZZRV751
52555136800
NULL
1
1154338160
B
VXWP565
NULL
NULL
0
559568
M
GYPZ201
32325593678
NULL
1
13255753
B
PH63355
70388916917
NULL
1
7614543
M
XOQO412
51698700618
NULL
1
我正在尝试获取共享相同 STOCK_EAN
且计数 > 1
的不同 ITE_VARIATION_ID
我目前通过拆分两个语句来做到这一点:
create multiset volatile table eans_plus as (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
group by 1
having count_ean > 1
) with data primary index (stock_ean) on commit preserve rows;
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;
但是这需要一段时间才能执行(>140 秒),因为 table 非常大,我想知道是否有更有效的方法来执行此操作,要么避免执行两个查询,要么添加索引。我在 Teradata
上使用 alation
您可以很容易地组合成一个查询,但这样做的性能改进可能非常小:
WITH eans_plus AS (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
WHERE STOCK_EAN IS NOT NULL
group by 1
having count_ean > 1
)
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;
您还可以将 window 函数与 QUALIFY 一起使用;不确定这是否会有所改善
SELECT STOCK_EAN, ITE_VARIATION_ID
FROM my_table a
WHERE STOCK_EAN IS NOT NULL
QUALIFY COUNT(*) OVER (PARTITION BY STOCK_EAN) > 1;
使 STOCK_EAN
成为 my_table
的主索引(如果还没有的话)可以改进这些查询,但您需要了解它如何影响此 table.
单个 table 连接索引可能会提高这些查询的性能,尽管 table 维护会导致性能下降。
CREATE JOIN INDEX my_table_aji AS
SELECT STOCK_EAN, COUNT(*) as theCount FROM my_table
GROUP BY 1
PRIMARY INDEX (STOCK_EAN);
编辑:添加WHERE STOCK_EAN IS NOT NULL
过滤
I am trying to get the different ITE_VARIATION_ID that share the same STOCK_EAN and have a count > 1
对我来说这听起来像是聚合查询:
select STOCK_EAN, ITE_VARIATION_ID, COUNT(*)
from t
where stock_ean is not null
group by STOCK_EAN, ITE_VARIATION_ID
qualify sum(count(*)) over (partition by stock_ean) > 1;
过滤 NULL
值可能有助于提高性能。
我有一个 table 看起来像这样:
ITE_ITEM_ID | SIT_SITE_ID | INVENTORY_ID | ITE_VARIATION_ID | STOCK_EAN | STOCK_FULFILLMENT_ACTIVE |
---|---|---|---|---|---|
16302514 | B | WAE6496 | 62101793519 | 79098642 | 1 |
6210113 | M | GKVU072 | [object Object] | NULL | 1 |
16021657 | M | YHQG635 | 60513515602 | NULL | 1 |
8449326 | A | ZZRV751 | 52555136800 | NULL | 1 |
1154338160 | B | VXWP565 | NULL | NULL | 0 |
559568 | M | GYPZ201 | 32325593678 | NULL | 1 |
13255753 | B | PH63355 | 70388916917 | NULL | 1 |
7614543 | M | XOQO412 | 51698700618 | NULL | 1 |
我正在尝试获取共享相同 STOCK_EAN
且计数 > 1
ITE_VARIATION_ID
我目前通过拆分两个语句来做到这一点:
create multiset volatile table eans_plus as (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
group by 1
having count_ean > 1
) with data primary index (stock_ean) on commit preserve rows;
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;
但是这需要一段时间才能执行(>140 秒),因为 table 非常大,我想知道是否有更有效的方法来执行此操作,要么避免执行两个查询,要么添加索引。我在 Teradata
上使用 alation您可以很容易地组合成一个查询,但这样做的性能改进可能非常小:
WITH eans_plus AS (
SELECT STOCK_EAN, count(*) as count_ean
FROM my_table
WHERE STOCK_EAN IS NOT NULL
group by 1
having count_ean > 1
)
SELECT a.STOCK_EAN, a.ITE_VARIATION_ID
from my_table a inner join eans_plus b on a.stock_ean = b.stock_ean
;
您还可以将 window 函数与 QUALIFY 一起使用;不确定这是否会有所改善
SELECT STOCK_EAN, ITE_VARIATION_ID
FROM my_table a
WHERE STOCK_EAN IS NOT NULL
QUALIFY COUNT(*) OVER (PARTITION BY STOCK_EAN) > 1;
使 STOCK_EAN
成为 my_table
的主索引(如果还没有的话)可以改进这些查询,但您需要了解它如何影响此 table.
单个 table 连接索引可能会提高这些查询的性能,尽管 table 维护会导致性能下降。
CREATE JOIN INDEX my_table_aji AS
SELECT STOCK_EAN, COUNT(*) as theCount FROM my_table
GROUP BY 1
PRIMARY INDEX (STOCK_EAN);
编辑:添加WHERE STOCK_EAN IS NOT NULL
过滤
I am trying to get the different ITE_VARIATION_ID that share the same STOCK_EAN and have a count > 1
对我来说这听起来像是聚合查询:
select STOCK_EAN, ITE_VARIATION_ID, COUNT(*)
from t
where stock_ean is not null
group by STOCK_EAN, ITE_VARIATION_ID
qualify sum(count(*)) over (partition by stock_ean) > 1;
过滤 NULL
值可能有助于提高性能。