在 DB2 中执行 sql 查询时的复杂性和时间消耗
Complexity and Time consumption while executing the sql query in DB2
我有一个 table 有超过 2 亿条记录,并试图在单个查询中获得结果,执行以下查询大约需要 1.5 小时。
我的table数据:
+---------+------------+-----+-----+-----+
| cust_id | product_id | p_a | p_b | p_c |
+---------+------------+-----+-----+-----+
| 1 | a | 1 | 0 | 0 |
| 1 | b | 0 | 2 | 0 |
| 1 | a | 1 | 0 | 0 |
| 1 | c | 0 | 0 | 3 |
| 2 | a | 1 | 0 | 0 |
| 2 | c | 0 | 0 | 3 |
| 2 | c | 0 | 0 | 3 |
| 3 | b | 0 | 1 | 0 |
+---------+------------+-----+-----+-----+
我需要以下结果:
+----------------+-----+-----+-----+------+-------+
| No.ofcustomers | p_a | p_b | p_c | P_ab | P_abc |
+----------------+-----+-----+-----+------+-------+
| 3 | 0 | 1 | 0 | 0 | 1 |
+----------------+-----+-----+-----+------+-------+
p_areturns只购买产品a的客户数。同样,p_b、p_c、p_ab、p_abc.
到目前为止,我已经编写了以下查询并给出了输出。但是,有什么办法可以减少执行查询所花费的时间。我的意思是 1.5 小时对于任何查询来说都非常高。
WITH CTE AS
(
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID), CTE1 AS
(
SELECT COUNT(DISTINCT CUST_ID) AS NOOFCUSTOMERS,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN CUST_ID END)) AS P_A,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_B,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN CUST_ID END)) AS P_C,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_AB,
COUNT(DISTINCT (CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN CUST_ID END)) AS P_ABC
FROM CTE)
SELECT *
FROM CTE1;
您可以将查询简化为:
WITH CTE AS (
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID
)
SELECT COUNT(*) AS NOOFCUSTOMERS,
SUM(CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN 1 ELSE 0 END) AS P_A,
SUM(CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_B,
SUM(CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN 1 ELSE 0 END) AS P_C,
SUM(CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_AB,
SUM(CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN 1 ELSE 0 END) AS P_ABC
FROM CTE;
第一个 CTE 是按客户 ID 汇总的,因此这些已经是唯一的。 COUNT(DISTINCT)
比其他聚合函数更昂贵,因此这可能会影响性能。
我有一个 table 有超过 2 亿条记录,并试图在单个查询中获得结果,执行以下查询大约需要 1.5 小时。
我的table数据:
+---------+------------+-----+-----+-----+
| cust_id | product_id | p_a | p_b | p_c |
+---------+------------+-----+-----+-----+
| 1 | a | 1 | 0 | 0 |
| 1 | b | 0 | 2 | 0 |
| 1 | a | 1 | 0 | 0 |
| 1 | c | 0 | 0 | 3 |
| 2 | a | 1 | 0 | 0 |
| 2 | c | 0 | 0 | 3 |
| 2 | c | 0 | 0 | 3 |
| 3 | b | 0 | 1 | 0 |
+---------+------------+-----+-----+-----+
我需要以下结果:
+----------------+-----+-----+-----+------+-------+
| No.ofcustomers | p_a | p_b | p_c | P_ab | P_abc |
+----------------+-----+-----+-----+------+-------+
| 3 | 0 | 1 | 0 | 0 | 1 |
+----------------+-----+-----+-----+------+-------+
p_areturns只购买产品a的客户数。同样,p_b、p_c、p_ab、p_abc.
到目前为止,我已经编写了以下查询并给出了输出。但是,有什么办法可以减少执行查询所花费的时间。我的意思是 1.5 小时对于任何查询来说都非常高。
WITH CTE AS
(
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID), CTE1 AS
(
SELECT COUNT(DISTINCT CUST_ID) AS NOOFCUSTOMERS,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN CUST_ID END)) AS P_A,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_B,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN CUST_ID END)) AS P_C,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_AB,
COUNT(DISTINCT (CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN CUST_ID END)) AS P_ABC
FROM CTE)
SELECT *
FROM CTE1;
您可以将查询简化为:
WITH CTE AS (
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID
)
SELECT COUNT(*) AS NOOFCUSTOMERS,
SUM(CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN 1 ELSE 0 END) AS P_A,
SUM(CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_B,
SUM(CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN 1 ELSE 0 END) AS P_C,
SUM(CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_AB,
SUM(CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN 1 ELSE 0 END) AS P_ABC
FROM CTE;
第一个 CTE 是按客户 ID 汇总的,因此这些已经是唯一的。 COUNT(DISTINCT)
比其他聚合函数更昂贵,因此这可能会影响性能。