Count Distinct 小于 Sum(Count Distinct)
Count Distinct is less than Sum(Count Distinct)
我有两个问题:
select COUNT(DISTINCT (CASE WHEN EVENT_NAME = 'event' THEN UPPER(user END)) AS SIGNUP_COUNT,
from table
WHERE date BETWEEN '2020-07-01' AND '2020-09-01'
和
with EVENTS_FILTERED_with_count as (
select *
, COUNT(DISTINCT (CASE WHEN EVENT_NAME = 'event' THEN UPPER(user END)) AS SIGNUP_COUNT
from table
group by 1)
SELECT sum(SIGNUP_COUNT) FROM EVENTS_FILTERED_with_count
WHERE date BETWEEN '2020-07-01' AND '2020-09-01'
第一个查询 returns 2.5K 行作为结果,第二个查询 returns 3K 行。
为什么加上group by会使结果变大?我想知道它是否与 NULL 值有关。
因为同一个user
有多个事件,所以在用户级统计时,该事件会被多次统计
没有样本数据很难描述得更清楚。
我有两个问题:
select COUNT(DISTINCT (CASE WHEN EVENT_NAME = 'event' THEN UPPER(user END)) AS SIGNUP_COUNT,
from table
WHERE date BETWEEN '2020-07-01' AND '2020-09-01'
和
with EVENTS_FILTERED_with_count as (
select *
, COUNT(DISTINCT (CASE WHEN EVENT_NAME = 'event' THEN UPPER(user END)) AS SIGNUP_COUNT
from table
group by 1)
SELECT sum(SIGNUP_COUNT) FROM EVENTS_FILTERED_with_count
WHERE date BETWEEN '2020-07-01' AND '2020-09-01'
第一个查询 returns 2.5K 行作为结果,第二个查询 returns 3K 行。
为什么加上group by会使结果变大?我想知道它是否与 NULL 值有关。
因为同一个user
有多个事件,所以在用户级统计时,该事件会被多次统计
没有样本数据很难描述得更清楚。