Count distinct where 属性是任一键

Question

鉴于此 table：

create temp table stats (
name text, country text, age integer
)

insert into stats values 

('eric',    'se',   1),
('eric',    'dk',   4),
('johan',   'dk',   6),
('johan',   'uk',   7),
('johan',   'de',   3),
('dan', 'de',   3),
('dan', 'de',   3),
('dan', 'de',   4)

我想知道国家或年龄与键相同的不同名称的计数。

country age count
se      1   1
de      3   2
de      4   3
dk      4   3
dk      6   2
uk      7   1

有 3 个不同的名字 country = dk (eric, johan) 或 age = 4 (eric,dan)

所以我的问题是，编写此查询的最佳方式是什么？

我有这个解决方案，但我觉得它很丑！

with country as (
 select count(distinct name), country
 from stats
 group by country
),
age as (
 select count(distinct name), age
 from stats
 group by age
),
country_and_age as(
 select count(distinct name), age, country
 from stats
 group by age, country
)
select country, age, c.count+a.count-ca.count as count from country_and_age ca join age a using(age) join country c using(country)

有什么更好的方法吗？

Answer 1

Select 从统计数据中区分年龄和国家。对于每条记录，计算您在匹配国家或年龄的记录中找到了多少个不同的名字。

select
  country, 
  age,
  (
    select count(distinct name)
    from stats s 
    where s.country = t.country 
    or s.age = t.age
  ) as cnt
from (select distinct country, age from stats) t;

Answer 2

我个人不喜欢内联查询，所以我会这样做：

SELECT DISTINCT
        *
FROM    ( SELECT    country ,
                    age ,
                    COUNT(*) OVER ( PARTITION BY country ) AS c_cnt ,
                    COUNT(*) OVER ( PARTITION BY age ) AS a_cnt
          FROM      stats
        ) a
WHERE   c_cnt > 0
        OR a_cnt > 0

我不确定 Postgres 的性能，但在 SQL 服务器中 "in-line" 慢了 ~3 倍（73% 对 27%）

Answer 3

您也可以加入原来的table：

SELECT
  s1.country,
  s1.age,
  COUNT(distinct s2.name)
FROM stats s1
JOIN stats s2 ON s1.country=s2.country OR s1.age=s2.age
GROUP by 1, 2;

Count distinct where 属性是任一键

Count distinct where attribute is either of keys

sql

postgresql