查询以按值并排获取总的不同值

Question

如何获得一个列，该列同时显示不同值的总计数和不同值的“分组依据”计数。

例如，在下面的查询中，我得到了每个值集中不同患者的计数。我还想获得不同患者的总数。我该怎么做呢？下面的查询有效，但没有得到患者总数的计数。这恰好在 Databricks SQL/Apache Spark 中，但我想在大多数 sql 数据库实现中都有一个通用的解决方案。

select
  value_set_name,
  oid value_set_oid,
  count(distinct rx.patient_id) as count
  -- ??? count of total distinct patient_id's ???
from
  rx
  join value_set vs on rx.code = vs.code
group by 1,2
order by 1

Answer 1

你想要window个函数吗？

select
    value_set_name,
    oid as value_set_oid,
    count(distinct rx.patient_id) as cnt_distinct_patient
    sum(count(distinct rx.patient_id)) over() as total_cnt_distinct_patient
from rx
inner join value_set vs on rx.code = vs.code
group by 1, 2
order by 1

Answer 2

子查询可能是最简单的方法：

select value_set_name,
       oid as value_set_oid,
       count(distinct rx.patient_id) as count,
       (select count(distinct rx2.patient_id) from rx rx2) as num_total_distinct
  -- ??? count of total distinct patient_id's ???
from rx join
     value_set vs
     on rx.code = vs.code
group by 1,2
order by 1;

因为您使用的是 count(distinct)，您不能简单地将所有行的总数 count(distinct) 相加——多于 on 组的患者将被计算多次。

您的查询其实很简单。进行两级聚合可能会更快：

select value_set_name, value_set_oid, count(*) as num_patients,
       sum(count(*)) over () as num_total_patients
from (select value_set_name, oid as value_set_oid, rx_patient_id, count(*) as cnt
      from rx join
           value_set vs
           on rx.code = vs.code
      group by 1, 2, 3
     ) rv
group by 1, 2
order by 1;

查询以按值并排获取总的不同值

Query to get total distinct values side by side with group by values

sql

count

distinct

apache-spark-sql