查询以按值并排获取总的不同值
Query to get total distinct values side by side with group by values
如何获得一个列,该列同时显示不同值的总计数和不同值的“分组依据”计数。
例如,在下面的查询中,我得到了每个值集中不同患者的计数。我还想获得不同患者的总数。我该怎么做呢?下面的查询有效,但没有得到患者总数的计数。这恰好在 Databricks SQL/Apache Spark 中,但我想在大多数 sql 数据库实现中都有一个通用的解决方案。
select
value_set_name,
oid value_set_oid,
count(distinct rx.patient_id) as count
-- ??? count of total distinct patient_id's ???
from
rx
join value_set vs on rx.code = vs.code
group by 1,2
order by 1
你想要window个函数吗?
select
value_set_name,
oid as value_set_oid,
count(distinct rx.patient_id) as cnt_distinct_patient
sum(count(distinct rx.patient_id)) over() as total_cnt_distinct_patient
from rx
inner join value_set vs on rx.code = vs.code
group by 1, 2
order by 1
子查询可能是最简单的方法:
select value_set_name,
oid as value_set_oid,
count(distinct rx.patient_id) as count,
(select count(distinct rx2.patient_id) from rx rx2) as num_total_distinct
-- ??? count of total distinct patient_id's ???
from rx join
value_set vs
on rx.code = vs.code
group by 1,2
order by 1;
因为您使用的是 count(distinct)
,您不能简单地将所有行的总数 count(distinct)
相加——多于 on 组的患者将被计算多次。
您的查询其实很简单。进行两级聚合可能会更快:
select value_set_name, value_set_oid, count(*) as num_patients,
sum(count(*)) over () as num_total_patients
from (select value_set_name, oid as value_set_oid, rx_patient_id, count(*) as cnt
from rx join
value_set vs
on rx.code = vs.code
group by 1, 2, 3
) rv
group by 1, 2
order by 1;
如何获得一个列,该列同时显示不同值的总计数和不同值的“分组依据”计数。
例如,在下面的查询中,我得到了每个值集中不同患者的计数。我还想获得不同患者的总数。我该怎么做呢?下面的查询有效,但没有得到患者总数的计数。这恰好在 Databricks SQL/Apache Spark 中,但我想在大多数 sql 数据库实现中都有一个通用的解决方案。
select
value_set_name,
oid value_set_oid,
count(distinct rx.patient_id) as count
-- ??? count of total distinct patient_id's ???
from
rx
join value_set vs on rx.code = vs.code
group by 1,2
order by 1
你想要window个函数吗?
select
value_set_name,
oid as value_set_oid,
count(distinct rx.patient_id) as cnt_distinct_patient
sum(count(distinct rx.patient_id)) over() as total_cnt_distinct_patient
from rx
inner join value_set vs on rx.code = vs.code
group by 1, 2
order by 1
子查询可能是最简单的方法:
select value_set_name,
oid as value_set_oid,
count(distinct rx.patient_id) as count,
(select count(distinct rx2.patient_id) from rx rx2) as num_total_distinct
-- ??? count of total distinct patient_id's ???
from rx join
value_set vs
on rx.code = vs.code
group by 1,2
order by 1;
因为您使用的是 count(distinct)
,您不能简单地将所有行的总数 count(distinct)
相加——多于 on 组的患者将被计算多次。
您的查询其实很简单。进行两级聚合可能会更快:
select value_set_name, value_set_oid, count(*) as num_patients,
sum(count(*)) over () as num_total_patients
from (select value_set_name, oid as value_set_oid, rx_patient_id, count(*) as cnt
from rx join
value_set vs
on rx.code = vs.code
group by 1, 2, 3
) rv
group by 1, 2
order by 1;