按组提取最常见(计数最高)的条目

extract the most common (highest count) entry by group

我有以下 table:

ID       height
personA  182
personA  182
personA  182
personA  192
personA  172
personB  175
personB  175

我想提取此人最常出现的身高,因为我怀疑 192 是一个拼写错误。到目前为止,我有:

select ID, height, count(ID,height) as cnt
from tbl
group by ID, height
having max(cnt);

我想要的输出是:

ID       height
personA  182
personB  175

您可以使用 window 函数根据用户的身高对用户 ID 进行排名。

WITH cte AS (
SELECT 
    ID
  , height
  , ROW_NUMBER() OVER (PARTITION BY ID ORDER BY COUNT(height) DESC) rn
FROM dbo.tbl
GROUP BY
  ID,
  height)

SELECT
    ID,
    height
FROM cte WHERE rn = 1 

您也可以使用 max() 函数通过 ID 获取最大条目..

select ID, max(height)
from tbl
group by ID

应该可以。

您需要使用google解析函数。分析函数将用您想要的列对您的 table 进行分区。我使用了 row_number() 函数。您还可以使用 rank() 函数。 要了解有关解析函数的更多信息:https://hevodata.com/learn/bigquery-row-number-function/

代码:

Select ID, height
From (SELECT *,
            row_number() over(partition by id, height order by height 
                              desc) as row_number
      FROM students)
Group By ID
having max(row_number)

您可以简单地使用专为您的用例设计的 mode。请注意,这不会处理平局

select id, mode(height) as height
from t
group by id;

另一种不使用 analytic functions 的替代方法也能处理平局

with cte as
(select id, height, count(*) as cnt 
from t
group by id, height)

select id, height
from cte
where (id, cnt) in (select id, max(cnt)
                    from cte
                    group by id)

如果您要使用 Lukasz 的回答中巧妙使用的 qualify 子句来实现上述内容,您可以

select id, height
from t
group by id, height
qualify max( count(*) ) over (partition by id) = count(*)

使用QUALIFY

SELECT ID, height
FROM tab
GROUP BY ID, height
QUALIFY RANK() OVER(PARTITION BY ID ORDER BY COUNT(*) DESC) = 1;

RANK 用于处理关系。