(Impala) 在字段中选择最常见的值会导致 "Subqueries are not supported in select list"
(Impala) Selecting most common value in field results in "Subqueries are not supported in select list"
我正在尝试执行一个采用组中最常见值的聚合,如下所示:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, most_common(colB) -- this is what I'm trying to achieve
from t1
group by id
这是我尝试做的:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, (select colB, count(colB) as counts from t1 group by colB order by counts desc limit 1) as most_freq_colB_per_id
from t1
group by id
然而,它告诉我AnalysisException: Subqueries are not supported in the select list
。我还能怎么做?
Impala 据我所知,没有内置聚合函数来计算模式(您要计算的统计名称)。
您可以使用两个级别的聚合。您的 CTE 没有执行任何操作,因此您可以执行以下操作:
select id, sum(total) as total, max(maxColA) as maxColA,
max(case when seqnum = 1 then colB end) as mode
from (select id, colB, count(*) as total, max(colA) as maxColA,
row_number() over (partition by id order by count(*) desc) as seqnum
from sometable
group by id, colb
) t
group by id;
我正在尝试执行一个采用组中最常见值的聚合,如下所示:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, most_common(colB) -- this is what I'm trying to achieve
from t1
group by id
这是我尝试做的:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, (select colB, count(colB) as counts from t1 group by colB order by counts desc limit 1) as most_freq_colB_per_id
from t1
group by id
然而,它告诉我AnalysisException: Subqueries are not supported in the select list
。我还能怎么做?
Impala 据我所知,没有内置聚合函数来计算模式(您要计算的统计名称)。
您可以使用两个级别的聚合。您的 CTE 没有执行任何操作,因此您可以执行以下操作:
select id, sum(total) as total, max(maxColA) as maxColA,
max(case when seqnum = 1 then colB end) as mode
from (select id, colB, count(*) as total, max(colA) as maxColA,
row_number() over (partition by id order by count(*) desc) as seqnum
from sometable
group by id, colb
) t
group by id;