查询未找到列,建议 Hive 中的相同列 SQL

Query does not found column, suggests same column in Hive SQL

我在 SQL 中有以下查询:

select midquery.account, midquery.name, midquery.label,  midquery.labelfrequency
from(

    -- Count the appearance of each label.

    select count(*) as labelfrequency, account, name, label
    from(

        select account, name, label from myTable 

    ) innerquery

    group by account, name, label
) midquery

-- Select most frequent values only.
where rank() over 
    (partition by midquery.account, midquery.name 
     order by midquery.labelfrequency desc) = 1     

我们的想法是为每个名称帐户集找到最常见的标签。当我 运行 这个查询时,我得到以下错误:

Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 12:74 Invalid column reference 'labelfrequency': (possible column names are: labelfrequency, account, name, label)

我不太明白为什么解释器找不到列 labelfrequency 但可以建议它。您对如何解决这个问题有什么建议吗?

编辑: 如果我将 rank() 移动到 select 部分,我会得到结果。

select midquery.account, midquery.name, midquery.label,  midquery.labelfrequency, 
    rank() over (partition by midquery.account, midquery.name 
     order by midquery.labelfrequency desc)
from(

    -- Count the appearance of each label.

    select count(*) as labelfrequency, account, name, label
    from(

        select account, name, label from myTable 

    ) innerquery

    group by account, name, label
) midquery

Window 函数根本不允许出现在 WHERE 子句中。这有充分的理由,但您可以将其视为 SQL 的另一条规则——类似于无法识别列别名。

(真正的原因是指定 window 函数在有多个过滤条件时将如何运行。(几乎?)不可能想出一套连贯的规则。)

话虽如此,您可以简化查询:

select t.account, t.name, t.label, t.labelfrequency
from (select count(*) as labelfrequency, account, name, label,
             rank() over (partition by account, name
                          order by count(*) desc
                         ) as seqnum
      from myTable t
      group by account, name, label
     ) t
where seqnum = 1;

即window函数和聚合函数可以组合。而且您不需要子查询来仅指定少数 a 列。