为什么在使用OVER(PARTITION BY x)时需要在GROUP BY中包含一个字段?

Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

我有一个 table,我想对其做一个字段的简单求和,按两列分组。然后我想要每个 year_num.

的所有值的总和

参见示例:http://rextester.com/QSLRS68794

此查询抛出:“42803:列 "foo.num_cust" 必须出现在 GROUP BY 子句中或用于聚合函数”,我不明白为什么。 为什么使用 OVER (PARTITION BY x) 的聚合函数要求求和字段位于 GROUP BY 中?

select 
    year_num
    ,age_bucket
    ,sum(num_cust)
    --,sum(num_cust) over (partition by year_num)  --THROWS ERROR!!
from
    foo
group by
    year_num
    ,age_bucket
order by 1,2

TABLE:

| loc_id |  year_num |  gen |  cust_category |  cust_age |  num_cust |  age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1      | 2016      | M    | cash           | 41        | 2         | 04_<45      |
| 1      | 2016      | F    | Prepaid        | 41        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 61        | 1         | 05_45+      |
| 1      | 2016      | F    | cc             | 19        | 2         | 02_<25      |
| 1      | 2016      | M    | cc             | 64        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 27        | 3         | 03_<35      |
| 1      | 2016      | M    | cash           | 42        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 35        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 37        | 1         | 04_<45      |
| 1      | 2017      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 19        | 4         | 02_<25      |
| 1      | 2017      | M    | cash           | 43        | 1         | 04_<45      |
| 1      | 2017      | M    | cash           | 29        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 13        | 1         | 01_<18      |
| 1      | 2017      | F    | cash           | 16        | 2         | 01_<18      |
| 1      | 2016      | F    | cc             | 17        | 2         | 01_<18      |
| 1      | 2016      | M    | cc             | 17        | 2         | 01_<18      |
| 1      | 2017      | F    | cash           | 18        | 9         | 02_<25      |

期望输出:

| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016     | 01_<18     | 5   | 21                  |
| 2016     | 02_<25     | 6   | 21                  |
| 2016     | 03_<35     | 4   | 21                  |
| 2016     | 04_<45     | 3   | 21                  |
| 2016     | 05_45+     | 3   | 21                  |
| 2017     | 01_<18     | 2   | 16                  |
| 2017     | 02_<25     | 9   | 16                  |
| 2017     | 03_<35     | 1   | 16                  |
| 2017     | 04_<45     | 3   | 16                  |
| 2017     | 05_45+     | 1   | 16                  |

您需要嵌套 sum()s:

select year_num, age_bucket, sum(num_cust),
       sum(sum(num_cust)) over (partition by year_num)  --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

为什么?好吧,window 函数没有进行聚合。参数需要是一个表达式,可以在 after group by 之后计算(因为这是一个聚合查询)。因为num_cust不是group by键,所以需要聚合函数。

如果您使用子查询,可能会更清楚:

select year_num, age_bucket, sum_num_cust,
       sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
      from foo
      group by year_num, age_bucket
     ) ya
order by 1, 2;

这两个查询做的事情完全一样。但是对于子查询,为什么你需要额外的聚合应该更明显。