查询分区和计数
query with partition and count
给定以下table(它记录用户的项目查看历史与会话)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
我正在尝试理解以下代码的作用..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
我试图用 partition
分解行以了解它在做什么,但随后它发出 DISTINCT is not implemented for window functions
。
我理解基本的 partition
和 group by
但无法理解上面的内容 sql..
- 编辑
有相当大的测试数据...
有些数据库(还)不支持 count(distinct)
作为 window 函数。对于此查询,count(distinct)
不是必需的,因为您是按用于 count(distinct)
的同一列进行聚合的。因此,count(distinct session_id)
每行 1。
您的查询本质上是:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
如果您想要 item_id
级别的比率,我不会感到惊讶,因此预期的查询是:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
如果是这样,等效逻辑可以使用子查询:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;
给定以下table(它记录用户的项目查看历史与会话)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
我正在尝试理解以下代码的作用..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
我试图用 partition
分解行以了解它在做什么,但随后它发出 DISTINCT is not implemented for window functions
。
我理解基本的 partition
和 group by
但无法理解上面的内容 sql..
- 编辑
有相当大的测试数据...
有些数据库(还)不支持 count(distinct)
作为 window 函数。对于此查询,count(distinct)
不是必需的,因为您是按用于 count(distinct)
的同一列进行聚合的。因此,count(distinct session_id)
每行 1。
您的查询本质上是:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
如果您想要 item_id
级别的比率,我不会感到惊讶,因此预期的查询是:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
如果是这样,等效逻辑可以使用子查询:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;