查询分区和计数

query with partition and count

给定以下table(它记录用户的项目查看历史与会话)

 create table view_log (
   server_time timestamp,
   device char(2),

   session_id char(10),
   uid char(7),
   item_id char(7)
 );

我正在尝试理解以下代码的作用..

create table coo_cs as
select
  item_id,
  session_id,
  count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;

我试图用 partition 分解行以了解它在做什么,但随后它发出 DISTINCT is not implemented for window functions

我理解基本的 partitiongroup by 但无法理解上面的内容 sql..

有相当大的测试数据...

http://pakdd2017.recobell.io/site_view_log_small.csv000.gz

有些数据库(还)不支持 count(distinct) 作为 window 函数。对于此查询,count(distinct) 不是必需的,因为您是按用于 count(distinct) 的同一列进行聚合的。因此,count(distinct session_id) 每行 1。

您的查询本质上是:

select item_id, session_id,
       1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;

如果您想要 item_id 级别的比率,我不会感到惊讶,因此预期的查询是:

select item_id, count(distinct session_id),
       count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;

如果是这样,等效逻辑可以使用子查询:

select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
      from view_log vl
      group by item_id
     ) vl;