HIVE 或 SQL 查询以比较相同样本量的前和 post 销售额

HIVE or SQL query to compare pre and post sales for same sample size


我有一个 table,其中包含 employeeIDs(字符串)、绩效评级(int)、日期(字符串)以及标记帐户(字符串)(如果已订阅)(Account = 'yes'订阅后 'no' 订阅前)

不同的员工在不同的日期订阅,

pre = 订阅前 post = 订阅后

需要计算他们订阅前 4 天的绩效评分总和以及从他们订阅之日起 post 4 天的绩效评分总和。

------------1-----2----3---4---|---4---3---2---- 1------------

table 如下所示(每天之间留 space 只是为了便于理解) Table 包含每个员工每天的交易行。

((code SNIPPET ARE JUST GET TABLE STRUCTURE TO APPEAR CLEAR))

employeeId | performance rating | account | date 
--------------------------------------------------------      
sam            3.2                  no      2013-9-15  
charlie        3.1                  no      2013-9-15  
john           2.1                  no      2013-9-15  

sam            4.1                  yes     2013-9-16  
charlie        5.1                  no      2013-9-16  
john           2.1                  no     2013-9-16  

sam            5.3                  yes     2013-9-17  
charlie        1.4                  no      2013-9-17  
john           6.3                  yes     2013-9-17  

sam            5.3                  yes     2013-9-18  
charlie        1.4                  no      2013-9-18
john           6.3                  yes     2013-9-18

sam            5.3                  yes     2013-9-19
charlie        1.4                  yes      2013-9-19
john           8.3                  yes     2013-9-19

sam            6.3                  yes     2013-9-20
charlie        7.4                  yes      2013-9-20
john           9.3                  yes     2013-9-20

>

期望的输出(数字只是为了样本,不是计算出来的)

DAY            sum performance rating
pre 1st day    10.0
pre 2nd day    13.9
pre 3rd day    24.9
pre 4th day    12.4       
post 1st day   16.8
post 2nd day   14.6
post 3rd day   17.2
post 4th day   12.8

感谢任何帮助..尝试了很多方法但仍然无法弄清楚..

您几乎可以在标准 SQL 中使用子查询和非等值连接来做到这一点:

select ta.employeeId,
       avg(case when t2.date < ta.date then t2.rating end) as beforeRating,
       avg(case when t2.date > ta.date then t2.rating end) as afterRating
from (select t.employeeId, min(date) as acctdate
      from table t
      group by t.employeeId
     ) ta join
     table t2
     on ta.employeeId = t2.employeeId and
        t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note:  date arithmetic depends on the database
group by ta.employeeId;

唯一不标准的部分是日期算法。您需要以适合您的数据库的方式表达它。

编辑:

如果您想要 之前的结果而不是 employee:

select datediff(ta.acctdate, t2.date), avg(t2.rating) as avgrating
from (select t.employeeId, min(date) as acctdate
      from table t
      group by t.employeeId
     ) ta join
     table t2
     on ta.employeeId = t2.employeeId and
        t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note:  date arithmetic depends on the database
group by datediff(ta.acctdate, t2.date);