HIVE 或 SQL 查询以比较相同样本量的前和 post 销售额
HIVE or SQL query to compare pre and post sales for same sample size
我有一个 table,其中包含 employeeIDs(字符串)、绩效评级(int)、日期(字符串)以及标记帐户(字符串)(如果已订阅)(Account = 'yes'订阅后 'no' 订阅前)
不同的员工在不同的日期订阅,
pre = 订阅前
post = 订阅后
需要计算他们订阅前 4 天的绩效评分总和以及从他们订阅之日起 post 4 天的绩效评分总和。
------------1-----2----3---4---|---4---3---2---- 1------------
table 如下所示(每天之间留 space 只是为了便于理解)
Table 包含每个员工每天的交易行。
((code SNIPPET ARE JUST GET TABLE STRUCTURE TO APPEAR CLEAR))
employeeId | performance rating | account | date
--------------------------------------------------------
sam 3.2 no 2013-9-15
charlie 3.1 no 2013-9-15
john 2.1 no 2013-9-15
sam 4.1 yes 2013-9-16
charlie 5.1 no 2013-9-16
john 2.1 no 2013-9-16
sam 5.3 yes 2013-9-17
charlie 1.4 no 2013-9-17
john 6.3 yes 2013-9-17
sam 5.3 yes 2013-9-18
charlie 1.4 no 2013-9-18
john 6.3 yes 2013-9-18
sam 5.3 yes 2013-9-19
charlie 1.4 yes 2013-9-19
john 8.3 yes 2013-9-19
sam 6.3 yes 2013-9-20
charlie 7.4 yes 2013-9-20
john 9.3 yes 2013-9-20
>
期望的输出(数字只是为了样本,不是计算出来的)
DAY sum performance rating
pre 1st day 10.0
pre 2nd day 13.9
pre 3rd day 24.9
pre 4th day 12.4
post 1st day 16.8
post 2nd day 14.6
post 3rd day 17.2
post 4th day 12.8
感谢任何帮助..尝试了很多方法但仍然无法弄清楚..
您几乎可以在标准 SQL 中使用子查询和非等值连接来做到这一点:
select ta.employeeId,
avg(case when t2.date < ta.date then t2.rating end) as beforeRating,
avg(case when t2.date > ta.date then t2.rating end) as afterRating
from (select t.employeeId, min(date) as acctdate
from table t
group by t.employeeId
) ta join
table t2
on ta.employeeId = t2.employeeId and
t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note: date arithmetic depends on the database
group by ta.employeeId;
唯一不标准的部分是日期算法。您需要以适合您的数据库的方式表达它。
编辑:
如果您想要 天 之前的结果而不是 employee:
select datediff(ta.acctdate, t2.date), avg(t2.rating) as avgrating
from (select t.employeeId, min(date) as acctdate
from table t
group by t.employeeId
) ta join
table t2
on ta.employeeId = t2.employeeId and
t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note: date arithmetic depends on the database
group by datediff(ta.acctdate, t2.date);
我有一个 table,其中包含 employeeIDs(字符串)、绩效评级(int)、日期(字符串)以及标记帐户(字符串)(如果已订阅)(Account = 'yes'订阅后 'no' 订阅前)
不同的员工在不同的日期订阅,
pre = 订阅前 post = 订阅后
需要计算他们订阅前 4 天的绩效评分总和以及从他们订阅之日起 post 4 天的绩效评分总和。
------------1-----2----3---4---|---4---3---2---- 1------------
table 如下所示(每天之间留 space 只是为了便于理解) Table 包含每个员工每天的交易行。
((code SNIPPET ARE JUST GET TABLE STRUCTURE TO APPEAR CLEAR))
employeeId | performance rating | account | date
--------------------------------------------------------
sam 3.2 no 2013-9-15
charlie 3.1 no 2013-9-15
john 2.1 no 2013-9-15
sam 4.1 yes 2013-9-16
charlie 5.1 no 2013-9-16
john 2.1 no 2013-9-16
sam 5.3 yes 2013-9-17
charlie 1.4 no 2013-9-17
john 6.3 yes 2013-9-17
sam 5.3 yes 2013-9-18
charlie 1.4 no 2013-9-18
john 6.3 yes 2013-9-18
sam 5.3 yes 2013-9-19
charlie 1.4 yes 2013-9-19
john 8.3 yes 2013-9-19
sam 6.3 yes 2013-9-20
charlie 7.4 yes 2013-9-20
john 9.3 yes 2013-9-20
>
期望的输出(数字只是为了样本,不是计算出来的)
DAY sum performance rating
pre 1st day 10.0
pre 2nd day 13.9
pre 3rd day 24.9
pre 4th day 12.4
post 1st day 16.8
post 2nd day 14.6
post 3rd day 17.2
post 4th day 12.8
感谢任何帮助..尝试了很多方法但仍然无法弄清楚..
您几乎可以在标准 SQL 中使用子查询和非等值连接来做到这一点:
select ta.employeeId,
avg(case when t2.date < ta.date then t2.rating end) as beforeRating,
avg(case when t2.date > ta.date then t2.rating end) as afterRating
from (select t.employeeId, min(date) as acctdate
from table t
group by t.employeeId
) ta join
table t2
on ta.employeeId = t2.employeeId and
t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note: date arithmetic depends on the database
group by ta.employeeId;
唯一不标准的部分是日期算法。您需要以适合您的数据库的方式表达它。
编辑:
如果您想要 天 之前的结果而不是 employee:
select datediff(ta.acctdate, t2.date), avg(t2.rating) as avgrating
from (select t.employeeId, min(date) as acctdate
from table t
group by t.employeeId
) ta join
table t2
on ta.employeeId = t2.employeeId and
t2.date between ta.acctdate - 4 and ta.acctdate + 4 -- Note: date arithmetic depends on the database
group by datediff(ta.acctdate, t2.date);