使用 Window 函数计算 Hive 中的滚动每周支出

Calculating Rolling Weekly Spend in Hive using Window Functions

我需要制定客户周长支出的分布。每次客户购买时,我都想知道他们在过去一周在我们这里花了多少钱。我想用我的 Hive 代码来做这个。

我的数据集和这个有点类似:

Spend_Table

Cust_ID | Purch_Date | Purch_Amount  
1 | 1/1/19 |   
1 | 1/2/19 |   
1 | 1/3/19 |   
1 | 1/4/19 |   
1 | 1/5/19 |   
1 | 1/6/19 |   
1 | 1/7/19 |   
2 | 1/1/19 |   
2 | 1/2/19 |   
2 | 1/3/19 |   
2 | 1/5/19 |   
2 | 1/7/19 |   
2 | 1/9/19 |   
2 | 1/11/19 |   

到目前为止,我已经尝试过类似这样的代码:

Select Cust_ID, 
Purch_Date, 
Purch_Amount,
sum(Purch_Amount) over (partition by Cust_ID order by unix_timestamp(Purch_Date) range between 604800 and current row) as Rolling_Spend
from Spend_Table



Cust_ID | Purch_Date | Purch_Amount | Rolling_Spend  
1 | 1/1/19 |  |   
1 | 1/2/19 |  |   
1 | 1/3/19 |  |   
1 | 1/4/19 |  |   
1 | 1/5/19 |  |   
1 | 1/6/19 |  | 4  
1 | 1/7/19 |  | 5  
2 | 1/1/19 |  |   
2 | 1/2/19 |  |   
2 | 1/3/19 |  |   
2 | 1/5/19 |  | 8  
2 | 1/7/19 |  | 0  
2 | 1/9/19 |  | 8  
2 | 1/11/19 |  | 8  

我认为问题出在我的范围之间,因为它似乎占用了前面的行数。我期待它在之前的秒数内获取数据(604800 是 6 天秒)。

我想做的事情可行吗?我做不到前 6 行,因为不是每个客户每天都会购买,就像客户 2 一样。非常感谢任何帮助!

SELECT *, sum(some_value) OVER (
        PARTITION BY Cust_ID 
        ORDER BY CAST(Purch_Date AS timestamp) 
        RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW
     ) AS cummulativeSum FROM Spend_Table

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

将问题的答案移至此处,

I was able to get the original code to work by changing it to:

Select Cust_ID, 
Purch_Date, 
Purch_Amount,
sum(Purch_Amount) over (partition by Cust_ID order by unix_timestamp(Purch_Date, 'MM-dd-yyyy') range between 604800 and

current row) as Rolling_Spend from Spend_Table

The key was specifying the date format in the unix_timestamp formula