Oracle:根据每行的过去 6 个月间隔计算 count()

Oracle: Calculate the count() based on the past 6 month interval for each rows

我有以下数据(数据是从2017年至今)

SELECT * FROM TABLE1 WHERE DATE > TO_DATE('01/01/2019','MM/DD/YYYY')

 Emp_ID         Date            Vehicle_ID        Working_Hours
 1005          01/01/2019         X500               7
 1005          01/02/2019         X500               6
 1005          01/03/2019         X700               7
 1005          01/04/2019         X500               5
 1005          01/05/2019         X700               7
 1005          01/06/2019         X500               7
 1006          01/01/2019         X500               7
 1006          01/02/2019         X500               6
 1006          01/03/2019         X700               7
 1006          01/04/2019         X500               5
 1006          01/05/2019         X700               7
 1006          01/06/2019         X500               7

我需要计算两列。 LAST_6M_UNIQ_Vehicle_Count ==> 该员工过去(过去)6 个月的唯一车辆 ID 计数 LAST_6M_Vehicle_Count ==> 该员工过去 6 个月的所有车辆 ID 计数 注意:从日期列开始过去 6 个月

预期输出:

 Emp_ID         Date            Vehicle_ID        Working_Hours     LAST_6M_UNIQ_Vehicle_Count     LAST_6M_Vehicle_Count
 1005          01/01/2019         X500               7                      6                       66
 1005          01/02/2019         X500               6                      7                       62
 1005          01/03/2019         X700               7                      6                       63
 1005          01/04/2019         X500               5                      7                       67
 1005          01/05/2019         X700               7                      7                       66
 1005          01/06/2019         X500               7                      7                       67
  .               .                .                 .
  .               .                .                 .
  .               .                .                 .
 1005          03/20/2019         X600               6                      12                      75
 1006          01/01/2019         X500               7                      11                      74
 1006          01/02/2019         X500               6                      10                      66
 1006          01/03/2019         X700               7                      11                      72
 1006          01/04/2019         X500               5                      13                      67
 1006          01/05/2019         X700               7                      12                      64
 1006          01/06/2019         X500               7                      12                      63

例如,在第一行中,LAST_6M_UNIQ_Vehicle_Count 的值为 6,因为对于员工 ID 1005,车辆 ID 在 ((01/01/2019) - 6 个月) 和01/01/2019 中有 6 个不同的车辆 ID。

我试过 Over 和 Partition by 但缺少 6 个月的间隔

 SELECT t.*, COUNT(DISTINCT t.VEHICLE_ID) OVER (PARTITION BY t.EMP_ID ORDER BY t.DATE) 
        AS LAST_6M_UNIQ_Vehicle_Count
        FROM TABLE1 t

我无法计算每行基于 6 个月间隔的值。

非常感谢您的帮助。

可以使用window函数和范围框架规范来做到这一点。

计算非重复计数有点棘手:Oracle 不直接支持它,但我们可以分两步进行。首先在 employee/vehicle 个分区内执行 window 计数,然后仅考虑员工分区中每辆车的第一次出现。

所以:

select vehicle_id, emp_id, "DATE",
    sum(case when flag = 1 then 1 else 0 end) over(
        partition by emp_id
            order by "DATE"
            range between interval '6' month preceding and current row
    ) as last_6m_uniq_vehicle_count,
    count(*) over (
        partition by emp_id 
        order by "DATE"
        range between interval '6' month preceding and current row
    ) as last_6m_vehicle_count
from (
    select t.*, 
        count(*) over (
            partition by emp_id , vehicle_id
            order by "DATE"
            range between interval '6' month preceding and current row
        ) as flag
    from table_name t
) t
order by "DATE", vehicle_id

Oracle 不喜欢 COUNT( DISTINCT ... ) OVER ( ... ) 在带范围的窗口分析函数中使用时会引发 ORA-30487: ORDER BY not allowed here 异常(否则,这就是解决方案)。它可以在没有 DISTINCT 关键字的情况下使用,但不能使用它。

相反,您可以使用相关 sub-query:

SELECT t.*,
       ( SELECT COUNT( DISTINCT vehicle_id )
         FROM   table_name c
         WHERE  c.emp_id = t.emp_id
         AND    c."DATE" <= t."DATE"
         AND    ADD_MONTHS( t."DATE", -6 ) <= c."DATE"
       ) AS last_6m_uniq_vehicle_count,
       COUNT(t.vehicle_id) OVER (
         PARTITION BY t.emp_id 
         ORDER     BY t."DATE"
         RANGE BETWEEN INTERVAL '6' MONTH PRECEDING
               AND     CURRENT ROW
      ) AS last_6m_vehicle_count
FROM  table_name t

其中样本数据:

CREATE TABLE table_name ( vehicle_id, emp_id, "DATE" ) AS
SELECT 1, 1, DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-07-31' FROM DUAL UNION ALL
SELECT 1, 1, DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-05-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-04-30' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-03-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-02-29' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-01-31' FROM DUAL UNION ALL
SELECT 3, 1, DATE '2020-01-31' FROM DUAL;

输出:

VEHICLE_ID | EMP_ID | DATE      | LAST_6M_UNIQ_VEHICLE_COUNT | LAST_6M_VEHICLE_COUNT
---------: | -----: | :-------- | -------------------------: | --------------------:
         2 |      1 | 31-JAN-20 |                          2 |                     2
         3 |      1 | 31-JAN-20 |                          2 |                     2
         2 |      1 | 29-FEB-20 |                          2 |                     3
         2 |      1 | 31-MAR-20 |                          2 |                     4
         2 |      1 | 30-APR-20 |                          2 |                     5
         2 |      1 | 31-MAY-20 |                          2 |                     6
         1 |      1 | 30-JUN-20 |                          3 |                     7
         2 |      1 | 31-JUL-20 |                          3 |                     8
         1 |      1 | 31-AUG-20 |                          2 |                     7

db<>fiddle here

正如 MTO 指出的那样,count(distinct) 不能用作 window 函数来解决此问题。

出于这个原因,我会选择横向连接:

select t.*, l.*
from t cross join lateral
     (select count(*) as last_6m_vehicle_count, count(distinct t.vehicle_id) as last_6m_uniq_vehicle_count
      from t t2
      where t2.emp_id = t.emp_id and
            t2.dte <= t.dte and
            t2.dte > add_months(t.dte, -6)
    ) l;

Here 是一个 db<>fiddle.