总结标识符出现的时间变化

Summarizing temporal change in identifier occurrence

我有一个 table 每个日期和用户 ID 都有作业 ID (varchar)。

User    Date        Job
mid1    2019-10-10  jid1
mid1    2019-10-10  jid2
mid1    2019-10-10  jid3
mid1    2019-10-10  jid4
mid1    2019-10-10  jid5
mid1    2019-10-11  jid3
mid1    2019-10-11  jid5
mid1    2019-10-11  jid6
mid1    2019-10-11  jid7
mid1    2019-10-11  jid8
mid1    2019-10-11  jid9
mid1    2019-10-12  jid3
mid1    2019-10-12  jid9
mid1    2019-10-12  jid10
mid2    2019-10-10  jid100
mid2    2019-10-10  jid101
mid2    2019-10-10  jid102
...

现在我需要一个 table,其中包含每个用户在数据时间序列中的新 ("Incoming") 和已完成 ("Outgoing") 作业数。

User    Date       Jobs  Incoming  Outgoing
mid1    2019-10-10   5     5           0
mid1    2019-10-11   6     4           3
mid1    2019-10-12   3     1           4
mid2    ...

如果它只计算唯一的作业 ID(有重复),那也很好。但否则我可以事先消除它们。

这可以使用 Teradata 完成吗SQL?

SELECT
   User
  ,Date
  ,Count(*) AS Jobs
   -- new jobs today
  ,Sum(firstdate) AS Incoming
   -- finished jobs today
  ,Sum(lastdate)
   -- finished jobs the day before
  ,Lag(Sum(lastdate),1,0) Over (PARTITION BY User ORDER BY Date) AS Outgoing
FROM
 (
   SELECT
      User
     ,Job
     ,Date 
      -- flag indicating job is present on the current day but absent the day before
     ,CASE WHEN Date =  Lag(Date) Over (PARTITION BY User, job ORDER BY Date) + 1 THEN 0 ELSE 1 END AS firstdate
      -- flag indicating job is present on the current day but absent the day after
     ,CASE WHEN Date = Lead(Date) Over (PARTITION BY User, job ORDER BY Date) - 1 THEN 0 ELSE 1 END AS lastdate
   FROM your_table
   -- to remove duplicate rows add
   -- GROUP BY 1,2,3
 ) AS dt
GROUP BY 1,2
ORDER BY 1,2

如果您的 Teradata 版本不支持 LAG/LEAD(即 < 16.10),您必须重写它:

SELECT
   User
  ,Date
  ,Count(*) AS Jobs
   -- new jobs today
  ,Sum(firstdate) AS Incoming
   -- finished jobs today
  ,Sum(lastdate)
   -- finished jobs the day before
  ,Coalesce(Min(Sum(lastdate)) Over (PARTITION BY User ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding), 0) AS Outgoing
FROM
 (
   SELECT
      User
     ,Job
     ,Date 
      -- flag indicating job is present on the current day but absent the day before
     ,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding) + 1 THEN 0 ELSE 1 END AS firstdate
      -- flag indicating job is present on the current day but absent the day after
     ,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Following AND 1 Following ) - 1 THEN 0 ELSE 1 END AS lastdate
   FROM your_table
   -- to remove duplicate rows add
   -- GROUP BY 1,2,3
 ) AS dt
GROUP BY 1,2
ORDER BY 1,2