总结标识符出现的时间变化
Summarizing temporal change in identifier occurrence
我有一个 table 每个日期和用户 ID 都有作业 ID (varchar)。
User Date Job
mid1 2019-10-10 jid1
mid1 2019-10-10 jid2
mid1 2019-10-10 jid3
mid1 2019-10-10 jid4
mid1 2019-10-10 jid5
mid1 2019-10-11 jid3
mid1 2019-10-11 jid5
mid1 2019-10-11 jid6
mid1 2019-10-11 jid7
mid1 2019-10-11 jid8
mid1 2019-10-11 jid9
mid1 2019-10-12 jid3
mid1 2019-10-12 jid9
mid1 2019-10-12 jid10
mid2 2019-10-10 jid100
mid2 2019-10-10 jid101
mid2 2019-10-10 jid102
...
现在我需要一个 table,其中包含每个用户在数据时间序列中的新 ("Incoming") 和已完成 ("Outgoing") 作业数。
User Date Jobs Incoming Outgoing
mid1 2019-10-10 5 5 0
mid1 2019-10-11 6 4 3
mid1 2019-10-12 3 1 4
mid2 ...
如果它只计算唯一的作业 ID(有重复),那也很好。但否则我可以事先消除它们。
这可以使用 Teradata 完成吗SQL?
SELECT
User
,Date
,Count(*) AS Jobs
-- new jobs today
,Sum(firstdate) AS Incoming
-- finished jobs today
,Sum(lastdate)
-- finished jobs the day before
,Lag(Sum(lastdate),1,0) Over (PARTITION BY User ORDER BY Date) AS Outgoing
FROM
(
SELECT
User
,Job
,Date
-- flag indicating job is present on the current day but absent the day before
,CASE WHEN Date = Lag(Date) Over (PARTITION BY User, job ORDER BY Date) + 1 THEN 0 ELSE 1 END AS firstdate
-- flag indicating job is present on the current day but absent the day after
,CASE WHEN Date = Lead(Date) Over (PARTITION BY User, job ORDER BY Date) - 1 THEN 0 ELSE 1 END AS lastdate
FROM your_table
-- to remove duplicate rows add
-- GROUP BY 1,2,3
) AS dt
GROUP BY 1,2
ORDER BY 1,2
如果您的 Teradata 版本不支持 LAG/LEAD(即 < 16.10),您必须重写它:
SELECT
User
,Date
,Count(*) AS Jobs
-- new jobs today
,Sum(firstdate) AS Incoming
-- finished jobs today
,Sum(lastdate)
-- finished jobs the day before
,Coalesce(Min(Sum(lastdate)) Over (PARTITION BY User ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding), 0) AS Outgoing
FROM
(
SELECT
User
,Job
,Date
-- flag indicating job is present on the current day but absent the day before
,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding) + 1 THEN 0 ELSE 1 END AS firstdate
-- flag indicating job is present on the current day but absent the day after
,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Following AND 1 Following ) - 1 THEN 0 ELSE 1 END AS lastdate
FROM your_table
-- to remove duplicate rows add
-- GROUP BY 1,2,3
) AS dt
GROUP BY 1,2
ORDER BY 1,2
我有一个 table 每个日期和用户 ID 都有作业 ID (varchar)。
User Date Job
mid1 2019-10-10 jid1
mid1 2019-10-10 jid2
mid1 2019-10-10 jid3
mid1 2019-10-10 jid4
mid1 2019-10-10 jid5
mid1 2019-10-11 jid3
mid1 2019-10-11 jid5
mid1 2019-10-11 jid6
mid1 2019-10-11 jid7
mid1 2019-10-11 jid8
mid1 2019-10-11 jid9
mid1 2019-10-12 jid3
mid1 2019-10-12 jid9
mid1 2019-10-12 jid10
mid2 2019-10-10 jid100
mid2 2019-10-10 jid101
mid2 2019-10-10 jid102
...
现在我需要一个 table,其中包含每个用户在数据时间序列中的新 ("Incoming") 和已完成 ("Outgoing") 作业数。
User Date Jobs Incoming Outgoing
mid1 2019-10-10 5 5 0
mid1 2019-10-11 6 4 3
mid1 2019-10-12 3 1 4
mid2 ...
如果它只计算唯一的作业 ID(有重复),那也很好。但否则我可以事先消除它们。
这可以使用 Teradata 完成吗SQL?
SELECT
User
,Date
,Count(*) AS Jobs
-- new jobs today
,Sum(firstdate) AS Incoming
-- finished jobs today
,Sum(lastdate)
-- finished jobs the day before
,Lag(Sum(lastdate),1,0) Over (PARTITION BY User ORDER BY Date) AS Outgoing
FROM
(
SELECT
User
,Job
,Date
-- flag indicating job is present on the current day but absent the day before
,CASE WHEN Date = Lag(Date) Over (PARTITION BY User, job ORDER BY Date) + 1 THEN 0 ELSE 1 END AS firstdate
-- flag indicating job is present on the current day but absent the day after
,CASE WHEN Date = Lead(Date) Over (PARTITION BY User, job ORDER BY Date) - 1 THEN 0 ELSE 1 END AS lastdate
FROM your_table
-- to remove duplicate rows add
-- GROUP BY 1,2,3
) AS dt
GROUP BY 1,2
ORDER BY 1,2
如果您的 Teradata 版本不支持 LAG/LEAD(即 < 16.10),您必须重写它:
SELECT
User
,Date
,Count(*) AS Jobs
-- new jobs today
,Sum(firstdate) AS Incoming
-- finished jobs today
,Sum(lastdate)
-- finished jobs the day before
,Coalesce(Min(Sum(lastdate)) Over (PARTITION BY User ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding), 0) AS Outgoing
FROM
(
SELECT
User
,Job
,Date
-- flag indicating job is present on the current day but absent the day before
,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding) + 1 THEN 0 ELSE 1 END AS firstdate
-- flag indicating job is present on the current day but absent the day after
,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Following AND 1 Following ) - 1 THEN 0 ELSE 1 END AS lastdate
FROM your_table
-- to remove duplicate rows add
-- GROUP BY 1,2,3
) AS dt
GROUP BY 1,2
ORDER BY 1,2