如果基于开始日期的行之间没有变化,则合并员工历史记录

Merge employee history records if there is no change between the rows based on start date

我正在尝试合并员工历史记录,并在任何其他维度列(员工、部门、工作、职位状态)没有其他变化时获取最小开始日期和最大结束日期。

输入:

输出:

table 创建和填充数据的脚本:

create table EmployeeHistory (EmployeeHistoryID INT,
                              EmployeeID INT,
                              DepartmentID  INT,
                              JobID INT,
                              PositionStatusID  INT,
                              StartDate DATE,
                              EndDate DATE)

insert into EmployeeHistory values (123, 362880, 450, 243, 1, '2019-05-28', '2020-05-03')
insert into EmployeeHistory values (124, 362880, 450, 243, 2, '2020-05-04', '2020-08-20')
insert into EmployeeHistory values (125, 362880, 450, 243, 1, '2020-08-21', '2020-08-31')
insert into EmployeeHistory values (126, 362880, 450, 243, 1, '2020-09-01',  '2021-09-23')
insert into EmployeeHistory values (127, 362881, 450, 243, 1, '2019-07-01', '2019-07-31')
insert into EmployeeHistory values (128, 362881, 450, 243, 1, '2019-08-01',  '2021-09-23')

当我使用分析函数或分组依据时,它正在合并第 1、3 和 4 行,但我只想合并 3 和 4,因为所有其他列都相同。尽管第 1 行与第 3 和第 4 行相同,但为了维护历史记录,在这种情况下不应将第 1 行合并到第 3 和第 4 行。

示例代码,我正在使用:

select distinct *
  from (select MAX(EmployeeHistoryID) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS EmployeeHistoryID,
               EmployeeID,
               DepartmentID,
               JobID,
               PositionStatusID,
               MIN(StartDate) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS StartDate,
               MAX(EndDate) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS EndDate
          from EmployeeHistory) m

如果我没理解错的话,这很容易使用 group by 实现。看看是否符合预期:

SELECT Max(employeehistoryid) AS EmployeeHistoryID,
       employeeid,
       departmentid,
       jobid,
       positionstatusid,
       Min(startdate)         AS StartDate,
       Max(enddate)           AS EndDate
FROM   employeehistory
GROUP  BY employeeid,
          departmentid,
          jobid,
          positionstatusid 

这是一种间隙和孤岛问题(一种与将相邻行与相似信息相结合的问题类型)。

在您的数据中,每个员工的记录完美地“平铺”在一起。没有差距。一行的开始日期是结束日期加上该员工上一行的一天。

这使您可以仅使用 window 函数来解决问题。避免聚合通常是一种性能优势。这个想法是找到发生变化的第一行,保留该行并计算结束日期。最终结束日期有点复杂:

select eh.EmployeeHistoryID, eh.EmployeeID, eh.DepartmentID, eh.JobID, eh.PositionStatusID, eh.StartDate,
       lead(dateadd(day, -1, StartDate), 1, max_EndDate) over (partition by EmployeeId order by StartDate) as EndDate
from (select eh.*,
             lag(StartDate) over (partition by EmployeeID order by StartDate) as prev_StartDate,
             lag(StartDate) over (partition by EmployeeID, DepartmentID, JobID, PositionStatusID order by StartDate) as prev_StartDate_same,
             max(EndDate) over (partition by EmployeeId) as max_EndDate
      from EmployeeHistory eh
     ) eh
where prev_StartDate_same is null or prev_StartDate_same <> prev_StartDate
order by EmployeeHistoryID;

Here 是一个 db<>fiddle.