如何使用合并在 SQL 服务器中插入新行并标记没有更新和更新行的现有行
How to insert new rows and mark existing rows with no update and updated rows in SQL Server using merge
我需要在 SQL 服务器中创建一个存储过程并实现一个更新插入,以便它将数据从暂存 table(a.k.a 源)移动到最终 table(a.k.a 目标)并在每次新一批数据进入时标记新的、更新的、未更新或删除的行。我正在使用合并,如 here 所述。
问题是它正在更新没有任何更改的行。我的工作流程如下:
- 将数据加载到源中 table
- 调用将根据合并条件将数据从
Source
移动到 Target
的存储过程
我的存储过程如下:
CREATE PROCEDURE [dbo].[upsert_with_flag_2]
AS
DECLARE @current_time AS datetime
SET @current_time = GETDATE()
MERGE [dbo].[employee] AS Target
USING [dbo].[employee_staging] AS Source
ON Source.[first_name] = Target.[first_name]
AND Source.[last_name] = Target.[last_name]
AND Source.[dob] = Target.[dob]
WHEN MATCHED
THEN
UPDATE
SET Target.[salary] = Source.[salary],
Target.[current_address] = Source.[current_address],
Target.[is_deleted] = 'Updated',
Target.[processed_date] = @current_time
WHEN NOT MATCHED BY Target
THEN
INSERT ([first_name], [last_name],
[dob], [salary],
[current_address], [is_deleted],
[processed_date])
VALUES (Source.[first_name], Source.[last_name],
Source.[dob], Source.[salary],
Source.[current_address], 'New',
@current_time);
-- After doing upsert, check for rows whose processed date is less than current date but status is new or updated, These are
-- the rows which were not present in input file. Update there status to deleted
-- QUESTION: Should we change the processed date to current date for row's whose status is deleted?
UPDATE [dbo].[employee]
SET [is_deleted] = 'deleted'
WHERE ([is_deleted] = 'New' OR [is_deleted] = 'Updated')
AND [processed_date] < @current_time
在此之后,我执行以下步骤来加载数据并获取输出:
--Loading the initial data
TRUNCATE TABLE [dbo].[employee_staging]
GO
INSERT INTO [dbo].[employee_staging] ([first_name],
[last_name],
[dob],
[salary],
[current_address])
VALUES ('John', 'Doe', '1995-04-28', 3000, 'Andra Pradesh'),
('Robert', 'Spenser', '1994-03-28', 1800, 'Madhya Pradesh'),
('Vikash', 'Sharma', '1996-12-20', 1400, 'Uttar Pradesh'),
('Anup', 'Soni', '1994-03-28', 1800, 'Delhi'),
('Prijan', 'Sonar', '1989-01-28', 3000, 'Himachal Pradesh')
GO
EXEC upsert_with_flag
SELECT * FROM [dbo].[employee]
--Loading the updated data
TRUNCATE TABLE [dbo].[employee_staging]
GO
INSERT INTO [dbo].[employee_staging] ([first_name], [last_name],
[dob], [salary],
[current_address])
VALUES ('Robert', 'Spenser', '1994-03-28', 2000, 'Madhya Pradesh'),
('Vikash', 'Sharma', '1996-12-20', 1400, 'Maharashtra'),
('Anup', 'Soni', '1994-03-28', 1800, 'Delhi'),
('Prijan', 'Sonar', '1989-01-28', 3000, 'Himachal Pradesh'),
('William', 'Beck', '1991-04-22', 3300, 'Karnataka'),
('Robert', 'Brownie', '1986-04-22', 5000, 'Assam')
注意第 4 行和第 5 行。Anup 的输入行数据没有变化,我仍然将 [is_deleted] 列显示为“已更新”。我希望它类似于“现有”或“无变化”。
请帮助实现这一目标。此更新插入逻辑是大型管道的一部分,我们需要在新文件中更新、新建、未更新或删除的行。我该如何实现?
您可以在更新的 is_deleted 字段中添加 case 语句,以检查是否有任何更改。像这样:
MERGE [dbo].[employee] as Target
USING [dbo].[employee_stagging] as Source
ON Source.[first_name] = Target.[first_name] and
Source.[last_name] = Target.[last_name] and
Source.[dob] = Target.[dob]
WHEN MATCHED
THEN
UPDATE
SET Target.[salary] = Source.[salary],
Target.[current_address] = Source.[current_address],
Target.[is_deleted] = CASE WHEN Source.salary = Target.salary
AND Source.current_address = Target.current_address THEN 'No change'
ELSE 'Updated'
END,
Target.[processed_date] = @current_time
WHEN NOT MATCHED BY Target
THEN
INSERT ([first_name],
[last_name],
[dob],
[salary],
[current_address],
[is_deleted],
[processed_date])
VALUES (Source.[first_name],
Source.[last_name],
Source.[dob],
Source.[salary],
Source.[current_address],
'New',
@current_time
);
因此,如果更新字段与暂存数据相同,它会将 is_deleted 更新为“无变化”,而如果更新字段已更改,它将将 is_deleted 更新为“已更新”。
注意:此代码假设工资和 current_address 是不可为空的字段(table 字段定义后跟 NOT NULL)。如果有可空值,那么您应该通过将 CASE 语句替换为以下内容来处理空值:
CASE WHEN (Source.salary = Target.salary
OR Source.salary IS NULL
AND Target.salary IS NULL)
AND (Source.current_address = Target.current_address
OR Source.current_address IS NULL
AND Target.current_address IS NULL) THEN 'No change'
ELSE 'Updated'
END
(文档 here)
我需要在 SQL 服务器中创建一个存储过程并实现一个更新插入,以便它将数据从暂存 table(a.k.a 源)移动到最终 table(a.k.a 目标)并在每次新一批数据进入时标记新的、更新的、未更新或删除的行。我正在使用合并,如 here 所述。
问题是它正在更新没有任何更改的行。我的工作流程如下:
- 将数据加载到源中 table
- 调用将根据合并条件将数据从
Source
移动到Target
的存储过程
我的存储过程如下:
CREATE PROCEDURE [dbo].[upsert_with_flag_2]
AS
DECLARE @current_time AS datetime
SET @current_time = GETDATE()
MERGE [dbo].[employee] AS Target
USING [dbo].[employee_staging] AS Source
ON Source.[first_name] = Target.[first_name]
AND Source.[last_name] = Target.[last_name]
AND Source.[dob] = Target.[dob]
WHEN MATCHED
THEN
UPDATE
SET Target.[salary] = Source.[salary],
Target.[current_address] = Source.[current_address],
Target.[is_deleted] = 'Updated',
Target.[processed_date] = @current_time
WHEN NOT MATCHED BY Target
THEN
INSERT ([first_name], [last_name],
[dob], [salary],
[current_address], [is_deleted],
[processed_date])
VALUES (Source.[first_name], Source.[last_name],
Source.[dob], Source.[salary],
Source.[current_address], 'New',
@current_time);
-- After doing upsert, check for rows whose processed date is less than current date but status is new or updated, These are
-- the rows which were not present in input file. Update there status to deleted
-- QUESTION: Should we change the processed date to current date for row's whose status is deleted?
UPDATE [dbo].[employee]
SET [is_deleted] = 'deleted'
WHERE ([is_deleted] = 'New' OR [is_deleted] = 'Updated')
AND [processed_date] < @current_time
在此之后,我执行以下步骤来加载数据并获取输出:
--Loading the initial data
TRUNCATE TABLE [dbo].[employee_staging]
GO
INSERT INTO [dbo].[employee_staging] ([first_name],
[last_name],
[dob],
[salary],
[current_address])
VALUES ('John', 'Doe', '1995-04-28', 3000, 'Andra Pradesh'),
('Robert', 'Spenser', '1994-03-28', 1800, 'Madhya Pradesh'),
('Vikash', 'Sharma', '1996-12-20', 1400, 'Uttar Pradesh'),
('Anup', 'Soni', '1994-03-28', 1800, 'Delhi'),
('Prijan', 'Sonar', '1989-01-28', 3000, 'Himachal Pradesh')
GO
EXEC upsert_with_flag
SELECT * FROM [dbo].[employee]
--Loading the updated data
TRUNCATE TABLE [dbo].[employee_staging]
GO
INSERT INTO [dbo].[employee_staging] ([first_name], [last_name],
[dob], [salary],
[current_address])
VALUES ('Robert', 'Spenser', '1994-03-28', 2000, 'Madhya Pradesh'),
('Vikash', 'Sharma', '1996-12-20', 1400, 'Maharashtra'),
('Anup', 'Soni', '1994-03-28', 1800, 'Delhi'),
('Prijan', 'Sonar', '1989-01-28', 3000, 'Himachal Pradesh'),
('William', 'Beck', '1991-04-22', 3300, 'Karnataka'),
('Robert', 'Brownie', '1986-04-22', 5000, 'Assam')
注意第 4 行和第 5 行。Anup 的输入行数据没有变化,我仍然将 [is_deleted] 列显示为“已更新”。我希望它类似于“现有”或“无变化”。
请帮助实现这一目标。此更新插入逻辑是大型管道的一部分,我们需要在新文件中更新、新建、未更新或删除的行。我该如何实现?
您可以在更新的 is_deleted 字段中添加 case 语句,以检查是否有任何更改。像这样:
MERGE [dbo].[employee] as Target
USING [dbo].[employee_stagging] as Source
ON Source.[first_name] = Target.[first_name] and
Source.[last_name] = Target.[last_name] and
Source.[dob] = Target.[dob]
WHEN MATCHED
THEN
UPDATE
SET Target.[salary] = Source.[salary],
Target.[current_address] = Source.[current_address],
Target.[is_deleted] = CASE WHEN Source.salary = Target.salary
AND Source.current_address = Target.current_address THEN 'No change'
ELSE 'Updated'
END,
Target.[processed_date] = @current_time
WHEN NOT MATCHED BY Target
THEN
INSERT ([first_name],
[last_name],
[dob],
[salary],
[current_address],
[is_deleted],
[processed_date])
VALUES (Source.[first_name],
Source.[last_name],
Source.[dob],
Source.[salary],
Source.[current_address],
'New',
@current_time
);
因此,如果更新字段与暂存数据相同,它会将 is_deleted 更新为“无变化”,而如果更新字段已更改,它将将 is_deleted 更新为“已更新”。
注意:此代码假设工资和 current_address 是不可为空的字段(table 字段定义后跟 NOT NULL)。如果有可空值,那么您应该通过将 CASE 语句替换为以下内容来处理空值:
CASE WHEN (Source.salary = Target.salary
OR Source.salary IS NULL
AND Target.salary IS NULL)
AND (Source.current_address = Target.current_address
OR Source.current_address IS NULL
AND Target.current_address IS NULL) THEN 'No change'
ELSE 'Updated'
END
(文档 here)