基于集合的非规范化数据批量导入规范化 SQL Server 2014 数据库表
Set-based bulk import of denormalized data into normalized SQL Server 2014 database tables
以下简化模型可以很好地基于 bulk/set 在 #BulkData
中插入非规范化数据(欢迎提出改进建议):
IF OBJECT_ID('tempdb..#Things') IS NOT NULL
DROP TABLE #Things
IF OBJECT_ID('tempdb..#Categories') IS NOT NULL
DROP TABLE #Categories
IF OBJECT_ID('tempdb..#ThingsToCategories') IS NOT NULL
DROP TABLE #ThingsToCategories
IF OBJECT_ID('tempdb..#BulkData') IS NOT NULL
DROP TABLE #BulkData
CREATE TABLE #Things
(
ThingId INT IDENTITY(1,1) PRIMARY KEY,
ThingName NVARCHAR(255)
)
CREATE TABLE #Categories
(
CategoryId INT IDENTITY(1,1) PRIMARY KEY,
CategoryName NVARCHAR(255)
)
CREATE TABLE #ThingsToCategories
(
ThingId INT,
CategoryId INT
)
CREATE TABLE #BulkData
(
ThingName NVARCHAR(255),
CategoryName NVARCHAR(255)
)
-- the following would be done from a flat file via a bulk import
INSERT INTO #BulkData
SELECT N'Thing1', N'Category1'
UNION
SELECT N'Thing2', N'Category1'
UNION
SELECT N'Thing3', N'Category2'
INSERT INTO #Categories
SELECT DISTINCT CategoryName
FROM #BulkData
WHERE CategoryName NOT IN (SELECT DISTINCT CategoryName
FROM #Categories)
INSERT INTO #Things
SELECT DISTINCT ThingName
FROM #BulkData
WHERE ThingName NOT IN (SELECT DISTINCT ThingName FROM #Things)
INSERT INTO #ThingsToCategories
SELECT ThingId, CategoryId
FROM #BulkData
INNER JOIN #Things ON #BulkData.ThingName = #Things.ThingName
INNER JOIN #Categories ON #BulkData.CategoryName = #Categories.CategoryName
SELECT * FROM #Categories
SELECT * FROM #Things
SELECT * FROM #ThingsToCategories
我遇到的一个问题是,在将数据插入 #ThingsToCategories
之前,可以访问 #Things
中的数据。
我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用?
像这样:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
这对几百万条记录有效吗?
我猜也可以降低日志记录级别?
- 我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用?像这样:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
答案是肯定的。来自 Documentation on Transactions:
A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.
事务具有以下四个标准属性,通常用首字母缩略词 ACID 表示。 tutorialspoint.com 上 SQL Transactions 上引用以下 link:
Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.
Consistency: ensures that the database properly changes states upon a successfully committed transaction.
Isolation: enables transactions to operate independently of and transparent to each other.
Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.
- 这是否适用于几百万个条目?
再一次,是的。条目的数量无关紧要。这次用我自己的话来说:
原子性:如果事务成功,事务中的所有操作将在事务完成时立即生效,即在提交事务时.如果事务中至少有一个操作失败,则所有操作都将回滚(换句话说,none 保留)。 交易中的操作量无关紧要。
隔离:其他事务不会看到其他事务的操作,除非它们被提交。
但是有不同的Transaction Isolation Levels。 SQL 服务器的默认值为 READ COMMITTED
:
Specifies that statements cannot read data that has been modified but not committed by other transactions. [...]
这是一个 trade-off 级别,用于平衡性能和一致性。理想情况下,您需要所有内容 SERIALIZABLE
(请参阅文档,太长以至于 copy/paste)。这种隔离级别以性能(-)换取一致性(+)。在很多情况下,READ COMMITTED
隔离级别就足够了,但您应该了解它的工作原理,并将其与您的事务应该如何工作相对于其他事务的完成进行比较。
另请注意,事务将锁定数据库对象(行、table、模式...),如果其他事务想要读取或修改这些对象(取决于类型的锁)。因此,最好将事务中的操作量保持在较低水平。但有时,交易只是做了很多事情,它们不能被分解。
以下简化模型可以很好地基于 bulk/set 在 #BulkData
中插入非规范化数据(欢迎提出改进建议):
IF OBJECT_ID('tempdb..#Things') IS NOT NULL
DROP TABLE #Things
IF OBJECT_ID('tempdb..#Categories') IS NOT NULL
DROP TABLE #Categories
IF OBJECT_ID('tempdb..#ThingsToCategories') IS NOT NULL
DROP TABLE #ThingsToCategories
IF OBJECT_ID('tempdb..#BulkData') IS NOT NULL
DROP TABLE #BulkData
CREATE TABLE #Things
(
ThingId INT IDENTITY(1,1) PRIMARY KEY,
ThingName NVARCHAR(255)
)
CREATE TABLE #Categories
(
CategoryId INT IDENTITY(1,1) PRIMARY KEY,
CategoryName NVARCHAR(255)
)
CREATE TABLE #ThingsToCategories
(
ThingId INT,
CategoryId INT
)
CREATE TABLE #BulkData
(
ThingName NVARCHAR(255),
CategoryName NVARCHAR(255)
)
-- the following would be done from a flat file via a bulk import
INSERT INTO #BulkData
SELECT N'Thing1', N'Category1'
UNION
SELECT N'Thing2', N'Category1'
UNION
SELECT N'Thing3', N'Category2'
INSERT INTO #Categories
SELECT DISTINCT CategoryName
FROM #BulkData
WHERE CategoryName NOT IN (SELECT DISTINCT CategoryName
FROM #Categories)
INSERT INTO #Things
SELECT DISTINCT ThingName
FROM #BulkData
WHERE ThingName NOT IN (SELECT DISTINCT ThingName FROM #Things)
INSERT INTO #ThingsToCategories
SELECT ThingId, CategoryId
FROM #BulkData
INNER JOIN #Things ON #BulkData.ThingName = #Things.ThingName
INNER JOIN #Categories ON #BulkData.CategoryName = #Categories.CategoryName
SELECT * FROM #Categories
SELECT * FROM #Things
SELECT * FROM #ThingsToCategories
我遇到的一个问题是,在将数据插入 #ThingsToCategories
之前,可以访问 #Things
中的数据。
我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用?
像这样:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
这对几百万条记录有效吗?
我猜也可以降低日志记录级别?
- 我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用?像这样:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
答案是肯定的。来自 Documentation on Transactions:
A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.
事务具有以下四个标准属性,通常用首字母缩略词 ACID 表示。 tutorialspoint.com 上 SQL Transactions 上引用以下 link:
Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.
Consistency: ensures that the database properly changes states upon a successfully committed transaction.
Isolation: enables transactions to operate independently of and transparent to each other.
Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.
- 这是否适用于几百万个条目?
再一次,是的。条目的数量无关紧要。这次用我自己的话来说:
原子性:如果事务成功,事务中的所有操作将在事务完成时立即生效,即在提交事务时.如果事务中至少有一个操作失败,则所有操作都将回滚(换句话说,none 保留)。 交易中的操作量无关紧要。
隔离:其他事务不会看到其他事务的操作,除非它们被提交。
但是有不同的Transaction Isolation Levels。 SQL 服务器的默认值为 READ COMMITTED
:
Specifies that statements cannot read data that has been modified but not committed by other transactions. [...]
这是一个 trade-off 级别,用于平衡性能和一致性。理想情况下,您需要所有内容 SERIALIZABLE
(请参阅文档,太长以至于 copy/paste)。这种隔离级别以性能(-)换取一致性(+)。在很多情况下,READ COMMITTED
隔离级别就足够了,但您应该了解它的工作原理,并将其与您的事务应该如何工作相对于其他事务的完成进行比较。
另请注意,事务将锁定数据库对象(行、table、模式...),如果其他事务想要读取或修改这些对象(取决于类型的锁)。因此,最好将事务中的操作量保持在较低水平。但有时,交易只是做了很多事情,它们不能被分解。