基于集合的非规范化数据批量导入规范化 SQL Server 2014 数据库表

Question

以下简化模型可以很好地基于 bulk/set 在 #BulkData 中插入非规范化数据（欢迎提出改进建议）：

IF OBJECT_ID('tempdb..#Things') IS NOT NULL 
   DROP TABLE #Things

IF OBJECT_ID('tempdb..#Categories') IS NOT NULL 
   DROP TABLE #Categories

IF OBJECT_ID('tempdb..#ThingsToCategories') IS NOT NULL 
   DROP TABLE #ThingsToCategories

IF OBJECT_ID('tempdb..#BulkData') IS NOT NULL 
   DROP TABLE #BulkData

CREATE TABLE #Things
(
    ThingId INT IDENTITY(1,1) PRIMARY KEY,
    ThingName NVARCHAR(255)
)

CREATE TABLE #Categories
(
    CategoryId INT IDENTITY(1,1) PRIMARY KEY,
    CategoryName NVARCHAR(255)
)

CREATE TABLE #ThingsToCategories
(
    ThingId INT,
    CategoryId INT
)

CREATE TABLE #BulkData
(
    ThingName NVARCHAR(255),
    CategoryName NVARCHAR(255)
)

-- the following would be done from a flat file via a bulk import 
INSERT INTO #BulkData
    SELECT N'Thing1', N'Category1'
        UNION 
    SELECT N'Thing2', N'Category1'
        UNION 
    SELECT N'Thing3', N'Category2'

INSERT INTO #Categories
    SELECT DISTINCT CategoryName 
    FROM #BulkData 
    WHERE CategoryName NOT IN (SELECT DISTINCT CategoryName 
                               FROM #Categories)

INSERT INTO #Things
    SELECT DISTINCT ThingName 
    FROM #BulkData 
    WHERE ThingName NOT IN (SELECT DISTINCT ThingName FROM #Things)

INSERT INTO #ThingsToCategories
    SELECT ThingId, CategoryId
    FROM #BulkData 
    INNER JOIN #Things ON #BulkData.ThingName = #Things.ThingName
    INNER JOIN #Categories ON #BulkData.CategoryName = #Categories.CategoryName

SELECT * FROM #Categories
SELECT * FROM #Things
SELECT * FROM #ThingsToCategories

我遇到的一个问题是，在将数据插入 #ThingsToCategories 之前，可以访问 #Things 中的数据。

我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用？

像这样：

BEGIN TRANSACTION X
 -- insert into all normalised tables
COMMIT TRANSACTION X

这对几百万条记录有效吗？

我猜也可以降低日志记录级别？

Answer 1

我能否将上述内容包装在事务中 (?) 以便仅在整个批量导入完成后才使 #Things 可用？像这样：

BEGIN TRANSACTION X
 -- insert into all normalised tables
COMMIT TRANSACTION X

答案是肯定的。来自 Documentation on Transactions:

A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.

事务具有以下四个标准属性，通常用首字母缩略词 ACID 表示。 tutorialspoint.com 上 SQL Transactions 上引用以下 link：

Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.

Consistency: ensures that the database properly changes states upon a successfully committed transaction.

Isolation: enables transactions to operate independently of and transparent to each other.

Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.

这是否适用于几百万个条目？

再一次，是的。条目的数量无关紧要。这次用我自己的话来说：

原子性：如果事务成功，事务中的所有操作将在事务完成时立即生效，即在提交事务时.如果事务中至少有一个操作失败，则所有操作都将回滚（换句话说，none 保留）。 交易中的操作量无关紧要。
隔离：其他事务不会看到其他事务的操作，除非它们被提交。

但是有不同的Transaction Isolation Levels。 SQL 服务器的默认值为 READ COMMITTED:

Specifies that statements cannot read data that has been modified but not committed by other transactions. [...]

这是一个 trade-off 级别，用于平衡性能和一致性。理想情况下，您需要所有内容 SERIALIZABLE（请参阅文档，太长以至于 copy/paste）。这种隔离级别以性能（-）换取一致性（+）。在很多情况下，READ COMMITTED 隔离级别就足够了，但您应该了解它的工作原理，并将其与您的事务应该如何工作相对于其他事务的完成进行比较。

另请注意，事务将锁定数据库对象（行、table、模式...），如果其他事务想要读取或修改这些对象（取决于类型的锁）。因此，最好将事务中的操作量保持在较低水平。但有时，交易只是做了很多事情，它们不能被分解。

基于集合的非规范化数据批量导入规范化 SQL Server 2014 数据库表

Set-based bulk import of denormalized data into normalized SQL Server 2014 database tables

tsql

sql-server

sql-server-2014