从 table A 中查找没有记录的行加入 table B
Find rows from table A without record in joined table B
我有两个 table 称为 Employee(列:Id、Name)和 DataSource(列:Id、EmployeeId、DataSourceName)。
每个员工都可以导出到零个或多个数据源并想象以下情况:
员工table
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Ivan |
| 2 | Adam |
+----+-------------+
数据源table:
+----+---------------------------------+
| Id | EmplpoyeeId | DataSourceName |
+----+---------------------------------+
| 1 | 1 | Source1 |
| 2 | 1 | Source2 |
| 3 | 2 | Source2 |
+----+---------------------------------+
我需要一个查询来确定哪个员工没有导出到 'Source1'(在这种情况下结果应该是 'Adam',因为他只导出到 'Source2')。
表 Employee 和 DataSource 可以有大量记录(数千条)。
有几种方法可以确定它,我们需要找到性能最好的一种。我想到的很少:
左连接:
SELECT Employee.Id
FROM Employee
LEFT JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
WHERE DataSource.Id IS NULL
内部 SELECT:
SELECT Employee.Id
FROM Employee
WHERE NOT EXIST (SELECT NULL FROM DataSource WHERE DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1')
异常:
SELECT Employee.ID
FROM Employee
EXCEPT
SELECT Employee.Id
FROM Employee
INNER JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
在开始对它们进行基准测试之前,我想问一下是否还有更多我应该考虑的方法(并且可能表现良好)。您能否分享您对最佳性能查询的想法。
如果您需要进一步阅读该主题,这篇文章很好;
http://www.sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
这表明 NOT EXISTS 将执行得更好,因为它不需要完成完整连接(执行 Anti-Semi 连接而不是 Semi Join);
"That’s the major difference between these two. When using the LEFT OUTER JOIN … IS NULL technique, SQL can’t tell that you’re only doing a check for nonexistance. Optimiser’s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join."
我有两个 table 称为 Employee(列:Id、Name)和 DataSource(列:Id、EmployeeId、DataSourceName)。
每个员工都可以导出到零个或多个数据源并想象以下情况:
员工table
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Ivan |
| 2 | Adam |
+----+-------------+
数据源table:
+----+---------------------------------+
| Id | EmplpoyeeId | DataSourceName |
+----+---------------------------------+
| 1 | 1 | Source1 |
| 2 | 1 | Source2 |
| 3 | 2 | Source2 |
+----+---------------------------------+
我需要一个查询来确定哪个员工没有导出到 'Source1'(在这种情况下结果应该是 'Adam',因为他只导出到 'Source2')。
表 Employee 和 DataSource 可以有大量记录(数千条)。
有几种方法可以确定它,我们需要找到性能最好的一种。我想到的很少:
左连接:
SELECT Employee.Id
FROM Employee
LEFT JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
WHERE DataSource.Id IS NULL
内部 SELECT:
SELECT Employee.Id
FROM Employee
WHERE NOT EXIST (SELECT NULL FROM DataSource WHERE DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1')
异常:
SELECT Employee.ID
FROM Employee
EXCEPT
SELECT Employee.Id
FROM Employee
INNER JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
在开始对它们进行基准测试之前,我想问一下是否还有更多我应该考虑的方法(并且可能表现良好)。您能否分享您对最佳性能查询的想法。
如果您需要进一步阅读该主题,这篇文章很好;
http://www.sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
这表明 NOT EXISTS 将执行得更好,因为它不需要完成完整连接(执行 Anti-Semi 连接而不是 Semi Join);
"That’s the major difference between these two. When using the LEFT OUTER JOIN … IS NULL technique, SQL can’t tell that you’re only doing a check for nonexistance. Optimiser’s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join."