Select 行每个项目都有缺失值
Select rows with missing value for each item
我正在尝试生成报告以查找 table 中的行,其中有错误,缺少项目订单。即
ID Item Order
----------------
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 2
7 C 3
8 D 1
请注意,项目 "C" 缺少订单索引为“1”的行。我需要找到所有缺少索引“1”并以“2”或其他开头的项目。
我想到的一种方法是:
SELECT DIstinct(Item) FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1)
但令我惊讶的是,即使我知道我有这些物品,它也没有给我任何结果。我猜,它首先 selects 不在 sub-select 中的项目,然后区分它们,但我想要的是 select 不同的项目,并找出其中哪些没有线"Order = 1".
此外,这段代码要执行7万多行,所以它必须是可行的(我能想到的另一种方式是CURSOR,但那样会很慢,而且可能不可行table?).
此致,
橡木
您可以使用 NOT EXISTS
:
SELECT DISTINCT(i1.Item) FROM ITEMS i1
WHERE NOT EXISTS
(
SELECT 1 FROM Items i2
WHERE i1.Item = i2.Item AND i2.[Order] = 1
)
NOT IN
有问题,值得一读:
http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL values
因为这个...
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.
您可以使用 HAVING 子句查找丢失的订单。 HAVING 允许您过滤聚合记录。在这种情况下,我们将过滤最小订单超过 1 的项目。
与 WHERE 子句中的子查询相比,这种方法的好处是 SQL 服务器不必多次重新运行 子查询。它应该 运行 在大型数据集上更快。
/* HAVING allows us to filter on aggregated records.
*/
WITH SampleData AS
(
/* This CTE creates some sample records
* to experiment with.
*/
SELECT
r.*
FROM
(
VALUES
( 1, 'A', 1),
( 2, 'A', 2),
( 3, 'A', 3),
( 4, 'B', 1),
( 5, 'B', 2),
( 6, 'C', 2),
( 7, 'C', 3),
( 8, 'D', 1)
) AS r(ID, Item, [Order])
)
SELECT
Item,
COUNT([Order]) AS Count_Order,
MIN([Order]) AS Min_Order
FROM
SampleData
GROUP BY
Item
HAVING
MIN([Order]) > 1
;
想法不错,但 NOT IN 的一个小细节可能会有问题。也就是说,如果 NOT IN 之后的子查询产生任何 NULL,则 NOT IN 被评估为假。这可能是您得不到结果的原因。您可以尝试 NOT EXISTS,就像在其他答案中一样,或者只是
SELECT DISTINCT Item FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL)
您的查询应该有效。问题可能是 Item
可能是 NULL
。所以试试这个:
SELECT Distinct(Item)
FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL);
这就是 NOT EXISTS
优于 NOT IN
的原因。
不过,我会使用聚合查询来执行此操作:
select item
from items
group by item
having sum(case when [order] = 1 then 1 else 0 end) = 0;
我正在尝试生成报告以查找 table 中的行,其中有错误,缺少项目订单。即
ID Item Order
----------------
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 2
7 C 3
8 D 1
请注意,项目 "C" 缺少订单索引为“1”的行。我需要找到所有缺少索引“1”并以“2”或其他开头的项目。 我想到的一种方法是:
SELECT DIstinct(Item) FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1)
但令我惊讶的是,即使我知道我有这些物品,它也没有给我任何结果。我猜,它首先 selects 不在 sub-select 中的项目,然后区分它们,但我想要的是 select 不同的项目,并找出其中哪些没有线"Order = 1".
此外,这段代码要执行7万多行,所以它必须是可行的(我能想到的另一种方式是CURSOR,但那样会很慢,而且可能不可行table?).
此致,
橡木
您可以使用 NOT EXISTS
:
SELECT DISTINCT(i1.Item) FROM ITEMS i1
WHERE NOT EXISTS
(
SELECT 1 FROM Items i2
WHERE i1.Item = i2.Item AND i2.[Order] = 1
)
NOT IN
有问题,值得一读:
http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values
因为这个...
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival it in terms of performance, when all other variables are the same, but all of the other methods introduce either performance problems or other challenges.
您可以使用 HAVING 子句查找丢失的订单。 HAVING 允许您过滤聚合记录。在这种情况下,我们将过滤最小订单超过 1 的项目。
与 WHERE 子句中的子查询相比,这种方法的好处是 SQL 服务器不必多次重新运行 子查询。它应该 运行 在大型数据集上更快。
/* HAVING allows us to filter on aggregated records.
*/
WITH SampleData AS
(
/* This CTE creates some sample records
* to experiment with.
*/
SELECT
r.*
FROM
(
VALUES
( 1, 'A', 1),
( 2, 'A', 2),
( 3, 'A', 3),
( 4, 'B', 1),
( 5, 'B', 2),
( 6, 'C', 2),
( 7, 'C', 3),
( 8, 'D', 1)
) AS r(ID, Item, [Order])
)
SELECT
Item,
COUNT([Order]) AS Count_Order,
MIN([Order]) AS Min_Order
FROM
SampleData
GROUP BY
Item
HAVING
MIN([Order]) > 1
;
想法不错,但 NOT IN 的一个小细节可能会有问题。也就是说,如果 NOT IN 之后的子查询产生任何 NULL,则 NOT IN 被评估为假。这可能是您得不到结果的原因。您可以尝试 NOT EXISTS,就像在其他答案中一样,或者只是
SELECT DISTINCT Item FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL)
您的查询应该有效。问题可能是 Item
可能是 NULL
。所以试试这个:
SELECT Distinct(Item)
FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL);
这就是 NOT EXISTS
优于 NOT IN
的原因。
不过,我会使用聚合查询来执行此操作:
select item
from items
group by item
having sum(case when [order] = 1 then 1 else 0 end) = 0;