Select 行每个项目都有缺失值

Question

我正在尝试生成报告以查找 table 中的行，其中有错误，缺少项目订单。即

ID   Item  Order
----------------
 1     A       1
 2     A       2
 3     A       3
 4     B       1
 5     B       2
 6     C       2
 7     C       3
 8     D       1

请注意，项目 "C" 缺少订单索引为“1”的行。我需要找到所有缺少索引“1”并以“2”或其他开头的项目。我想到的一种方法是：

SELECT DIstinct(Item) FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1)

但令我惊讶的是，即使我知道我有这些物品，它也没有给我任何结果。我猜，它首先 selects 不在 sub-select 中的项目，然后区分它们，但我想要的是 select 不同的项目，并找出其中哪些没有线"Order = 1".

此外，这段代码要执行7万多行，所以它必须是可行的（我能想到的另一种方式是CURSOR，但那样会很慢，而且可能不可行table?).

此致，

橡木

Answer 1

您可以使用 NOT EXISTS:

SELECT DISTINCT(i1.Item) FROM ITEMS i1
WHERE NOT EXISTS
(
    SELECT 1 FROM Items i2 
    WHERE i1.Item = i2.Item AND i2.[Order] = 1
)

NOT IN 有问题，值得一读：

http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values

因为这个...

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival it in terms of performance, when all other variables are the same, but all of the other methods introduce either performance problems or other challenges.

Answer 2

您可以使用 HAVING 子句查找丢失的订单。 HAVING 允许您过滤聚合记录。在这种情况下，我们将过滤最小订单超过 1 的项目。

与 WHERE 子句中的子查询相比，这种方法的好处是 SQL 服务器不必多次重新运行子查询。它应该运行在大型数据集上更快。

Example

/* HAVING allows us to filter on aggregated records. 
 */
WITH SampleData AS
    (
        /* This CTE creates some sample records 
         * to experiment with.
         */
        SELECT
            r.*
        FROM
            (
                VALUES
                    ( 1,     'A',       1),
                    ( 2,     'A',       2),
                    ( 3,     'A',       3),
                    ( 4,     'B',       1),
                    ( 5,     'B',       2),
                    ( 6,     'C',       2),
                    ( 7,     'C',       3),
                    ( 8,     'D',       1)
            ) AS r(ID, Item, [Order])
    )
SELECT
    Item,
    COUNT([Order])        AS Count_Order,
    MIN([Order])        AS Min_Order
FROM
    SampleData
GROUP BY
    Item
HAVING 
    MIN([Order]) > 1
;

Answer 3

想法不错，但 NOT IN 的一个小细节可能会有问题。也就是说，如果 NOT IN 之后的子查询产生任何 NULL，则 NOT IN 被评估为假。这可能是您得不到结果的原因。您可以尝试 NOT EXISTS，就像在其他答案中一样，或者只是

SELECT DISTINCT Item FROM ITEMS as I WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL)

Answer 4

您的查询应该有效。问题可能是 Item 可能是 NULL。所以试试这个：

SELECT Distinct(Item)
FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL);

这就是 NOT EXISTS 优于 NOT IN 的原因。

不过，我会使用聚合查询来执行此操作：

select item
from items
group by item
having sum(case when [order] = 1 then 1 else 0 end) = 0;

Select 行每个项目都有缺失值

Select rows with missing value for each item

sql

sql-server

distinct