IN 语句与 PRIMARY KEY 不一致

Question

所以我有一个名为 temp 的简单 table 可以通过以下方式创建：

CREATE TABLE temp (value int, id int not null primary key);
INSERT INTO temp
VALUES(0,1),
      (0,2),
      (0,3),
      (0,4),
      (1,5),
      (1,6),
      (1,7),
      (1,8);

我有第二个 table temp2，可以通过以下方式创建：

CREATE TABLE temp (value int, id int);
INSERT INTO temp
VALUES(0,1),
      (0,2),
      (0,3),
      (0,4),
      (1,5),
      (1,6),
      (1,7),
      (1,8);

temp和temp2唯一的区别是id字段是temp中的主键，而temp2没有主键。我不确定如何，但我通过以下查询得到了不同的结果：

select * from temp
where id in (
    select id
    from (
        select id, ROW_NUMBER() over (partition by value order by value) rownum
        from temp
    ) s1
    where rownum = 1
)

这是临时的结果：

value       id
----------- -----------
0           1
0           2
0           3
0           4
1           5
1           6
1           7
1           8

这是我用 temp2 替换 temp 时得到的结果（正确的结果）：

value       id
----------- -----------
0           1
1           5

当运行最内层查询(s1)时，检索到预期结果：

id          rownum
----------- --------------------
1           1
2           2
3           3
4           4
5           1
6           2
7           3
8           4

当只是运行对in语句查询时，我也得到了预期的结果：

id
-----------
1
5

我不知道这可能是什么原因。这是一个错误吗？

注意：temp2 是使用简单的 select * into temp2 from temp 创建的。我是运行 SQL Server 2008。如果这是已知故障，我深表歉意。很难搜索它，因为它需要 in 语句。使用联接 的 "equivalent" 查询确实 在两个 table 上都产生了正确的结果。

编辑：dbfiddle 显示差异： Unexpected Results Expected Results

Answer 1

我无法具体回答您的问题，但更改 ORDER BY 可以解决问题。 partition by value order by value 不太明白，貌似是"fooling" SQL服务器的问题；当您按排序依据的相同值对行进行分区时，每一行都是 "row number 1" 因为它们可能都在开头。别忘了，table 是一个无序堆； even 当它有一个主键（集群与否）时。

如果您将 ORDER BY 改为 id，问题就会消失。

SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
                FROM (SELECT sq.id,
                             ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY sq.id) AS rownum
                      FROM temp2 sq) s1
                WHERE s1.rownum = 1);

事实上，将 ORDER BY 子句更改为其他任何内容都可以解决问题：

SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
                FROM (SELECT sq.id,
                             ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY (SELECT NULL)) AS rownum
                      FROM temp2 sq) s1
                WHERE s1.rownum = 1);

所以问题是您对 PARTITION BY 和 ORDER BY 子句使用相同的表达式（列）；意味着这些行中的任何一行都可以是行号 1，并且其中 none；因此全部返回。两者相同没有意义，所以它们应该不同。

不过，这个问题确实在 SQL Server 2017（我怀疑是 2019）中仍然存在，所以你可能无论如何都想向他们提出支持请求（但作为您使用的是 2008，不要指望它会得到修复，因为您的支持即将结束）。

由于可以删除评论，恕不另行通知我想添加@scsimon 的评论和我的回复：

scsimon: Interesting. Changing rownum = 2 gives expected results without changing order by. I think it's a bug.

Larnu: I agree at @scsimon. I suspect that changing the WHERE to s1.rownum = 2 effectively forces the data engine to actually determine the values of rownum, rather than assume every row is "equal"; as if that were the case none would be returned.
Even so, changing the WHERE to s1.rownum = 2 is still resigning to "return a random row", if the PARTITION BY and ORDER BY clauses are the same

IN 语句与 PRIMARY KEY 不一致

IN statement inconsistency with PRIMARY KEY

sql-server

primary-key

sql-server-2008