使用 `newid()` 将单行子查询与列交叉连接会导致每一行具有不同的 GUID

Cross joining a single row subquery with a column using `newid()` results in every row having a different GUID

摘要

类似

的查询
SELECT *
       FROM elbat t
            CROSS JOIN (SELECT newid() guid) x;

in SQL 服务器生成的结果中每一行都有不同的 GUID,而不是整个结果中每一行都有一个共同的 GUID。如何为结果的所有行设置一个 GUID(不使用变量或(临时)table)?

设置

考虑 SQL 服务器数据库中的以下 table。

CREATE TABLE elbat
             (id integer);

INSERT INTO elbat
            VALUES (1);
INSERT INTO elbat
            VALUES (2);
INSERT INTO elbat
            VALUES (3);
INSERT INTO elbat
            VALUES (4);
INSERT INTO elbat
            VALUES (5);
INSERT INTO elbat
            VALUES (6);

让我们运行下面的查询。

SELECT *
       FROM elbat t
            CROSS JOIN (SELECT newid() guid) x;

这里有一个 db<>fiddle and an SQL Fiddle 可以看到它的实际效果。

问题

令我惊讶的是,结果中每一行都有不同的 GUID。例如:

 id | guid                                
 -: | :-----------------------------------
  1 | ad146af7-9ebd-4521-a440-47c7dea6a1d4
  2 | ce24fbb8-af64-480c-8c46-1e03187642c5
  3 | 14509451-9b1d-49e9-8da2-c691947ae805
  4 | 37a86339-e352-486f-b541-92798540599f
  5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  6 | d491275b-4ebb-461b-94e2-93b47e7d2348

这让我很困惑。我希望在整个结果集中每一行都具有相同的 GUID。例如:

 id | guid                                
 -: | :-----------------------------------
  1 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  2 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  3 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  4 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  6 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8

我当然明白,GUID 会因调用而异。但我不明白为什么当我交叉加入单个 GUID 并且没有将 newid() 调用放入投影列列表时,它会逐行变化。

附加信息

我对 fiddle 平台上的所有可用版本以及本地 Microsoft SQL Server 2014(12.0.2269.0 (X64),Express)进行了尝试。结果到处都是一样的(当然只有 GUID 改变了)。

质疑我对联接的理解,我还使用等效的设置和查询对其他 DBMS 进行了一些测试。

所有这些其他 DBMS 都产生了我实际期望的结果 -- 结果的所有行中都有一个公共 GUID。

我也试着改变查询。不过没用。

查找文档,我找不到任何地方涵盖此行为。

问题

为什么 SQL 服务器的行为不同于所有其他(测试过的)DBMS(在这方面)并且有没有办法获得预期的结果(不使用变量或(临时)table)?

(注意:我知道我可以使用一个用 newid() 初始化的变量并将其放在投影列中。但问题实际上是在我试图避免使用这样的变量时出现的。我实际上想看看对于 .)

的无变量、仅查询解决方案

我对 SQL 服务器的行为感到非常惊讶。我没有意识到它会一遍又一遍地重新评估此类子查询。我怀疑原因是优化:cross join中的表达式实际上被移动到读取数据的节点,所以函数被反复调用。

无论如何,我认为这是错误的。这样的优化应该认识到 newid() 是一个易变函数并相应地进行调整。

经过一些实验,我发现子查询中的 order by 确实会导致它只被评估一次。所以,这就是你想要的:

select *
from elbat cross join
     (select top (1) newid() as guid
      order by guid
     ) x;

符合您预期的另一个版本:

select *
from elbat cross join
     (select max(newid()) as guid
     ) x;

顺便说一句,后一个版本也适用于 select

select *, (select max(newid())) as guid
from elbat ;

在这种情况下,我希望子查询对每一行进行一次评估。去图吧。

这里有一个 link Connect 问题的存档(唉,现在已经不存在了),讨论是否要 "fix" 这种行为。转载于此以保留资料。这是 SQL 开发团队关于关闭报告问题的反馈 "Won't Fix":

“Closing the loop . . . I've discussed this question with the Dev team. And eventually we have decided not to change current behavior, for the following reasons:

1) The optimizer does not guarantee timing or number of executions of scalar functions. This is a long-established tenet. It's the fundamental 'leeway' that allows the optimizer enough freedom to gain significant improvements in query-plan execution.

2) This "once-per-row behavior" is not a new issue, although it's not widely discussed. We started to tweak its behavior back in the Yukon release. But it's quite hard to pin down precisely, in all cases, exactly what it means! For example, does it a apply to interim rows calculated 'on the way' to the final result? - in which case it clearly depends on the plan chosen. Or does it apply only to the rows that will eventually appear in the completed result? - there's a nasty recursion going on here, as I'm sure you'll agree!

3) As I mentioned earlier, we default to "optimize performance" - which is good for 99% of cases. The 1% of cases where it might change results are fairly easy to spot - side-effecting 'functions' such as NEWID - and easy to 'fix' (trading perf, as a consequence). This default to "optimize performance" again, is long-established, and accepted. (Yes, it's not the stance chosen by compilers for conventional programming languages, but so be it).

So, our recommendations are:

a) Avoid reliance on non-guaranteed timing and number-of-executions semantics.

b) Avoid using NEWID() deep in table expressions.

c) Use OPTION to force a particular behavior (trading perf)

Hope this explanation helps clarify our reasons for closing this bug as "won't fix".

Thanks,

Jim”

https://web.archive.org/web/20160626085155/https://connect.microsoft.com/SQLServer/feedbackdetail/view/350485/bug-with-newid-and-table-expressions

A cte(无递归)只是一种使带有子查询的查询对我们人类更具可读性的方法。 SQL 服务器似乎太聪明了,无论我们如何编写查询,都只是添加了一个计算列。但是这样,使用外部连接,我欺骗了他并使用嵌套循环让他加入:

WITH x (guid) AS (
  SELECT newid()
)
SELECT *
FROM elbat t
  RIGHT JOIN x ON x.guid IS NOT NULL;