SQL 服务器:比较串联列的无序集是否相等

SQL Server: Comparing Concatenated Columns for Equality of Unordered Set

我有两个 table 带有 ID 列(将 table 匹配在一起)和串联列,其中串联值以任意顺序排列。我想比较一下,看看这两列是否包含完全相同的项目(以任何顺序),如果不包含则输出 ID。

示例:

Table 1

PersonID    Products
1           Apple|Pear|Orange
2           Flour|Apple|Butter
3           Apple
4           Banana|Cashews
5           Juice|Crackers|Banana|Cashews
6           Cashews

Table 2

PersonID    Products
1           Orange|Apple|Pear
2           Flour|Apple|Butter
3           Apple|Banana
4           Banana
5           Crackers|Juice|Banana|Cashews
6           Pear|Crackers

我想获得所有产品在 table 1 和 table 2 之间不是相同集合(任何顺序)的所有 personids。所以在这种情况下是: 人 3(额外产品)、人 4(缺少产品)和人 6(不同产品)。

我当前的查询错误地选择了第 1 个人和第 5 个人,因为他们订购的产品不同。

我现在的查询是这样的:

select t1.personid, t1.products as t1products, t2.products as t2products
from table1 t1 (nolock)
inner join table2 t2 (nolock) on t1.personid = t2.personid
where t1.products != t2.products

我也有预连接形式的数据,每个 personid 有多行(每个产品一行,然后分别在两个 tables 中),如果这更有帮助的话 - 我还没有想出了如何按字母顺序连接它们,所以解决这个问题也能解决这个问题。

编辑(澄清): 未连接的数据如下所示:

Table 1

PersonID    Product
1           Apple
1           Pear
1           Orange
2           Flour
2           Apple
2           Butter
3           Apple

Table 2

PersonID    Product
1           Orange
1           Apple
1           Pear
2           Flour
2           Apple
2           Butter
3           Apple
3           Banana

我使用 STUFF 通过 PersonID 连接它们。

如果您以每行一个产品的形式拥有它,那么您可以查询在相反 table 中不匹配产品和 personid 的所有结果。然后对另一个 table 做同样的事情并合并结果:

SELECT t1.personid, t1.product, '2' AS [Not Found In Table]
FROM table1 t1 
LEFT JOIN table2 t2 ON t1.personid = t2.personid AND t1.product = t2.product
WHERE t2.product IS NULL
UNION
SELECT t2.personid, t2.product,  '1' AS [Not Found In Table]
FROM table2 t2 
LEFT JOIN table1 t1 ON t2.personid = t1.personid AND t2.product = t1.product
WHERE t1.product IS NULL

您可以将其包装在 select 和 CONCAT 结果中,以便为您提供一个很好的列表,列出每个不匹配的人 table 缺少的内容。

测试数据

Declare @t1 TABLE (PersonID INT, Products Varchar(200))
INSERT INTO @t1 VALUES
(1   ,'Apple|Pear|Orange'),
(2   ,'Flour|Apple|Butter'),
(3   ,'Apple'),
(4   ,'Banana|Cashews'),
(5   ,'Juice|Crackers|Banana|Cashews'),
(6   ,'Cashews');

Declare @t2 TABLE (PersonID INT, Products Varchar(200))
INSERT INTO @t2 VALUES
(1   ,'Orange|Apple|Pear'),
(2   ,'Flour|Apple|Butter'),
(3   ,'Apple|Banana'),
(4   ,'Banana'),
(5   ,'Crackers|Juice|Banana|Cashews'),
(6   ,'Pear|Crackers');

查询

WITH Table1 AS (
SELECT  PersonID
        ,Split.a.value('.', 'VARCHAR(100)') Products
FROM   
    (SELECT PersonID
            ,Cast ('<X>' + Replace(Products, '|', '</X><X>') + '</X>' AS XML) AS Data
    FROM    @t1
    ) AS t CROSS APPLY Data.nodes ('/X') AS Split(a) 
), Table2 AS (
SELECT  PersonID
        ,Split.a.value('.', 'VARCHAR(100)') Products
FROM   
    (SELECT PersonID
            ,Cast ('<X>' + Replace(Products, '|', '</X><X>') + '</X>' AS XML) AS Data
    FROM    @t2
    ) AS t CROSS APPLY Data.nodes ('/X') AS Split(a) 
)
SELECT t1.PersonID 
FROM Table1 t1
WHERE NOT EXISTS (SELECT 1 
                  FROM Table2 t2
                  WHERE t1.PersonID = t2.PersonID 
                  AND t1.Products = t2.Products)
UNION  
SELECT t2.PersonID 
FROM Table2 t2
WHERE NOT EXISTS (SELECT 1 
                  FROM Table1 t1
                  WHERE t1.PersonID = t2.PersonID 
                  AND t1.Products = t2.Products)

我只是在完全连接时将行配对。这样,如果出现配对,则表示产品匹配,如果没有,则表示有问题。 所以我希望这个简单的查询也能解决你的问题:

SELECT DISTINCT PersonID FROM (
    SELECT * FROM table1 
    UNION ALL 
    SELECT * FROM table2
) d 
GROUP BY PersonID, Products 
HAVING COUNT(*) != 2