查找两列中具有重复值的行,其中一列中至少有一个值是特定值

Find rows with duplicate values in two columns where at least one value in one column is a specific value

所以,我不确定它是如何工作的,而且我没有通过谷歌搜索找到足够的答案(可能没有使用正确的流行语)。所以它来了: 假设我有一个像这样的 table,我们称它为 persons

ID Name First Name Country
1 Doe John USA
2 Doe John UK
3 Doe John Brazil
4 Meyer Julia Germany
5 Meyer Julia Austria
6 Picard Jean-Luc France
7 Picard Jean-Luc UK
8 Nakamura Hikaro Japan

好的,现在我想 select 所有具有相同名字和名字并且至少有一个国家是英国的行。所以我的结果集应该是这样的。

ID Name First_Name Country
1 Doe John USA
2 Doe John UK
3 Doe John Brazil
6 Picard Jean-Luc France
7 Picard Jean-Luc UK

我的意思是,我知道如何像这样找到一般的双打

SELECT *
FROM persons  p1
JOIN (SELECT NAME, FIRST_NAME, count(*) FROM PERSONS  
GROUP BY FIRST_NAME, NAME having count(*) >1) p2
ON p1.NAME = p2.NAME 
AND p1.FIRST_NAME = p2.FIRST_NAME; 

但这也导致 Julia Meyer 出现在那里,我不想让她出现。

有什么建议吗?

有条件地计算感兴趣的国家/地区

SELECT *
FROM persons  p1
JOIN (
   SELECT NAME, FIRST_NAME
   FROM PERSONS  
   GROUP BY FIRST_NAME, NAME 
   having count(*) > 1 and count(case country = 'UK' then 1 end) >= 1
) p2 ON p1.NAME = p2.NAME 
   AND p1.FIRST_NAME = p2.FIRST_NAME; 

使用EXISTS:

SELECT p1.*
FROM persons p1
WHERE EXISTS (
  SELECT *
  FROM persons p2
  WHERE p2.ID <> p1.ID
    AND p2.Name = p1.Name AND p2.FirstName = p1.FirstName 
    AND 'UK' IN (p1.Country, p2.Country)
);

参见demo

有 2 个条件,1) 包含 'UK' 和 2) count(1) > 1。因此,下面的查询将有效。

SELECT p1.*
FROM persons p1
WHERE (p1.NAME, p1.FIRST_NAME) IN (
    SELECT p2.NAME, p2.FIRST_NAME
    FROM persons p2
    WHERE p2.Country = 'UK') AND
  AND (p1.NAME, p1.FIRST_NAME) IN (
    SELECT p3.NAME, p3.FIRST_NAME
    FROM persons p3
    GROUP BY p3.NAME, p3.FIRST_NAME
    HAVING COUNT(1) > 1)

I want to select all the rows that have the same name and first name and where at least one country is the UK.

您可以使用带条件聚合的 COUNT 分析函数(并在单个 table 扫描中解决问题,无需任何 self-joins):

SELECT id, name, first_name, country
FROM   (
  SELECT t.*,
         COUNT(CASE country WHEN 'UK' THEN 1 END)
           OVER (PARTITION BY name, first_name) AS cnt
  FROM  table_name t
)
WHERE  cnt > 0;

其中,对于示例数据:

CREATE TABLE table_name (ID, Name, First_Name, Country) AS
SELECT 1, 'Doe',    'John',     'USA' FROM DUAL UNION ALL
SELECT 2, 'Doe',    'John',     'UK' FROM DUAL UNION ALL
SELECT 3, 'Doe',    'John',     'Brazil' FROM DUAL UNION ALL
SELECT 4, 'Meyer',  'Julia',    'Germany' FROM DUAL UNION ALL
SELECT 5, 'Meyer',  'Julia',    'Austria' FROM DUAL UNION ALL
SELECT 6, 'Picard',     'Jean-Luc',     'France' FROM DUAL UNION ALL
SELECT 7, 'Picard',     'Jean-Luc',     'UK' FROM DUAL UNION ALL
SELECT 8, 'Nakamura',   'Hikaro',   'Japan' FROM DUAL;

输出:

ID NAME FIRST_NAME COUNTRY
1 Doe John USA
2 Doe John UK
3 Doe John Brazil
6 Picard Jean-Luc France
7 Picard Jean-Luc UK

如果你想找到至少有一个是 UK 的重复行,那么还要计算分区中的所有行:

SELECT id, name, first_name, country
FROM   (
  SELECT t.*,
         COUNT(CASE country WHEN 'UK' THEN 1 END)
           OVER (PARTITION BY name, first_name) AS cnt_uk,
         COUNT(*)
           OVER (PARTITION BY name, first_name) AS cnt_all
  FROM  table_name t
)
WHERE  cnt_uk > 0
AND    cnt_all >= 2;

这为示例数据提供了相同的输出。

db<>fiddle here