查找重复行但仅针对唯一列

Find duplicate rows but only for a unique column

使用 Oracle 12c。我正在尝试识别具有唯一 ref1_descr 字段的重复行。计数应按前 3 列分组(empliditem_typeacad_year) 它应该只计算 ref1_descr 一次。

比如这个结果不应该被pick,因为它属于同一个ref1_descr.

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 |      2020 |    1938427 |
| 00000010315 | 103201000000 |      2020 |    1938427 |
+-------------+--------------+-----------+------------+

这应该被拾取,因为唯一的 ref1_descr.

存在重复项
+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 |      2020 |    1941668 |
| 00000592537 | 104110123000 |      2020 |    1941164 |
+-------------+--------------+-----------+------------+

这将选取两个示例,但我需要它忽略第一个,因为这些行共享 ref1_descr.

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1

编辑

Appologies - 我应该在我原来的问题中包含一个预期的输出。

I think you want an extra condition in the having clause:

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);
+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 |      2020 |    1956315 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
+-------------+--------------+-----------+------------+

结果:

+-------------+--------------+-----------+----------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 |      2020 |        4 |
+-------------+--------------+-----------+----------+

我原以为 return 计数为 2。

我想你想在 having 子句中添加一个额外的条件:

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

实际上,如果描述不同,则至少有两行,因此您可以删除 `COUNT(*) 条件:

HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

编辑:

SELECT emplid, item_type, acad_year, COUNT(DISTINCT REF1_DESCR)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

这似乎是最简单的解决方案。

是关于 DISTINCT 的吗?参见第 10 行:

SQL> with test (emplid, item_type, acad_year, ref1_descr) as
  2    (select 27710, 104300113000 , 2020, 1956315 from dual union all
  3     select 27710, 104300113000 , 2020, 1946006 from dual union all
  4     select 27710, 104300113000 , 2020, 1946006 from dual union all
  5     select 27710, 104300113000 , 2020, 1946006 from dual
  6    )
  7  select emplid,
  8         item_Type,
  9         acad_year,
 10         count(distinct ref1_descr) cnt      --> DISTINCT here?
 11  from test
 12  group by emplid, item_type, acad_year
 13  having count(*) > 1
 14    and min(ref1_descr) <> max(ref1_descr);

    EMPLID      ITEM_TYPE  ACAD_YEAR        CNT
---------- -------------- ---------- ----------
     27710   104300113000       2020          2

SQL>

一个选项是使用 count() 分析函数,distinct ref1_descr 按其余三列进行分区:

with t as
(
select count(distinct ref1_descr) over (partition by emplid,  item_Type, acad_year) as cnt,
       t.*
  from tab t
)  
select emplid, item_type, acad_year, ref1_descr
  from t
 where cnt > 1 

为了 return 只有那两行

Demo