查找重复行但仅针对唯一列

Question

使用 Oracle 12c。我正在尝试识别具有唯一 ref1_descr 字段的重复行。计数应按前 3 列分组（emplid、item_type 和 acad_year) 它应该只计算 ref1_descr 一次。

比如这个结果不应该被pick，因为它属于同一个ref1_descr.

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 |      2020 |    1938427 |
| 00000010315 | 103201000000 |      2020 |    1938427 |
+-------------+--------------+-----------+------------+

这应该被拾取，因为唯一的 ref1_descr.

存在重复项

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 |      2020 |    1941668 |
| 00000592537 | 104110123000 |      2020 |    1941164 |
+-------------+--------------+-----------+------------+

这将选取两个示例，但我需要它忽略第一个，因为这些行共享 ref1_descr.

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1

编辑

Appologies - 我应该在我原来的问题中包含一个预期的输出。

I think you want an extra condition in the having clause:

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 |      2020 |    1956315 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
+-------------+--------------+-----------+------------+

结果：

+-------------+--------------+-----------+----------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 |      2020 |        4 |
+-------------+--------------+-----------+----------+

我原以为 return 计数为 2。

Answer 1

我想你想在 having 子句中添加一个额外的条件：

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

实际上，如果描述不同，则至少有两行，因此您可以删除 `COUNT(*) 条件：

HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

编辑：

SELECT emplid, item_type, acad_year, COUNT(DISTINCT REF1_DESCR)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

这似乎是最简单的解决方案。

Answer 2

是关于 DISTINCT 的吗？参见第 10 行：

SQL> with test (emplid, item_type, acad_year, ref1_descr) as
  2    (select 27710, 104300113000 , 2020, 1956315 from dual union all
  3     select 27710, 104300113000 , 2020, 1946006 from dual union all
  4     select 27710, 104300113000 , 2020, 1946006 from dual union all
  5     select 27710, 104300113000 , 2020, 1946006 from dual
  6    )
  7  select emplid,
  8         item_Type,
  9         acad_year,
 10         count(distinct ref1_descr) cnt      --> DISTINCT here?
 11  from test
 12  group by emplid, item_type, acad_year
 13  having count(*) > 1
 14    and min(ref1_descr) <> max(ref1_descr);

    EMPLID      ITEM_TYPE  ACAD_YEAR        CNT
---------- -------------- ---------- ----------
     27710   104300113000       2020          2

SQL>

Answer 3

一个选项是使用 count() 分析函数，distinct ref1_descr 按其余三列进行分区：

with t as
(
select count(distinct ref1_descr) over (partition by emplid,  item_Type, acad_year) as cnt,
       t.*
  from tab t
)  
select emplid, item_type, acad_year, ref1_descr
  from t
 where cnt > 1

为了 return 只有那两行

Demo

查找重复行但仅针对唯一列

Find duplicate rows but only for a unique column

sql

oracle

oracle12c