查找重复行但仅针对唯一列
Find duplicate rows but only for a unique column
使用 Oracle 12c。我正在尝试识别具有唯一 ref1_descr 字段的重复行。计数应按前 3 列分组(emplid、item_type 和 acad_year) 它应该只计算 ref1_descr 一次。
比如这个结果不应该被pick,因为它属于同一个ref1_descr.
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 | 2020 | 1938427 |
| 00000010315 | 103201000000 | 2020 | 1938427 |
+-------------+--------------+-----------+------------+
这应该被拾取,因为唯一的 ref1_descr.
存在重复项
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 | 2020 | 1941668 |
| 00000592537 | 104110123000 | 2020 | 1941164 |
+-------------+--------------+-----------+------------+
这将选取两个示例,但我需要它忽略第一个,因为这些行共享 ref1_descr.
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1
编辑
Appologies - 我应该在我原来的问题中包含一个预期的输出。
I think you want an extra condition in the having clause:
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
MIN(REF1_DESCR) <> MAX(REF1_DESCR);
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 | 2020 | 1956315 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
+-------------+--------------+-----------+------------+
结果:
+-------------+--------------+-----------+----------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 | 2020 | 4 |
+-------------+--------------+-----------+----------+
我原以为 return 计数为 2。
我想你想在 having
子句中添加一个额外的条件:
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
MIN(REF1_DESCR) <> MAX(REF1_DESCR);
实际上,如果描述不同,则至少有两行,因此您可以删除 `COUNT(*) 条件:
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);
编辑:
SELECT emplid, item_type, acad_year, COUNT(DISTINCT REF1_DESCR)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);
这似乎是最简单的解决方案。
是关于 DISTINCT
的吗?参见第 10 行:
SQL> with test (emplid, item_type, acad_year, ref1_descr) as
2 (select 27710, 104300113000 , 2020, 1956315 from dual union all
3 select 27710, 104300113000 , 2020, 1946006 from dual union all
4 select 27710, 104300113000 , 2020, 1946006 from dual union all
5 select 27710, 104300113000 , 2020, 1946006 from dual
6 )
7 select emplid,
8 item_Type,
9 acad_year,
10 count(distinct ref1_descr) cnt --> DISTINCT here?
11 from test
12 group by emplid, item_type, acad_year
13 having count(*) > 1
14 and min(ref1_descr) <> max(ref1_descr);
EMPLID ITEM_TYPE ACAD_YEAR CNT
---------- -------------- ---------- ----------
27710 104300113000 2020 2
SQL>
一个选项是使用 count()
分析函数,distinct ref1_descr
按其余三列进行分区:
with t as
(
select count(distinct ref1_descr) over (partition by emplid, item_Type, acad_year) as cnt,
t.*
from tab t
)
select emplid, item_type, acad_year, ref1_descr
from t
where cnt > 1
为了 return 只有那两行
使用 Oracle 12c。我正在尝试识别具有唯一 ref1_descr 字段的重复行。计数应按前 3 列分组(emplid、item_type 和 acad_year) 它应该只计算 ref1_descr 一次。
比如这个结果不应该被pick,因为它属于同一个ref1_descr.
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 | 2020 | 1938427 |
| 00000010315 | 103201000000 | 2020 | 1938427 |
+-------------+--------------+-----------+------------+
这应该被拾取,因为唯一的 ref1_descr.
存在重复项+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 | 2020 | 1941668 |
| 00000592537 | 104110123000 | 2020 | 1941164 |
+-------------+--------------+-----------+------------+
这将选取两个示例,但我需要它忽略第一个,因为这些行共享 ref1_descr.
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1
编辑
Appologies - 我应该在我原来的问题中包含一个预期的输出。
I think you want an extra condition in the having clause:
SELECT emplid, item_type, acad_year, COUNT(*) FROM ps_item_sf GROUP BY emplid, item_type, acad_year HAVING COUNT(*) > 1 AND MIN(REF1_DESCR) <> MAX(REF1_DESCR);
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 | 2020 | 1956315 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
+-------------+--------------+-----------+------------+
结果:
+-------------+--------------+-----------+----------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 | 2020 | 4 |
+-------------+--------------+-----------+----------+
我原以为 return 计数为 2。
我想你想在 having
子句中添加一个额外的条件:
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
MIN(REF1_DESCR) <> MAX(REF1_DESCR);
实际上,如果描述不同,则至少有两行,因此您可以删除 `COUNT(*) 条件:
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);
编辑:
SELECT emplid, item_type, acad_year, COUNT(DISTINCT REF1_DESCR)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);
这似乎是最简单的解决方案。
是关于 DISTINCT
的吗?参见第 10 行:
SQL> with test (emplid, item_type, acad_year, ref1_descr) as
2 (select 27710, 104300113000 , 2020, 1956315 from dual union all
3 select 27710, 104300113000 , 2020, 1946006 from dual union all
4 select 27710, 104300113000 , 2020, 1946006 from dual union all
5 select 27710, 104300113000 , 2020, 1946006 from dual
6 )
7 select emplid,
8 item_Type,
9 acad_year,
10 count(distinct ref1_descr) cnt --> DISTINCT here?
11 from test
12 group by emplid, item_type, acad_year
13 having count(*) > 1
14 and min(ref1_descr) <> max(ref1_descr);
EMPLID ITEM_TYPE ACAD_YEAR CNT
---------- -------------- ---------- ----------
27710 104300113000 2020 2
SQL>
一个选项是使用 count()
分析函数,distinct ref1_descr
按其余三列进行分区:
with t as
(
select count(distinct ref1_descr) over (partition by emplid, item_Type, acad_year) as cnt,
t.*
from tab t
)
select emplid, item_type, acad_year, ref1_descr
from t
where cnt > 1
为了 return 只有那两行