无法使用 PostgreSQL 删除重复行

Question

我的查询删除了整个 table 而不是重复的行。视频为证：https://streamable.com/3s843

create table customer_info (
    id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    phone_number VARCHAR(50)
);
insert into customer_info (id, first_name, last_name, phone_number) values
(1, 'Kevin', 'Binley', '600-449-1059'),
(1, 'Kevin', 'Binley', '600-449-1059'),
(2, 'Skippy', 'Lam', '779-278-0889');

我的查询：

with t1 as (
select *, row_number() over(partition by id order by id) as rn
from customer_info)

delete
from customer_info 
where id in (select id from t1 where rn > 1);

Answer 1

您的查询将从每组重复项中删除所有行（因为所有重复项都共享相同的 id，您 select - 这就是@wildplasser用微妙的注释暗示）并且只有最初唯一的行才能存活。因此，如果它 "deletes the whole table"，则意味着根本没有唯一行。

在您的查询中，重复项仅由 (id) 定义，而不是如您的标题所暗示的那样由整行定义。

无论哪种方式，都有一个非常简单的解决方案：

DELETE FROM customer_info c
WHERE  EXISTS (
   SELECT FROM customer_info c1
   WHERE  ctid < c.ctid
   AND    c1 = c  -- comparing whole rows
   );

由于您处理 完全相同的行 ，剩下的区分它们的方法是内部元组 ID ctid.

我的查询删除了所有行，其中存在具有较小 ctid 的相同行。因此，每组骗子中只有 "first" 行幸存下来。

值得注意的是，在这种情况下， NULL 值 比较等于 - 这很可能是所需的。 The manual:

The SQL specification requires row-wise comparison to return NULL if the result depends on comparing two NULL values or a NULL and a non-NULL. PostgreSQL does this only when comparing the results of two row constructors (as in Section 9.23.5) or comparing a row constructor to the output of a subquery (as in Section 9.22). In other contexts where two composite-type values are compared, two NULL field values are considered equal, [...]

如果欺骗是由 id 单独定义的（如您的查询所建议的），那么这将起作用：

DELETE FROM customer_info c
WHERE  EXISTS (
   SELECT FROM customer_info c1
   WHERE  ctid < c.ctid
   AND    id = c.id
   );

但是作为万不得已的措施，可能有比 ctid 更好的方法来决定保留哪些行！

显然，您随后会添加一个 PRIMARY KEY 以避免再次出现最初的困境。对于第二种解释，id是候选人。

相关：

How do I (or can I) SELECT DISTINCT on multiple columns?

关于ctid：

How do I decompose ctid into page and row numbers?

Answer 2

如果 table 没有密钥，则不能。

表格有 "keys" 来唯一标识每一行。如果您的 table 没有任何键，那么您将无法从另一行中识别出一行。

我能想到的删除重复行的唯一解决方法是：

在table上添加一个键。
使用键删除多余的行。

例如：

create sequence seq1;
alter table customer_info add column k1 int;
update customer_info set k1 = nextval('seq1');

delete from customer_info where k1 in (
  select k1 
  from (
    select
      k1,
      row_number() over(partition by id, first_name, last_name, phone_number) as rn
    from customer_info
  ) x
  where rn > 1
)

现在你只有两行。

无法使用 PostgreSQL 删除重复行

Unable to delete duplicate rows with PostgreSQL

sql

postgresql

duplicates

sql-delete

row-value-expression