Postgresql:标记和识别重复项
Postgresql: Flagging and Identifying Duplicates
我正在尝试找到一种方法来标记类似于此 question 的重复案例。
但是,我不想计算重复值的出现次数,而是想将它们标记为 0
和 1
,分别用于重复和独特的情况。这与 SPSS
的识别重复案例功能非常相似。例如,如果我有一个像这样的数据集:
Name State Gender
John TX M
Katniss DC F
Noah CA M
Katniss CA F
John SD M
Ariel FL F
如果我想标记那些重名的,那么输出将是这样的:
Name State Gender Dup
John TX M 1
Katniss DC F 1
Noah CA M 1
Katniss CA F 0
John SD M 0
Ariel FL F 1
奖金将是一个查询语句,它将处理在确定唯一案例时选择哪种案例。
SELECT name, state, gender
, NOT EXISTS (SELECT 1 FROM names nx
WHERE nx.name = na.name
AND nx.gender = na.gender
AND nx.ctid < na.ctid) AS Is_not_a_dup
FROM names na
;
解释:[NOT] EXISTS(...)
产生一个布尔值(可以转换为整数)转换为布尔值需要一对额外的 ()
,但是:
SELECT name, state, gender
, (NOT EXISTS (SELECT 1 FROM names nx
WHERE nx.name = na.name
AND nx.gender = na.gender
AND nx.ctid < na.ctid))::integer AS is_not_a_dup
FROM names na
;
结果:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 6
name | state | gender | nodup
---------+-------+--------+-------
John | TX | M | t
Katniss | DC | F | t
Noah | CA | M | t
Katniss | CA | F | f
John | SD | M | f
Ariel | FL | F | t
(6 rows)
name | state | gender | nodup
---------+-------+--------+-------
John | TX | M | 1
Katniss | DC | F | 1
Noah | CA | M | 1
Katniss | CA | F | 0
John | SD | M | 0
Ariel | FL | F | 1
(6 rows)
我正在尝试找到一种方法来标记类似于此 question 的重复案例。
但是,我不想计算重复值的出现次数,而是想将它们标记为 0
和 1
,分别用于重复和独特的情况。这与 SPSS
的识别重复案例功能非常相似。例如,如果我有一个像这样的数据集:
Name State Gender
John TX M
Katniss DC F
Noah CA M
Katniss CA F
John SD M
Ariel FL F
如果我想标记那些重名的,那么输出将是这样的:
Name State Gender Dup
John TX M 1
Katniss DC F 1
Noah CA M 1
Katniss CA F 0
John SD M 0
Ariel FL F 1
奖金将是一个查询语句,它将处理在确定唯一案例时选择哪种案例。
SELECT name, state, gender
, NOT EXISTS (SELECT 1 FROM names nx
WHERE nx.name = na.name
AND nx.gender = na.gender
AND nx.ctid < na.ctid) AS Is_not_a_dup
FROM names na
;
解释:[NOT] EXISTS(...)
产生一个布尔值(可以转换为整数)转换为布尔值需要一对额外的 ()
,但是:
SELECT name, state, gender
, (NOT EXISTS (SELECT 1 FROM names nx
WHERE nx.name = na.name
AND nx.gender = na.gender
AND nx.ctid < na.ctid))::integer AS is_not_a_dup
FROM names na
;
结果:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 6
name | state | gender | nodup
---------+-------+--------+-------
John | TX | M | t
Katniss | DC | F | t
Noah | CA | M | t
Katniss | CA | F | f
John | SD | M | f
Ariel | FL | F | t
(6 rows)
name | state | gender | nodup
---------+-------+--------+-------
John | TX | M | 1
Katniss | DC | F | 1
Noah | CA | M | 1
Katniss | CA | F | 0
John | SD | M | 0
Ariel | FL | F | 1
(6 rows)