自连接以查找重复项但包括所有列
Self Join to find duplicates but including all columns
我想匹配日志 table 中出现的具有相同 day
和 cause
且在 table 中出现不止一次的任何条目。我编写的查询已经获取了重复项,我的问题是我需要访问 table 结果中的所有列以用于以后的 JOIN。 Table 看起来像这样:
| ID | DATE | CAUSE | USER | ... |
|--------------------------------------|
| x | 2017-01-01 | aaa | 100 | ... |
| x | 2017-01-02 | aaa | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-05 | aaa | 101 | ... |
| .....................................|
| .....................................|
| .....................................|
查询:
SELECT logs.* FROM
(SELECT day, cause FROM logs
GROUP BY day, cause HAVING COUNT(*) > 1) AS logsTwice, logs
WHERE logsTwice.day = logs.day AND logsTwice.cause = logs.cause
sub select 获取了完全正确的数据(日期和原因),但是当我尝试获取这些匹配项的其他列时,我得到了完全错误的数据。我做错了什么?
试试这个:
SELECT logs.* FROM logs
inner join
(SELECT day, cause FROM logs GROUP BY day, cause HAVING COUNT(*) > 1) logsTwice
on logsTwice.day = logs.day AND logsTwice.cause = logs.cause
您可以只使用 window 函数:
SELECT l.*
FROM (SELECT l.*,
COUNT(*) OVER (PARTITION BY day, cause) as cnt
FROM logs l
) l
WHERE cnt > 1;
一般来说,window 函数的性能优于使用 JOIN
和 GROUP BY
的等效查询。
你可以试试
SELECT l1.*
FROM logs l1
INNER JOIN logs l2
ON (l1.id <> l2.id
AND l1.day = l2.day
AND l1.cause = l2.cause
AND l1.user <> l2.user);
我想匹配日志 table 中出现的具有相同 day
和 cause
且在 table 中出现不止一次的任何条目。我编写的查询已经获取了重复项,我的问题是我需要访问 table 结果中的所有列以用于以后的 JOIN。 Table 看起来像这样:
| ID | DATE | CAUSE | USER | ... |
|--------------------------------------|
| x | 2017-01-01 | aaa | 100 | ... |
| x | 2017-01-02 | aaa | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-05 | aaa | 101 | ... |
| .....................................|
| .....................................|
| .....................................|
查询:
SELECT logs.* FROM
(SELECT day, cause FROM logs
GROUP BY day, cause HAVING COUNT(*) > 1) AS logsTwice, logs
WHERE logsTwice.day = logs.day AND logsTwice.cause = logs.cause
sub select 获取了完全正确的数据(日期和原因),但是当我尝试获取这些匹配项的其他列时,我得到了完全错误的数据。我做错了什么?
试试这个:
SELECT logs.* FROM logs
inner join
(SELECT day, cause FROM logs GROUP BY day, cause HAVING COUNT(*) > 1) logsTwice
on logsTwice.day = logs.day AND logsTwice.cause = logs.cause
您可以只使用 window 函数:
SELECT l.*
FROM (SELECT l.*,
COUNT(*) OVER (PARTITION BY day, cause) as cnt
FROM logs l
) l
WHERE cnt > 1;
一般来说,window 函数的性能优于使用 JOIN
和 GROUP BY
的等效查询。
你可以试试
SELECT l1.*
FROM logs l1
INNER JOIN logs l2
ON (l1.id <> l2.id
AND l1.day = l2.day
AND l1.cause = l2.cause
AND l1.user <> l2.user);