如何按 MySQL 中的最后重复项对记录进行分组?
How to group records by last duplicates in MySQL?
我有一个 table 保存有关用户登录的信息。我想对 last 重复记录进行分组。例如:
+---+------------+-------------+-------------+------------------+
| | ip | platform | browser | date |
+---+------------+-------------+-------------+------------------+
| 1 | 127.0.0.1 | Windows | Chrome | 2018-01-01 00:00 |
| 2 | 127.0.0.1 | Windows | Chrome | 2018-01-02 00:00 |
| 3 | 10.0.0.1 | Linux | Firefox | 2018-01-03 00:00 |
| 4 | 127.0.0.1 | Windows | Chrome | 2018-01-04 00:00 |
+---+------------+-------------+-------------+------------------+
将输出:
+-----+------------+-------------+-------------+-------------+
| | ip | platform | browser | num_records |
+-----+------------+-------------+-------------+-------------+
| 1-2 | 127.0.0.1 | Windows | Chrome | 2 |
| 3 | 10.0.0.1 | Linux | Firefox | 1 |
| 4 | 127.0.0.1 | Windows | Chrome | 1 |
+-----+------------+-------------+-------------+-------------+
(为了简单起见,我给出了日期,应该有像 id 这样的日期范围)
注意ids 1,2,4
是一样的,但是1,2
和4
因为时间线的原因被分开分组了(还有一条记录把他们分开了)
要查找重复项,我应该考虑以下列:ip, platform, browser
。如果某些内容与这些列不同,则它不是重复项。
我能做到:
SELECT ip, platform, browser, COUNT(1) AS num_records
FROM users_logins
WHERE user_id = 1
GROUP BY ip, platform, browser
但这会在不考虑时间线的情况下对所有记录进行分组。
这是一个 gaps-and-islands 问题。在MySQL8+中,可以使用行号的不同:
select ip, platform, browser,
count(*) as numrecords,
min(id), max(id),
min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by ip, platform, browser order by date) as seqnum_2
from t
) t
group by ip, platform, browser, (seqnum - seqnum_2)
order by min(date) desc;
我有一个 table 保存有关用户登录的信息。我想对 last 重复记录进行分组。例如:
+---+------------+-------------+-------------+------------------+
| | ip | platform | browser | date |
+---+------------+-------------+-------------+------------------+
| 1 | 127.0.0.1 | Windows | Chrome | 2018-01-01 00:00 |
| 2 | 127.0.0.1 | Windows | Chrome | 2018-01-02 00:00 |
| 3 | 10.0.0.1 | Linux | Firefox | 2018-01-03 00:00 |
| 4 | 127.0.0.1 | Windows | Chrome | 2018-01-04 00:00 |
+---+------------+-------------+-------------+------------------+
将输出:
+-----+------------+-------------+-------------+-------------+
| | ip | platform | browser | num_records |
+-----+------------+-------------+-------------+-------------+
| 1-2 | 127.0.0.1 | Windows | Chrome | 2 |
| 3 | 10.0.0.1 | Linux | Firefox | 1 |
| 4 | 127.0.0.1 | Windows | Chrome | 1 |
+-----+------------+-------------+-------------+-------------+
(为了简单起见,我给出了日期,应该有像 id 这样的日期范围)
注意ids 1,2,4
是一样的,但是1,2
和4
因为时间线的原因被分开分组了(还有一条记录把他们分开了)
要查找重复项,我应该考虑以下列:ip, platform, browser
。如果某些内容与这些列不同,则它不是重复项。
我能做到:
SELECT ip, platform, browser, COUNT(1) AS num_records
FROM users_logins
WHERE user_id = 1
GROUP BY ip, platform, browser
但这会在不考虑时间线的情况下对所有记录进行分组。
这是一个 gaps-and-islands 问题。在MySQL8+中,可以使用行号的不同:
select ip, platform, browser,
count(*) as numrecords,
min(id), max(id),
min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by ip, platform, browser order by date) as seqnum_2
from t
) t
group by ip, platform, browser, (seqnum - seqnum_2)
order by min(date) desc;