MYSQL:排除同一天内重复的扫描日志
MYSQL: Exclude duplicate scan logs within same day
我试图在一天内 select 行排除重复项。
重复的标准是:SAME USER
AND SAME PRODUCT_UPC
AND SAME DATE(SCANNED_ON)
所以,从下面的 table,如果 SCAN_ID = 100 是 selected,排除 SCAN_ID = 101,因为它们属于同一个 user_id AND 相同 product_upc 并且具有相同的 DATE(scanned_on).
这是 table 结构:
SCAN_ID USER_ID PRODUCT_UPC SCANNED_ON
100 1 0767914767 2020-08-01 03:49:11
101 1 0767914767 2020-08-01 03:58:28
102 2 0064432050 2020-08-02 04:01:31
103 3 0804169977 2020-08-10 04:08:48
104 4 0875523846 2020-08-10 05:21:32
105 4 0007850492 2020-08-12 07:10:05
到目前为止我提出的查询是:
SET @last_user='', @last_upc='', @last_date='';
SELECT *,
@last_user as last_user , @last_user:=user_id as this_user,
@last_upc as last_upc , @last_upc:=product_upc as this_upc,
@last_date as last_date , @last_date:=DATE(scanned_on) as this_date
FROM scansv2
HAVING this_user != last_user OR this_upc != last_upc OR this_date != last_date
在 MySQL 8 中你可以使用 ROW_NUMVER
CREATE TABLE scansv2 (
`SCAN_ID` INTEGER,
`USER_ID` INTEGER,
`PRODUCT_UPC` INTEGER,
`SCANNED_ON` DATETIME
);
INSERT INTO scansv2
(`SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON`)
VALUES
('100', '1', '0767914767', '2020-08-01 03:49:11'),
('101', '1', '0767914767', '2020-08-01 03:58:28'),
('102', '2', '0064432050', '2020-08-02 04:01:31'),
('103', '3', '0804169977', '2020-08-10 04:08:48'),
('104', '4', '0875523846', '2020-08-10 05:21:32'),
('105', '4', '0007850492', '2020-08-12 07:10:05');
WITH rownum AS (SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON`,ROW_NUMBER() OVER (
PARTITION BY `PRODUCT_UPC`
ORDER BY `SCANNED_ON` DESC) row_num FROM scansv2)
SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON` FROM rownum WHERE row_num = 1 ORDER BY `SCAN_ID`
SCAN_ID | USER_ID | PRODUCT_UPC | SCANNED_ON
------: | ------: | ----------: | :------------------
101 | 1 | 767914767 | 2020-08-01 03:58:28
102 | 2 | 64432050 | 2020-08-02 04:01:31
103 | 3 | 804169977 | 2020-08-10 04:08:48
104 | 4 | 875523846 | 2020-08-10 05:21:32
105 | 4 | 7850492 | 2020-08-12 07:10:05
db<>fiddle here
在 MySQL 5.x 中,您需要用户定义的变量用于相同的目的
SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON`
FROM
(SELECT `SCAN_ID`, `USER_ID`, `SCANNED_ON`,
IF (@product = `PRODUCT_UPC`,@row_num := @row_num + 1,@row_num := 1) row_num
, @product := `PRODUCT_UPC` PRODUCT_UPC
FROM (SELECT * FROM scansv2 ORDER BY `PRODUCT_UPC`, `SCANNED_ON`) c,(SELECT @row_num := 0,@product := 0) a ) b
WHERE row_num = 1 ORDER BY `SCAN_ID`
SCAN_ID | USER_ID | PRODUCT_UPC | SCANNED_ON
------: | ------: | ----------: | :------------------
100 | 1 | 767914767 | 2020-08-01 03:49:11
102 | 2 | 64432050 | 2020-08-02 04:01:31
103 | 3 | 804169977 | 2020-08-10 04:08:48
104 | 4 | 875523846 | 2020-08-10 05:21:32
105 | 4 | 7850492 | 2020-08-12 07:10:05
db<>fiddle here
在大多数数据库中(包括 MySQL 8.0 之前的版本),使用子查询进行过滤是一种受支持且高效的选项:
select s.*
from scansv2 s
where s.scanned_on = (
select min(s1.scanned_on)
from scansv2 s1
where
s1.user_id = s.user_id
and s1.product_upc = s.product_upc
and s1.scanned_on >= date(s.scanned_on)
and s1.scanned_on < date(s.scanned_on) + interval 1 day
)
这会为您提供每个 user_id
、product_upc
和每天的第一行,并过滤掉其他行(如果有)。
我试图在一天内 select 行排除重复项。
重复的标准是:SAME USER
AND SAME PRODUCT_UPC
AND SAME DATE(SCANNED_ON)
所以,从下面的 table,如果 SCAN_ID = 100 是 selected,排除 SCAN_ID = 101,因为它们属于同一个 user_id AND 相同 product_upc 并且具有相同的 DATE(scanned_on).
这是 table 结构:
SCAN_ID USER_ID PRODUCT_UPC SCANNED_ON
100 1 0767914767 2020-08-01 03:49:11
101 1 0767914767 2020-08-01 03:58:28
102 2 0064432050 2020-08-02 04:01:31
103 3 0804169977 2020-08-10 04:08:48
104 4 0875523846 2020-08-10 05:21:32
105 4 0007850492 2020-08-12 07:10:05
到目前为止我提出的查询是:
SET @last_user='', @last_upc='', @last_date='';
SELECT *,
@last_user as last_user , @last_user:=user_id as this_user,
@last_upc as last_upc , @last_upc:=product_upc as this_upc,
@last_date as last_date , @last_date:=DATE(scanned_on) as this_date
FROM scansv2
HAVING this_user != last_user OR this_upc != last_upc OR this_date != last_date
在 MySQL 8 中你可以使用 ROW_NUMVER
CREATE TABLE scansv2 ( `SCAN_ID` INTEGER, `USER_ID` INTEGER, `PRODUCT_UPC` INTEGER, `SCANNED_ON` DATETIME ); INSERT INTO scansv2 (`SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON`) VALUES ('100', '1', '0767914767', '2020-08-01 03:49:11'), ('101', '1', '0767914767', '2020-08-01 03:58:28'), ('102', '2', '0064432050', '2020-08-02 04:01:31'), ('103', '3', '0804169977', '2020-08-10 04:08:48'), ('104', '4', '0875523846', '2020-08-10 05:21:32'), ('105', '4', '0007850492', '2020-08-12 07:10:05');
WITH rownum AS (SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON`,ROW_NUMBER() OVER ( PARTITION BY `PRODUCT_UPC` ORDER BY `SCANNED_ON` DESC) row_num FROM scansv2) SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON` FROM rownum WHERE row_num = 1 ORDER BY `SCAN_ID`
SCAN_ID | USER_ID | PRODUCT_UPC | SCANNED_ON ------: | ------: | ----------: | :------------------ 101 | 1 | 767914767 | 2020-08-01 03:58:28 102 | 2 | 64432050 | 2020-08-02 04:01:31 103 | 3 | 804169977 | 2020-08-10 04:08:48 104 | 4 | 875523846 | 2020-08-10 05:21:32 105 | 4 | 7850492 | 2020-08-12 07:10:05
db<>fiddle here
在 MySQL 5.x 中,您需要用户定义的变量用于相同的目的
SELECT `SCAN_ID`, `USER_ID`, `PRODUCT_UPC`, `SCANNED_ON` FROM (SELECT `SCAN_ID`, `USER_ID`, `SCANNED_ON`, IF (@product = `PRODUCT_UPC`,@row_num := @row_num + 1,@row_num := 1) row_num , @product := `PRODUCT_UPC` PRODUCT_UPC FROM (SELECT * FROM scansv2 ORDER BY `PRODUCT_UPC`, `SCANNED_ON`) c,(SELECT @row_num := 0,@product := 0) a ) b WHERE row_num = 1 ORDER BY `SCAN_ID`
SCAN_ID | USER_ID | PRODUCT_UPC | SCANNED_ON ------: | ------: | ----------: | :------------------ 100 | 1 | 767914767 | 2020-08-01 03:49:11 102 | 2 | 64432050 | 2020-08-02 04:01:31 103 | 3 | 804169977 | 2020-08-10 04:08:48 104 | 4 | 875523846 | 2020-08-10 05:21:32 105 | 4 | 7850492 | 2020-08-12 07:10:05
db<>fiddle here
在大多数数据库中(包括 MySQL 8.0 之前的版本),使用子查询进行过滤是一种受支持且高效的选项:
select s.*
from scansv2 s
where s.scanned_on = (
select min(s1.scanned_on)
from scansv2 s1
where
s1.user_id = s.user_id
and s1.product_upc = s.product_upc
and s1.scanned_on >= date(s.scanned_on)
and s1.scanned_on < date(s.scanned_on) + interval 1 day
)
这会为您提供每个 user_id
、product_upc
和每天的第一行,并过滤掉其他行(如果有)。