只删除连续的重复行

Delete only consecutive duplicate rows

我从 API 收集了数据以建立历史。最初,我每五分钟保存一次 all 值。后来我把我的程序改成只保存变化的数据。

现在,我想清理我的旧数据并删除 count 与同一 accountid 中的先前记录相比没有改变的所有值。

account id      count   time
42      12147   492     2015-09-20 11:31:14.0
42      12147   492     2015-09-20 11:36:19.0 // delete
13      12147   246     2015-09-20 11:31:14.0
2       12253   183     2015-09-20 11:36:19.0
2       19684   805     2015-09-20 12:00:41.0 // note in next comment
2       19684   810     2015-09-20 12:05:41.0
2       19684   805     2015-09-20 12:10:41.0 // we had this combination, but don't delete this record because the previous value was different
2       19684   805     2015-09-20 12:15:41.0 // delete
2       19684   805     2015-09-20 12:20:41.0 // delete
2       19684   806     2015-09-20 12:25:41.0

我试图用 group by 解决这个问题,而不是 accountidcount。然而,使用这种方法,它将删除 非连续的 重复项——即,如果一条记录在一段时间后再次具有相同的值,它将属于同一组。

我也想过写一个小脚本,在其中迭代所有数据,如果 accountidcount 与前一条记录相同,则删除当前行,但我很好奇这是否可以通过单个 SQL 语句实现?

您可以使用以下(未经测试的)代码删除除第一个以外的所有内容:

delete from history h1 
where exists (select h2 
              from history 
              where
                h1.account = h2.account and
                h1.id = h2.id and
                h1.count = h2.count and
                h1.time < h2.time
             )

您可以使用以下查询:

DELETE history 
FROM history 
INNER JOIN (SELECT MIN(time) AS minTime, account, id, count
            FROM history
            GROUP BY account, id, count) AS h
ON history.account = h.account AND history.id = h.id AND history.count = h.count
WHERE history.time > h.minTime

Demo here

编辑:

编辑完成后,我认为OP的示例数据仍然存在一些错误(time字段应按升序排列)。

使用 table 中存在的 PK 的附加假设,您可以使用以下查询:

SELECT pk
FROM history AS h1
WHERE account = (SELECT account 
                 FROM history AS h2
                 WHERE h1.account = h2.account AND
                       h1.id = h2.id AND                       
                       h2.time < h1.time
                 ORDER BY time DESC 
                 LIMIT 1)
      AND
      id = (SELECT id 
            FROM history AS h2
            WHERE h1.account = h2.account AND
                  h1.id = h2.id AND                  
                  h2.time < h1.time
            ORDER BY time DESC 
            LIMIT 1)
      AND
      count = (SELECT count
               FROM history AS h2
               WHERE h1.account = h2.account AND
                     h1.id = h2.id AND                     
                     h2.time < h1.time
               ORDER BY time DESC 
               LIMIT 1)

为了识别要删除的 记录(参见this demo)。

现在您可以使用 NOT IN 运算符轻松删除不需要的行:

DELETE FROM history 
WHERE pk IN (
SELECT x.pk
FROM (             
  SELECT pk
  FROM history AS h1
  WHERE 
     account = (SELECT account 
                FROM history AS h2
                WHERE h1.account = h2.account AND
                      h1.id = h2.id AND                       
                      h2.time < h1.time
                      ORDER BY time DESC 
                      LIMIT 1)

     AND

     id = (SELECT id 
           FROM history AS h2
           WHERE h1.account = h2.account AND
                 h1.id = h2.id AND                  
                 h2.time < h1.time
           ORDER BY time DESC 
           LIMIT 1)

     AND

     count = (SELECT count
              FROM history AS h2
              WHERE h1.account = h2.account AND
                    h1.id = h2.id AND                     
                    h2.time < h1.time
              ORDER BY time DESC 
              LIMIT 1)) AS x)

Demo here

编辑 2:

使用变量定位要删除的 pk 值可能会导致查询速度大大加快:

SELECT pk
FROM (
  SELECT pk, account, id, count, time,
         @rn := IF (account = @acc AND id = @id AND count = @count,
                    @rn + 1, 1) AS rn,
         @acc := account,
         @id := id,
         @count := count
  FROM history
  CROSS JOIN (SELECT @rn = 0, @acc = 0, @id = 0, @count = 0) AS vars
  ORDER BY account, id, time, count ) AS t
WHERE t.rn > 1

Demo here