mySQL 分组非常大 table 以消除重复项

Question

我有一个非常大的 table，大约有 27 个 mio。行和 80 列（主要是浮点数，但也有日期时间、整数和文本值）。基本上，除了具有不相等值的 2 列（int、text）之外，总是有 4 行具有唯一值。我想将这 4 行分组以减小 table 大小。在这一点上，从 2 个值不相等的列中保留哪个值并不那么重要。对数据进行分组的最佳方式是什么？

由于 table 非常大，因此无法概览数据是否始终完整，即始终有 4 行具有唯一值。有什么好的方法可以检查吗？

不幸的是，我是 SQL 的初学者，非常感谢任何关于如何处理它的提示。

Answer 1

此查询根据唯一列为您提供重复行。将c1,c2,c3,c4替换为满足unity

的列

select <unique_columns> 
    from (
    select c.*,
      row_number() over (partition by c1,c2,c3,c4 order by c5) rn
    from myTable c
) t
where rn != 1

Answer 2

你可以

-- Create new table with the same structure
CREATE TABLE new_table LIKE old_table;
-- Add unique index by all columns except last two
ALTER TABLE new_table ADD UNIQUE index_name (column1, column2, ... , columnN);
-- Copy data from old table ignoring duplicates 
-- (only one row in each group will be inserted)
INSERT IGNORE INTO new_table SELECT * FROM old_table;

然后您可以删除使用过的索引，删除旧的 table 并重命名新的。或者截断旧表并从新表中复制行（将其用作临时表）。

我希望没有外键 and/or 触发源 table...

mySQL 分组非常大 table 以消除重复项

mySQL group very large table to eliminate duplicates

mysql

group-by

duplicates

mysql-workbench