MySQL 分区 (innoDB) - 大 table

Question

我有一个 MySQL 非常大的数据库（10 亿行），如下所示：

数据库：产品("name","caracteristics")

两列都是 VARCHAR(50)。

实际上，它没有KEY sat，但是"name"将是独一无二的，所以我想我会把它改成"name" PRIMARY_KEY。（我之前应该这样做。现在我需要在添加 primary_key 选项之前执行删除重复查询我猜）

我的问题是，在 table 上执行简单查询时，它确实需要很长时间。

SELECT caracteristics WHERE name=blabla LIMIT 1; //需要很长时间。

我正在考虑对现有的 table 进行分区。

那么问题来了：

解决我的性能问题是个好主意吗？
我怎样才能做到这一点？
我的 ALTER TABLE 将 'name' 列设置为 PRIMARY_KEY 也是一个好主意吗？
还有关于重复查询，我在这里发现了这个，我做得对吗？（不想弄乱我的 table...）

delete a
from products a
left join(
select max(name) maxname, caracteristics
from products
group by caracteristics) b
on a.name = maxname and
a.caracteristics= b.caracteristics
where b.maxname IS NULL;

Answer 1

我认为分区不是解决这个特定问题的方法。你会如何分区？根据什么标准？

我认为您主要关心的是架构问题，应该先于其他问题解决：唯一记录不是唯一的。

由于容积法，我认为任何解决方案都需要一段时间才能执行。但我敢打赌，这个是最快的：

CREATE TABLE products_unique (
 name VARCHAR(50) NOT NULL,
 characteristics VARCHAR(50),
 PRIMARY KEY (name)
);

INSERT IGNORE INTO products_unique SELECT * FROM products;

RENAME TABLE products TO products_backup;
RENAME TABLE products_unique TO products;

重复项会被任意显示，但我认为这正是您要查找的内容。如果花费的时间太长，您应该运行过夜尝试...我只是希望事务缓冲区不会在您身上爆炸，在这种情况下我们必须处理一些存储过程以分批分离插入。

Answer 2

是的，解决性能问题是个好主意。当您遇到性能问题时，这总是正确的答案 serious-enough 想知道性能修复。

您可以通过更改 table 并将 name 设为 primary key 来实现这一点，正如您已经意识到的那样。

您的查询应该没有必要。您应该创建一个临时 table 而不是 insert 您认为必要的值。假设 table 的名称是 mytemptable。那么：

insert into mytemptable(name, characteristics)
select name, characteristics
from products
where not exists (select 1
                  from mytemptable t
                  where products.name = t.name);

然后使用

从products中删除您的记录

delete from products;

然后 alter products，确保它有 name 作为 primary key 然后

insert into products(name, characteristics)
select name, characteristics
from mytemptable;

最后 drop 你的临时 table。

关于您的查询：

由于您删除了记录，如果您有一个可能的 name 与给定的 characteristics 关联，max(name) 将等于您组中的所有其他 name值，这是非常安全的假设。因此，如果您有一个可能的 characteristics 值匹配单个 name，您将删除该 name 的所有实例，所以是的，您的查询会弄乱你的数据。

Answer 3

你也可以直接设置 PRIMARY KEY 和 ignore 选项，像这样：

ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);

这将删除名称中的所有重复项。

样本

MariaDB [l]> CREATE TABLE `products` (
    ->   `name` varchar(50) NOT NULL DEFAULT '',
    ->   `caracteristics` varchar(50) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

MariaDB [l]> INSERT INTO `products` (`name`, `caracteristics`)
    -> VALUES
    ->     ('val1', 'asdfasdfasdf'),
    ->     ('val2', 'asdasDasd'),
    ->     ('val3', 'aesfawfa'),
    ->     ('val1', '99999999');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf   |
| val2 | asdasDasd      |
| val3 | aesfawfa       |
| val1 | 99999999       |
+------+----------------+
4 rows in set (0.00 sec)

MariaDB [l]> ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
Query OK, 4 rows affected (0.03 sec)
Records: 4  Duplicates: 1  Warnings: 0

MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf   |
| val2 | asdasDasd      |
| val3 | aesfawfa       |
+------+----------------+
3 rows in set (0.00 sec)

MariaDB [l]>

测试添加主键/插入忽略

这是添加主键和插入忽略之间的测试。你可以看到 add Primary key (90 sec / 120 sec) 在此示例中要快一点

MariaDB [l]> CREATE TABLE `bigtable10m` (
    ->   `id` varchar(32) NOT NULL DEFAULT ''
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable10m`
    -> select lpad(seq,8,'0') from seq_1_to_10000000;
Query OK, 10000000 rows affected (24.24 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]>
MariaDB [l]> SELECT * FROM `bigtable10m` LIMIT 10;
+----------+
| id       |
+----------+
| 00000001 |
| 00000002 |
| 00000003 |
| 00000004 |
| 00000005 |
| 00000006 |
| 00000007 |
| 00000008 |
| 00000009 |
| 00000010 |
+----------+
10 rows in set (0.00 sec)

MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
    ->   `id` varchar(32) NOT NULL DEFAULT ''
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (28.49 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (29.01 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (32.98 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]>
MariaDB [l]> ALTER IGNORE TABLE `bigtable30m` ADD PRIMARY KEY(id);
Query OK, 30000000 rows affected (1 min 32.34 sec)
Records: 30000000  Duplicates: 20000000  Warnings: 0

MariaDB [l]>
MariaDB [l]> DROP TABLE `bigtable30m`;
Query OK, 0 rows affected (0.52 sec)

MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
    ->   `id` varchar(32) NOT NULL DEFAULT ''
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)

MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (37.29 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (41.87 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (30.87 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [l]>
MariaDB [l]> CREATE TABLE bigtable_unique (
    ->   `id` varchar(32) NOT NULL DEFAULT '',
    ->  PRIMARY KEY (id)
    -> );
Query OK, 0 rows affected (0.02 sec)

MariaDB [l]>
MariaDB [l]> INSERT IGNORE bigtable_unique SELECT * FROM `bigtable30m`;
Query OK, 10000000 rows affected, 65535 warnings (1 min 57.99 sec)
Records: 30000000  Duplicates: 20000000  Warnings: 20000000

MariaDB [l]>

MySQL 分区 (innoDB) - 大 table

MySQL Partitioning (innoDB) - Large table

mysql

performance

innodb

partitioning