MySQL 是为主键创建额外的索引还是使用数据本身作为 "index"

Does MySQL create an extra index for primary key or uses the data itself as an "index"

找不到明确的答案。 我知道当您创建一个主键时,MySQL 根据该主键对数据进行排序,问题是,它实际上是否创建了另一个索引,或者使用实际数据作为索引,因为它应该由主键排序钥匙?

编辑:

如果我有一个具有索引 A 和索引 B 且没有主键的 table,则我有数据 + 索引 A + 索引 B。如果我将 table 更改为具有列索引A作为主键,我只会有数据(也用作索引)+索引B对吗?以上是内存占用方面

Clustered and Secondary Indexes

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

  • When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index

  • If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

  • If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage

是的,聚集索引的索引是 table 本身。这是唯一存储其他 non-indexed 列的地方。当您显示 TABLE STATUS 时,您会看到报告为 Data_length。二级索引报告为 Index_length.

mysql> show table status like 'redacted'\G
*************************** 1. row ***************************
           Name: redacted
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 100217
 Avg_row_length: 1168
    Data_length: 117063680    <-- clustered index
Max_data_length: 0
   Index_length: 3653632      <-- secondary index(es)

InnoDB 总是存储聚集索引。如果你的table的任何列上都没有定义PRIMARY KEY,InnoDB会创建一个人工列作为聚集索引的键,并且无法查询该列。

if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage

确实如此 - 在存储大小方面还有更多需要考虑的因素。

假设,您尝试做的事情在逻辑上没有问题,而您想要提升为主键的索引实际上是 candidate key。是否可以节省存储大小取决于索引的数量和主键列的大小。原因是 InnoDB 将主键列附加到每个二级索引(如果它们还不是明确的一部分)。它还会影响其他(更大的)tables,如果他们需要将其作为外键引用。

下面是一些简单的测试,可以看出差异。我正在使用 MariaDB,因为它的 序列插件 可以轻松创建虚拟数据。但是你应该在 MySQL 服务器上看到相同的效果。

所以首先我将只创建一个简单的 table,其中包含两个 INT 列和一个索引,每个列用 10 万行填充它。

drop table if exists test;
create table test(
    a int,
    b int,
    index(a),
    index(b)
);

insert into test(a, b)
    select seq as a, seq as b
    from seq_1_to_100000
;

为简单起见,我将只查看 table 的文件大小(我使用的是 innodb_file_per_table=1)。

16777216 test.ibd

现在让我们做你想做的,将列 a 设为主键,更改 CREATE 语句:

create table test(
    a int,
    b int,
    primary key(a),
    index(b)
);

文件大小现在是:

13631488 test.ibd

这是真的 - 您可以通过将索引提升为主键来节省存储空间。在这种情况下几乎是 20%。

但是如果我将列类型从 INT(4 字节)更改为 BINARY(32)(32 字节)会怎样?

create table test(
    a binary(32),
    b binary(32),
    index(a),
    index(b)
);

文件大小:

37748736 test.ibd

现在将列a设为主键

create table test(
    a binary(32),
    b binary(32),
    primary key(a),
    index(b)
);

文件大小:

41943040 test.ibd

如您所见,您也可以增加尺寸。在这种情况下,比如 11%。

虽然建议始终定义主键。如果有疑问,只需创建一个 AUTO_INCREMENT PRIMARY KEY。在我的示例中,它可能是:

create table test(
    id mediumint auto_increment primary key,
    a binary(32),
    b binary(32),
    index(a),
    index(b)
);

文件大小:

37748736 test.ibd

大小与我们没有显式主键时的大小相同。 (虽然我希望在大小上节省一点,因为我使用 3 字节 PK 而不是隐藏的 6 字节 PK。)但现在您可以在查询中使用它,用于外键和连接。