数据适合行时的 VARCHAR 与 TEXT 性能

VARCHAR vs TEXT performance when data fits on row

mysql> desc temp1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| value | varchar(255) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+

mysql> desc temp2;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| value | text | YES  |     | NULL    |       |
+-------+------+------+-----+---------+-------+

255 - 每行 'a' 个字符(在两个 table 中)

mysql> select * from temp1 limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

mysql> select * from temp2 limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

查询table1:

select count(*) from temp1 where value like '%a';

查询table2:

select count(*) from temp2 where value like '%a';

统计数据:

No of records---temp1(varchar)---temp2(text)


2097152---------6.08(sec)--------6.91(sec)          
4194304---------12.42(sec)-------13.66(sec)
8388608---------25.08(sec)-------28.03(sec)
16777216--------52.82(sec)-------56.88(sec)
33554432--------1(min)50.17(sec)-1(min)59.36(sec)

我的问题:如何解释执行速度的差异?

两个 table 中的行内容相同。

据我了解,VarCharText 列仅在超过行大小时才将内容保留在页面之外。所以 table 的内容都将是我的 page size(16kb) 的内联数据。那么这个查询执行时间差的原因是什么

Note: Both table column is not indexed

Row Format - DYNAMIC

Collation - UTF8mb3

Character set - utf8_general_ci

Storage engine -  innodb

Mysql - 5.7

参考link:

更新: 现在相同的流程我尝试在两个 table 中使用 5000 个字符 ('a') 结果差异很大。

2097152---------1(min)53.63(sec)--------2(min)4.66(sec)    

更新二: 相同的流程现在我尝试在两个 table 中使用 2 个字符 ('a') 仍然存在性能差异

添加table状态:

mysql> select * FROM information_schema.tables  WHERE table_schema = "db67006db" and table_name = 'temp1';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | db67006db    | temp1      | BASE TABLE | InnoDB |      10 | Dynamic    |   30625036 |            315 |  9659482112 |               0 |            0 | 425721856 |           NULL | 2019-09-23 20:20:17 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> select * FROM information_schema.tables  WHERE table_schema = "db67006db" and table_name = 'temp2';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | db67006db    | temp2      | BASE TABLE | InnoDB |      10 | Dynamic    |   30922268 |            315 |  9753853952 |               0 |            0 | 425721856 |           NULL | 2019-09-23 20:20:12 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+

让我们使用一些工具

由于 最初的预感(见下文)是错误的 ,请尝试 运行 通过 MySQL 查询 Workbench为了收集Query Performance Stats.


初步预感(无结果)

一点想法:

  • TEXT 磁盘上的列大小为 2 + N 字节,其中 N 是字符串的长度
  • VARCHAR 占用 1 + N 字节(对于 N ≤ 255)或 2 + N 字节(对于 256 ≤ N ≤ 65535)

尝试将 列中的文本大小扩展到 256 个字符以上,然后重新运行 测试。他们可能 运行 的表现会更接近。

另请注意,您 post 的差异以每条记录的微秒数表示,因此可能会有许多 OS 事件阻碍或非常简单的 if (TEXT) {do some additional IO or housekeeping} 代码路径来源。

TEXT 类型总是比 VARCHAR 慢,因为这些类型有不同的存储方法。 VARCHAR 字段存储在 table 中,除 TEXT 以外的所有列都以不同的方式存储。每个 TEXT 值都是一个单独的对象。这意味着如果你想用 TEXT 值做一些事情 MySQL 将进行额外的操作来获取该对象。

引自the official documentation

Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.

您的第一个案例假设不正确。基于 Storage Requirements TEXT255 a 存储的字节比 VARCHAR 多一个字节,因此对于 33554432 中的记录 table 你需要 33554432 more bytes 加载内存并解释时间延迟。

这当然不适用于 5000 a,根据相同的文档,大小是相同的 L + 2 bytes。但我认为延迟的原因在 Row Size Limits 中写道:

The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows. BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.

我认为作为行数据的一部分并单独存储(需要一些时间从存储的位置检索它)是完全不同的,这解释了时间延迟。

With respect to storage, InnoDB will handle VARCHAR and TEXT much the same when both stored inline. However, when fetching the data from InnoDB, the server will allocate space for all VARCHAR columns before query execution. While space for TEXT columns will only be allocated if they are actually read, where DYNAMIC memory allocation takes time.

https://forums.mysql.com/read.php?24,645115,645164#msg-645164