MySQL 在不改变结构的情况下转换为 UTF8

MySQL convert to UTF8 without structure change

我有一个相当大的数据库,我正在尝试将其从字符集和排序规则 latin1/latin1_swedish_ci 转换为 utf8mb4/utf8mb4_unicode_ci。我希望设置复制到一个从站,运行 转换,然后在完成后升级从站以避免停机。

我注意到当 运行查询时...

ALTER TABLE `sometable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

...MySQL 自动将 text 转换为 mediumtext 或将 mediumtext 转换为 longtext,等等

有没有办法关闭这个功能?很高兴 MySQL 有这个功能,但问题是它会破坏复制,因为从服务器上的表结构与主服务器不同。

ALTER TABLE Syntax 中所述:

For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.

To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:

ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(<strong><em>M</em></strong>) CHARACTER SET utf8;

(不是真正的答案,而是一些说明性的例子。)

情况一:文本在latin1列中正确存储为latin1;使用转换为

mysql>     CREATE TABLE alters (
    ->         c VARCHAR(11)  CHARACTER SET latin1  NOT NULL
    ->     );

mysql>     INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));

mysql>     SELECT c, HEX(c) from alters;
+-------+----------+
| c     | HEX(c)   |
+-------+----------+
| aabc  | 61616263 |
| aàbc  | 61E06263 |
| aábc  | 61E16263 |
+-------+----------+

mysql>     ALTER TABLE alters CONVERT TO CHARACTER SET utf8;

mysql>     SELECT c, HEX(c) from alters;
+-------+------------+
| c     | HEX(c)     |
+-------+------------+
| aabc  | 61616263   |
| aàbc  | 61C3A06263 |
| aábc  | 61C3A16263 |
+-------+------------+

mysql>     -- Observation: text was correctly converted to utf8.

mysql>     SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
  `c` varchar(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

情况2:文本在latin1列中正确存储为latin1;使用 "Double ALTER"

mysql>     CREATE TABLE alters (
    ->         c VARCHAR(11)  CHARACTER SET latin1  NOT NULL
    ->     );

mysql>     INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));

mysql>     ALTER TABLE alters MODIFY  c VARBINARY(11) NOT NULL;

mysql>     ALTER TABLE alters MODIFY  c VARCHAR(11)  CHARACTER SET utf8  NOT NULL;
Query OK, 3 rows affected, 2 warnings (0.10 sec)
Records: 3  Duplicates: 0  Warnings: 2

mysql>     SHOW WARNINGS;
+---------+------+----------------------------------------------------------+
| Level   | Code | Message                                                  |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE0bc' for column 'c' at row 2 |
| Warning | 1366 | Incorrect string value: '\xE1bc' for column 'c' at row 3 |
+---------+------+----------------------------------------------------------+

mysql>     SELECT c, HEX(c) from alters;
+------+----------+
| c    | HEX(c)   |
+------+----------+
| aabc | 61616263 |
| a    | 61       |
| a    | 61       |
+------+----------+

mysql>     -- Observation: text was truncated !  BAD

mysql>     SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
  `c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

案例 3:文本被错误地存储为 latin1 列中的 utf8;使用“双重 ALTER 修复它

mysql>     CREATE TABLE alters (
    ->         c VARCHAR(11)  CHARACTER SET latin1  NOT NULL
    ->     );

mysql>     INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61c3a06263')), (UNHEX('61c3a16263'));

mysql>     ALTER TABLE alters MODIFY  c VARBINARY(11) NOT NULL;
mysql>     ALTER TABLE alters MODIFY  c VARCHAR(11)  CHARACTER SET utf8  NOT NULL;

mysql>     SELECT c, HEX(c) from alters;
+-------+------------+
| c     | HEX(c)     |
+-------+------------+
| aabc  | 61616263   |
| aàbc  | 61C3A06263 |
| aábc  | 61C3A16263 |
+-------+------------+

mysql>     SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
  `c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

案例 4:使用 ALTER ... MODIFY;注意长度和 CHAR_LENGTH

mysql>     CREATE TABLE alters (
    ->         c VARCHAR(9)  CHARACTER SET latin1  NOT NULL
    ->     );

mysql>     INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')),
    ->                    (UNHEX('61e16263')),
    ->                    (UNHEX('61e162633536373839'));

mysql>     SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+--------------------+-----------+----------------+
| c          | HEX(c)             | LENGTH(c) | CHAR_LENGTH(c) |
+------------+--------------------+-----------+----------------+
| aabc       | 61616263           |         4 |              4 |
| aàbc       | 61E06263           |         4 |              4 |
| aábc       | 61E16263           |         4 |              4 |
| aábc56789  | 61E162633536373839 |         9 |              9 |
+------------+--------------------+-----------+----------------+

mysql>     ALTER TABLE alters MODIFY  c VARCHAR(9)  CHARACTER SET utf8  NOT NULL;

mysql>     SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+----------------------+-----------+----------------+
| c          | HEX(c)               | LENGTH(c) | CHAR_LENGTH(c) |
+------------+----------------------+-----------+----------------+
| aabc       | 61616263             |         4 |              4 |
| aàbc       | 61C3A06263           |         5 |              4 |
| aábc       | 61C3A16263           |         5 |              4 |
| aábc56789  | 61C3A162633536373839 |        10 |              9 |
+------------+----------------------+-----------+----------------+

mysql>     SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
  `c` varchar(9) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

备注:

  • 没有警告,除了我做了 SHOW 的一个案例。
  • 默认 table CHARSET 未更改,但这不是问题。