MySQL JOIN 大表的查询优化
MySQL Query Optimization for JOIN Large Tables
我的 MySQL 大数据访问查询有问题,当使用连接优化查询时,它会在 122 秒内给出一周数据的输出。然后对于一个月的数据,该过程需要 526 秒。
我想优化此查询以减少每年的处理时间,或者是否有任何方法可以优化 MySQL 一般设置?
Table 详情。
我引用了两个 tables,其中 mdiaries 和 tv_diaries,在两个 tables 中我都索引了相关的列,在 mdiaries table 中有 2661331 行和 [=28 中的 27074645 行=].
日记 table:
INDEX area (area),
INDEX date (date),
INDEX district (district),
INDEX gaDivision (gaDivision),
INDEX member_id (member_id),
INDEX tv_channel_id (tv_channel_id),
tv_diaries.
INDEX area (area),
INDEX date (date),
INDEX district (district),
INDEX member_id (member_id),
INDEX timeslot_id (timeslot_id),
INDEX tv_channel_id (tv_channel_id),
这是我的查询,执行需要 122 秒。
$sql = "SELECT COUNT(TvDiary.id) AS m_count,TvDiary.date,TvDiary.timeslot_id,TvDiary.tv_channel_id,TvDiary.district,TvDiary.area
FROM `mdiaries` AS Mdiary INNER JOIN `tv_diaries` AS TvDiary ON Mdiary.member_id = TvDiary.member_id
WHERE Mdiary.date >= '2014-01-01' AND Mdiary.date <= '2014-01-07'
AND TvDiary.date >= '2014-01-01' AND TvDiary.date <= '2014-01-07'
GROUP BY TvDiary.date,
TvDiary.timeslot_id,
TvDiary.tv_channel_id,
TvDiary.district,
TvDiary.area";
这是 my.cnf 文件。
[mysqld]
## General
datadir = /var/lib/mysql
tmpdir = /var/lib/mysqltmp
socket = /var/lib/mysql/mysql.sock
skip-name-resolve
sql-mode = NO_ENGINE_SUBSTITUTION
#event-scheduler = 1
## Networking
back-log = 100
#max-connections = 200
max-connect-errors = 10000
max-allowed-packet = 32M
interactive-timeout = 3600
wait-timeout = 600
### Storage Engines
#default-storage-engine = InnoDB
innodb = FORCE
## MyISAM
key-buffer-size = 64M
myisam-sort-buffer-size = 128M
## InnoDB
innodb-buffer-pool-size = 16G
innodb_buffer_pool_instances = 16
#innodb-log-file-size = 100M
#innodb-log-buffer-size = 8M
#innodb-file-per-table = 1
#innodb-open-files = 300
## Replication
server-id = 1
#log-bin = /var/log/mysql/bin-log
#relay-log = /var/log/mysql/relay-log
relay-log-space-limit = 16G
expire-logs-days = 7
#read-only = 1
#sync-binlog = 1
#log-slave-updates = 1
#binlog-format = STATEMENT
#auto-increment-offset = 1
#auto-increment-increment = 2
## Logging
log-output = FILE
slow-query-log = 1
slow-query-log-file = /var/log/mysql/slow-log
#log-slow-slave-statements
long-query-time = 2
##
query_cache_size = 512M
query_cache_type = 1
query_cache_limit = 2M
join_buffer_size = 512M
thread_cache_size = 128
[mysqld_safe]
log-error = /var/log/mysqld.log
open-files-limit = 65535
[mysql]
no-auto-rehash
这是您的查询:
SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id,
t.district, t.area
FROM `mdiaries` m INNER JOIN
`tv_diaries` t
ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' AND
t.date >= '2014-01-01' AND t.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;
我将从复合索引开始:tv_diaries(date, member_id)
和 mdiaries(member_id, date)
。
此查询有问题,但这些可能会有所帮助。
尝试在 GROUP BY
子句中引用的所有列上添加多列索引,如 in the documentation.
所述
INDEX grp (date, timeslot_id, tv_channel_id, district, area)
不确定,但它可以为您提供更好的性能-
SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area
FROM `mdiaries` m
JOIN
(
SELECT t.id, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area, t.member_id
FROM `tv_diaries` AS t
WHERE t.date >= '2014-01-01' AND t.date <= '2014-01-07'
) t ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;
您还可以检查您的数据库配置设置,因为我看到以下问题-
innodb_file_per_table=1 评论:如果它是真的那么数据将存储在单个 ibd 文件中而不是 table 明智。
tmp_table_size 和 max_heap_table_size 可以在您尝试从繁重的 table 中获取数据时提高性能。因此,如果您的查询正在磁盘上创建临时 table,请尝试将它们都设置为至少 100M,以避免在磁盘上创建临时 table。
因为您正在使用分组依据,所以如果您增加 sort_buffer_size 变量会有所帮助。可以设置2M.
join_buffer_size太高了应该接近2M左右可以设置max。 8M 但不是 512M,因为它使用会话明智所以占用你所有的内存。
你也设置了 query_cache_size 太高了 512M,所以从这里释放内存,你也可以通过 mysqltuner 报告检查你实际上是否从缓存查询中获益,如果没有那么你可以禁用它。
也许您可以使用物化视图来存储查询结果并定期刷新(每月?15 天?)
这不会优化您的查询,但您的咨询会更快(不会再次计算计数)
我的 MySQL 大数据访问查询有问题,当使用连接优化查询时,它会在 122 秒内给出一周数据的输出。然后对于一个月的数据,该过程需要 526 秒。 我想优化此查询以减少每年的处理时间,或者是否有任何方法可以优化 MySQL 一般设置?
Table 详情。 我引用了两个 tables,其中 mdiaries 和 tv_diaries,在两个 tables 中我都索引了相关的列,在 mdiaries table 中有 2661331 行和 [=28 中的 27074645 行=].
日记 table:
INDEX area (area),
INDEX date (date),
INDEX district (district),
INDEX gaDivision (gaDivision),
INDEX member_id (member_id),
INDEX tv_channel_id (tv_channel_id),
tv_diaries.
INDEX area (area),
INDEX date (date),
INDEX district (district),
INDEX member_id (member_id),
INDEX timeslot_id (timeslot_id),
INDEX tv_channel_id (tv_channel_id),
这是我的查询,执行需要 122 秒。
$sql = "SELECT COUNT(TvDiary.id) AS m_count,TvDiary.date,TvDiary.timeslot_id,TvDiary.tv_channel_id,TvDiary.district,TvDiary.area
FROM `mdiaries` AS Mdiary INNER JOIN `tv_diaries` AS TvDiary ON Mdiary.member_id = TvDiary.member_id
WHERE Mdiary.date >= '2014-01-01' AND Mdiary.date <= '2014-01-07'
AND TvDiary.date >= '2014-01-01' AND TvDiary.date <= '2014-01-07'
GROUP BY TvDiary.date,
TvDiary.timeslot_id,
TvDiary.tv_channel_id,
TvDiary.district,
TvDiary.area";
这是 my.cnf 文件。
[mysqld]
## General
datadir = /var/lib/mysql
tmpdir = /var/lib/mysqltmp
socket = /var/lib/mysql/mysql.sock
skip-name-resolve
sql-mode = NO_ENGINE_SUBSTITUTION
#event-scheduler = 1
## Networking
back-log = 100
#max-connections = 200
max-connect-errors = 10000
max-allowed-packet = 32M
interactive-timeout = 3600
wait-timeout = 600
### Storage Engines
#default-storage-engine = InnoDB
innodb = FORCE
## MyISAM
key-buffer-size = 64M
myisam-sort-buffer-size = 128M
## InnoDB
innodb-buffer-pool-size = 16G
innodb_buffer_pool_instances = 16
#innodb-log-file-size = 100M
#innodb-log-buffer-size = 8M
#innodb-file-per-table = 1
#innodb-open-files = 300
## Replication
server-id = 1
#log-bin = /var/log/mysql/bin-log
#relay-log = /var/log/mysql/relay-log
relay-log-space-limit = 16G
expire-logs-days = 7
#read-only = 1
#sync-binlog = 1
#log-slave-updates = 1
#binlog-format = STATEMENT
#auto-increment-offset = 1
#auto-increment-increment = 2
## Logging
log-output = FILE
slow-query-log = 1
slow-query-log-file = /var/log/mysql/slow-log
#log-slow-slave-statements
long-query-time = 2
##
query_cache_size = 512M
query_cache_type = 1
query_cache_limit = 2M
join_buffer_size = 512M
thread_cache_size = 128
[mysqld_safe]
log-error = /var/log/mysqld.log
open-files-limit = 65535
[mysql]
no-auto-rehash
这是您的查询:
SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id,
t.district, t.area
FROM `mdiaries` m INNER JOIN
`tv_diaries` t
ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' AND
t.date >= '2014-01-01' AND t.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;
我将从复合索引开始:tv_diaries(date, member_id)
和 mdiaries(member_id, date)
。
此查询有问题,但这些可能会有所帮助。
尝试在 GROUP BY
子句中引用的所有列上添加多列索引,如 in the documentation.
INDEX grp (date, timeslot_id, tv_channel_id, district, area)
不确定,但它可以为您提供更好的性能-
SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area
FROM `mdiaries` m
JOIN
(
SELECT t.id, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area, t.member_id
FROM `tv_diaries` AS t
WHERE t.date >= '2014-01-01' AND t.date <= '2014-01-07'
) t ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;
您还可以检查您的数据库配置设置,因为我看到以下问题-
innodb_file_per_table=1 评论:如果它是真的那么数据将存储在单个 ibd 文件中而不是 table 明智。
tmp_table_size 和 max_heap_table_size 可以在您尝试从繁重的 table 中获取数据时提高性能。因此,如果您的查询正在磁盘上创建临时 table,请尝试将它们都设置为至少 100M,以避免在磁盘上创建临时 table。
因为您正在使用分组依据,所以如果您增加 sort_buffer_size 变量会有所帮助。可以设置2M.
join_buffer_size太高了应该接近2M左右可以设置max。 8M 但不是 512M,因为它使用会话明智所以占用你所有的内存。
你也设置了 query_cache_size 太高了 512M,所以从这里释放内存,你也可以通过 mysqltuner 报告检查你实际上是否从缓存查询中获益,如果没有那么你可以禁用它。
也许您可以使用物化视图来存储查询结果并定期刷新(每月?15 天?)
这不会优化您的查询,但您的咨询会更快(不会再次计算计数)