我怎样才能使以下查询在许多条件下的查找中更有效
how can i make the following query more efficient in lookup with many conditions
首先,我需要通过使用几个 where 条件限制以下查询查找中的结果来实现分页功能。
SELECT SQL_CALC_FOUND_ROWS
a.uid, b.NAME
FROM
`profiles` AS a FORCE INDEX(profiles_country_city_gender_index)
JOIN `users` AS b
ON b.id = a.uid
AND a.country = 'INDONESIA'
AND a.gender = 0
JOIN (
SELECT
a.uid
FROM
profile_details AS a
JOIN profile_details AS kids ON kids.uid = a.uid
AND kids.kids_pref = 1
JOIN profile_details AS current ON current.uid = a.uid
AND current.current_relationship = 1
JOIN profile_details AS smoking ON smoking.uid = a.uid
AND smoking.smoking_pref = 1
) AS e ON e.uid = a.uid
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
LIMIT 33;
这里的所有 table 都是与 table 用户的一对一关系
- 个人资料
- Profile_details
使用 id 列作为 Users 中的主键,其他 tables 中的 uid 作为外键。
一开始,我对上面的 query/design 没有问题,直到 table 增长到其中的 300K 行,查询 运行 需要 OK, Time: 0.726000s
来获取结果对我来说太慢了。
我尝试使用 count(*) 根据上述条件计算行数并得到一些大致相同的结果,
我需要有更快的方法从查找条件中获取行数,以使分页系统按预期工作,等待时间更短。
如您在查询中所见,我正在使用:
FORCE INDEX(profiles_country_city_gender_index)
我认为使用范围产生较大的行并没有太大帮助:
AND a.country = 'INDONESIA'
AND a.gender = 0
结果(按国家/地区划分的 148801 行范围限制,性别等于 0),如果我与城市配对,这不是问题查询时间很长,因为行结果要小得多,但当行较大时仍然会成为问题总有一天。
对于可能要求查询解释的任何人:
Explain SELECT SQL_CALC_FOUND_ROWS
a.uid,
b.NAME ...
Results:
| select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-------------+---------+--------+------------------------------------+------------------------------------+---------+------------------+--------+-----------+------------------------------------+
| SIMPLE | a | ref | profiles_country_city_gender_index | profiles_country_city_gender_index | 242 | const | 148801 | 10.00 | Using index condition; Using where |
| SIMPLE | a | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 100.00.00 | Using index |
| SIMPLE | kids | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | current | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | smoking | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 3 | restfulapi.a.uid | 1 | 100.00.00 | |
如您在解释结果中所见,没有 table 扫描或使用临时或使用范围,只有索引条件。
我想如果 tables 按国家/地区范围至少有 100 万行 returns,只需将缩放行的时间乘以 300K 就糟透了:(.
下面是table的定义,希望对分析问题有帮助:
CREATE TABLE `profile_details` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`intents` tinyint(4) NOT NULL DEFAULT '3',
`height` smallint(6) DEFAULT NULL,
`body_type` tinyint(4) NOT NULL DEFAULT '5',
`kids_pref` tinyint(4) NOT NULL DEFAULT '1',
`drinking_pref` tinyint(4) NOT NULL DEFAULT '2',
`living_with` tinyint(4) NOT NULL DEFAULT '0',
`current_relationship` tinyint(4) NOT NULL DEFAULT '1',
`sexual_pref` tinyint(4) NOT NULL DEFAULT '1',
`smoking_pref` tinyint(4) NOT NULL DEFAULT '0',
`status_online` tinyint(4) NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `profile_details_uid_foreign` (`uid`),
KEY `idx_multipart` (`intents`,`body_type`,`kids_pref`,`drinking_pref`,`living_with`,`current_relationship`,`sexual_pref`,`smoking_pref`),
CONSTRAINT `profile_details_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
CREATE TABLE `profiles` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`birth_date` date NOT NULL,
`gender` tinyint(4) NOT NULL DEFAULT '0',
`country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
`city` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT 'Makassar',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`latitude` double NOT NULL DEFAULT '0',
`longitude` double NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `profiles_uid_foreign` (`uid`),
KEY `profiles_birth_date_index` (`birth_date`),
KEY `profiles_latitude_longitude_index` (`latitude`,`longitude`),
KEY `profiles_country_city_gender_index` (`country`,`city`,`gender`),
KEY `idx_country_gender_birthdate` (`country`,`gender`,`birth_date`),
KEY `idx_country_city_gender_birthdate` (`country`,`city`,`gender`,`birth_date`),
CONSTRAINT `profiles_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
如何找到解决方案,是否需要重新设计 table 以获得理想的系统?也许这是最后的选择。
编辑
我正在尝试您之前的建议,首先我在三列中添加了一个索引:
CREATE INDEX profiles_country_gender_birth_date_index on `profiles`(country,gender,birth_date);
并且我尝试 select Count(*) 而无需加入 profile_detail:
SELECT
count(*)
FROM
`profiles` AS a
FORCE INDEX ( profiles_country_gender_birth_date_index )
JOIN `users` AS b ON b.id = a.uid
and
a.country = 'INDONESIA'
AND a.gender =1
AND a.birth_date BETWEEN NOW()- INTERVAL 35 YEAR
AND NOW()- INTERVAL 25 YEAR
结果时间不是 stable 在 0.7 秒到 0.35 秒之间,我不知道为什么会这样。
以下是 Json 格式的解释查询计划以防万一,以帮助找出罪魁祸首。
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "114747.38"
},
"nested_loop": [
{
"table": {
"table_name": "a",
"access_type": "range",
"possible_keys": [
"profiles_country_gender_birth_date_index"
],
"key": "profiles_country_gender_birth_date_index",
"used_key_parts": [
"country",
"gender",
"birth_date"
],
"key_length": "246",
"rows_examined_per_scan": 94066,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"index_condition": "((`restfulapi`.`a`.`gender` = 1) and (`restfulapi`.`a`.`country` = 'INDONESIA') and (`restfulapi`.`a`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))))",
"cost_info": {
"read_cost": "15858.00",
"eval_cost": "6592.23",
"prefix_cost": "75194.00",
"data_read_per_join": "16M"
},
"used_columns": [
"uid",
"birth_date",
"gender",
"country"
]
}
},
{
"table": {
"table_name": "b",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "3",
"ref": [
"restfulapi.a.uid"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "32961.15",
"eval_cost": "6592.23",
"prefix_cost": "114747.38",
"data_read_per_join": "89M"
},
"used_columns": [
"id"
]
}
}
]
}
}
INDEX(country, gender, birth_date) -- in this order
并将 birth_date
的用法更改为“sargeable”:
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
至
AND a.birth_date BETWEEN NOW() - INTERVAL 35 YEAR
AND NOW() - INTERVAL 25 YEAR
以便优化器可以使用birth_date
。
LIMIT 33
-- 你关心哪33行?也许你需要 ORDER BY
?
当计划 JOIN profile_details ...
可行时,不要执行 JOIN ( SELECT ... profile_details ... )
。
SQL_CALC_FOUND_ROWS
要花点钱。移除它,看看它运行的速度有多快,然后再决定它是否值得保留。
我认为您不需要多次 JOIN profile_details
,尤其是 1:1 和 profiles
。
我的意思是:
而不是 JOIN ( SELECT ... )
只有
JOIN profile_details AS d USING(uid)
然后将这些添加到 WHERE 子句中:
AND d.kids_pref = 1
AND d.current_relationship = 1
AND d.smoking_pref = 1
避免文件排序
INDEX(country, gender, -- Tested with '='
birth_date, -- Tested as a "range"
uid) -- For the ORDER BY -- Useless!
构建索引时,按此顺序包含列
- 所有列测试为“列 = 常量”。
- 一个范围(例如
BETWEEN
)。如果这与 ORDER BY
相同,则可能会避免使用“filesort”。
如果 WHERE
中没有“范围”,则
- 所有列测试为“列 = 常量”。
ORDER BY
列——假设它们都是 DESC
或都是 ASC
(或者,在 MySQL 8.0 中,匹配 INDEX
定义) .这可能会避免“文件排序”。
但是索引不能同时处理“范围”和不同的“排序依据”。考虑以下。您有一个包含姓氏和名字的人员列表。查询是
SELECT ...
WHERE last_name LIKE 'Ja%' -- a "range"
ORDER BY first_name;
INDEX(last_name, first_name)
将有助于 WHERE
,但会使 first_names 混乱。反之亦然。
(这是一种简化,有关更多详细信息,请参阅 http://mysql.rjweb.org/doc.php/index_cookbook_mysql。)
首先,我需要通过使用几个 where 条件限制以下查询查找中的结果来实现分页功能。
SELECT SQL_CALC_FOUND_ROWS
a.uid, b.NAME
FROM
`profiles` AS a FORCE INDEX(profiles_country_city_gender_index)
JOIN `users` AS b
ON b.id = a.uid
AND a.country = 'INDONESIA'
AND a.gender = 0
JOIN (
SELECT
a.uid
FROM
profile_details AS a
JOIN profile_details AS kids ON kids.uid = a.uid
AND kids.kids_pref = 1
JOIN profile_details AS current ON current.uid = a.uid
AND current.current_relationship = 1
JOIN profile_details AS smoking ON smoking.uid = a.uid
AND smoking.smoking_pref = 1
) AS e ON e.uid = a.uid
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
LIMIT 33;
这里的所有 table 都是与 table 用户的一对一关系
- 个人资料
- Profile_details
使用 id 列作为 Users 中的主键,其他 tables 中的 uid 作为外键。
一开始,我对上面的 query/design 没有问题,直到 table 增长到其中的 300K 行,查询 运行 需要 OK, Time: 0.726000s
来获取结果对我来说太慢了。
我尝试使用 count(*) 根据上述条件计算行数并得到一些大致相同的结果, 我需要有更快的方法从查找条件中获取行数,以使分页系统按预期工作,等待时间更短。
如您在查询中所见,我正在使用:
FORCE INDEX(profiles_country_city_gender_index)
我认为使用范围产生较大的行并没有太大帮助:
AND a.country = 'INDONESIA'
AND a.gender = 0
结果(按国家/地区划分的 148801 行范围限制,性别等于 0),如果我与城市配对,这不是问题查询时间很长,因为行结果要小得多,但当行较大时仍然会成为问题总有一天。
对于可能要求查询解释的任何人:
Explain SELECT SQL_CALC_FOUND_ROWS
a.uid,
b.NAME ...
Results:
| select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-------------+---------+--------+------------------------------------+------------------------------------+---------+------------------+--------+-----------+------------------------------------+
| SIMPLE | a | ref | profiles_country_city_gender_index | profiles_country_city_gender_index | 242 | const | 148801 | 10.00 | Using index condition; Using where |
| SIMPLE | a | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 100.00.00 | Using index |
| SIMPLE | kids | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | current | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | smoking | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 3 | restfulapi.a.uid | 1 | 100.00.00 | |
如您在解释结果中所见,没有 table 扫描或使用临时或使用范围,只有索引条件。 我想如果 tables 按国家/地区范围至少有 100 万行 returns,只需将缩放行的时间乘以 300K 就糟透了:(.
下面是table的定义,希望对分析问题有帮助:
CREATE TABLE `profile_details` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`intents` tinyint(4) NOT NULL DEFAULT '3',
`height` smallint(6) DEFAULT NULL,
`body_type` tinyint(4) NOT NULL DEFAULT '5',
`kids_pref` tinyint(4) NOT NULL DEFAULT '1',
`drinking_pref` tinyint(4) NOT NULL DEFAULT '2',
`living_with` tinyint(4) NOT NULL DEFAULT '0',
`current_relationship` tinyint(4) NOT NULL DEFAULT '1',
`sexual_pref` tinyint(4) NOT NULL DEFAULT '1',
`smoking_pref` tinyint(4) NOT NULL DEFAULT '0',
`status_online` tinyint(4) NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `profile_details_uid_foreign` (`uid`),
KEY `idx_multipart` (`intents`,`body_type`,`kids_pref`,`drinking_pref`,`living_with`,`current_relationship`,`sexual_pref`,`smoking_pref`),
CONSTRAINT `profile_details_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
CREATE TABLE `profiles` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`birth_date` date NOT NULL,
`gender` tinyint(4) NOT NULL DEFAULT '0',
`country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
`city` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT 'Makassar',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`latitude` double NOT NULL DEFAULT '0',
`longitude` double NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `profiles_uid_foreign` (`uid`),
KEY `profiles_birth_date_index` (`birth_date`),
KEY `profiles_latitude_longitude_index` (`latitude`,`longitude`),
KEY `profiles_country_city_gender_index` (`country`,`city`,`gender`),
KEY `idx_country_gender_birthdate` (`country`,`gender`,`birth_date`),
KEY `idx_country_city_gender_birthdate` (`country`,`city`,`gender`,`birth_date`),
CONSTRAINT `profiles_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
如何找到解决方案,是否需要重新设计 table 以获得理想的系统?也许这是最后的选择。
编辑
我正在尝试您之前的建议,首先我在三列中添加了一个索引:
CREATE INDEX profiles_country_gender_birth_date_index on `profiles`(country,gender,birth_date);
并且我尝试 select Count(*) 而无需加入 profile_detail:
SELECT
count(*)
FROM
`profiles` AS a
FORCE INDEX ( profiles_country_gender_birth_date_index )
JOIN `users` AS b ON b.id = a.uid
and
a.country = 'INDONESIA'
AND a.gender =1
AND a.birth_date BETWEEN NOW()- INTERVAL 35 YEAR
AND NOW()- INTERVAL 25 YEAR
结果时间不是 stable 在 0.7 秒到 0.35 秒之间,我不知道为什么会这样。 以下是 Json 格式的解释查询计划以防万一,以帮助找出罪魁祸首。
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "114747.38"
},
"nested_loop": [
{
"table": {
"table_name": "a",
"access_type": "range",
"possible_keys": [
"profiles_country_gender_birth_date_index"
],
"key": "profiles_country_gender_birth_date_index",
"used_key_parts": [
"country",
"gender",
"birth_date"
],
"key_length": "246",
"rows_examined_per_scan": 94066,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"index_condition": "((`restfulapi`.`a`.`gender` = 1) and (`restfulapi`.`a`.`country` = 'INDONESIA') and (`restfulapi`.`a`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))))",
"cost_info": {
"read_cost": "15858.00",
"eval_cost": "6592.23",
"prefix_cost": "75194.00",
"data_read_per_join": "16M"
},
"used_columns": [
"uid",
"birth_date",
"gender",
"country"
]
}
},
{
"table": {
"table_name": "b",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "3",
"ref": [
"restfulapi.a.uid"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "32961.15",
"eval_cost": "6592.23",
"prefix_cost": "114747.38",
"data_read_per_join": "89M"
},
"used_columns": [
"id"
]
}
}
]
}
}
INDEX(country, gender, birth_date) -- in this order
并将 birth_date
的用法更改为“sargeable”:
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
至
AND a.birth_date BETWEEN NOW() - INTERVAL 35 YEAR
AND NOW() - INTERVAL 25 YEAR
以便优化器可以使用birth_date
。
LIMIT 33
-- 你关心哪33行?也许你需要 ORDER BY
?
当计划 JOIN profile_details ...
可行时,不要执行 JOIN ( SELECT ... profile_details ... )
。
SQL_CALC_FOUND_ROWS
要花点钱。移除它,看看它运行的速度有多快,然后再决定它是否值得保留。
我认为您不需要多次 JOIN profile_details
,尤其是 1:1 和 profiles
。
我的意思是:
而不是 JOIN ( SELECT ... )
只有
JOIN profile_details AS d USING(uid)
然后将这些添加到 WHERE 子句中:
AND d.kids_pref = 1
AND d.current_relationship = 1
AND d.smoking_pref = 1
避免文件排序
INDEX(country, gender, -- Tested with '='
birth_date, -- Tested as a "range"
uid) -- For the ORDER BY -- Useless!
构建索引时,按此顺序包含列
- 所有列测试为“列 = 常量”。
- 一个范围(例如
BETWEEN
)。如果这与ORDER BY
相同,则可能会避免使用“filesort”。
如果 WHERE
中没有“范围”,则
- 所有列测试为“列 = 常量”。
ORDER BY
列——假设它们都是DESC
或都是ASC
(或者,在 MySQL 8.0 中,匹配INDEX
定义) .这可能会避免“文件排序”。
但是索引不能同时处理“范围”和不同的“排序依据”。考虑以下。您有一个包含姓氏和名字的人员列表。查询是
SELECT ...
WHERE last_name LIKE 'Ja%' -- a "range"
ORDER BY first_name;
INDEX(last_name, first_name)
将有助于 WHERE
,但会使 first_names 混乱。反之亦然。
(这是一种简化,有关更多详细信息,请参阅 http://mysql.rjweb.org/doc.php/index_cookbook_mysql。)