如何通过 mysql 中的联接按性能提高顺序
How to improve order by performance with joins in mysql
我正在开发社交网络跟踪应用程序。即使加入适当的索引也能正常工作。但是当我添加 order by 子句时,总查询需要 100 倍的时间来执行。下面的查询我曾经在没有 order by 子句的情况下获取 twitter_users。
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
LIMIT 20
显示第 0 - 19 行(共 20 行,查询耗时 0.0714 秒)
但是当我添加 order by 子句时(在索引列上)
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
LIMIT 20
显示第 0 - 19 行(共 20 行,查询耗时 13.4636 秒)
解释
当我单独在 table 中实现 order by 子句时,它不会花费太多时间
SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20
显示第 0 - 19 行(总共 20 行,查询耗时 0.0711 秒)[followers_count:68236387 - 10525612]
table创建查询如下
CREATE TABLE IF NOT EXISTS `tracked_twitter` (
`id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` text COLLATE utf8_unicode_ci,
`profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`followers_count` int(11) NOT NULL,
`is_influencer` tinyint(1) NOT NULL DEFAULT '0',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci
DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `followers_count` (`followers_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
所以当我在它的 table 上执行它时,join 运行良好,并没有减慢查询和顺序。那么如何提高性能呢?
更新 1
@GordonLinoff 方法解决了如果我只需要父 table 的结果集。我想知道每个人的推文数量(与 tracked_twitter table 匹配的 twitter_content 的计数)。我该如何修改它?如果我想在推文内容上使用数学函数,我该怎么做??
SELECT `tracked_twitter` . * , COUNT( * ) AS twitterContentCount, retweet_count + favourite_count + reply_count AS engagement
FROM `tracked_twitter`
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE `is_influencer` != '1'
AND `tracker_twitter_content`.`tracker_id` = '88'
AND `tracked_twitter_id` != '0'
GROUP BY `tracked_twitter`.`id`
ORDER BY twitterContentCount DESC
LIMIT 20
OFFSET 0
尝试摆脱 distinct
。那是一个性能杀手。我不确定为什么您的第一个查询运行得很快;也许 MySQL 足够聪明,可以优化它。
我会尝试:
SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
FROM twitter_content tc INNER JOIN
tracker_twitter_content ttc
ON tc.id = ttc.twitter_content_id
WHERE ttc.tracker_id = 88 AND
tt.id = tc.tracked_twitter_id
)
ORDER BY tt.followers_count DESC ;
对于此版本,您需要索引:tracked_twitter(followers_count, id)
、twitter_content(tracked_twitter_id, id)
和
tracker_twitter_content(twitter_content_id, tracker_id)
.
父级 table 保持在括号内
SELECT DISTINCT `tracked_twitter`.id FROM
(SELECT id,followers_count FROM tracked_twitter ORDER BY followers_count DESC
LIMIT 20) AS tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
主要问题是,即使您的行数相对较少,您也使用 varchar(255) COLLATE utf8_unicode_ci
作为主键(而不是整数),因此作为其他表中的外键。我怀疑 twitter_content.id
也有同样的问题。这会导致大量长字符串比较并为临时表保留大量额外内存。
关于查询本身,是的,它应该是一个遍历 followers_count
索引并检查相关表的条件的查询。这可以按照 Gordon Linoff 的建议或使用索引提示来完成。
我正在开发社交网络跟踪应用程序。即使加入适当的索引也能正常工作。但是当我添加 order by 子句时,总查询需要 100 倍的时间来执行。下面的查询我曾经在没有 order by 子句的情况下获取 twitter_users。
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
LIMIT 20
显示第 0 - 19 行(共 20 行,查询耗时 0.0714 秒)
但是当我添加 order by 子句时(在索引列上)
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
LIMIT 20
显示第 0 - 19 行(共 20 行,查询耗时 13.4636 秒)
解释
当我单独在 table 中实现 order by 子句时,它不会花费太多时间
SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20
显示第 0 - 19 行(总共 20 行,查询耗时 0.0711 秒)[followers_count:68236387 - 10525612]
table创建查询如下
CREATE TABLE IF NOT EXISTS `tracked_twitter` (
`id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` text COLLATE utf8_unicode_ci,
`profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`followers_count` int(11) NOT NULL,
`is_influencer` tinyint(1) NOT NULL DEFAULT '0',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci
DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `followers_count` (`followers_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
所以当我在它的 table 上执行它时,join 运行良好,并没有减慢查询和顺序。那么如何提高性能呢?
更新 1
@GordonLinoff 方法解决了如果我只需要父 table 的结果集。我想知道每个人的推文数量(与 tracked_twitter table 匹配的 twitter_content 的计数)。我该如何修改它?如果我想在推文内容上使用数学函数,我该怎么做??
SELECT `tracked_twitter` . * , COUNT( * ) AS twitterContentCount, retweet_count + favourite_count + reply_count AS engagement
FROM `tracked_twitter`
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE `is_influencer` != '1'
AND `tracker_twitter_content`.`tracker_id` = '88'
AND `tracked_twitter_id` != '0'
GROUP BY `tracked_twitter`.`id`
ORDER BY twitterContentCount DESC
LIMIT 20
OFFSET 0
尝试摆脱 distinct
。那是一个性能杀手。我不确定为什么您的第一个查询运行得很快;也许 MySQL 足够聪明,可以优化它。
我会尝试:
SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
FROM twitter_content tc INNER JOIN
tracker_twitter_content ttc
ON tc.id = ttc.twitter_content_id
WHERE ttc.tracker_id = 88 AND
tt.id = tc.tracked_twitter_id
)
ORDER BY tt.followers_count DESC ;
对于此版本,您需要索引:tracked_twitter(followers_count, id)
、twitter_content(tracked_twitter_id, id)
和
tracker_twitter_content(twitter_content_id, tracker_id)
.
父级 table 保持在括号内
SELECT DISTINCT `tracked_twitter`.id FROM
(SELECT id,followers_count FROM tracked_twitter ORDER BY followers_count DESC
LIMIT 20) AS tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
主要问题是,即使您的行数相对较少,您也使用 varchar(255) COLLATE utf8_unicode_ci
作为主键(而不是整数),因此作为其他表中的外键。我怀疑 twitter_content.id
也有同样的问题。这会导致大量长字符串比较并为临时表保留大量额外内存。
关于查询本身,是的,它应该是一个遍历 followers_count
索引并检查相关表的条件的查询。这可以按照 Gordon Linoff 的建议或使用索引提示来完成。