使用枢轴 table 关系提高大型数据集的性能(使用 Laravel)
Increase performance on large dataset with pivot table relationships (Using Laravel)
正在寻求关于我是否可以改进我的数据库语句或者我是否应该开始缓存查询结果以提高性能的建议。
架构设置为 Many-To-Many Polymorphic 关系。我有一个包含视频信息的 Videos
table,一个包含所有类别的 Category
table 和一个包含枢轴信息。
Videos
和 Categorizable
之间的比率约为 1:4。 (即每个视频至少有 4 个以上的类别)。
访问具有 40 行限制和 WITHOUT 偏移量的数据透视表时的结果是:~1.2s+。
当偏移量 > 50,000 行时,添加偏移量会增加更多。
虽然 1.2 秒看起来很小,但这只是整个数据集的一小部分,最终包含大约 3000 万条视频记录(因此有约 12+ 百万条可分类记录)。我担心 1.2s 会乘以每百万条记录。
数据库架构
视频 table:
------------------------------------------------------------------------
id | title | author | views | duration | etc.
------------------------------------------------------------------------
1 | What's the biggest word? | Dictonary | 3432 | 600 | ...
2 | Yearly Videos Roundup 2020 | YouTube | 165 | 945 | ...
3 | Google SEO Help | Google | 1401 | 287 | ...
↓
101234 | How to cook pasta | YouTube | 9401 | 87 | ...
索引:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
videos | 0 | PRIMARY | 1 | id | A | 253057 | NULL | NULL | | BTREE | | | YES | NULL
videos | 1 | idx_videos_views | 1 | views | A | 102188 | NULL | NULL |YES | BTREE | | | YES | NULL
可分类table:
-------------------------------------------------------------
id | category_id | cateogrizable_id | categorizable_type
-------------------------------------------------------------
1 | 5 | 1 | 'Video'
2 | 100 | 2 | 'Video'
3 | 31 | 3 | 'Video'
↓
299052 | 65 | 101234 | 'Video'
索引:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categorizables | 0 | PRIMARY | 1 | id | A | 296745 | NULL | NULL | | BTREE | | | YES | NULL
categorizables | 1 | idx_category_id | 1 | category_id | A | 82 | NULL | NULL | | BTREE | | | YES | NULL
categorizables | 1 | idx_categorizable_id | 1 | categorizable_id | A | 104705 | NULL | NULL | | BTREE | | | YES | NULL
类别table:
--------------------
id | name
--------------------
1 | Education
2 | Health
3 | Entertainment
↓
100 | News
索引:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categories | 0 | PRIMARY | 1 | id | A | 100 | NULL | NULL | | BTREE | | | YES | NULL
MySQL
类型:InnoDB
Laravel 查询:
Category::where('id', $cat)
->with(['videos' => function($query){
return $query->take(40)->orderby('views');
}])
->get();
变成MySQL查询:
SELECT `videos`.`id`, `views`
FROM `videos` inner join `categorizables`
ON `videos`.`id` = `categorizables`.`categorizable_id`
WHERE `categorizables`.`category_id` = 1
ORDER BY `views` desc
LIMIT 40 offset 0
性能结果
以下是 MySQL
的性能输出
---------------------------------------------------------
Stage | Duration
---------------------------------------------------------
stage/sql/starting | 0.000068
stage/sql/Executing hook on transaction begin. | 0.000000
stage/sql/starting | 0.000003
stage/sql/checking permissions | 0.000001
stage/sql/checking permissions | 0.000001
stage/sql/Opening tables | 0.000038
stage/sql/init | 0.000003
stage/sql/System lock | 0.000005
stage/sql/optimizing | 0.000007
stage/sql/statistics | 0.005628
stage/sql/preparing | 0.000008
stage/sql/Creating tmp table | 0.000033
stage/sql/executing | 1.273442
stage/sql/end | 0.000001
stage/sql/query end | 0.000001
stage/sql/waiting for handler commit | 0.000008
stage/sql/removing tmp table | 0.000003
stage/sql/closing tables | 0.000006
stage/sql/freeing items | 0.000080
stage/sql/cleaning up | 0.000000
具体来说:
stage/sql/executing | 1.273442
查询成本:
----------------------------------
Variable_name | Value
----------------------------------
Last_query_cost | 107258.575124
编辑:
解释查询
有排序:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | categorizables | NULL | ref | idx_category_id,idx_categorizable_id | idx_category_id | 4 | const | 51210 | 100.00 | Using temporary; Using filesort
1 | SIMPLE | videos | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dev_db.categorizables.categorizable_id | 1 | 100.00 | Using index
没有排序:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | videos | NULL | index | NULL | PRIMARY | 4 | NULL | 40 | 100.00 | Backward index scan; Using index
您可以查看此 Laravel Debugbar 以了解您正在执行多少重复查询。
接下来,您可以在获取数据时预先加载关系,例如:
索引控制器
$videos = Videos::with('categories')->get();
return $videos;
您还可以为 Laravel 使用缓存,例如:
$videos = \Cache::rememberForever('key', function() {
return Video::with('categories')->get();
});
return $videos;
让我告诉你最坏的情况:
SELECT v.`id`, v.`views`
FROM `videos` AS v
inner join `categorizables` AS c ON v.`id` = c.`categorizable_id`
WHERE c.`category_id` = 1
ORDER BY v.`views` desc
LIMIT 40 offset 50000
流程是这样的:
- 在
categorizables
中查找具有 category_id = 1
的所有行。这可能会或可能不会使用索引:INDEX(category_id, categorizable_id)
可能有帮助。
- 对于这些行中的每一行,进入
videos
以获得 views
和 id
。假设 id
是 PRIMARY KEY
,我没有添加建议。
- 将所有这些东西收集到一个临时文件中 table。 (大概超过 50K 行?)
- 对 table.
进行排序
- 通读已排序的 table,跳过 50000 行。
- 传送 40 行并退出。
我希望很明显,删除排序或删除偏移量或(等)将导致简化执行计划,从而更快。
你说多对多关系?那是categorizables
吗?它是否遵循此处的性能提示:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table ?
正在寻求关于我是否可以改进我的数据库语句或者我是否应该开始缓存查询结果以提高性能的建议。
架构设置为 Many-To-Many Polymorphic 关系。我有一个包含视频信息的 Videos
table,一个包含所有类别的 Category
table 和一个包含枢轴信息。
Videos
和 Categorizable
之间的比率约为 1:4。 (即每个视频至少有 4 个以上的类别)。
访问具有 40 行限制和 WITHOUT 偏移量的数据透视表时的结果是:~1.2s+。 当偏移量 > 50,000 行时,添加偏移量会增加更多。
虽然 1.2 秒看起来很小,但这只是整个数据集的一小部分,最终包含大约 3000 万条视频记录(因此有约 12+ 百万条可分类记录)。我担心 1.2s 会乘以每百万条记录。
数据库架构
视频 table:
------------------------------------------------------------------------ id | title | author | views | duration | etc. ------------------------------------------------------------------------ 1 | What's the biggest word? | Dictonary | 3432 | 600 | ... 2 | Yearly Videos Roundup 2020 | YouTube | 165 | 945 | ... 3 | Google SEO Help | Google | 1401 | 287 | ... ↓ 101234 | How to cook pasta | YouTube | 9401 | 87 | ...
索引:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- videos | 0 | PRIMARY | 1 | id | A | 253057 | NULL | NULL | | BTREE | | | YES | NULL videos | 1 | idx_videos_views | 1 | views | A | 102188 | NULL | NULL |YES | BTREE | | | YES | NULL
可分类table:
------------------------------------------------------------- id | category_id | cateogrizable_id | categorizable_type ------------------------------------------------------------- 1 | 5 | 1 | 'Video' 2 | 100 | 2 | 'Video' 3 | 31 | 3 | 'Video' ↓ 299052 | 65 | 101234 | 'Video'
索引:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- categorizables | 0 | PRIMARY | 1 | id | A | 296745 | NULL | NULL | | BTREE | | | YES | NULL categorizables | 1 | idx_category_id | 1 | category_id | A | 82 | NULL | NULL | | BTREE | | | YES | NULL categorizables | 1 | idx_categorizable_id | 1 | categorizable_id | A | 104705 | NULL | NULL | | BTREE | | | YES | NULL
类别table:
-------------------- id | name -------------------- 1 | Education 2 | Health 3 | Entertainment ↓ 100 | News
索引:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- categories | 0 | PRIMARY | 1 | id | A | 100 | NULL | NULL | | BTREE | | | YES | NULL
MySQL
类型:InnoDB
Laravel 查询:
Category::where('id', $cat)
->with(['videos' => function($query){
return $query->take(40)->orderby('views');
}])
->get();
变成MySQL查询:
SELECT `videos`.`id`, `views`
FROM `videos` inner join `categorizables`
ON `videos`.`id` = `categorizables`.`categorizable_id`
WHERE `categorizables`.`category_id` = 1
ORDER BY `views` desc
LIMIT 40 offset 0
性能结果
以下是 MySQL
的性能输出--------------------------------------------------------- Stage | Duration --------------------------------------------------------- stage/sql/starting | 0.000068 stage/sql/Executing hook on transaction begin. | 0.000000 stage/sql/starting | 0.000003 stage/sql/checking permissions | 0.000001 stage/sql/checking permissions | 0.000001 stage/sql/Opening tables | 0.000038 stage/sql/init | 0.000003 stage/sql/System lock | 0.000005 stage/sql/optimizing | 0.000007 stage/sql/statistics | 0.005628 stage/sql/preparing | 0.000008 stage/sql/Creating tmp table | 0.000033 stage/sql/executing | 1.273442 stage/sql/end | 0.000001 stage/sql/query end | 0.000001 stage/sql/waiting for handler commit | 0.000008 stage/sql/removing tmp table | 0.000003 stage/sql/closing tables | 0.000006 stage/sql/freeing items | 0.000080 stage/sql/cleaning up | 0.000000
具体来说:
stage/sql/executing | 1.273442
查询成本:
---------------------------------- Variable_name | Value ---------------------------------- Last_query_cost | 107258.575124
编辑:
解释查询
有排序:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | SIMPLE | categorizables | NULL | ref | idx_category_id,idx_categorizable_id | idx_category_id | 4 | const | 51210 | 100.00 | Using temporary; Using filesort 1 | SIMPLE | videos | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dev_db.categorizables.categorizable_id | 1 | 100.00 | Using index
没有排序:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | SIMPLE | videos | NULL | index | NULL | PRIMARY | 4 | NULL | 40 | 100.00 | Backward index scan; Using index
您可以查看此 Laravel Debugbar 以了解您正在执行多少重复查询。
接下来,您可以在获取数据时预先加载关系,例如:
索引控制器
$videos = Videos::with('categories')->get();
return $videos;
您还可以为 Laravel 使用缓存,例如:
$videos = \Cache::rememberForever('key', function() {
return Video::with('categories')->get();
});
return $videos;
让我告诉你最坏的情况:
SELECT v.`id`, v.`views`
FROM `videos` AS v
inner join `categorizables` AS c ON v.`id` = c.`categorizable_id`
WHERE c.`category_id` = 1
ORDER BY v.`views` desc
LIMIT 40 offset 50000
流程是这样的:
- 在
categorizables
中查找具有category_id = 1
的所有行。这可能会或可能不会使用索引:INDEX(category_id, categorizable_id)
可能有帮助。 - 对于这些行中的每一行,进入
videos
以获得views
和id
。假设id
是PRIMARY KEY
,我没有添加建议。 - 将所有这些东西收集到一个临时文件中 table。 (大概超过 50K 行?)
- 对 table. 进行排序
- 通读已排序的 table,跳过 50000 行。
- 传送 40 行并退出。
我希望很明显,删除排序或删除偏移量或(等)将导致简化执行计划,从而更快。
你说多对多关系?那是categorizables
吗?它是否遵循此处的性能提示:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table ?