使用枢轴 table 关系提高大型数据集的性能(使用 Laravel)

Increase performance on large dataset with pivot table relationships (Using Laravel)

正在寻求关于我是否可以改进我的数据库语句或者我是否应该开始缓存查询结果以提高性能的建议。

架构设置为 Many-To-Many Polymorphic 关系。我有一个包含视频信息的 Videos table,一个包含所有类别的 Category table 和一个包含枢轴信息。

VideosCategorizable 之间的比率约为 1:4。 (即每个视频至少有 4 个以上的类别)。

访问具有 40 行限制和 WITHOUT 偏移量的数据透视表时的结果是:~1.2s+。 当偏移量 > 50,000 行时,添加偏移量会增加更多。

虽然 1.2 秒看起来很小,但这只是整个数据集的一小部分,最终包含大约 3000 万条视频记录(因此有约 12+ 百万条可分类记录)。我担心 1.2s 会乘以每百万条记录。

数据库架构

视频 table:

------------------------------------------------------------------------
id     | title                      | author    | views | duration | etc.
------------------------------------------------------------------------
1      | What's the biggest word?   | Dictonary | 3432  | 600      | ...
2      | Yearly Videos Roundup 2020 | YouTube   | 165   | 945      | ...
3      | Google SEO Help            | Google    | 1401  | 287      | ...
↓      
101234 | How to cook pasta          | YouTube   | 9401  | 87       | ...

索引:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table  | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
videos | 0          | PRIMARY          | 1            | id          | A         | 253057      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
videos | 1          | idx_videos_views | 1            | views       | A         | 102188      | NULL     | NULL   |YES   | BTREE      |         |               | YES     | NULL

可分类table:

-------------------------------------------------------------
id      | category_id | cateogrizable_id | categorizable_type
-------------------------------------------------------------
1       | 5           |  1               | 'Video'
2       | 100         |  2               | 'Video'
3       | 31          |  3               | 'Video'
↓
299052  | 65          |  101234          | 'Video'

索引:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table          | Non_unique | Key_name             | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categorizables | 0          | PRIMARY              | 1            | id               | A         | 296745      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
categorizables | 1          | idx_category_id      | 1            | category_id      | A         | 82          | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
categorizables | 1          | idx_categorizable_id | 1            | categorizable_id | A         | 104705      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL

类别table:

--------------------
id  | name 
--------------------
1   | Education
2   | Health
3   | Entertainment
↓
100 | News

索引:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table       | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categories  |  0         | PRIMARY  |  1           |  id         |  A        |  100        | NULL     | NULL   |      |  BTREE     |         |               |  YES    | NULL

MySQL

类型:InnoDB

Laravel 查询:

Category::where('id', $cat)
  ->with(['videos' => function($query){ 
    return $query->take(40)->orderby('views'); 
   }])
   ->get();

变成MySQL查询:

SELECT `videos`.`id`, `views` 
FROM `videos` inner join `categorizables` 
ON `videos`.`id` = `categorizables`.`categorizable_id`
WHERE `categorizables`.`category_id` = 1 
ORDER BY `views` desc 
LIMIT 40 offset 0

性能结果

以下是 MySQL

的性能输出
---------------------------------------------------------
Stage                                          | Duration
---------------------------------------------------------
stage/sql/starting                             | 0.000068
stage/sql/Executing hook on transaction begin. | 0.000000
stage/sql/starting                             | 0.000003
stage/sql/checking permissions                 | 0.000001
stage/sql/checking permissions                 | 0.000001
stage/sql/Opening tables                       | 0.000038
stage/sql/init                                 | 0.000003
stage/sql/System lock                          | 0.000005
stage/sql/optimizing                           | 0.000007
stage/sql/statistics                           | 0.005628
stage/sql/preparing                            | 0.000008
stage/sql/Creating tmp table                   | 0.000033
stage/sql/executing                            | 1.273442
stage/sql/end                                  | 0.000001
stage/sql/query end                            | 0.000001
stage/sql/waiting for handler commit           | 0.000008
stage/sql/removing tmp table                   | 0.000003
stage/sql/closing tables                       | 0.000006
stage/sql/freeing items                        | 0.000080
stage/sql/cleaning up                          | 0.000000

具体来说:

stage/sql/executing | 1.273442

查询成本:

----------------------------------
Variable_name     | Value
----------------------------------
Last_query_cost   | 107258.575124

编辑:

解释查询

有排序:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table          | partitions | type   | possible_keys                        | key             | key_len | ref                                     | rows  | filtered | Extra
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | SIMPLE      | categorizables | NULL       | ref    | idx_category_id,idx_categorizable_id | idx_category_id | 4       | const                                   | 51210 | 100.00   | Using temporary; Using filesort
1  | SIMPLE      | videos         | NULL       | eq_ref | PRIMARY                              | PRIMARY         | 4       | dev_db.categorizables.categorizable_id  | 1     | 100.00   | Using index

没有排序:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table          | partitions | type   | possible_keys                        | key             | key_len | ref   | rows  | filtered | Extra
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | SIMPLE      | videos         | NULL       | index  | NULL                                 | PRIMARY         | 4       | NULL  | 40    | 100.00   | Backward index scan; Using index

您可以查看此 Laravel Debugbar 以了解您正在执行多少重复查询。

接下来,您可以在获取数据时预先加载关系,例如:

索引控制器

$videos = Videos::with('categories')->get();
return $videos;

您还可以为 Laravel 使用缓存,例如:

$videos = \Cache::rememberForever('key', function() {
  return Video::with('categories')->get();
});

return $videos;

让我告诉你最坏的情况:

SELECT  v.`id`, v.`views`
    FROM  `videos` AS v
    inner join  `categorizables` AS c  ON v.`id` = c.`categorizable_id`
    WHERE  c.`category_id` = 1
    ORDER BY  v.`views` desc
    LIMIT  40 offset 50000 

流程是这样的:

  1. categorizables 中查找具有 category_id = 1 的所有行。这可能会或可能不会使用索引:INDEX(category_id, categorizable_id) 可能有帮助。
  2. 对于这些行中的每一行,进入 videos 以获得 viewsid。假设 idPRIMARY KEY,我没有添加建议。
  3. 将所有这些东西收集到一个临时文件中 table。 (大概超过 50K 行?)
  4. 对 table.
  5. 进行排序
  6. 通读已排序的 table,跳过 50000 行。
  7. 传送 40 行并退出。

我希望很明显,删除排序或删除偏移量或(等)将导致简化执行计划,从而更快。

你说多对多关系?那是categorizables吗?它是否遵循此处的性能提示:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table ?