多个相同 table 左连接非常慢

Multiple same table Left Joins very slow

我有两个 table,假设一个用户 table 和一个日期 table。它们看起来像这样:

用户

ID_User | Title | Firstname | Surname | JobNumber
1       | Mr    | Bob       | Smith   | JOB001
2       | Mrs   | Bobbi     | Smythe  | JOB001
...
13000

日期

ID_Date | ID_User | DateType | DateAssigned | JobNumber
1       | 1       | Intent   | 21-Jun-2016  | JOB001
2       | 1       | Reg      | 21-Apr-2017  | JOB001
3       | 1       | Flight   | 21-May-2017  | JOB001
4       | 2       | Intent   | 09-Dec-2016  | JOB001
5       | 2       | Flight   | 01-Jan-2017  | JOB001
...
5000

唯一索引是ID_User+DateType+JobNumber。

可以有任意数量的 DateType。

当我执行如下查询时,需要很长时间。

select
  ID_User,
  Title,
  Firstname,
  Surname,
  JobNumber,
  DI.DateAssigned as Date_Intent,
  DR.DateAssigned as Date_Reg,
  DF.DateAssigned as Date_Flight
from
  User as U
  left join Dates as DI on U.ID_User = DI.ID_User
    and DI.JobNumber = "JOB001"
    and DI.DateType = "Intent"
  left join Dates as DR on U.ID_User = DR.ID_User
    and DR.JobNumber = "JOB001"
    and DR.DateType = "Reg"
  left join Dates as DF on U.ID_User = DF.ID_User
    and DF.JobNumber = "JOB001"
    and DF.DateType = "Flight"
where
  U.JobNumber = "JOB001"
order by
  U.Surname,
  U.Firstname;

每个 JobNumber 只能容纳 300 人,最多可以有 5 种不同的日期类型。

为什么要花这么长时间?我们说了 2 分钟。

还有别的写法吗?

日期 Table:

CREATE TABLE `ATL_V2_Assigned_Dates` (
  `ID_Date` bigint(7) unsigned NOT NULL AUTO_INCREMENT,
  `JobNumber` varchar(10) NOT NULL DEFAULT '',
  `ID_User` bigint(7) unsigned NOT NULL DEFAULT '0',
  `DateAssigned` datetime NOT NULL,
  `DateType` varchar(100) NOT NULL,
  `Comment` text NOT NULL,
  `Updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `Inserted` datetime NOT NULL,
  PRIMARY KEY (`ID_Date`),
  UNIQUE KEY `ID_Date` (`ID_Date`) USING BTREE,
  UNIQUE KEY `unq_idx` (`JobNumber`,`ID_User`,`DateType`) USING BTREE,
  KEY `JobNumber` (`JobNumber`) USING BTREE,
  KEY `ID_User` (`ID_User`) USING BTREE,
  KEY `DateType` (`DateType`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=3975 DEFAULT CHARSET=utf8;

更新 2017 年 1 月 12 日

很奇怪,查询现在在 0.06 秒后是 运行,这是来自的输出:

explain select
  U.ID_User,
  U.Title,
  U.Firstname,
  U.Surname,
  U.JobNumber,
  DI.DateAssigned as Date_Intent,
  DR.DateAssigned as Date_Reg,
  DF.DateAssigned as Date_Flight
from
  ATL_Users as U
  left join ATL_V2_Assigned_Dates as DI on U.ID_User = DI.ID_User
    and DI.JobNumber = "ACI001"
    and DI.DateType = "Deadline - Intention"
  left join ATL_V2_Assigned_Dates as DR on U.ID_User = DR.ID_User
    and DR.JobNumber = "ACI001"
    and DR.DateType = "Event - Registration"
  left join ATL_V2_Assigned_Dates as DF on U.ID_User = DF.ID_User
    and DF.JobNumber = "ACI001"
    and DF.DateType = "Deadline - Flight"
where
  U.JobNumber = "ACI001"
order by
  U.Surname,
  U.Firstname;

+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys                      | key       | key_len | ref                                | rows | Extra                                              |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
|  1 | SIMPLE      | U     | ref    | JobNumber                          | JobNumber | 32      | const                              |  506 | Using index condition; Using where; Using filesort |
|  1 | SIMPLE      | DI    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
|  1 | SIMPLE      | DR    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
|  1 | SIMPLE      | DF    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+

我不知道 I/we 做了什么,谁能告诉我您认为谁提供了答案,我会打勾。谢谢大家。

您可能缺少合适的索引。尝试:

create index idx_user (jobnumber, id_user);
create index idx_dates (jobnumber, datetype, id_user, dateassigned);

这是加入相同 table 的最佳方式,不确定花费的时间。即使查询 30,000 条记录,也不会花费 2 分钟。这一定是由于其他一些问题,例如与数据库的多个连接。

您可以尝试条件聚合来避免所有这些连接 鉴于

drop table if exists Userjobs;
create table userjobs (ID_User int, Title varchar(10), Firstname varchar(10), Surname varchar(10), JobNumber varchar(10));
insert into userjobs values
(1       , 'Mr'   ,  'Bob'  ,      'Smith'  ,  'JOB001'),
(2       , 'Mrs'  ,  'Bobbi',      'Smythe' ,  'JOB001');


drop table if exists jobDates;
create table jobdates(ID_Date int, ID_User int, DateType varchar(10), DateAssigned date, JobNumber varchar(10));
insert into jobdates values
(1       , 1       , 'Intent'   , '2016-06-21'  , 'JOB001'),
(2       , 1       , 'Reg'      , '2017-04-21'  , 'JOB001'),
(3       , 1       , 'Flight'   , '2017-05-21'  , 'JOB001'),
(4       , 2       , 'Intent'   , '2016-12-09'  , 'JOB001'),
(5       , 2       , 'Flight'   , '2017-01-01'  , 'JOB001');

MariaDB [sandbox]> select
    ->   u.ID_User,
    ->   Title,
    ->   Firstname,
    ->   Surname,
    ->   u.JobNumber,
    ->   max(case when datetype = 'intent' then dateassigned else null end) as intent,
    ->   max(case when datetype = 'reg' then dateassigned else null end) reg,
    ->   max(case when datetype = 'flight' then dateassigned else null end) as flight
    -> from
    ->   Userjobs as U
    -> left join jobDates as jd on U.ID_User = jd.ID_User
    ->     and jd.JobNumber = u.jobnumber
    -> where u.jobnumber = 'JOB001'
    -> group by   u.ID_User,
    ->   Title,
    ->   Firstname,
    ->   Surname,
    ->   u.JobNumber;
+---------+-------+-----------+---------+-----------+------------+------------+------------+
| ID_User | Title | Firstname | Surname | JobNumber | intent     | reg        | flight     |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
|       1 | Mr    | Bob       | Smith   | JOB001    | 2016-06-21 | 2017-04-21 | 2017-05-21 |
|       2 | Mrs   | Bobbi     | Smythe  | JOB001    | 2016-12-09 | NULL       | 2017-01-01 |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
2 rows in set (0.00 sec)

U 需要 INDEX(JobNumber, Surname, Firstname)。那应该涵盖 WHEREORDER BY,从而避免 'filesort'.

对于 Dates,你有 UNIQUE(ID_User, DateType, JobNumber),对吗?让我们从 table 中去掉 id,然后用

替换 UNIQUE
PRIMARY KEY(JobNumber, ID_User, DateType)

这将使查找更有效,因为 BTree 的底部将包含 DateAssigned 所需的三行将相邻,因为 "clustering"的PK.

除非您有一些其他查询(读取或修改)涉及 Dates,否则 table 上应该没有其他索引。

这些 table 有多大?你意识到你将完全阅读它们。然而,我的建议将导致每一行只阅读一次,而不是多次。