多个相同 table 左连接非常慢
Multiple same table Left Joins very slow
我有两个 table,假设一个用户 table 和一个日期 table。它们看起来像这样:
用户
ID_User | Title | Firstname | Surname | JobNumber
1 | Mr | Bob | Smith | JOB001
2 | Mrs | Bobbi | Smythe | JOB001
...
13000
日期
ID_Date | ID_User | DateType | DateAssigned | JobNumber
1 | 1 | Intent | 21-Jun-2016 | JOB001
2 | 1 | Reg | 21-Apr-2017 | JOB001
3 | 1 | Flight | 21-May-2017 | JOB001
4 | 2 | Intent | 09-Dec-2016 | JOB001
5 | 2 | Flight | 01-Jan-2017 | JOB001
...
5000
唯一索引是ID_User+DateType+JobNumber。
可以有任意数量的 DateType。
当我执行如下查询时,需要很长时间。
select
ID_User,
Title,
Firstname,
Surname,
JobNumber,
DI.DateAssigned as Date_Intent,
DR.DateAssigned as Date_Reg,
DF.DateAssigned as Date_Flight
from
User as U
left join Dates as DI on U.ID_User = DI.ID_User
and DI.JobNumber = "JOB001"
and DI.DateType = "Intent"
left join Dates as DR on U.ID_User = DR.ID_User
and DR.JobNumber = "JOB001"
and DR.DateType = "Reg"
left join Dates as DF on U.ID_User = DF.ID_User
and DF.JobNumber = "JOB001"
and DF.DateType = "Flight"
where
U.JobNumber = "JOB001"
order by
U.Surname,
U.Firstname;
每个 JobNumber 只能容纳 300 人,最多可以有 5 种不同的日期类型。
为什么要花这么长时间?我们说了 2 分钟。
还有别的写法吗?
日期 Table:
CREATE TABLE `ATL_V2_Assigned_Dates` (
`ID_Date` bigint(7) unsigned NOT NULL AUTO_INCREMENT,
`JobNumber` varchar(10) NOT NULL DEFAULT '',
`ID_User` bigint(7) unsigned NOT NULL DEFAULT '0',
`DateAssigned` datetime NOT NULL,
`DateType` varchar(100) NOT NULL,
`Comment` text NOT NULL,
`Updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Inserted` datetime NOT NULL,
PRIMARY KEY (`ID_Date`),
UNIQUE KEY `ID_Date` (`ID_Date`) USING BTREE,
UNIQUE KEY `unq_idx` (`JobNumber`,`ID_User`,`DateType`) USING BTREE,
KEY `JobNumber` (`JobNumber`) USING BTREE,
KEY `ID_User` (`ID_User`) USING BTREE,
KEY `DateType` (`DateType`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=3975 DEFAULT CHARSET=utf8;
更新 2017 年 1 月 12 日
很奇怪,查询现在在 0.06 秒后是 运行,这是来自的输出:
explain select
U.ID_User,
U.Title,
U.Firstname,
U.Surname,
U.JobNumber,
DI.DateAssigned as Date_Intent,
DR.DateAssigned as Date_Reg,
DF.DateAssigned as Date_Flight
from
ATL_Users as U
left join ATL_V2_Assigned_Dates as DI on U.ID_User = DI.ID_User
and DI.JobNumber = "ACI001"
and DI.DateType = "Deadline - Intention"
left join ATL_V2_Assigned_Dates as DR on U.ID_User = DR.ID_User
and DR.JobNumber = "ACI001"
and DR.DateType = "Event - Registration"
left join ATL_V2_Assigned_Dates as DF on U.ID_User = DF.ID_User
and DF.JobNumber = "ACI001"
and DF.DateType = "Deadline - Flight"
where
U.JobNumber = "ACI001"
order by
U.Surname,
U.Firstname;
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | U | ref | JobNumber | JobNumber | 32 | const | 506 | Using index condition; Using where; Using filesort |
| 1 | SIMPLE | DI | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
| 1 | SIMPLE | DR | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
| 1 | SIMPLE | DF | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
我不知道 I/we 做了什么,谁能告诉我您认为谁提供了答案,我会打勾。谢谢大家。
您可能缺少合适的索引。尝试:
create index idx_user (jobnumber, id_user);
create index idx_dates (jobnumber, datetype, id_user, dateassigned);
这是加入相同 table 的最佳方式,不确定花费的时间。即使查询 30,000 条记录,也不会花费 2 分钟。这一定是由于其他一些问题,例如与数据库的多个连接。
您可以尝试条件聚合来避免所有这些连接
鉴于
drop table if exists Userjobs;
create table userjobs (ID_User int, Title varchar(10), Firstname varchar(10), Surname varchar(10), JobNumber varchar(10));
insert into userjobs values
(1 , 'Mr' , 'Bob' , 'Smith' , 'JOB001'),
(2 , 'Mrs' , 'Bobbi', 'Smythe' , 'JOB001');
drop table if exists jobDates;
create table jobdates(ID_Date int, ID_User int, DateType varchar(10), DateAssigned date, JobNumber varchar(10));
insert into jobdates values
(1 , 1 , 'Intent' , '2016-06-21' , 'JOB001'),
(2 , 1 , 'Reg' , '2017-04-21' , 'JOB001'),
(3 , 1 , 'Flight' , '2017-05-21' , 'JOB001'),
(4 , 2 , 'Intent' , '2016-12-09' , 'JOB001'),
(5 , 2 , 'Flight' , '2017-01-01' , 'JOB001');
MariaDB [sandbox]> select
-> u.ID_User,
-> Title,
-> Firstname,
-> Surname,
-> u.JobNumber,
-> max(case when datetype = 'intent' then dateassigned else null end) as intent,
-> max(case when datetype = 'reg' then dateassigned else null end) reg,
-> max(case when datetype = 'flight' then dateassigned else null end) as flight
-> from
-> Userjobs as U
-> left join jobDates as jd on U.ID_User = jd.ID_User
-> and jd.JobNumber = u.jobnumber
-> where u.jobnumber = 'JOB001'
-> group by u.ID_User,
-> Title,
-> Firstname,
-> Surname,
-> u.JobNumber;
+---------+-------+-----------+---------+-----------+------------+------------+------------+
| ID_User | Title | Firstname | Surname | JobNumber | intent | reg | flight |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
| 1 | Mr | Bob | Smith | JOB001 | 2016-06-21 | 2017-04-21 | 2017-05-21 |
| 2 | Mrs | Bobbi | Smythe | JOB001 | 2016-12-09 | NULL | 2017-01-01 |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
2 rows in set (0.00 sec)
U
需要 INDEX(JobNumber, Surname, Firstname)
。那应该涵盖 WHERE
和 ORDER BY
,从而避免 'filesort'.
对于 Dates
,你有 UNIQUE(ID_User, DateType, JobNumber)
,对吗?让我们从 table 中去掉 id
,然后用
替换 UNIQUE
PRIMARY KEY(JobNumber, ID_User, DateType)
这将使查找更有效,因为 BTree 的底部将包含 DateAssigned
和 所需的三行将相邻,因为 "clustering"的PK.
除非您有一些其他查询(读取或修改)涉及 Dates
,否则 table 上应该没有其他索引。
这些 table 有多大?你意识到你将完全阅读它们。然而,我的建议将导致每一行只阅读一次,而不是多次。
我有两个 table,假设一个用户 table 和一个日期 table。它们看起来像这样:
用户
ID_User | Title | Firstname | Surname | JobNumber
1 | Mr | Bob | Smith | JOB001
2 | Mrs | Bobbi | Smythe | JOB001
...
13000
日期
ID_Date | ID_User | DateType | DateAssigned | JobNumber
1 | 1 | Intent | 21-Jun-2016 | JOB001
2 | 1 | Reg | 21-Apr-2017 | JOB001
3 | 1 | Flight | 21-May-2017 | JOB001
4 | 2 | Intent | 09-Dec-2016 | JOB001
5 | 2 | Flight | 01-Jan-2017 | JOB001
...
5000
唯一索引是ID_User+DateType+JobNumber。
可以有任意数量的 DateType。
当我执行如下查询时,需要很长时间。
select
ID_User,
Title,
Firstname,
Surname,
JobNumber,
DI.DateAssigned as Date_Intent,
DR.DateAssigned as Date_Reg,
DF.DateAssigned as Date_Flight
from
User as U
left join Dates as DI on U.ID_User = DI.ID_User
and DI.JobNumber = "JOB001"
and DI.DateType = "Intent"
left join Dates as DR on U.ID_User = DR.ID_User
and DR.JobNumber = "JOB001"
and DR.DateType = "Reg"
left join Dates as DF on U.ID_User = DF.ID_User
and DF.JobNumber = "JOB001"
and DF.DateType = "Flight"
where
U.JobNumber = "JOB001"
order by
U.Surname,
U.Firstname;
每个 JobNumber 只能容纳 300 人,最多可以有 5 种不同的日期类型。
为什么要花这么长时间?我们说了 2 分钟。
还有别的写法吗?
日期 Table:
CREATE TABLE `ATL_V2_Assigned_Dates` (
`ID_Date` bigint(7) unsigned NOT NULL AUTO_INCREMENT,
`JobNumber` varchar(10) NOT NULL DEFAULT '',
`ID_User` bigint(7) unsigned NOT NULL DEFAULT '0',
`DateAssigned` datetime NOT NULL,
`DateType` varchar(100) NOT NULL,
`Comment` text NOT NULL,
`Updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Inserted` datetime NOT NULL,
PRIMARY KEY (`ID_Date`),
UNIQUE KEY `ID_Date` (`ID_Date`) USING BTREE,
UNIQUE KEY `unq_idx` (`JobNumber`,`ID_User`,`DateType`) USING BTREE,
KEY `JobNumber` (`JobNumber`) USING BTREE,
KEY `ID_User` (`ID_User`) USING BTREE,
KEY `DateType` (`DateType`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=3975 DEFAULT CHARSET=utf8;
更新 2017 年 1 月 12 日
很奇怪,查询现在在 0.06 秒后是 运行,这是来自的输出:
explain select
U.ID_User,
U.Title,
U.Firstname,
U.Surname,
U.JobNumber,
DI.DateAssigned as Date_Intent,
DR.DateAssigned as Date_Reg,
DF.DateAssigned as Date_Flight
from
ATL_Users as U
left join ATL_V2_Assigned_Dates as DI on U.ID_User = DI.ID_User
and DI.JobNumber = "ACI001"
and DI.DateType = "Deadline - Intention"
left join ATL_V2_Assigned_Dates as DR on U.ID_User = DR.ID_User
and DR.JobNumber = "ACI001"
and DR.DateType = "Event - Registration"
left join ATL_V2_Assigned_Dates as DF on U.ID_User = DF.ID_User
and DF.JobNumber = "ACI001"
and DF.DateType = "Deadline - Flight"
where
U.JobNumber = "ACI001"
order by
U.Surname,
U.Firstname;
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | U | ref | JobNumber | JobNumber | 32 | const | 506 | Using index condition; Using where; Using filesort |
| 1 | SIMPLE | DI | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
| 1 | SIMPLE | DR | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
| 1 | SIMPLE | DF | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx | 342 | const,cclliveo_atl.U.ID_User,const | 1 | Using where |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
我不知道 I/we 做了什么,谁能告诉我您认为谁提供了答案,我会打勾。谢谢大家。
您可能缺少合适的索引。尝试:
create index idx_user (jobnumber, id_user);
create index idx_dates (jobnumber, datetype, id_user, dateassigned);
这是加入相同 table 的最佳方式,不确定花费的时间。即使查询 30,000 条记录,也不会花费 2 分钟。这一定是由于其他一些问题,例如与数据库的多个连接。
您可以尝试条件聚合来避免所有这些连接 鉴于
drop table if exists Userjobs;
create table userjobs (ID_User int, Title varchar(10), Firstname varchar(10), Surname varchar(10), JobNumber varchar(10));
insert into userjobs values
(1 , 'Mr' , 'Bob' , 'Smith' , 'JOB001'),
(2 , 'Mrs' , 'Bobbi', 'Smythe' , 'JOB001');
drop table if exists jobDates;
create table jobdates(ID_Date int, ID_User int, DateType varchar(10), DateAssigned date, JobNumber varchar(10));
insert into jobdates values
(1 , 1 , 'Intent' , '2016-06-21' , 'JOB001'),
(2 , 1 , 'Reg' , '2017-04-21' , 'JOB001'),
(3 , 1 , 'Flight' , '2017-05-21' , 'JOB001'),
(4 , 2 , 'Intent' , '2016-12-09' , 'JOB001'),
(5 , 2 , 'Flight' , '2017-01-01' , 'JOB001');
MariaDB [sandbox]> select
-> u.ID_User,
-> Title,
-> Firstname,
-> Surname,
-> u.JobNumber,
-> max(case when datetype = 'intent' then dateassigned else null end) as intent,
-> max(case when datetype = 'reg' then dateassigned else null end) reg,
-> max(case when datetype = 'flight' then dateassigned else null end) as flight
-> from
-> Userjobs as U
-> left join jobDates as jd on U.ID_User = jd.ID_User
-> and jd.JobNumber = u.jobnumber
-> where u.jobnumber = 'JOB001'
-> group by u.ID_User,
-> Title,
-> Firstname,
-> Surname,
-> u.JobNumber;
+---------+-------+-----------+---------+-----------+------------+------------+------------+
| ID_User | Title | Firstname | Surname | JobNumber | intent | reg | flight |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
| 1 | Mr | Bob | Smith | JOB001 | 2016-06-21 | 2017-04-21 | 2017-05-21 |
| 2 | Mrs | Bobbi | Smythe | JOB001 | 2016-12-09 | NULL | 2017-01-01 |
+---------+-------+-----------+---------+-----------+------------+------------+------------+
2 rows in set (0.00 sec)
U
需要 INDEX(JobNumber, Surname, Firstname)
。那应该涵盖 WHERE
和 ORDER BY
,从而避免 'filesort'.
对于 Dates
,你有 UNIQUE(ID_User, DateType, JobNumber)
,对吗?让我们从 table 中去掉 id
,然后用
UNIQUE
PRIMARY KEY(JobNumber, ID_User, DateType)
这将使查找更有效,因为 BTree 的底部将包含 DateAssigned
和 所需的三行将相邻,因为 "clustering"的PK.
除非您有一些其他查询(读取或修改)涉及 Dates
,否则 table 上应该没有其他索引。
这些 table 有多大?你意识到你将完全阅读它们。然而,我的建议将导致每一行只阅读一次,而不是多次。