优化 SQL 在同一 table 上使用多个内部联接的查询
Optimize SQL query with many inner joins on same table
我遇到了性能问题:
一家商店有一个商品过滤器,类别为 "color"、"size"、"gender" 和 "feature"。所有这些详细信息都存储在 article_criterias
table 中,如下所示:
Table article_criterias
的布局是;这个 table 有大约 36.000 行:
article_id | group | option | option_val
100 | "size" | "35" | 35.00
100 | "size" | "36" | 36.00
100 | "size" | "36½" | 36.50
100 | "color" | "40" | 40.00
100 | "color" | "50" | 50.00
100 | "gender" | "1" | 1.00
101 | "size" | "40" | 40.00
...
我们有一个动态构建的 SQL 查询,基于当前选择的条件。该查询适用于 2-3 个条件,但在选择超过 5 个选项时会变得非常慢(每个额外的 INNER JOIN 大约使执行时间加倍)
我们如何才能使这个 SQL 更快,甚至可以用性能更高的概念替换内部联接?
这是查询(逻辑正确,只是性能不好):
-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
criteria.group AS `key`,
criteria.option AS `value`
FROM articles
INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
INNER JOIN article_criterias AS criteria_gender
ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
INNER JOIN article_criterias AS criteria_color1
ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
INNER JOIN article_criterias AS criteria_size2
ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
INNER JOIN article_criterias AS criteria_size3
ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
INNER JOIN article_criterias AS criteria_size4
ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
INNER JOIN article_criterias AS criteria_size5
ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
INNER JOIN article_criterias AS criteria_size6
ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
INNER JOIN article_criterias AS criteria_size7
ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
AND (criteria_gender.option IN ("1"))
AND (criteria_color1.option IN ("80", "30"))
AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)
按照@affan-pathan 的建议,添加索引确实解决了问题:
CREATE INDEX text_option
ON `article_criterias` (`article_id`, `group`, `option`);
CREATE INDEX numeric_option
ON `article_criterias` (`article_id`, `group`, `option_val`);
这两个索引将上述查询表单的执行时间缩短了近 1 分钟,不到 50 毫秒!!
我知道您创建的索引解决了您的问题,
但只是为了玩一个伪替代方案(避免多个 INNER JOIN),你能尝试这样的事情吗? (我只测试了三个条件。你的条件应该插入内部查询。要 select 只有满足所有条件的记录,你必须更改最后一个 WHERE 条件(WHERE max = 3,使用条件数你在上面写了;所以如果你使用 5 个条件,你应该写 WHERE max = 5)。(为了便于使用,我更改了列组和选项的名称)。
这只是一个想法,所以请做一些测试并检查性能,请告诉我...
CREATE TABLE CRITERIA (ARTICLE_ID INT, GROU VARCHAR(10), OPT VARCHAR(20), OPTION_VAL NUMERIC(12,2));
CREATE TABLE ARTICLES (ID INT);
INSERT INTO CRITERIA VALUES (100,'size','35',35);
INSERT INTO CRITERIA VALUES (100,'size','36',36);
INSERT INTO CRITERIA VALUES (100,'color','40',40);
INSERT INTO CRITERIA VALUES (100,'gender','1',1);
INSERT INTO CRITERIA VALUES (200,'size','36.2',36.2);
INSERT INTO CRITERIA VALUES (300,'size','36.2',36.2);
INSERT INTO ARTICLES VALUES (100);
INSERT INTO ARTICLES VALUES (200);
INSERT INTO ARTICLES VALUES (300);
-------------------------------------------------------
SELECT D.article_id, D.GROU, D.OPT
FROM (SELECT C.*
, @o:=CASE WHEN @h=ARTICLE_ID THEN @o ELSE cumul END max
, @h:=ARTICLE_ID AS a_id
FROM (SELECT article_id,
B.GROU, B.OPT,
@r:= CASE WHEN @g = B.ARTICLE_ID THEN @r+1 ELSE 1 END cumul,
@g:= B.ARTICLE_ID g
FROM CRITERIA B
CROSS JOIN (SELECT @g:=0, @r:=0) T1
WHERE (B.GROU='gender' AND B.OPT IN ('1'))
OR (B.GROU='color' AND B.OPT IN ('40', '30'))
OR (B.GROU='size' AND B.OPT BETWEEN 35.500000 AND 36.500000)
ORDER BY article_id
) C
CROSS JOIN (SELECT @o:=0, @h:=0) T2
ORDER BY ARTICLE_ID, CUMUL DESC) D
WHERE max=3
;
输出:
article_id GROU OPT
100 gender 1
100 color 40
100 size 36
Key/value 表格真麻烦。但是,为了找到某些符合条件的聚合数据:
select
a.*,
ac.group AS "key",
ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
select article_id
from article_criterias
group by article_id
having sum("group" = 'gender' and option = '1') > 0
and sum("group" = 'color' and option in ('30','80')) > 0
and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;
这将为您提供适用于性别 1、颜色 30 and/or 80、所有列出的尺码范围及其所有选项的所有商品。 (不过,尺寸范围有点奇怪;例如,尺寸 36.5 会满足两个范围。)您明白了:按 article_id 分组并使用 HAVING
以便仅获得 article_id 符合标准。
关于您需要的索引
create index idx on article_criterias(article_id, "group", option, option_val);
我遇到了性能问题:
一家商店有一个商品过滤器,类别为 "color"、"size"、"gender" 和 "feature"。所有这些详细信息都存储在 article_criterias
table 中,如下所示:
Table article_criterias
的布局是;这个 table 有大约 36.000 行:
article_id | group | option | option_val
100 | "size" | "35" | 35.00
100 | "size" | "36" | 36.00
100 | "size" | "36½" | 36.50
100 | "color" | "40" | 40.00
100 | "color" | "50" | 50.00
100 | "gender" | "1" | 1.00
101 | "size" | "40" | 40.00
...
我们有一个动态构建的 SQL 查询,基于当前选择的条件。该查询适用于 2-3 个条件,但在选择超过 5 个选项时会变得非常慢(每个额外的 INNER JOIN 大约使执行时间加倍)
我们如何才能使这个 SQL 更快,甚至可以用性能更高的概念替换内部联接?
这是查询(逻辑正确,只是性能不好):
-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
criteria.group AS `key`,
criteria.option AS `value`
FROM articles
INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
INNER JOIN article_criterias AS criteria_gender
ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
INNER JOIN article_criterias AS criteria_color1
ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
INNER JOIN article_criterias AS criteria_size2
ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
INNER JOIN article_criterias AS criteria_size3
ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
INNER JOIN article_criterias AS criteria_size4
ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
INNER JOIN article_criterias AS criteria_size5
ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
INNER JOIN article_criterias AS criteria_size6
ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
INNER JOIN article_criterias AS criteria_size7
ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
AND (criteria_gender.option IN ("1"))
AND (criteria_color1.option IN ("80", "30"))
AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)
按照@affan-pathan 的建议,添加索引确实解决了问题:
CREATE INDEX text_option
ON `article_criterias` (`article_id`, `group`, `option`);
CREATE INDEX numeric_option
ON `article_criterias` (`article_id`, `group`, `option_val`);
这两个索引将上述查询表单的执行时间缩短了近 1 分钟,不到 50 毫秒!!
我知道您创建的索引解决了您的问题, 但只是为了玩一个伪替代方案(避免多个 INNER JOIN),你能尝试这样的事情吗? (我只测试了三个条件。你的条件应该插入内部查询。要 select 只有满足所有条件的记录,你必须更改最后一个 WHERE 条件(WHERE max = 3,使用条件数你在上面写了;所以如果你使用 5 个条件,你应该写 WHERE max = 5)。(为了便于使用,我更改了列组和选项的名称)。 这只是一个想法,所以请做一些测试并检查性能,请告诉我...
CREATE TABLE CRITERIA (ARTICLE_ID INT, GROU VARCHAR(10), OPT VARCHAR(20), OPTION_VAL NUMERIC(12,2));
CREATE TABLE ARTICLES (ID INT);
INSERT INTO CRITERIA VALUES (100,'size','35',35);
INSERT INTO CRITERIA VALUES (100,'size','36',36);
INSERT INTO CRITERIA VALUES (100,'color','40',40);
INSERT INTO CRITERIA VALUES (100,'gender','1',1);
INSERT INTO CRITERIA VALUES (200,'size','36.2',36.2);
INSERT INTO CRITERIA VALUES (300,'size','36.2',36.2);
INSERT INTO ARTICLES VALUES (100);
INSERT INTO ARTICLES VALUES (200);
INSERT INTO ARTICLES VALUES (300);
-------------------------------------------------------
SELECT D.article_id, D.GROU, D.OPT
FROM (SELECT C.*
, @o:=CASE WHEN @h=ARTICLE_ID THEN @o ELSE cumul END max
, @h:=ARTICLE_ID AS a_id
FROM (SELECT article_id,
B.GROU, B.OPT,
@r:= CASE WHEN @g = B.ARTICLE_ID THEN @r+1 ELSE 1 END cumul,
@g:= B.ARTICLE_ID g
FROM CRITERIA B
CROSS JOIN (SELECT @g:=0, @r:=0) T1
WHERE (B.GROU='gender' AND B.OPT IN ('1'))
OR (B.GROU='color' AND B.OPT IN ('40', '30'))
OR (B.GROU='size' AND B.OPT BETWEEN 35.500000 AND 36.500000)
ORDER BY article_id
) C
CROSS JOIN (SELECT @o:=0, @h:=0) T2
ORDER BY ARTICLE_ID, CUMUL DESC) D
WHERE max=3
;
输出:
article_id GROU OPT
100 gender 1
100 color 40
100 size 36
Key/value 表格真麻烦。但是,为了找到某些符合条件的聚合数据:
select
a.*,
ac.group AS "key",
ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
select article_id
from article_criterias
group by article_id
having sum("group" = 'gender' and option = '1') > 0
and sum("group" = 'color' and option in ('30','80')) > 0
and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;
这将为您提供适用于性别 1、颜色 30 and/or 80、所有列出的尺码范围及其所有选项的所有商品。 (不过,尺寸范围有点奇怪;例如,尺寸 36.5 会满足两个范围。)您明白了:按 article_id 分组并使用 HAVING
以便仅获得 article_id 符合标准。
关于您需要的索引
create index idx on article_criterias(article_id, "group", option, option_val);