PSQL 优化左连接
PSQL Optimize left join
我有2个tables import_data
和api_request_record
定义如下
CREATE TABLE public.import_data (
id uuid NOT NULL,
company_id integer NOT NULL,
business_name character varying,
no_of_rating double precision,
no_of_reviews double precision,
phone character varying,
email text,
state character varying,
city character varying,
zip character varying,
address character varying,
category character varying,
upserted_by integer,
upsert_date timestamp without time zone,
vendor character varying,
purchased boolean
)
PARTITION BY LIST (company_id);
CREATE TABLE import_data_1 PARTITION OF import_data FOR VALUES IN ('1');
import_data
保存有关中小企业业务的数据。几乎所有的领域都有索引。 table 已分区。
CREATE TABLE public.api_request_record (
id integer NOT NULL,
company_id integer NOT NULL,
create_date timestamp without time zone,
data_record_id uuid NOT NULL,
phone character varying,
valid_number boolean,
validity_check_date timestamp without time zone,
phone_location character varying,
carrier character varying,
line_type character varying,
dnc_compliant boolean,
dnc_check_date timestamp without time zone,
lc_compliant boolean,
lc_check_date timestamp without time zone
)
PARTITION BY LIST (company_id);
ALTER TABLE public.api_request_record
ADD CONSTRAINT api_request_record_data_record_id_company_id_fkey FOREIGN KEY (data_record_id, company_id) REFERENCES public.import_data(id, company_id) ON DELETE CASCADE;
CREATE TABLE api_request_record_1 PARTITION OF api_request_record FOR VALUES IN ('1');
api_request_record
保存关于每个公司的 import_data
记录的 API 请求信息的数据。大多数字段都有索引。也是分区的。
现在我正在尝试 select 一些关于记录的信息。我的 SQL 查询为
SELECT
import_data.id,
import_data.phone,
import_data.business_name,
import_data.address,
api_request_record.lc_compliant,
api_request_record.dnc_compliant
FROM
import_data
LEFT JOIN (
SELECT
*
FROM
api_request_record
WHERE
company_id = 1
AND (api_request_record.dnc_compliant = TRUE
OR api_request_record.lc_compliant = TRUE)
AND (api_request_record.dnc_compliant = TRUE)) AS api_request_record ON api_request_record.data_record_id = import_data.id
WHERE
import_data.company_id = 1
AND (api_request_record.dnc_compliant = TRUE
OR api_request_record.lc_compliant = TRUE)
AND (api_request_record.dnc_compliant = TRUE)
ORDER BY
import_data.phone DESC
LIMIT 100 OFFSET 600;
问题是,为了从此查询中获取 100 条记录,当两个 table 中的记录数约为 150 万条记录时,大约需要 30 秒。它随着偏移量的增加而增加。
我正在研究
- PostgreSQL: v13.2
- OS: Ubuntu 18.04
- 内存:8GB
- CPU:2.2 GHz 4 核英特尔 i3
这是EXPLAIN (analyze, buffers, format text)
https://explain.depesz.com/s/AQOZ
有没有办法优化这个查询,以便查询在更短的时间内完成?提前致谢。
尝试这样的事情
SELECT
import_data.id,
import_data.phone,
import_data.business_name,
import_data.address,
api_request_record.lc_compliant,
api_request_record.dnc_compliant
FROM
import_data
LEFT JOIN api_request_record on api_request_record.data_record_id = import_data.id AND (api_request_record.dnc_compliant = TRUE OR api_request_record.lc_compliant = TRUE)
WHERE
import_data.company_id = 1
ORDER BY
import_data.phone DESC
LIMIT 100 OFFSET 600;
您似乎想要:
您的查询可以从根本上简化:
- 由于
WHERE
子句,外连接实际上是内连接。
- 布尔条件比必要的复杂得多(“(A OR B) AND A”实际上只是“A”)。
- 不需要子查询。
JOIN
条件可以扩展
所以:
SELECT id.id, id.phone, id.business_name, id.address,
arr.lc_compliant, arr_record.dnc_compliant
FROM import_data id JOIN
api_request_record arr
ON arr.company_id = id.company_id AND
arr.data_record_id = id.id
WHERE id.company_id = 1 AND
arr.dnc_compliant
ORDER BY id.phone DESC
LIMIT 100 OFFSET 600;
那么对于这个查询,我会推荐以下索引:
import_data(company_id, id)
api_request_record(company_id, data_record_id, dnc_compliant)
请注意,这些是具有多个键的索引,而不是每个键上的单独索引。
我有2个tables import_data
和api_request_record
定义如下
CREATE TABLE public.import_data (
id uuid NOT NULL,
company_id integer NOT NULL,
business_name character varying,
no_of_rating double precision,
no_of_reviews double precision,
phone character varying,
email text,
state character varying,
city character varying,
zip character varying,
address character varying,
category character varying,
upserted_by integer,
upsert_date timestamp without time zone,
vendor character varying,
purchased boolean
)
PARTITION BY LIST (company_id);
CREATE TABLE import_data_1 PARTITION OF import_data FOR VALUES IN ('1');
import_data
保存有关中小企业业务的数据。几乎所有的领域都有索引。 table 已分区。
CREATE TABLE public.api_request_record (
id integer NOT NULL,
company_id integer NOT NULL,
create_date timestamp without time zone,
data_record_id uuid NOT NULL,
phone character varying,
valid_number boolean,
validity_check_date timestamp without time zone,
phone_location character varying,
carrier character varying,
line_type character varying,
dnc_compliant boolean,
dnc_check_date timestamp without time zone,
lc_compliant boolean,
lc_check_date timestamp without time zone
)
PARTITION BY LIST (company_id);
ALTER TABLE public.api_request_record
ADD CONSTRAINT api_request_record_data_record_id_company_id_fkey FOREIGN KEY (data_record_id, company_id) REFERENCES public.import_data(id, company_id) ON DELETE CASCADE;
CREATE TABLE api_request_record_1 PARTITION OF api_request_record FOR VALUES IN ('1');
api_request_record
保存关于每个公司的 import_data
记录的 API 请求信息的数据。大多数字段都有索引。也是分区的。
现在我正在尝试 select 一些关于记录的信息。我的 SQL 查询为
SELECT
import_data.id,
import_data.phone,
import_data.business_name,
import_data.address,
api_request_record.lc_compliant,
api_request_record.dnc_compliant
FROM
import_data
LEFT JOIN (
SELECT
*
FROM
api_request_record
WHERE
company_id = 1
AND (api_request_record.dnc_compliant = TRUE
OR api_request_record.lc_compliant = TRUE)
AND (api_request_record.dnc_compliant = TRUE)) AS api_request_record ON api_request_record.data_record_id = import_data.id
WHERE
import_data.company_id = 1
AND (api_request_record.dnc_compliant = TRUE
OR api_request_record.lc_compliant = TRUE)
AND (api_request_record.dnc_compliant = TRUE)
ORDER BY
import_data.phone DESC
LIMIT 100 OFFSET 600;
问题是,为了从此查询中获取 100 条记录,当两个 table 中的记录数约为 150 万条记录时,大约需要 30 秒。它随着偏移量的增加而增加。
我正在研究
- PostgreSQL: v13.2
- OS: Ubuntu 18.04
- 内存:8GB
- CPU:2.2 GHz 4 核英特尔 i3
这是EXPLAIN (analyze, buffers, format text)
https://explain.depesz.com/s/AQOZ
有没有办法优化这个查询,以便查询在更短的时间内完成?提前致谢。
尝试这样的事情
SELECT
import_data.id,
import_data.phone,
import_data.business_name,
import_data.address,
api_request_record.lc_compliant,
api_request_record.dnc_compliant
FROM
import_data
LEFT JOIN api_request_record on api_request_record.data_record_id = import_data.id AND (api_request_record.dnc_compliant = TRUE OR api_request_record.lc_compliant = TRUE)
WHERE
import_data.company_id = 1
ORDER BY
import_data.phone DESC
LIMIT 100 OFFSET 600;
您似乎想要:
您的查询可以从根本上简化:
- 由于
WHERE
子句,外连接实际上是内连接。 - 布尔条件比必要的复杂得多(“(A OR B) AND A”实际上只是“A”)。
- 不需要子查询。
JOIN
条件可以扩展
所以:
SELECT id.id, id.phone, id.business_name, id.address,
arr.lc_compliant, arr_record.dnc_compliant
FROM import_data id JOIN
api_request_record arr
ON arr.company_id = id.company_id AND
arr.data_record_id = id.id
WHERE id.company_id = 1 AND
arr.dnc_compliant
ORDER BY id.phone DESC
LIMIT 100 OFFSET 600;
那么对于这个查询,我会推荐以下索引:
import_data(company_id, id)
api_request_record(company_id, data_record_id, dnc_compliant)
请注意,这些是具有多个键的索引,而不是每个键上的单独索引。