PSQL 优化左连接

PSQL Optimize left join

我有2个tables import_dataapi_request_record定义如下

CREATE TABLE public.import_data (
    id uuid NOT NULL,
    company_id integer NOT NULL,
    business_name character varying,
    no_of_rating double precision,
    no_of_reviews double precision,
    phone character varying,
    email text,
    state character varying,
    city character varying,
    zip character varying,
    address character varying,
    category character varying,
    upserted_by integer,
    upsert_date timestamp without time zone,
    vendor character varying,
    purchased boolean
)
PARTITION BY LIST (company_id);

CREATE TABLE import_data_1 PARTITION OF import_data FOR VALUES IN ('1');

import_data 保存有关中小企业业务的数据。几乎所有的领域都有索引。 table 已分区。

CREATE TABLE public.api_request_record (
    id integer NOT NULL,
    company_id integer NOT NULL,
    create_date timestamp without time zone,
    data_record_id uuid NOT NULL,
    phone character varying,
    valid_number boolean,
    validity_check_date timestamp without time zone,
    phone_location character varying,
    carrier character varying,
    line_type character varying,
    dnc_compliant boolean,
    dnc_check_date timestamp without time zone,
    lc_compliant boolean,
    lc_check_date timestamp without time zone
)
PARTITION BY LIST (company_id);

ALTER TABLE public.api_request_record
    ADD CONSTRAINT api_request_record_data_record_id_company_id_fkey FOREIGN KEY (data_record_id, company_id) REFERENCES public.import_data(id, company_id) ON DELETE CASCADE;

CREATE TABLE api_request_record_1 PARTITION OF api_request_record FOR VALUES IN ('1');

api_request_record 保存关于每个公司的 import_data 记录的 API 请求信息的数据。大多数字段都有索引。也是分区的。

现在我正在尝试 select 一些关于记录的信息。我的 SQL 查询为

SELECT
    import_data.id,
    import_data.phone,
    import_data.business_name,
    import_data.address,
    api_request_record.lc_compliant,
    api_request_record.dnc_compliant
FROM
    import_data
    LEFT JOIN (
        SELECT
            *
        FROM
            api_request_record
        WHERE
            company_id = 1
            AND (api_request_record.dnc_compliant = TRUE
                OR api_request_record.lc_compliant = TRUE)
            AND (api_request_record.dnc_compliant = TRUE)) AS api_request_record ON api_request_record.data_record_id = import_data.id
WHERE
    import_data.company_id = 1
    AND (api_request_record.dnc_compliant = TRUE
        OR api_request_record.lc_compliant = TRUE)
    AND (api_request_record.dnc_compliant = TRUE)
ORDER BY
    import_data.phone DESC
LIMIT 100 OFFSET 600;

问题是,为了从此查询中获取 100 条记录,当两个 table 中的记录数约为 150 万条记录时,大约需要 30 秒。它随着偏移量的增加而增加。

我正在研究

这是EXPLAIN (analyze, buffers, format text) https://explain.depesz.com/s/AQOZ

有没有办法优化这个查询,以便查询在更短的时间内完成?提前致谢。

尝试这样的事情

SELECT
    import_data.id,
    import_data.phone,
    import_data.business_name,
    import_data.address,
    api_request_record.lc_compliant,
    api_request_record.dnc_compliant
FROM
    import_data
    LEFT JOIN api_request_record on api_request_record.data_record_id = import_data.id AND (api_request_record.dnc_compliant = TRUE OR api_request_record.lc_compliant = TRUE)
WHERE
    import_data.company_id = 1
ORDER BY
    import_data.phone DESC
LIMIT 100 OFFSET 600;

您似乎想要:

您的查询可以从根本上简化:

  • 由于 WHERE 子句,外连接实际上是内连接。
  • 布尔条件比必要的复杂得多(“(A OR B) AND A”实际上只是“A”)。
  • 不需要子查询。
  • JOIN条件可以扩展

所以:

SELECT id.id, id.phone, id.business_name, id.address,
       arr.lc_compliant, arr_record.dnc_compliant
FROM import_data id JOIN
     api_request_record arr
     ON arr.company_id = id.company_id AND
        arr.data_record_id = id.id
WHERE id.company_id = 1 AND
      arr.dnc_compliant
ORDER BY id.phone DESC
LIMIT 100 OFFSET 600;

那么对于这个查询,我会推荐以下索引:

  • import_data(company_id, id)
  • api_request_record(company_id, data_record_id, dnc_compliant)

请注意,这些是具有多个键的索引,而不是每个键上的单独索引。