Postgres - 大型 jsonb 列的性能 select

Question

我们在我们的一个数据库表中使用 Postgres jsonb 类型。 Table结构如下：

CREATE TABLE IF NOT EXISTS public.draft_document (
    id bigserial NOT NULL PRIMARY KEY,
    ...
    document jsonb NOT NULL,
    ein_search character varying(11) NOT NULL
);

CREATE INDEX IF NOT EXISTS count_draft_document_idx ON public.draft_document USING btree (ein_search);
CREATE INDEX IF NOT EXISTS read_draft_document_idx ON public.draft_document USING btree (id, ein_search);

document 列的 json 结构可能不同。以下是 document:

的可能架构示例

"withholdingCredit": {  
    "type": "array",
    "items": {
        "$ref": "#/definitions/withholding"
    }
}

其中 withholding 结构（数组元素）涉及：

"withholding": {
    "properties": {
        ...
        "proportionalityIndicator": {
            "type": "boolean"
        },
        "tribute": {
            "$ref": "#/definitions/tribute"
        },
        "payingSourceEin": {
            "type": "string"
        },
        "value": {
            "type": "number"
        }
        ...
    }
    ...
},      
"tribute": {
    "type": "object",
    "properties": {
        "code": {
            "type": "number"
        },
        "additionalCode": {
            "type": "number"
        }
        ...
    }
}

这里是 json 到 document jsonb 列的例子：

{
   "withholdingCredit":[
      {
         "value": 15000,
         "tribute":{
            "code": 1216,
            "additionalCode": 2
         },
         "payingSourceEin": "03985506123132",
         "proportionalityIndicator": false
      },
      ...
      {
         "value": 98150,
         "tribute":{
            "code": 3155,
            "additionalCode": 1
         },
         "payingSourceEin": "04185506123163",
         "proportionalityIndicator": false
      }
   ]
}

数组中元素的最大数量最多可以变化到 100.000（十万）个元素。这是一个业务限制。

我们需要一个分页 select 查询，该查询 return 分解了 withholding 数组（每行 1 个元素），其中每一行还包含 sum withholding 个元素 value 和 array length。查询还需要 return 预扣 ordered by proportionalityIndicator、tribute-->code、tribute-->additionalCode、payingSourceEin。类似于：

id	sum	jsonb_array_length	jsonb_array_elements
30900	1.800.027	2300	{"value":15000,"tribute":{"code":1216,...}, ...}
...	...	...	{ ... }
30900	1.800.027	2300	{"value":98150,"tribute":{"code":3155,...}, ...}

我们定义了以下查询：

SELECT dft.id, 
    SUM((elem->>'value')::NUMERIC),
    jsonb_array_length(dft.document->'withholdingCredit'),
    jsonb_array_elements(jsonb_agg(elem 
    ORDER BY 
        elem->>'proportionalityIndicator',
        (elem->'tribute'->>'code')::NUMERIC,
        (elem->'tribute'->>'additionalCode')::NUMERIC,
        elem->>'payingSourceEin'))
FROM 
    draft_document dft
    CROSS JOIN LATERAL jsonb_array_elements(dft.document->'withholdingCredit') arr(elem)
WHERE (dft.document->'withholdingCredit') IS NOT NULL
    AND dft.id = :id
    AND dft.ein_search = :ein_search
GROUP BY dft.id
LIMIT :limit OFFSET :offset;

此查询有效，但当我们将大量元素放入 jsonb 数组时，性能会受到限制。欢迎提出任何改进建议。

顺便说一句，我们使用的是 Postgres 9.6。

Answer 1

你的奇怪查询将它分开，聚合它，然后再次分开似乎确实触发了 PostgreSQL 中的一些病态内存管理问题（在 15dev 上测试）。也许您应该就此提交错误报告。

但是你可以通过将它分开一次来避免这个问题。然后您需要使用 window 函数来获取您想要包括所有行的表格，即使是那些被偏移和限制删除的行。

SELECT dft.id, 
    SUM((elem->>'value')::NUMERIC) over (),
    count(*) over (),                                     
    elem                                
FROM 
    draft_document dft
    CROSS JOIN LATERAL jsonb_array_elements(dft.document->'withholdingCredit') arr(elem)
WHERE (dft.document->'withholdingCredit') IS NOT NULL
    AND dft.id = 4
    AND dft.ein_search = '4' 
ORDER BY 
        elem->>'proportionalityIndicator',
        (elem->'tribute'->>'code')::NUMERIC,
        (elem->'tribute'->>'additionalCode')::NUMERIC,
        elem->>'payingSourceEin' 
limit 4 offset 500;

在我手中，这给出了与您的查询相同的答案，但需要 370 毫秒而不是 13,789 毫秒。

在更高的偏移量下，我的查询仍然有效，而你的查询导致完全锁定，需要硬重置。

如果有人想重现不良行为，我通过以下方式生成数据：

insert into draft_document select 4, jsonb_build_object('withholdingCredit',jsonb_agg(jsonb_build_object('value',floor(random()*99999)::int,'tribute','{"code": 1216, "additionalCode": 2}'::jsonb,'payingSourceEin',floor(random()*99999999)::int,'proportionalityIndicator',false))),'4' from generate_series(1,100000) group by 1,3;

Postgres - 大型 jsonb 列的性能 select

Postgres - Performance of select for large jsonb column

postgresql

indexing

performance

json

jsonb