如何有效地计算嵌套在 Postgres 中的 JSONB 数组的摘要统计信息?

How do I efficiently calculate summary stats on JSONB arrays nested in Postgres?

使用 Postgres 9.6。

我有这个工作,但怀疑有更有效的方法。在 MyEventLength 数组上计算 AVG、SUM 等的最佳方法是什么?

DROP TABLE IF EXISTS activity;
DROP SEQUENCE IF EXISTS activity_id_seq;
CREATE SEQUENCE activity_id_seq;

CREATE TABLE activity (
    id INT CHECK (id > 0) NOT NULL DEFAULT NEXTVAL ('activity_id_seq'),
    user_id INT,
    events JSONB
);

INSERT INTO activity (user_id,events) VALUES
(1, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[12],"MyEventValue":[4]}}'),
(2, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[1000,2000],"MyEventValue":[450,550]}}');

迄今为止,这是我能想到的计算 MyEventLength 数组的平均值的最佳方法 user_id 1:

SELECT avg(recs::text::numeric) FROM (
    SELECT jsonb_array_elements(a.event_length) as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

或者这个变体:

SELECT avg(recs) FROM (
    SELECT jsonb_array_elements_text(a.event_length)::numeric as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

有没有不需要那么多子选择的更好方法?

您需要将带有标量值的行传递给 avg(),否则(如果您尝试传递某些设置返回函数的输出,例如 jsonb_array_elements_text(..)),您将得到这样的错误像这样:

ERROR:  set-valued function called in context that cannot accept a set

所以你肯定至少需要 1 个子查询或 CTE。

选项 1,w/o CTE:

select avg(v::numeric)
from (
  select
    jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')
  from activity
  where user_id = 1
) as a(v);

方案二,CTE(可读性更好):

with vals as (
  select
    jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')::numeric as val
  from activity
  where user_id = 1
)
select avg(val)
from vals
;

更新,选项 3:事实证明,您可以 w/o 任何嵌套查询,使用隐式 JOIN LATERAL:

select avg(val::text::numeric)
from activity a, jsonb_array_elements(a.events->'MyEvent'->'MyEventLength') vals(val)
where user_id = 1;