如何有效地计算嵌套在 Postgres 中的 JSONB 数组的摘要统计信息?
How do I efficiently calculate summary stats on JSONB arrays nested in Postgres?
使用 Postgres 9.6。
我有这个工作,但怀疑有更有效的方法。在 MyEventLength
数组上计算 AVG、SUM 等的最佳方法是什么?
DROP TABLE IF EXISTS activity;
DROP SEQUENCE IF EXISTS activity_id_seq;
CREATE SEQUENCE activity_id_seq;
CREATE TABLE activity (
id INT CHECK (id > 0) NOT NULL DEFAULT NEXTVAL ('activity_id_seq'),
user_id INT,
events JSONB
);
INSERT INTO activity (user_id,events) VALUES
(1, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[12],"MyEventValue":[4]}}'),
(2, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[1000,2000],"MyEventValue":[450,550]}}');
迄今为止,这是我能想到的计算 MyEventLength
数组的平均值的最佳方法 user_id
1:
SELECT avg(recs::text::numeric) FROM (
SELECT jsonb_array_elements(a.event_length) as recs FROM (
SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
WHERE user_id = 1
)a
) b;
或者这个变体:
SELECT avg(recs) FROM (
SELECT jsonb_array_elements_text(a.event_length)::numeric as recs FROM (
SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
WHERE user_id = 1
)a
) b;
有没有不需要那么多子选择的更好方法?
您需要将带有标量值的行传递给 avg()
,否则(如果您尝试传递某些设置返回函数的输出,例如 jsonb_array_elements_text(..)
),您将得到这样的错误像这样:
ERROR: set-valued function called in context that cannot accept a set
所以你肯定至少需要 1 个子查询或 CTE。
选项 1,w/o CTE:
select avg(v::numeric)
from (
select
jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')
from activity
where user_id = 1
) as a(v);
方案二,CTE(可读性更好):
with vals as (
select
jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')::numeric as val
from activity
where user_id = 1
)
select avg(val)
from vals
;
更新,选项 3:事实证明,您可以 w/o 任何嵌套查询,使用隐式 JOIN LATERAL:
select avg(val::text::numeric)
from activity a, jsonb_array_elements(a.events->'MyEvent'->'MyEventLength') vals(val)
where user_id = 1;
使用 Postgres 9.6。
我有这个工作,但怀疑有更有效的方法。在 MyEventLength
数组上计算 AVG、SUM 等的最佳方法是什么?
DROP TABLE IF EXISTS activity;
DROP SEQUENCE IF EXISTS activity_id_seq;
CREATE SEQUENCE activity_id_seq;
CREATE TABLE activity (
id INT CHECK (id > 0) NOT NULL DEFAULT NEXTVAL ('activity_id_seq'),
user_id INT,
events JSONB
);
INSERT INTO activity (user_id,events) VALUES
(1, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[12],"MyEventValue":[4]}}'),
(2, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[1000,2000],"MyEventValue":[450,550]}}');
迄今为止,这是我能想到的计算 MyEventLength
数组的平均值的最佳方法 user_id
1:
SELECT avg(recs::text::numeric) FROM (
SELECT jsonb_array_elements(a.event_length) as recs FROM (
SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
WHERE user_id = 1
)a
) b;
或者这个变体:
SELECT avg(recs) FROM (
SELECT jsonb_array_elements_text(a.event_length)::numeric as recs FROM (
SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
WHERE user_id = 1
)a
) b;
有没有不需要那么多子选择的更好方法?
您需要将带有标量值的行传递给 avg()
,否则(如果您尝试传递某些设置返回函数的输出,例如 jsonb_array_elements_text(..)
),您将得到这样的错误像这样:
ERROR: set-valued function called in context that cannot accept a set
所以你肯定至少需要 1 个子查询或 CTE。
选项 1,w/o CTE:
select avg(v::numeric)
from (
select
jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')
from activity
where user_id = 1
) as a(v);
方案二,CTE(可读性更好):
with vals as (
select
jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')::numeric as val
from activity
where user_id = 1
)
select avg(val)
from vals
;
更新,选项 3:事实证明,您可以 w/o 任何嵌套查询,使用隐式 JOIN LATERAL:
select avg(val::text::numeric)
from activity a, jsonb_array_elements(a.events->'MyEvent'->'MyEventLength') vals(val)
where user_id = 1;