使用同一 table 中的最新相关 ID 更新 table 列
UPDATE table column with latest related id from the same table
我在 PostgreSQL 13 中有这个 table:
CREATE TABLE candles (
id serial primary key,
day integer,
minute integer,
"open" integer,
high integer,
low integer,
"close" integer,
volume integer,
id_d1 integer,
);
CREATE INDEX candles_idx1 ON public.candles (day,minute);
我尝试更新字段 id_d1
,它应该在同一时间具有前一天的 id
:
UPDATE candles s2
SET id_d1 = (SELECT id FROM candles s
WHERE s.id<s2.id
AND s.day<s2.day
AND s.minute=s2.minute
ORDER BY s.id DESC
LIMIT 1);
对于较小的数据量,它工作得很好。对于 80k 条记录,它 运行 无穷无尽。
解释查询:
Update on candles s2 (cost=0.00..744027.57 rows=80240 width=68)
-> Seq Scan on candles s2 (cost=0.00..744027.57 rows=80240 width=68)
SubPlan 1
-> Limit (cost=0.29..9.25 rows=1 width=4)
-> Index Scan Backward using candles_pkey on candles s (cost=0.29..2347.34 rows=262 width=4)
Index Cond: (id < s2.id)
Filter: ((day < s2.day) AND (minute = s2.minute))
我也试过了(WHERE子句中没有id):
EXPLAIN
UPDATE candles s2
SET id_d1 = (SELECT id FROM candles s
WHERE s.day<s2.day
AND s.minute=s2.minute
ORDER BY s.id DESC
LIMIT 1);
结果:
Update on candles s2 (cost=0.00..513040.75 rows=80240 width=68)
-> Seq Scan on candles s2 (cost=0.00..513040.75 rows=80240 width=68)
SubPlan 1
-> Limit (cost=0.29..6.37 rows=1 width=4)
-> Index Scan Backward using candles_pkey on candles s (cost=0.29..4784.85 rows=787 width=4)
Filter: ((day < s2.day) AND (minute = s2.minute))
我应该如何在合理的时间内将查询或模式修改为运行?
提高性能(尤其是对于您的原始查询)的关键是具有倒排索引列的索引。一边做一边做 UNIQUE
:
CREATE UNIQUE INDEX candles_idx1 ON public.candles (minute, day);
优先列。参见:
- Multicolumn index and performance
- Is a composite index also good for queries on the first field?
- Working of indexes in PostgreSQL
如果索引不能 UNIQUE
,您必须告诉我们更多可能的重复以及您打算如何打破平局。
如果可以,考虑用它作为PK来代替id列(完全)。您可能需要 (day, minute)
...
上的附加索引
在更新 all 行时,使用 window function lag()
in a FROM
clause 连接到单个子查询应该(快得多)计算所有目标值(而不是 运行 每行的相关子查询):
UPDATE candles c
SET id_d1 = c2.prev_id
FROM (
SELECT id, lag(id) OVER (PARTITION BY minute ORDER BY day) AS prev_id
FROM candles
) c2
WHERE c.id = c2.id
如果某些行已经 具有 正确的 id_d1
,请添加此行以避免代价高昂的空更新:
AND id_d1 IS DISTINCT FROM c2.prev_id
参见:
- How do I (or can I) SELECT DISTINCT on multiple columns?
在更新所有行时,索引可能甚至不会用于新查询。
有了索引,考虑从 table 中完全删除 id_d1
。存储功能相关的值往往不是一个好主意。使用 lag()
即时计算应该很便宜。然后该值始终自动更新。否则你必须考虑如何使专栏保持最新——这可能很棘手。
我在 PostgreSQL 13 中有这个 table:
CREATE TABLE candles (
id serial primary key,
day integer,
minute integer,
"open" integer,
high integer,
low integer,
"close" integer,
volume integer,
id_d1 integer,
);
CREATE INDEX candles_idx1 ON public.candles (day,minute);
我尝试更新字段 id_d1
,它应该在同一时间具有前一天的 id
:
UPDATE candles s2
SET id_d1 = (SELECT id FROM candles s
WHERE s.id<s2.id
AND s.day<s2.day
AND s.minute=s2.minute
ORDER BY s.id DESC
LIMIT 1);
对于较小的数据量,它工作得很好。对于 80k 条记录,它 运行 无穷无尽。
解释查询:
Update on candles s2 (cost=0.00..744027.57 rows=80240 width=68)
-> Seq Scan on candles s2 (cost=0.00..744027.57 rows=80240 width=68)
SubPlan 1
-> Limit (cost=0.29..9.25 rows=1 width=4)
-> Index Scan Backward using candles_pkey on candles s (cost=0.29..2347.34 rows=262 width=4)
Index Cond: (id < s2.id)
Filter: ((day < s2.day) AND (minute = s2.minute))
我也试过了(WHERE子句中没有id):
EXPLAIN
UPDATE candles s2
SET id_d1 = (SELECT id FROM candles s
WHERE s.day<s2.day
AND s.minute=s2.minute
ORDER BY s.id DESC
LIMIT 1);
结果:
Update on candles s2 (cost=0.00..513040.75 rows=80240 width=68)
-> Seq Scan on candles s2 (cost=0.00..513040.75 rows=80240 width=68)
SubPlan 1
-> Limit (cost=0.29..6.37 rows=1 width=4)
-> Index Scan Backward using candles_pkey on candles s (cost=0.29..4784.85 rows=787 width=4)
Filter: ((day < s2.day) AND (minute = s2.minute))
我应该如何在合理的时间内将查询或模式修改为运行?
提高性能(尤其是对于您的原始查询)的关键是具有倒排索引列的索引。一边做一边做 UNIQUE
:
CREATE UNIQUE INDEX candles_idx1 ON public.candles (minute, day);
优先列。参见:
- Multicolumn index and performance
- Is a composite index also good for queries on the first field?
- Working of indexes in PostgreSQL
如果索引不能 UNIQUE
,您必须告诉我们更多可能的重复以及您打算如何打破平局。
如果可以,考虑用它作为PK来代替id列(完全)。您可能需要 (day, minute)
...
在更新 all 行时,使用 window function lag()
in a FROM
clause 连接到单个子查询应该(快得多)计算所有目标值(而不是 运行 每行的相关子查询):
UPDATE candles c
SET id_d1 = c2.prev_id
FROM (
SELECT id, lag(id) OVER (PARTITION BY minute ORDER BY day) AS prev_id
FROM candles
) c2
WHERE c.id = c2.id
如果某些行已经 具有 正确的 id_d1
,请添加此行以避免代价高昂的空更新:
AND id_d1 IS DISTINCT FROM c2.prev_id
参见:
- How do I (or can I) SELECT DISTINCT on multiple columns?
在更新所有行时,索引可能甚至不会用于新查询。
有了索引,考虑从 table 中完全删除 id_d1
。存储功能相关的值往往不是一个好主意。使用 lag()
即时计算应该很便宜。然后该值始终自动更新。否则你必须考虑如何使专栏保持最新——这可能很棘手。