匹配邮政编码和城市名称 - 在 PostgreSQL 中非常慢
Matching on postal code and city name - very slow in PostgreSQL
我正在尝试用 othertable 中的数据更新 mytable 中的地址字段。
如果我匹配邮政编码并从 mytable 中的其他表搜索城市名称,它的工作速度相当快。但是由于我在所有情况下都没有邮政编码,所以我也想只在第二次查询中查找名称。这需要数小时 (>12h)。有什么想法可以加快查询速度吗?请注意索引没有帮助。 (2) 中的索引扫描并不快。
邮政编码+姓名匹配代码(1)
update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t.postal_code = t1.postal_code and t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;
仅根据姓名匹配的代码 (2)
update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;
查询计划 1:
"Update on mytable t1 (cost=19084169.53..19205622.16 rows=13781 width=1918)"
" -> Merge Join (cost=19084169.53..19205622.16 rows=13781 width=1918)"
" Merge Cond: (((t1.postal_code)::text = (othertable.postal_code)::text) AND (t1.country = othertable.country))"
" Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
" -> Sort (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
" Sort Key: t1.postal_code, t1.country"
" -> Seq Scan on mytable t1 (cost=0.00..4057214.31 rows=6270570 width=1661)"
" Filter: (admin1 IS NULL)"
" -> Materialize (cost=752152.19..766803.71 rows=2930305 width=92)"
" -> Sort (cost=752152.19..759477.95 rows=2930305 width=92)"
" Sort Key: othertable.postal_code, othertable.country"
" -> Seq Scan on othertable (cost=0.00..136924.05 rows=2930305 width=92)"
查询计划 2:
"Update on mytable t1 (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
" -> Merge Join (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
" Merge Cond: (t1.country = othertable.country)"
" Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
" -> Sort (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
" Sort Key: t1.country"
" -> Seq Scan on mytable t1 (cost=0.00..4057214.31 rows=6270570 width=1661)"
" Filter: (admin1 IS NULL)"
" -> Materialize (cost=752152.19..766803.71 rows=2930305 width=92)"
" -> Sort (cost=752152.19..759477.95 rows=2930305 width=92)"
" Sort Key: othertable.country"
" -> Seq Scan on othertable (cost=0.00..136924.05 rows=2930305 width=92)"
在第二个查询中,您正在(或多或少)加入城市名称,但是 othertable
每个城市名称有多个条目,因此您要对每个记录更新 mytable
几次,具有不可预测的值(经纬度或其他 admin2/3 将是最后更新的值?)
如果 othertable
有没有邮政编码的条目,请通过添加额外条件 AND othertable.posalcode is null
来使用它们
否则,您将希望获得 othertable
的子集,即 returns 每个 admin1
+ country
值一行。您将用以下查询替换 select * from othertable
。当然你可能想要调整它以获得另一个 lat/long/admin2-3 而不是第一个..
SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
FROM othertable
GROUP BY admin1,country
最糟糕的是,第二个查询会覆盖第一个查询中更新的内容,因此您必须通过添加 and mytable.postalcode is null
来忽略这些记录
整个查询可以是
UPDATE mytable t1
SET
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng
FROM (
SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
FROM othertable
GROUP BY admin1,country) t
WHERE t1.country = t.country
AND upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
AND admin1 is null
AND mytable.postal_code is null;
我正在尝试用 othertable 中的数据更新 mytable 中的地址字段。 如果我匹配邮政编码并从 mytable 中的其他表搜索城市名称,它的工作速度相当快。但是由于我在所有情况下都没有邮政编码,所以我也想只在第二次查询中查找名称。这需要数小时 (>12h)。有什么想法可以加快查询速度吗?请注意索引没有帮助。 (2) 中的索引扫描并不快。
邮政编码+姓名匹配代码(1)
update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t.postal_code = t1.postal_code and t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;
仅根据姓名匹配的代码 (2)
update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;
查询计划 1:
"Update on mytable t1 (cost=19084169.53..19205622.16 rows=13781 width=1918)"
" -> Merge Join (cost=19084169.53..19205622.16 rows=13781 width=1918)"
" Merge Cond: (((t1.postal_code)::text = (othertable.postal_code)::text) AND (t1.country = othertable.country))"
" Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
" -> Sort (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
" Sort Key: t1.postal_code, t1.country"
" -> Seq Scan on mytable t1 (cost=0.00..4057214.31 rows=6270570 width=1661)"
" Filter: (admin1 IS NULL)"
" -> Materialize (cost=752152.19..766803.71 rows=2930305 width=92)"
" -> Sort (cost=752152.19..759477.95 rows=2930305 width=92)"
" Sort Key: othertable.postal_code, othertable.country"
" -> Seq Scan on othertable (cost=0.00..136924.05 rows=2930305 width=92)"
查询计划 2:
"Update on mytable t1 (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
" -> Merge Join (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
" Merge Cond: (t1.country = othertable.country)"
" Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
" -> Sort (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
" Sort Key: t1.country"
" -> Seq Scan on mytable t1 (cost=0.00..4057214.31 rows=6270570 width=1661)"
" Filter: (admin1 IS NULL)"
" -> Materialize (cost=752152.19..766803.71 rows=2930305 width=92)"
" -> Sort (cost=752152.19..759477.95 rows=2930305 width=92)"
" Sort Key: othertable.country"
" -> Seq Scan on othertable (cost=0.00..136924.05 rows=2930305 width=92)"
在第二个查询中,您正在(或多或少)加入城市名称,但是 othertable
每个城市名称有多个条目,因此您要对每个记录更新 mytable
几次,具有不可预测的值(经纬度或其他 admin2/3 将是最后更新的值?)
如果 othertable
有没有邮政编码的条目,请通过添加额外条件 AND othertable.posalcode is null
否则,您将希望获得 othertable
的子集,即 returns 每个 admin1
+ country
值一行。您将用以下查询替换 select * from othertable
。当然你可能想要调整它以获得另一个 lat/long/admin2-3 而不是第一个..
SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
FROM othertable
GROUP BY admin1,country
最糟糕的是,第二个查询会覆盖第一个查询中更新的内容,因此您必须通过添加 and mytable.postalcode is null
整个查询可以是
UPDATE mytable t1
SET
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng
FROM (
SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
FROM othertable
GROUP BY admin1,country) t
WHERE t1.country = t.country
AND upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
AND admin1 is null
AND mytable.postal_code is null;