有没有办法在 PostgreSQL 的多值字段中搜索部分匹配?
Is there a way to search partial match in multivalue field in PostgreSQL?
我有一个 table 安静的像这样:
CREATE TABLE myTable (
family text,
names text[]
)
我可以这样搜索:
SELECT family
FROM myTable where names @> array['B0WP04'];
但我想这样做:
SELECT family
FROM myTable where names @> array['%P0%'];
这可能吗?
在 postgreSQL 9.3 中,您可以:
select family
from myTable
join lateral unnest(mytable.names) as un(name) on true
where un.name like '%P0%';
但请记住,它可能会产生重复项,因此您可能想添加不同的内容。
对于早期版本:
select family
from myTable where
exists (select 1 from unnest(names) as un(name) where un.name like '%P0%');
您可以使用 parray_gin
扩展 https://github.com/theirix/parray_gin
据说此扩展只能在 9.2 下运行,但我刚刚在 9.3 上安装并测试它并且运行良好。
这里是如何在类似 ubuntu 的系统上安装它:)
# install postgresql extension network client and postgresql extension build tools
sudo apt-get install python-setuptools
easy_install pgxnclient
sudo apt-get install postgresql-server-dev-9.3
# get the extension
pgxn install parray_gin
这是我的测试
-- as a superuser: add the extension to the current database
CREATE EXTENSION parray_gin;
-- as a normal user
CREATE TABLE test (
id SERIAL PRIMARY KEY,
names TEXT []
);
INSERT INTO test (names) VALUES
(ARRAY ['nam1', 'nam2']),
(ARRAY ['2nam1', '2nam2']),
(ARRAY ['Hello', 'Woooorld']),
(ARRAY ['Woooorld', 'Hello']),
(ARRAY [] :: TEXT []),
(NULL),
(ARRAY ['Hello', 'is', 'it', 'me', 'you''re', 'looking', 'for', '?']);
-- double up the rows in test table, with many rows, the index is used
INSERT INTO test (names) (SELECT names FROM test);
SELECT count(*) from test; /*
count
--------
997376
(1 row)
*/
现在我们有了一些测试数据,现在是神奇的时刻:
-- http://pgxn.org/dist/parray_gin/doc/parray_gin.html
CREATE INDEX names_idx ON test USING GIN (names parray_gin_ops);
--- now it's time for some tests
EXPLAIN ANALYZE SELECT * FROM test WHERE names @> ARRAY ['is']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..25667.00 rows=1138 width=49) (actual time=0.021..508.599 rows=51200 loops=1)
Filter: (names @> '{is}'::text[])
Rows Removed by Filter: 946176
Total runtime: 653.879 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=455.73..3463.37 rows=997 width=49) (actual time=14.327..240.365 rows=51200 loops=1)
Recheck Cond: (names @> '{is}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..455.48 rows=997 width=0) (actual time=12.241..12.241 rows=51200 loops=1)
Index Cond: (names @> '{is}'::text[])
Total runtime: 341.750 ms
(5 rows)
*/
EXPLAIN ANALYZE SELECT * FROM test WHERE names @@> ARRAY ['%nam%']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..23914.20 rows=997 width=49) (actual time=0.023..590.093 rows=102400 loops=1)
Filter: (names @@> '{%nam%}'::text[])
Rows Removed by Filter: 894976
Total runtime: 796.636 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=159.73..3167.37 rows=997 width=49) (actual time=20.164..293.942 rows=102400 loops=1)
Recheck Cond: (names @@> '{%nam%}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..159.48 rows=997 width=0) (actual time=18.539..18.539 rows=102400 loops=1)
Index Cond: (names @@> '{%nam%}'::text[])
Total runtime: 490.060 ms
(5 rows)
*/
最终性能完全取决于您的数据和查询,但在我的虚拟示例中,此扩展非常有效,将查询时间缩短了一半。
在 Radek 的回答上加一点,我试过了
select family
from myTable where
exists (select 1 from unnest(names) as name where name like '%P0%');
而且它也有效。我在 PostgreSQL 文档中搜索了 un()
函数,但找不到任何内容。
我并不是说它没有任何作用,但我只是想知道 un()
函数应该做什么(很高兴我的问题得到解决)
我有一个 table 安静的像这样:
CREATE TABLE myTable (
family text,
names text[]
)
我可以这样搜索:
SELECT family
FROM myTable where names @> array['B0WP04'];
但我想这样做:
SELECT family
FROM myTable where names @> array['%P0%'];
这可能吗?
在 postgreSQL 9.3 中,您可以:
select family
from myTable
join lateral unnest(mytable.names) as un(name) on true
where un.name like '%P0%';
但请记住,它可能会产生重复项,因此您可能想添加不同的内容。
对于早期版本:
select family
from myTable where
exists (select 1 from unnest(names) as un(name) where un.name like '%P0%');
您可以使用 parray_gin
扩展 https://github.com/theirix/parray_gin
据说此扩展只能在 9.2 下运行,但我刚刚在 9.3 上安装并测试它并且运行良好。
这里是如何在类似 ubuntu 的系统上安装它:)
# install postgresql extension network client and postgresql extension build tools
sudo apt-get install python-setuptools
easy_install pgxnclient
sudo apt-get install postgresql-server-dev-9.3
# get the extension
pgxn install parray_gin
这是我的测试
-- as a superuser: add the extension to the current database
CREATE EXTENSION parray_gin;
-- as a normal user
CREATE TABLE test (
id SERIAL PRIMARY KEY,
names TEXT []
);
INSERT INTO test (names) VALUES
(ARRAY ['nam1', 'nam2']),
(ARRAY ['2nam1', '2nam2']),
(ARRAY ['Hello', 'Woooorld']),
(ARRAY ['Woooorld', 'Hello']),
(ARRAY [] :: TEXT []),
(NULL),
(ARRAY ['Hello', 'is', 'it', 'me', 'you''re', 'looking', 'for', '?']);
-- double up the rows in test table, with many rows, the index is used
INSERT INTO test (names) (SELECT names FROM test);
SELECT count(*) from test; /*
count
--------
997376
(1 row)
*/
现在我们有了一些测试数据,现在是神奇的时刻:
-- http://pgxn.org/dist/parray_gin/doc/parray_gin.html
CREATE INDEX names_idx ON test USING GIN (names parray_gin_ops);
--- now it's time for some tests
EXPLAIN ANALYZE SELECT * FROM test WHERE names @> ARRAY ['is']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..25667.00 rows=1138 width=49) (actual time=0.021..508.599 rows=51200 loops=1)
Filter: (names @> '{is}'::text[])
Rows Removed by Filter: 946176
Total runtime: 653.879 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=455.73..3463.37 rows=997 width=49) (actual time=14.327..240.365 rows=51200 loops=1)
Recheck Cond: (names @> '{is}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..455.48 rows=997 width=0) (actual time=12.241..12.241 rows=51200 loops=1)
Index Cond: (names @> '{is}'::text[])
Total runtime: 341.750 ms
(5 rows)
*/
EXPLAIN ANALYZE SELECT * FROM test WHERE names @@> ARRAY ['%nam%']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..23914.20 rows=997 width=49) (actual time=0.023..590.093 rows=102400 loops=1)
Filter: (names @@> '{%nam%}'::text[])
Rows Removed by Filter: 894976
Total runtime: 796.636 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=159.73..3167.37 rows=997 width=49) (actual time=20.164..293.942 rows=102400 loops=1)
Recheck Cond: (names @@> '{%nam%}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..159.48 rows=997 width=0) (actual time=18.539..18.539 rows=102400 loops=1)
Index Cond: (names @@> '{%nam%}'::text[])
Total runtime: 490.060 ms
(5 rows)
*/
最终性能完全取决于您的数据和查询,但在我的虚拟示例中,此扩展非常有效,将查询时间缩短了一半。
在 Radek 的回答上加一点,我试过了
select family
from myTable where
exists (select 1 from unnest(names) as name where name like '%P0%');
而且它也有效。我在 PostgreSQL 文档中搜索了 un()
函数,但找不到任何内容。
我并不是说它没有任何作用,但我只是想知道 un()
函数应该做什么(很高兴我的问题得到解决)