mysql 全文搜索失败
mysql fulltext search failure
这是我测试的。
mysql> select * from product;
+------------+---------+---------------+
| Id | Product | ProductIdType |
+------------+---------+---------------+
| B00005N5PF | one pen | ASIN |
| B000J5XS3C | | ASIN |
+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('pen' in boolean mode);
Empty set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('one pen' in boolean mode);
Empty set (0.00 sec)
这是创建语句。
CREATE TABLE product
(
Id VARCHAR(16),
Product VARCHAR(128),
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (Product)
) ENGINE=MyISAM;
等号和 'LIKE' 工作正常。那为什么?
-- drop table testproduct;
CREATE TABLE testproduct
(
Id VARCHAR(16),
prod_name TEXT,
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (prod_name)
) ENGINE=MyISAM;
insert into testproduct (id,prod_name,productidtype) values ('B00005N5PF','one pen and a good price for a pen','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B570J5XS3C',null,'ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C00ZZ5N5PF','let us get rid of some noise','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D00LL5N5PA','four score and seven years ago our fore...','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('EEEZZ5N5PF','he has a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C01ZZ5N5PF','and then we','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B00ZZ5N5PF','he has a pen in his pocket not a banana','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C02ZZ5N5PF','went to the store','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C03ZZ5N5PF','and decided that we should buy some','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04ZZ5N5PF','fruit cups or fruit or berries or pebbles','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C037Z5N5PF','then he and her she and it','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04K95N5PF','threw some daggers and a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D88895N5PF','more noise and some of this','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D34595N5PF','this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D12395N5PF','and there we go','ASIN');
全文搜索需要一些变化来摆脱重复 'noise'。使用最少的数据进行测试会产生糟糕的结果。将你的整个 collection 投入其中,以获得任何有意义的结果。甚至尝试搜索的字词的最小大小也有设置,如下面的一些链接所示。
停用词
有MySqlLists of Stop Words in various languages representing insignificant words skipped during the search process. That list is compiled into the server, but can be overriden as seen in this Manual Page和文字:
To override the default stopword list, set the ft_stopword_file system
variable. (See Section 5.1.4, “Server System Variables”.) The variable
value should be the path name of the file containing the stopword
list, or the empty string to disable stopword filtering. The server
looks for the file in the data directory unless an absolute path name
is given to specify a different directory. After changing the value of
this variable or the contents of the stopword file, restart the server
and rebuild your FULLTEXT indexes.
一些示例查询
-- select * from testproduct
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('score' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('harpoon' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('banana' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('years' IN BOOLEAN MODE);
获取多个单词匹配:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN BOOLEAN MODE ) AS relevance
FROM testproduct
ORDER BY relevance DESC
在 relevance
列中给出实际权重:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN NATURAL LANGUAGE MODE) AS relevance
FROM testproduct
ORDER BY relevance DESC
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| id | prod_name | relevance |
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| D34595N5PF | this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon | 3.6207125186920166 |
| EEEZZ5N5PF | he has a harpoon | 1.2845110893249512 |
| C04K95N5PF | threw some daggers and a harpoon | 1.2559525966644287 |
|------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
取消了 here 中的多词部分。谢谢斯宾塞
这是我测试的。
mysql> select * from product;
+------------+---------+---------------+
| Id | Product | ProductIdType |
+------------+---------+---------------+
| B00005N5PF | one pen | ASIN |
| B000J5XS3C | | ASIN |
+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('pen' in boolean mode);
Empty set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('one pen' in boolean mode);
Empty set (0.00 sec)
这是创建语句。
CREATE TABLE product
(
Id VARCHAR(16),
Product VARCHAR(128),
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (Product)
) ENGINE=MyISAM;
等号和 'LIKE' 工作正常。那为什么?
-- drop table testproduct;
CREATE TABLE testproduct
(
Id VARCHAR(16),
prod_name TEXT,
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (prod_name)
) ENGINE=MyISAM;
insert into testproduct (id,prod_name,productidtype) values ('B00005N5PF','one pen and a good price for a pen','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B570J5XS3C',null,'ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C00ZZ5N5PF','let us get rid of some noise','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D00LL5N5PA','four score and seven years ago our fore...','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('EEEZZ5N5PF','he has a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C01ZZ5N5PF','and then we','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B00ZZ5N5PF','he has a pen in his pocket not a banana','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C02ZZ5N5PF','went to the store','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C03ZZ5N5PF','and decided that we should buy some','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04ZZ5N5PF','fruit cups or fruit or berries or pebbles','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C037Z5N5PF','then he and her she and it','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04K95N5PF','threw some daggers and a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D88895N5PF','more noise and some of this','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D34595N5PF','this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D12395N5PF','and there we go','ASIN');
全文搜索需要一些变化来摆脱重复 'noise'。使用最少的数据进行测试会产生糟糕的结果。将你的整个 collection 投入其中,以获得任何有意义的结果。甚至尝试搜索的字词的最小大小也有设置,如下面的一些链接所示。
停用词
有MySqlLists of Stop Words in various languages representing insignificant words skipped during the search process. That list is compiled into the server, but can be overriden as seen in this Manual Page和文字:
To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.4, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.
一些示例查询
-- select * from testproduct
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('score' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('harpoon' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('banana' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('years' IN BOOLEAN MODE);
获取多个单词匹配:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN BOOLEAN MODE ) AS relevance
FROM testproduct
ORDER BY relevance DESC
在 relevance
列中给出实际权重:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN NATURAL LANGUAGE MODE) AS relevance
FROM testproduct
ORDER BY relevance DESC
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| id | prod_name | relevance |
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| D34595N5PF | this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon | 3.6207125186920166 |
| EEEZZ5N5PF | he has a harpoon | 1.2845110893249512 |
| C04K95N5PF | threw some daggers and a harpoon | 1.2559525966644287 |
|------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
取消了 here 中的多词部分。谢谢斯宾塞