为给定字符串搜索数据库的好算法
Good algorithm for searching DB for a given string
我正在开发一个网络应用程序 (PHP + MySQL),用户可以在其中通过输入一些搜索字符串来搜索其他用户。
我需要将用户的输入字符串与数据库中 'User' table 的 2 列(用户名和全名)相匹配,并且 return 最相关(20 或 50)火柴。最理想的是,我还需要考虑拼写错误。
我该如何解决这个问题?我不想在这里重新发明轮子。
我会用两列进行类似搜索
我的查询是这样的
select * from ( select * from user where (username like ='%{input}%' or lastName like '%{input}%')) and rownum =20
您需要编写自定义清理算法来清理输入的拼写错误
您可以使用 MySQL full Text search:
我也想向你解释一下 Boolean Full Text Search; But I advise you to please go through Full Text Search using Query Expansion。
让我们看一下 table 上给出的示例 dev.mysql.com:
mysql> select * from articles;
+----+-----------------------+------------------------------------------+
| id | title | body |
+----+-----------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 2 | How To Use MySQL Well | After you went through a ... |
| 3 | Optimizing MySQL | In this tutorial we will show ... |
| 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 6 | MySQL Security | When configured properly, MySQL ... |
+----+-----------------------+------------------------------------------+
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('"database comparison"' IN BOOLEAN MODE);
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
顺序很重要,引用的话:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('"comparison database"' IN BOOLEAN MODE);
Empty set (0.01 sec)
当我们删除引号时,它将搜索包含单词 "database" 或 "comparison":
的行
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('database comparison' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
现在顺序无关紧要:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('comparison database' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
如果我们想要获取包含单词 "PostgreSQL" 或短语 "database comparison" 的行,我们应该使用此请求:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('PostgreSQL "database comparison"' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
确保您要搜索的词不在 list of stopwords 中,它们将被忽略。
(显然像 'is'、'the' 这样的词是 stopwords 并且这些被忽略了)
要在布尔模式下增强结果排序,您可以使用以下查询:
(假设你在用户输入的字符串中总共有 2 个单词)那么。
SELECT column_names, MATCH (text) AGAINST ('word1 word2')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2' in boolean mode)
order by col1 desc;
(如果你在用户输入的字符串中有3个词)那么..
SELECT column_names, MATCH (text) AGAINST ('word1 word2 word3')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode)
order by col1 desc;
首先使用MATCH()
我们得到非布尔搜索模式的分数(更有特色)。 second MATCH()
确保我们真的只得到我们想要的结果(所有 3 个词)。
我正在开发一个网络应用程序 (PHP + MySQL),用户可以在其中通过输入一些搜索字符串来搜索其他用户。
我需要将用户的输入字符串与数据库中 'User' table 的 2 列(用户名和全名)相匹配,并且 return 最相关(20 或 50)火柴。最理想的是,我还需要考虑拼写错误。
我该如何解决这个问题?我不想在这里重新发明轮子。
我会用两列进行类似搜索
我的查询是这样的
select * from ( select * from user where (username like ='%{input}%' or lastName like '%{input}%')) and rownum =20
您需要编写自定义清理算法来清理输入的拼写错误
您可以使用 MySQL full Text search:
我也想向你解释一下 Boolean Full Text Search; But I advise you to please go through Full Text Search using Query Expansion。
让我们看一下 table 上给出的示例 dev.mysql.com:
mysql> select * from articles;
+----+-----------------------+------------------------------------------+
| id | title | body |
+----+-----------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 2 | How To Use MySQL Well | After you went through a ... |
| 3 | Optimizing MySQL | In this tutorial we will show ... |
| 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 6 | MySQL Security | When configured properly, MySQL ... |
+----+-----------------------+------------------------------------------+
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('"database comparison"' IN BOOLEAN MODE);
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
顺序很重要,引用的话:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('"comparison database"' IN BOOLEAN MODE);
Empty set (0.01 sec)
当我们删除引号时,它将搜索包含单词 "database" 或 "comparison":
的行mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('database comparison' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
现在顺序无关紧要:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('comparison database' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
如果我们想要获取包含单词 "PostgreSQL" 或短语 "database comparison" 的行,我们应该使用此请求:
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('PostgreSQL "database comparison"' IN BOOLEAN MODE);
+----+---------------------+------------------------------------------+
| id | title | body |
+----+---------------------+------------------------------------------+
| 1 | PostgreSQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+---------------------+------------------------------------------+
确保您要搜索的词不在 list of stopwords 中,它们将被忽略。
(显然像 'is'、'the' 这样的词是 stopwords 并且这些被忽略了)
要在布尔模式下增强结果排序,您可以使用以下查询:
(假设你在用户输入的字符串中总共有 2 个单词)那么。
SELECT column_names, MATCH (text) AGAINST ('word1 word2')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2' in boolean mode)
order by col1 desc;
(如果你在用户输入的字符串中有3个词)那么..
SELECT column_names, MATCH (text) AGAINST ('word1 word2 word3')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode)
order by col1 desc;
首先使用MATCH()
我们得到非布尔搜索模式的分数(更有特色)。 second MATCH()
确保我们真的只得到我们想要的结果(所有 3 个词)。