Lucene 问题搜索连字符字段
Lucene problems searchinh hyphenated field
我在使用 Lucene 时遇到了一些问题,这让我抓狂。我有以下字段:
doc.Add(new Field("cataloguenumber", i.CatalogueNumber.ToLower(), Field.Store.YES, Field.Index.ANALYZED));
其中将包含如下所示的目录号:
- DF-GH5
- 东风-FJ4
- DF-DOG
- AC-DP
- AC-123
- AC-DOCO
即两个字符后跟连字符后跟 2-5 个字母数字字符。
我正在尝试 运行 布尔查询以允许用户搜索数据:
// specify the search fields, lucene search in multiple fields
string[] searchfields = new string[] { "cataloguenumber", "title", "author", "categories", "year", "length", "keyword", "description" };
// Making a boolean query for searching and get the searched hits
BooleanQuery mainQuery = new BooleanQuery();
QueryParser parser;
//Add filter for main keyword
parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
parser.AllowLeadingWildcard = true;
mainQuery.Add(parser.Parse(GetMainSearchQueryString(SearchPhrase)), Occur.MUST);
除了 cataloguenumber 之外的所有字段,系统都工作正常,无论出于何种原因根本无法工作。
理想情况下,我们希望能够按完整或部分目录编号进行搜索,例如 "DF-" 应该 return 所有以 DF
为前缀的项目
有谁知道我该怎么做?
非常感谢
奥利
一个常见的问题来源是在 index-time 和 query-time 上使用不同的分析器。您应该能够通过使用 StandardAnalyzer
获得良好的结果 - 它将文本 DF-GH5
视为单个标记,因此您将能够使用 fx df-gh5
或 df-*
进行搜索但请务必将其用于 IndexWriter
和 QueryParser
.
这里是一个简单的例子,它用单个文档建立一个 in-memory 索引,并尝试通过 cataloguenumber
.
查询索引
public static void Test()
{
// Use an in-memory index.
RAMDirectory indexDirectory = new RAMDirectory();
// Make sure to use the same analyzer for indexing
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
// Add single document to the index.
using (IndexWriter writer = new IndexWriter(indexDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
Document document = new Document();
document.Add(new Field("content", "This is just some text", Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("cataloguenumber", "DF-GH5", Field.Store.YES, Field.Index.ANALYZED));
writer.AddDocument(document);
}
var parser = new MultiFieldQueryParser(
Lucene.Net.Util.Version.LUCENE_30,
new[] { "cataloguenumber", "content" },
analyzer);
var searcher = new IndexSearcher(indexDirectory);
DoSearch("df-gh5", parser, searcher);
DoSearch("df-*", parser, searcher);
}
private static void DoSearch(string queryString, MultiFieldQueryParser parser, IndexSearcher searcher)
{
var query = parser.Parse(queryString);
TopDocs docs = searcher.Search(query, 10);
foreach (ScoreDoc scoreDoc in docs.ScoreDocs)
{
Document searchHit = searcher.Doc(scoreDoc.Doc);
string cataloguenumber = searchHit.GetValues("cataloguenumber").FirstOrDefault();
string content = searchHit.GetValues("content").FirstOrDefault();
Console.WriteLine($"Found object: {cataloguenumber} {content}");
}
}
我在使用 Lucene 时遇到了一些问题,这让我抓狂。我有以下字段:
doc.Add(new Field("cataloguenumber", i.CatalogueNumber.ToLower(), Field.Store.YES, Field.Index.ANALYZED));
其中将包含如下所示的目录号:
- DF-GH5
- 东风-FJ4
- DF-DOG
- AC-DP
- AC-123
- AC-DOCO
即两个字符后跟连字符后跟 2-5 个字母数字字符。
我正在尝试 运行 布尔查询以允许用户搜索数据:
// specify the search fields, lucene search in multiple fields
string[] searchfields = new string[] { "cataloguenumber", "title", "author", "categories", "year", "length", "keyword", "description" };
// Making a boolean query for searching and get the searched hits
BooleanQuery mainQuery = new BooleanQuery();
QueryParser parser;
//Add filter for main keyword
parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
parser.AllowLeadingWildcard = true;
mainQuery.Add(parser.Parse(GetMainSearchQueryString(SearchPhrase)), Occur.MUST);
除了 cataloguenumber 之外的所有字段,系统都工作正常,无论出于何种原因根本无法工作。
理想情况下,我们希望能够按完整或部分目录编号进行搜索,例如 "DF-" 应该 return 所有以 DF
为前缀的项目有谁知道我该怎么做?
非常感谢
奥利
一个常见的问题来源是在 index-time 和 query-time 上使用不同的分析器。您应该能够通过使用 StandardAnalyzer
获得良好的结果 - 它将文本 DF-GH5
视为单个标记,因此您将能够使用 fx df-gh5
或 df-*
进行搜索但请务必将其用于 IndexWriter
和 QueryParser
.
这里是一个简单的例子,它用单个文档建立一个 in-memory 索引,并尝试通过 cataloguenumber
.
public static void Test()
{
// Use an in-memory index.
RAMDirectory indexDirectory = new RAMDirectory();
// Make sure to use the same analyzer for indexing
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
// Add single document to the index.
using (IndexWriter writer = new IndexWriter(indexDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
Document document = new Document();
document.Add(new Field("content", "This is just some text", Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("cataloguenumber", "DF-GH5", Field.Store.YES, Field.Index.ANALYZED));
writer.AddDocument(document);
}
var parser = new MultiFieldQueryParser(
Lucene.Net.Util.Version.LUCENE_30,
new[] { "cataloguenumber", "content" },
analyzer);
var searcher = new IndexSearcher(indexDirectory);
DoSearch("df-gh5", parser, searcher);
DoSearch("df-*", parser, searcher);
}
private static void DoSearch(string queryString, MultiFieldQueryParser parser, IndexSearcher searcher)
{
var query = parser.Parse(queryString);
TopDocs docs = searcher.Search(query, 10);
foreach (ScoreDoc scoreDoc in docs.ScoreDocs)
{
Document searchHit = searcher.Doc(scoreDoc.Doc);
string cataloguenumber = searchHit.GetValues("cataloguenumber").FirstOrDefault();
string content = searchHit.GetValues("content").FirstOrDefault();
Console.WriteLine($"Found object: {cataloguenumber} {content}");
}
}