当所有术语出现在 parent 或 child 文档中时,Lucene BlockJoin 查询匹配
Lucene BlockJoin query to match when all terms appear in either parent or child docs
我用 parent 和 child 文档填充了一个索引,使用 "Blocks"。即使用 IndexWriter.addAll() 方法添加文档,最后一个文档是 parent 文档。
目前,我只成功搜索了 'Blocks',查询中的 any 项出现在 parent 或 child仁。这给了我歪斜的结果。例如我得到的最佳结果是其中一个术语在 'Block' 中多次出现,但其他术语根本没有出现。
我想搜索 'Blocks',查询中的 all 项必须出现在 parent 或 children 中。
但我不确定如何构建查询。
我目前的查询代码如下:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
//Construct join of child and parent query
BooleanQuery.Builder childAndParentQueryBuilder = new BooleanQuery.Builder();
childAndParentQueryBuilder.add(new BooleanClause(childQuery, BooleanClause.Occur.SHOULD));
childAndParentQueryBuilder.add(new BooleanClause(parentQueryBuilder.build(), BooleanClause.Occur.SHOULD));
Query childAndParentQuery = childAndParentQueryBuilder.build();
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);
以上代码将 return 排名靠前的结果,其中只有一个术语出现多次。例如如果 'foo' 在 parent 或 child 文档中出现 100 次。但是'bar'根本没有出现。
我想 return 只显示 所有 项(例如 'foo' 和 'bar')出现在 [=49] 中的结果=],或者其中之一是 children.
一个选项是在 Parent 文档中创建一个字段,它是 parent 和 child 文档中所有 textContent 字段的集合,并且只搜索新的聚合字段。但是这些索引已经相当大了。 (例如 50GB)。而且我仍然需要将 textContent 分隔在 parent 和 children 中以便显示,因此创建聚合字段几乎会使索引的大小增加一倍。
如有任何帮助,我们将不胜感激。
我已经通过使用 DisjunctionMaxQuery 而不是 BooleanQuery 将父查询和子查询连接在一起解决了这个问题。
来自文档:
...We want the primary score to be the one associated with the highest
boost, not the sum of the field scores (as BooleanQuery would give).
If the query is "albino elephant" this ensures that "albino" matching
one field and "elephant" matching another gets a higher score than
"albino" matching both fields...
更新代码:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
Query parentQuery = parentQueryBuilder.build();
//Construct join of child and parent query
Query childAndParentQuery = new DisjunctionMaxQuery(Arrays.asList(childQuery, parentQuery), 0.5f);
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);
我用 parent 和 child 文档填充了一个索引,使用 "Blocks"。即使用 IndexWriter.addAll() 方法添加文档,最后一个文档是 parent 文档。
目前,我只成功搜索了 'Blocks',查询中的 any 项出现在 parent 或 child仁。这给了我歪斜的结果。例如我得到的最佳结果是其中一个术语在 'Block' 中多次出现,但其他术语根本没有出现。
我想搜索 'Blocks',查询中的 all 项必须出现在 parent 或 children 中。
但我不确定如何构建查询。
我目前的查询代码如下:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
//Construct join of child and parent query
BooleanQuery.Builder childAndParentQueryBuilder = new BooleanQuery.Builder();
childAndParentQueryBuilder.add(new BooleanClause(childQuery, BooleanClause.Occur.SHOULD));
childAndParentQueryBuilder.add(new BooleanClause(parentQueryBuilder.build(), BooleanClause.Occur.SHOULD));
Query childAndParentQuery = childAndParentQueryBuilder.build();
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);
以上代码将 return 排名靠前的结果,其中只有一个术语出现多次。例如如果 'foo' 在 parent 或 child 文档中出现 100 次。但是'bar'根本没有出现。
我想 return 只显示 所有 项(例如 'foo' 和 'bar')出现在 [=49] 中的结果=],或者其中之一是 children.
一个选项是在 Parent 文档中创建一个字段,它是 parent 和 child 文档中所有 textContent 字段的集合,并且只搜索新的聚合字段。但是这些索引已经相当大了。 (例如 50GB)。而且我仍然需要将 textContent 分隔在 parent 和 children 中以便显示,因此创建聚合字段几乎会使索引的大小增加一倍。
如有任何帮助,我们将不胜感激。
我已经通过使用 DisjunctionMaxQuery 而不是 BooleanQuery 将父查询和子查询连接在一起解决了这个问题。
来自文档:
...We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as BooleanQuery would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields...
更新代码:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
Query parentQuery = parentQueryBuilder.build();
//Construct join of child and parent query
Query childAndParentQuery = new DisjunctionMaxQuery(Arrays.asList(childQuery, parentQuery), 0.5f);
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);