SOLR 8.1.1 EdgeNGramFilterFactory解析查询
SOLR 8.1.1 EdgeNGramFilterFactory parsing query
我有一个 SOLR 4.10.2 核心,我正在升级到 8.1.1。
我使用 default_config 设置手动创建了一个 8.1.1 核心,然后将设置引入 8.1.1 模式。
我已经调整了schema.xml和solrconfig.xml,并且我在8.1.1中有核心可查询。
我有一个名为公司的字段:
<field name="Company" type="string" indexed="true" stored="true"/>
<field name="IDX_Company" type="text_general" indexed="true" stored="false" multiValued="true" />
<copyField source="Company" dest="IDX_Company"/>
在4.10.2的时候我运行查询:
IDX_Company:蓝色
打开 debugQuery,我看到查询被解析成多个部分(正确地)
"debug": {
"rawquerystring": "IDX_Company:blue",
"querystring": "IDX_Company:blue",
"parsedquery": "(IDX_Company:b IDX_Company:bl IDX_Company:blu IDX_Company:blue)/no_coord",
...
当我针对 8.1.1 运行 启用 debugQuery 时,我得到以下信息:
"debug":{
"rawquerystring":"IDX_Company:blue",
"querystring":"IDX_Company:blue",
"parsedquery":"IDX_Company:blue",
...
它似乎没有应用 EdgeNGramFilterFactory - 根据文档,我对 EdgeNGramFilterFactory 配置所做的唯一更改是删除 "side" 属性。
此外,根据文档,我将 SynonymFilterFactory 替换为 SynonmGraphFilterFactory,并添加了 FlattenGraphFilterFactory。
我尝试删除 FlattenGraphFilterFactory,我已经清除并重新填充核心(重新索引),我已经停止并启动 SOLR 8.1.1,没有区别。
这是我在schema.xml
中使用的text_general的定义
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- RDH SynonymFilterFactory has been deprecated, replace with SynonymGraphFilterFactory -->
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
Flatten Graph Filter
This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
-->
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- strip all punctuation -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- RDH SynonymFilterFactory is deprecated, replace with SynonymGraphFilterFactory -->
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
Flatten Graph Filter
This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
-->
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- strip all punctuation -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
</analyzer>
</fieldType>
虽然我通过清除数据并将数据发布到核心中来重新加载信息,但我忽略了转到核心管理页面,选择核心,然后单击重新加载按钮。
现在查询已按预期解析。
我有一个 SOLR 4.10.2 核心,我正在升级到 8.1.1。
我使用 default_config 设置手动创建了一个 8.1.1 核心,然后将设置引入 8.1.1 模式。
我已经调整了schema.xml和solrconfig.xml,并且我在8.1.1中有核心可查询。
我有一个名为公司的字段:
<field name="Company" type="string" indexed="true" stored="true"/>
<field name="IDX_Company" type="text_general" indexed="true" stored="false" multiValued="true" />
<copyField source="Company" dest="IDX_Company"/>
在4.10.2的时候我运行查询:
IDX_Company:蓝色
打开 debugQuery,我看到查询被解析成多个部分(正确地)
"debug": {
"rawquerystring": "IDX_Company:blue",
"querystring": "IDX_Company:blue",
"parsedquery": "(IDX_Company:b IDX_Company:bl IDX_Company:blu IDX_Company:blue)/no_coord",
...
当我针对 8.1.1 运行 启用 debugQuery 时,我得到以下信息:
"debug":{
"rawquerystring":"IDX_Company:blue",
"querystring":"IDX_Company:blue",
"parsedquery":"IDX_Company:blue",
...
它似乎没有应用 EdgeNGramFilterFactory - 根据文档,我对 EdgeNGramFilterFactory 配置所做的唯一更改是删除 "side" 属性。 此外,根据文档,我将 SynonymFilterFactory 替换为 SynonmGraphFilterFactory,并添加了 FlattenGraphFilterFactory。
我尝试删除 FlattenGraphFilterFactory,我已经清除并重新填充核心(重新索引),我已经停止并启动 SOLR 8.1.1,没有区别。
这是我在schema.xml
中使用的text_general的定义<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- RDH SynonymFilterFactory has been deprecated, replace with SynonymGraphFilterFactory -->
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
Flatten Graph Filter
This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
-->
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- strip all punctuation -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- RDH SynonymFilterFactory is deprecated, replace with SynonymGraphFilterFactory -->
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
Flatten Graph Filter
This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
-->
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- strip all punctuation -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
</analyzer>
</fieldType>
虽然我通过清除数据并将数据发布到核心中来重新加载信息,但我忽略了转到核心管理页面,选择核心,然后单击重新加载按钮。
现在查询已按预期解析。