solr如何对文件进行排序?

How solr ranks documents?

我在 solr 中使用以下配置为我的文档文本编制了索引:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
            <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> -->
            <filter class="solr.LowerCaseFilterFactory" />              
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
</fieldType>

<field name="desc" type="text_general" indexed="true" stored="true" multiValued="false"/>

和一个测试查询

desc:Alabama Crimson Tide Toddler Crimson Team Logo Flannel Pajama Pants

Returns 前 2 个结果如下:

{

"id":"_:node1b897e5ffccc354e5da5128066e2e9e4|https://www.crookscountry.com/product/alabama-greatest-hits",
    "name":"Alabama - Greatest Hits",
    "source_entity_index":"prod03",
    "category":"",
    "category_str":"",
    "desc":"Alabama ~ Alabama - Greatest Hits",
    "host":"www.crookscountry.com",
    "url":"https://www.crookscountry.com/product/alabama-greatest-hits",
    "_version_":1652845859059007489},
  {
    "id":"_:noded8c4ca8e98bb12e1132af18c76f277b|https://shop.spreadshirt.com/thatshirtcray/amateur+sketch+shirt-A12174934",
    "name":"Amateur Sketch Shirt | Men's T-Shirt",
    "source_entity_index":"prod03",
    "category":"",
    "category_str":"",
    "desc":"Leprechaun in Alabama amateur sketch.",
    "host":"shop.spreadshirt.com",
    "url":"https://shop.spreadshirt.com/thatshirtcray/amateur+sketch+shirt-A12174934",
    "_version_":1652846254331265025},

但是我真正想要排名高的文档排在前100之后,例如:

{
        "id":"_:nodec65a89504cb5f3af808caf654ac7cb72|http://shop.rolltide.com/Alabama_Crimson_Tide_Sweatshirts_And_Fleece_Sweaters",
        "host":"shop.rolltide.com",
        "name":"Men's Crimson Alabama Crimson Tide Big Logo Sweater",
        "text":"Show off your team spirit with this Alabama Crimson Tide Big Logo sweater.",
        "_version_":1646377538225700866},
      {
        "id":"_:nodeebc0adb5a11937556ebdf77132fab580|http://shop.foxsports.com/FOX_Alabama_Crimson_Tide_Sweaters_And_Dress_Shirts",
        "host":"shop.foxsports.com",
        "name":"Men's Crimson Alabama Crimson Tide Big Logo Sweater",
        "text":"Show off your team spirit with this Alabama Crimson Tide Big Logo sweater.",
        "_version_":1646383652576165892},

我不太明白默认的 solr 排名是如何工作的...它似乎更喜欢短文本,即使查询中只有一个重叠的词。无论如何我可以根据我的需要改变这个吗?

非常感谢!

Solr 文档排名依赖于Lucene Similarity

it seems that it favours short text, even if there is only one overlapping word with the query

此行为是由于字段长度规范化造成的。您可以设置 omitNorms=true 以禁用字段长度规范化(参见 https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#field-default-properties)。

有关更深入的解释,请参阅 this post

Alternatively/additionally 使用 (e)dismax 解析器,您可以使用 mm(又名 MinimumShouldMatch)参数来调整 - 不是排名 - 但 Solr 如何匹配文档。