Solr 给出了错误的字段长度

Solr is giving wrong FIeld Length

我的功能列表如下:

[
   {
    "store": "myfeature_store",
    "name" : "titleLength",
    "class" : "org.apache.solr.ltr.feature.FieldLengthFeature",
    "params" : {
    "field":"title" 
     }
   }
]

当我搜索以下查询时:

curl -g 'http://localhost:8983/solr/nutch/select?indent=on&q=python&wt=json&fl=title,score,[features%20efi.query=python%20store=myfeature_store]'

我得到以下结果:

{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"python",
      "indent":"on",
      "fl":"title,score,[features efi.query=python store=myfeature_store]",
      "wt":"json"}},
  "response":{"numFound":793,"start":0,"maxScore":0.33828905,"docs":[
      {
        "title":"Newest 'python' Questions - Stack Overflow",
        "score":0.33828905,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Newest 'python-3.x' Questions - Stack Overflow",
        "score":0.14434122,
        "[features]":"titleLength=5349.8774"},
      {
        "title":"Geographic Information Systems Stack Exchange",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow em Português",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow en español",
        "score":0.07460209,
        "[features]":"titleLength=2621.44"},
      {
        "title":"Hot Questions - Stack Exchange",
        "score":0.06534503,
        "[features]":"titleLength=655.36"},
      {
        "title":"Code Review Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Software Recommendations Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Raspberry Pi Stack Exchange",
        "score":0.042962566,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Welcome to The Apache Software Foundation!",
        "score":0.042862184,
        "[features]":"titleLength=455.1111"}]
  }}

正如你所看到的,titleLength 完全错了。例如,对于最后一个结果,标题是 Welcome to The Apache Software Foundation!titleLength 应该是 5,但现在是 455.1111。问题可能在哪里?

titleLength 处理程序使用为字段存储的规范 - 这些规范被映射到 a lookup table of floats with 256 possible values. These values are not expected to be exact (since the length of a field can be larger than 256), but to map the whole space of 2^31 integer values 到单个字节中。

这还包括任何索引时间提升,因此如果在索引时某个字段被提升(例如通过 Nutch 插件),这将反映在为该字段存储的规范中。您不能依赖 titleLength 是为该文档的字段存储的确切术语数,但它表示该字段的 "boost"。