错误的 Java 到无模式集合的 Solr 类型映射

Wrong Java to Solr type mapping with Schemaless Collection

我正在使用 SolrJ 将 POJO 索引到 Solr,并且带有数值的字符串 属性 被映射到 org.apache.solr.schema.TrieLongField 类型,这反过来会导致 BindingException 当我尝试从 Solr 检索文档。

我的 class 在设置器上用 @Field 注释,我正在添加带有 client.addBean(object) 的文档。

以下代码可以重现此问题:

public class SolrIndexTest {
    @Field
    public Long longField;
    @Field
    public String stringField;

    public static void main(String[] args) {
        //test core created with the following command
        //sudo su - solr -c  "/opt/solr/bin/solr create -c test -n data_driven_schema_configs"

        HttpSolrClient client = new HttpSolrClient.Builder("http://localhost:8983/solr/test").build();
        client.setParser(new XMLResponseParser());

        SolrIndexTest obj1 = new SolrIndexTest();
        obj1.longField = 1L;
        obj1.stringField = "1"; // 1st doc: numeric value
        SolrIndexTest obj2 = new SolrIndexTest();
        obj2.longField = 2L;
        obj2.stringField = "Text string"; // 2nd doc: text value

        try {
            client.addBean(obj1);
            client.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            client.addBean(obj2); // This line will throw a BindingException
            client.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

当您运行 Schemaless mode 中的 Solr 集合时,字段类型(双精度、整数、字符串等)由添加到字段名称的后缀获取。 或者通过猜测字段类型,Boolean、Integer、Long、Float、Double 和 Date 的解析器当前可用(不是 String)。

Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all controlled via solrconfig.xml, are:

  1. Managed schema: Schema modifications are made at runtime through Solr APIs, which requires the use of schemaFactory that supports these changes - see Schema Factory Definition in SolrConfig for more details.
  2. Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
  3. Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types.

简而言之,如果您想正确映射您的字段类型,只需添加正确的后缀:

@Field
public Long longField_l; // _l stands for long
@Field
public String stringField_s; // _s stands for string

你会看到预期的结果:

<doc>
    <long name="longField_l">1</long>
    <str name="stringField_s">1</str>
</doc>
<doc>
    <long name="longField_l">2</long>
    <str name="stringField_s">Text string</str>
</doc>

如果最后打开 managed-schema 文件,您将看到用于映射类型的动态字段列表。 我在这里复制了其中的一些:

<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_p" type="location" indexed="true" stored="true"/>
<dynamicField name="*_c" type="currency" indexed="true" stored="true"/>