为什么 Jackson 在重建用于为 CSV 构建记录行的配置模式时包装包含逗号的字符串?

Why Jackson wrap the string containing comma when rebuilding the configuration schema for building a record line for a CSV?

我正在从存储库获取数据并将其放入 CSV 文件中。为了建立一条记录线,我使用杰克逊。如果该字段值包含逗号,我的目标是用双引号将字段(字符串类型)括起来。所以输出应该是这样的:

some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等
some-uuid-value,"some string, but with comma",一些东西,123456,www.some.url,等等
some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等

我想出了这个代码:

private String toCsvString(EntityCsvRecord entity) {

        CsvMapper mapper = new CsvMapper();
        CsvSchema schema = mapper.schemaFor(EntityCsvRecord.class).withoutQuoteChar();

        if (entity.getName() == null) {
            entity.setName("");
        }

        if (entity.getName().contains(",")) {
            String columnName = "name";
            int nameColumnIndex = schema.column(columnName).getIndex();
            schema = mapper
                .configure(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING, true)
                .schemaFor(EntityCsvRecord.class)
                .rebuild()
                .replaceColumn(nameColumnIndex, new CsvSchema.Column(nameColumnIndex, columnName))
                .build();
        }

        try {
            return mapper.writer(schema).writeValueAsString(entity);
        } catch (Exception e) {
            ...
        }
    }

但是,我不明白为什么会这样,我在文档中找不到任何下降线索。

有人能解开这个谜团吗?

整个技巧就是启用 CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING 功能。来自文档:

Feature that determines how much work is done before determining that a column value requires quoting: when set as true, full check is made to only use quoting when it is strictly necessary; but when false, a faster but more conservative check is made, and possibly quoting is used for values that might not need it. Trade-offs is basically between optimal/minimal quoting (true), and faster handling (false). Faster check involves only checking first N characters of value, as well as possible looser checks.

Note, however, that regardless setting, all values that need to be quoted will be: it is just that when set to false, other values may also be quoted (to avoid having to do more expensive checks).

Default value is false for "loose" (approximate, conservative) checking.

您可以删除模式和映射器的所有其他配置,它们将以相同的方式工作。您可以将其简化为以下代码:

class CsvEntityGenerator {

    private final CsvMapper mapper;
    private final CsvSchema schema;

    public CsvEntityGenerator(Class clazz) {
        mapper = new CsvMapper();
        mapper.enable(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING);

        schema = mapper.schemaFor(clazz).withNullValue("");
    }

    public String toCsvString(Object entity) throws IOException {
        return mapper.writer(schema).writeValueAsString(entity);
    }
}

简单用法:

CsvEntityGenerator gen = new CsvEntityGenerator(EntityCsvRecord.class);
System.out.print(gen.toCsvString(new EntityCsvRecord("Na,me")));
System.out.print(gen.toCsvString(new EntityCsvRecord(null)));
System.out.print(gen.toCsvString(new EntityCsvRecord("Name")));

打印:

8b572b1b-17c1-429d-887b-ec9af1c30d05,"Na,me",SOMETHING,123456,www.some.url
e86eacb1-d45e-4614-91bb-45f0d8840ea9,,SOMETHING,123456,www.some.url
e9627c32-6736-44a5-8eb2-7d153f86af20,Name,SOMETHING,123456,www.some.url

如您所见,我们只创建了一次 CsvMapperCsvSchema 并在我们想要序列化实体时重用它。这是更快的方法。