为什么 Jackson 在重建用于为 CSV 构建记录行的配置模式时包装包含逗号的字符串?
Why Jackson wrap the string containing comma when rebuilding the configuration schema for building a record line for a CSV?
我正在从存储库获取数据并将其放入 CSV 文件中。为了建立一条记录线,我使用杰克逊。如果该字段值包含逗号,我的目标是用双引号将字段(字符串类型)括起来。所以输出应该是这样的:
some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等
some-uuid-value,"some string, but with comma",一些东西,123456,www.some.url,等等
some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等
我想出了这个代码:
private String toCsvString(EntityCsvRecord entity) {
CsvMapper mapper = new CsvMapper();
CsvSchema schema = mapper.schemaFor(EntityCsvRecord.class).withoutQuoteChar();
if (entity.getName() == null) {
entity.setName("");
}
if (entity.getName().contains(",")) {
String columnName = "name";
int nameColumnIndex = schema.column(columnName).getIndex();
schema = mapper
.configure(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING, true)
.schemaFor(EntityCsvRecord.class)
.rebuild()
.replaceColumn(nameColumnIndex, new CsvSchema.Column(nameColumnIndex, columnName))
.build();
}
try {
return mapper.writer(schema).writeValueAsString(entity);
} catch (Exception e) {
...
}
}
但是,我不明白为什么会这样,我在文档中找不到任何下降线索。
有人能解开这个谜团吗?
整个技巧就是启用 CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING 功能。来自文档:
Feature that determines how much work is done before determining that
a column value requires quoting: when set as true, full check is made
to only use quoting when it is strictly necessary; but when false, a
faster but more conservative check is made, and possibly quoting is
used for values that might not need it. Trade-offs is basically
between optimal/minimal quoting (true), and faster handling (false).
Faster check involves only checking first N characters of value, as
well as possible looser checks.
Note, however, that regardless setting, all values that need to be quoted will be: it is just that
when set to false, other values may also be quoted (to avoid having to
do more expensive checks).
Default value is false for "loose" (approximate, conservative)
checking.
您可以删除模式和映射器的所有其他配置,它们将以相同的方式工作。您可以将其简化为以下代码:
class CsvEntityGenerator {
private final CsvMapper mapper;
private final CsvSchema schema;
public CsvEntityGenerator(Class clazz) {
mapper = new CsvMapper();
mapper.enable(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING);
schema = mapper.schemaFor(clazz).withNullValue("");
}
public String toCsvString(Object entity) throws IOException {
return mapper.writer(schema).writeValueAsString(entity);
}
}
简单用法:
CsvEntityGenerator gen = new CsvEntityGenerator(EntityCsvRecord.class);
System.out.print(gen.toCsvString(new EntityCsvRecord("Na,me")));
System.out.print(gen.toCsvString(new EntityCsvRecord(null)));
System.out.print(gen.toCsvString(new EntityCsvRecord("Name")));
打印:
8b572b1b-17c1-429d-887b-ec9af1c30d05,"Na,me",SOMETHING,123456,www.some.url
e86eacb1-d45e-4614-91bb-45f0d8840ea9,,SOMETHING,123456,www.some.url
e9627c32-6736-44a5-8eb2-7d153f86af20,Name,SOMETHING,123456,www.some.url
如您所见,我们只创建了一次 CsvMapper
和 CsvSchema
并在我们想要序列化实体时重用它。这是更快的方法。
我正在从存储库获取数据并将其放入 CSV 文件中。为了建立一条记录线,我使用杰克逊。如果该字段值包含逗号,我的目标是用双引号将字段(字符串类型)括起来。所以输出应该是这样的:
some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等
some-uuid-value,"some string, but with comma",一些东西,123456,www.some.url,等等
some-uuid-value,一些不带逗号的字符串,SOMETHING,123456,www.some.url,等等
我想出了这个代码:
private String toCsvString(EntityCsvRecord entity) {
CsvMapper mapper = new CsvMapper();
CsvSchema schema = mapper.schemaFor(EntityCsvRecord.class).withoutQuoteChar();
if (entity.getName() == null) {
entity.setName("");
}
if (entity.getName().contains(",")) {
String columnName = "name";
int nameColumnIndex = schema.column(columnName).getIndex();
schema = mapper
.configure(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING, true)
.schemaFor(EntityCsvRecord.class)
.rebuild()
.replaceColumn(nameColumnIndex, new CsvSchema.Column(nameColumnIndex, columnName))
.build();
}
try {
return mapper.writer(schema).writeValueAsString(entity);
} catch (Exception e) {
...
}
}
但是,我不明白为什么会这样,我在文档中找不到任何下降线索。
有人能解开这个谜团吗?
整个技巧就是启用 CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING 功能。来自文档:
Feature that determines how much work is done before determining that a column value requires quoting: when set as true, full check is made to only use quoting when it is strictly necessary; but when false, a faster but more conservative check is made, and possibly quoting is used for values that might not need it. Trade-offs is basically between optimal/minimal quoting (true), and faster handling (false). Faster check involves only checking first N characters of value, as well as possible looser checks.
Note, however, that regardless setting, all values that need to be quoted will be: it is just that when set to false, other values may also be quoted (to avoid having to do more expensive checks).
Default value is false for "loose" (approximate, conservative) checking.
您可以删除模式和映射器的所有其他配置,它们将以相同的方式工作。您可以将其简化为以下代码:
class CsvEntityGenerator {
private final CsvMapper mapper;
private final CsvSchema schema;
public CsvEntityGenerator(Class clazz) {
mapper = new CsvMapper();
mapper.enable(CsvGenerator.Feature.STRICT_CHECK_FOR_QUOTING);
schema = mapper.schemaFor(clazz).withNullValue("");
}
public String toCsvString(Object entity) throws IOException {
return mapper.writer(schema).writeValueAsString(entity);
}
}
简单用法:
CsvEntityGenerator gen = new CsvEntityGenerator(EntityCsvRecord.class);
System.out.print(gen.toCsvString(new EntityCsvRecord("Na,me")));
System.out.print(gen.toCsvString(new EntityCsvRecord(null)));
System.out.print(gen.toCsvString(new EntityCsvRecord("Name")));
打印:
8b572b1b-17c1-429d-887b-ec9af1c30d05,"Na,me",SOMETHING,123456,www.some.url
e86eacb1-d45e-4614-91bb-45f0d8840ea9,,SOMETHING,123456,www.some.url
e9627c32-6736-44a5-8eb2-7d153f86af20,Name,SOMETHING,123456,www.some.url
如您所见,我们只创建了一次 CsvMapper
和 CsvSchema
并在我们想要序列化实体时重用它。这是更快的方法。