Postgres 需要很长时间来保存具有空值的批量更新

Question

我在这里 Spring Batch Performance issue when run two independent jobs sequentially? 进一步扩展这个问题：

我们看到 Postgres v9.6 的行为，我们试图使用批量更新保存具有空值的数据，这需要很长时间才能保存数据。

有什么方法可以从数据库端或 Spring 引导端修复它？

示例查询 -

"INSERT INTO test.ACCT "
    + "(rec_type, acct_type_cd, src_acct_id, stat_cd,stat_dttm, ........, "
    + "..................) "
    + "VALUES(?, ? , ?, )";

注意 - 出于安全原因，不能按原样添加列名。

我们有如下 table 详细信息和查询

在几列中有很大比例的 NULL
定期收到大量更新或删除
增长不快
上面没有索引。
不使用任何可能正在执行数据库函数或直接调用函数的触发器

总人数我们批量插入的行数是 40K，并且在 2500 条记录的块中。 代码列表<地图<字符串，对象>>

 batchValues = new ArrayList<>(items.size());
for(Employee emp: items) {
    batchValues.add(new MapSqlParameterSource()
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX() , JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX(), JDBCType.DATE.getVendorTypeNumber())
            .addValue("", emp.getXXXXX()== null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            ........
            ........
            ........
            ........
            .getValues());
    
}

try {

    int[] updateCounts = namedJdbcTemplate.batchUpdate(SQL, batchValues.toArray(new Map[items.size()]));
} catch (Exception e) {
    log.error("Error occurred in BatchUpdate ##");
    throw new GenericException(e.getMessage(),this.getClass().getSimpleName()); 
}

批处理作业是运行顺序作业，首先是截断（速度很快），然后是其他批次插入（具有更多空值）会消耗性能。

Answer 1

感谢 M 的帮助。 Deinum 和 a_horse_with_no_name，总是提供很大的帮助。

按照此处的建议 and Section 5.3.2 - https://docs.spring.io/spring/docs/current/spring-framework-reference/data-access.html#jdbc-batch-list。

In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).

Alternatively, you might consider specifying the corresponding JDBC types explicitly, either through a 'BatchPreparedStatementSetter' (as shown earlier), through an explicit type array given to a 'List<Object[]>' based call, through 'registerSqlType' calls on a custom 'MapSqlParameterSource' instance, or through a 'BeanPropertySqlParameterSource' that derives the SQL type from the Java-declared property type even for a null value.

我创建了 spring.properties 并添加了 spring.jdbc.getParameterType.ignore=true 解决了我的问题，现在只需 7-10 秒即可将 1800 条记录加载到 3 个不同的表中，10 列中有 5 列为 NULL 值。

Postgres 需要很长时间来保存具有空值的批量更新

Postgres taking a long time to save the batch updates with the null values

postgresql

spring

postgresql-performance

spring-boot

postgresql-9.6