kafka-connect Error: Unable to obtain valid replication slot

kafka-connect Error: Unable to obtain valid replication slot

我的应用程序中的 Kafka debezium-postgres 连接器抛出此错误:

org.apache.kafka.connect.errors.ConnectException: Unable to obtain valid replication slot. Make sure there are no long-running transactions running in parallel as they may hinder the allocation of the replication slot when starting this connector
    at io.debezium.connector.postgresql.connection.PostgresConnection.readReplicationSlotInfo(PostgresConnection.java:226)
    at io.debezium.connector.postgresql.connection.PostgresConnection.getReplicationSlotState(PostgresConnection.java:150)
    at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:98)
    at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:198)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

应用程序使用postgresql 9.6.11版本,max_replication_slots的值为10。我可以看到数据库中的活动逻辑replication_slot with confirmed_flush_lsn = null, restart_lsn = 3/93043310, catalog_xmin = 202656, active = t, datoid = 16407, slot_type = logical, active_pid = 32183, plugin = wal2json, slot_name = slot1, database = db1(我用虚拟值替换了插槽名称和数据库名称)
根据我的理解,因为此处的逻辑复制槽 confirmed_flush_lsn = null 导致此错误,因为它阻止连接器找到此槽。

我该如何解决这个问题以及为什么 confirmed_flush_lsn 值会为空?

我通过在连接器引用的 aws 中重新启动 rds 数据库实例来修复它,之后 confirmed_flush_lsn 的值被重置为一个非空值,有点类似于 (restart_lsn = 3/93043310)。 kafka-connect 能够按预期找到 replication_slot "slot1"。连接器也启动了。 这暂时解决了我的问题,但我仍然想首先了解什么设置 confirmed_flush_lsn =null 用于逻辑 replication_slot。