流复制失败 "WAL segment has already been moved"
Streaming replication is failing with "WAL segment has already been moved"
我正在尝试在 Postgres 11.5
上实施 Master/Slave 流复制。我运行以下步骤 -
大师级
select pg_start_backup('replication-setup',true);
从属
停止了 postgres 11 数据库和 运行
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" --exclude pg_wal --exclude postgresql.pid --exclude pg_log MASTER:/var/lib/postgresql/11/main/* /var/lib/postgresql/11/main
硕士
select pg_stop_backup();
从属
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" MASTER:/var/lib/postgresql/11/main/pg_wal/* /var/lib/postgresql/11/main/pg_wal
我在 slave ~/11/main 文件夹中创建了 recovery.conf
文件
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name='my_repl_slot'
当我在 Slave 上启动 Postgres 时,我在 MASTER 和 SLAVE 日志上都收到错误 -
019-11-08 09:03:51.205 CST [27633] LOG: 00000: database system was interrupted; last known up at 2019-11-08 02:53:04 CST
2019-11-08 09:03:51.205 CST [27633] LOCATION: StartupXLOG, xlog.c:6388
2019-11-08 09:03:51.252 CST [27633] LOG: 00000: entering standby mode
2019-11-08 09:03:51.252 CST [27633] LOCATION: StartupXLOG, xlog.c:6443
2019-11-08 09:03:51.384 CST [27634] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.384 CST [27634] LOCATION: WalReceiverMain, walreceiver.c:383
2019-11-08 09:03:51.384 CST [27634] FATAL: XX000: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000012DB0000000C has already been removed
2019-11-08 09:03:51.384 CST [27634] LOCATION: libpqrcv_receive, libpqwalreceiver.c:772
2019-11-08 09:03:51.408 CST [27635] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.408 CST [27635] LOCATION: WalReceiverMain, walreceiver.c:383
问题是 START WAL - 00000001000012DB0000000C
在我 运行 pg_stop_backup()
之前一直可用,一旦 pg_stop_backup()
被存档并且不再可用执行。所以这不是 WAL 因 WAL_KEEP_SEGMENTS
低而被归档的问题。
postgres@SLAVE:~/11/main/pg_wal$ cat 00000001000012DB0000000C.00000718.backup
START WAL LOCATION: 12DB/C000718 (file 00000001000012DB0000000C)
STOP WAL LOCATION: 12DB/F4C30720 (file 00000001000012DB000000F4)
CHECKPOINT LOCATION: 12DB/C000750
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2019-11-07 15:47:26 CST
LABEL: replication-setup-mdurbha
START TIMELINE: 1
STOP TIME: 2019-11-08 08:48:35 CST
STOP TIMELINE: 1
我的 MASTER 已设置 archive_command
,并且我有可用的缺失 WAL。我将它们复制到 SLAVE 上的恢复目录中并尝试了下面的 recovery.conf
,但它仍然失败,MASTER 报告相同的 WAL segment has already been moved
错误。
知道如何解决这个问题吗?我过去在 Postgres 9.6 上使用 rsync 设置复制没有任何问题,但在 Postgres 11 上一直遇到这个问题。
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command='cp /var/lib/postgresql/restore/%f %p'
将 restore_command
放入 recovery.conf
中,可以恢复存档的 WAL 文件,你就可以了。
我正在尝试在 Postgres 11.5
上实施 Master/Slave 流复制。我运行以下步骤 -
大师级
select pg_start_backup('replication-setup',true);
从属 停止了 postgres 11 数据库和 运行
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" --exclude pg_wal --exclude postgresql.pid --exclude pg_log MASTER:/var/lib/postgresql/11/main/* /var/lib/postgresql/11/main
硕士
select pg_stop_backup();
从属
rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" MASTER:/var/lib/postgresql/11/main/pg_wal/* /var/lib/postgresql/11/main/pg_wal
我在 slave ~/11/main 文件夹中创建了 recovery.conf
文件
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name='my_repl_slot'
当我在 Slave 上启动 Postgres 时,我在 MASTER 和 SLAVE 日志上都收到错误 -
019-11-08 09:03:51.205 CST [27633] LOG: 00000: database system was interrupted; last known up at 2019-11-08 02:53:04 CST
2019-11-08 09:03:51.205 CST [27633] LOCATION: StartupXLOG, xlog.c:6388
2019-11-08 09:03:51.252 CST [27633] LOG: 00000: entering standby mode
2019-11-08 09:03:51.252 CST [27633] LOCATION: StartupXLOG, xlog.c:6443
2019-11-08 09:03:51.384 CST [27634] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.384 CST [27634] LOCATION: WalReceiverMain, walreceiver.c:383
2019-11-08 09:03:51.384 CST [27634] FATAL: XX000: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000012DB0000000C has already been removed
2019-11-08 09:03:51.384 CST [27634] LOCATION: libpqrcv_receive, libpqwalreceiver.c:772
2019-11-08 09:03:51.408 CST [27635] LOG: 00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.408 CST [27635] LOCATION: WalReceiverMain, walreceiver.c:383
问题是 START WAL - 00000001000012DB0000000C
在我 运行 pg_stop_backup()
之前一直可用,一旦 pg_stop_backup()
被存档并且不再可用执行。所以这不是 WAL 因 WAL_KEEP_SEGMENTS
低而被归档的问题。
postgres@SLAVE:~/11/main/pg_wal$ cat 00000001000012DB0000000C.00000718.backup
START WAL LOCATION: 12DB/C000718 (file 00000001000012DB0000000C)
STOP WAL LOCATION: 12DB/F4C30720 (file 00000001000012DB000000F4)
CHECKPOINT LOCATION: 12DB/C000750
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2019-11-07 15:47:26 CST
LABEL: replication-setup-mdurbha
START TIMELINE: 1
STOP TIME: 2019-11-08 08:48:35 CST
STOP TIMELINE: 1
我的 MASTER 已设置 archive_command
,并且我有可用的缺失 WAL。我将它们复制到 SLAVE 上的恢复目录中并尝试了下面的 recovery.conf
,但它仍然失败,MASTER 报告相同的 WAL segment has already been moved
错误。
知道如何解决这个问题吗?我过去在 Postgres 9.6 上使用 rsync 设置复制没有任何问题,但在 Postgres 11 上一直遇到这个问题。
standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command='cp /var/lib/postgresql/restore/%f %p'
将 restore_command
放入 recovery.conf
中,可以恢复存档的 WAL 文件,你就可以了。