From acf1dd42342d6d84ca8e7a1998335e2e8809759e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 17 Apr 2022 10:22:03 +1200 Subject: [PATCH] Don't retry restore_command while reading ahead. Suppress further attempts to read ahead in the WAL if we run out of data, until the records already decoded have been replayed. This restores the traditional behavior for continuous archive recovery, which is to retry the failing restore_command only every 5 seconds. With the coding in 5dc0418f, we would start retrying every time through the recovery loop when our WAL decoding window hit the end of the current segment and we tried to look ahead into a not-yet-available next file. That was very slow. Also change the no_readahead_until mechanism to use <= rather than <, which seems more useful. Otherwise we'd either get one extra unwanted retry of restore_command, or we'd need to add 1 to an LSN. No change in behavior for regular streaming. That was already limited by the flushedUpto variable, which won't be updated until we replay what we have already. Reported by Andres Freund while analyzing the failure of a TAP test on build farm animal skink (investigation ongoing but probably due to otherwise unrelated timing bugs triggered by this slowness magnified by valgrind). Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de --- src/backend/access/transam/xlogprefetcher.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c index f3428888d2..959e409466 100644 --- a/src/backend/access/transam/xlogprefetcher.c +++ b/src/backend/access/transam/xlogprefetcher.c @@ -487,8 +487,8 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn) */ nonblocking = XLogReaderHasQueuedRecordOrError(reader); - /* Certain records act as barriers for all readahead. */ - if (nonblocking && replaying_lsn < prefetcher->no_readahead_until) + /* Readahead is disabled until we replay past a certain point. */ + if (nonblocking && replaying_lsn <= prefetcher->no_readahead_until) return LRQ_NEXT_AGAIN; record = XLogReadAhead(prefetcher->reader, nonblocking); @@ -496,8 +496,13 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn) { /* * We can't read any more, due to an error or lack of data in - * nonblocking mode. + * nonblocking mode. Don't try to read ahead again until + * we've replayed everything already decoded. */ + if (nonblocking && prefetcher->reader->decode_queue_tail) + prefetcher->no_readahead_until = + prefetcher->reader->decode_queue_tail->lsn; + return LRQ_NEXT_AGAIN; }