postgres

Author	SHA1	Message	Date
Heikki Linnakangas	add6c3179a	Make the streaming replication protocol messages architecture-independent. We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.	2012-11-07 19:09:13 +02:00
Heikki Linnakangas	7d3ed5ae78	Fix typo in comment. Fujii Masao	2012-10-15 13:01:31 +03:00
Heikki Linnakangas	6f60fdd701	Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.	2012-10-11 17:48:08 +03:00
Peter Eisentraut	8521d13194	Refactor flex and bison make rules Numerous flex and bison make rules have appeared in the source tree over time, and they are all virtually identical, so we can replace them by pattern rules with some variables for customization. Users of pgxs will also be able to benefit from this.	2012-10-11 06:57:04 -04:00
Heikki Linnakangas	0b77aebabf	Remove stray newline in comment.	2012-10-09 13:06:48 +03:00
Peter Eisentraut	b6d4522296	Remove generation of repl_gram.h It was apparently never necessary.	2012-10-08 20:36:46 -04:00
Heikki Linnakangas	9c0e2b9182	Fix walsender handling of postmaster shutdown, to not go into endless loop. This bug was introduced by my patch to use the regular die/quickdie signal handlers in walsender processes. I tried to make walsender exit at next CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you need to set InterruptPending too. On second thoght, it was not a very good way to make walsender exit anyway, so use proc_exit(0) instead. Also, send a CommandComplete message before exiting; that's what we did before, and you get a nicer error message in the standby that way. Reported by Thom Brown.	2012-10-08 13:32:14 +03:00
Heikki Linnakangas	fd5942c18f	Use the regular main processing loop also in walsenders. The regular backend's main loop handles signal handling and error recovery better than the current WAL sender command loop does. For example, if the client hangs and a SIGTERM is received before starting streaming, the walsender will now terminate immediately, rather than hang until the connection times out.	2012-10-05 17:21:12 +03:00
Tom Lane	05b555d12b	Fix tar files emitted by pg_dump and pg_basebackup to be POSIX conformant. Both programs got the "magic" string wrong, causing standard-conforming tar implementations to believe the output was just legacy tar format without any POSIX extensions. This doesn't actually matter that much, especially since pg_dump failed to fill the POSIX fields anyway, but still there is little point in emitting tar format if we can't be compliant with the standard. In addition, pg_dump failed to write the EOF marker correctly (there should be 2 blocks of zeroes not just one), pg_basebackup put the numeric group ID in the wrong place, and both programs had a pretty brain-dead idea of how to compute the checksum. Fix all that and improve the comments a bit. pg_restore is modified to accept either the correct POSIX-compliant "magic" string or the previous value. This part of the change will need to be back-patched to avoid an unnecessary compatibility break when a previous version tries to read tar-format output from 9.3 pg_dump. Brian Weaver and Tom Lane	2012-09-28 15:19:15 -04:00
Heikki Linnakangas	c4c227477b	Fix bugs in cascading replication with recovery_target_timeline='latest' The cascading replication code assumed that the current RecoveryTargetTLI never changes, but that's not true with recovery_target_timeline='latest'. The obvious upshot of that is that RecoveryTargetTLI in shared memory needs to be protected by a lock. A less obvious consequence is that when a cascading standby is connected, and the standby switches to a new target timeline after scanning the archive, it will continue to stream WAL to the cascading standby, but from a wrong file, ie. the file of the previous timeline. For example, if the standby is currently streaming from the middle of file 000000010000000000000005, and the timeline changes, the standby will continue to stream from that file. However, the WAL on the new timeline is in file 000000020000000000000005, so the standby sends garbage from 000000010000000000000005 to the cascading standby, instead of the correct WAL from file 000000020000000000000005. This also fixes a related bug where a partial WAL segment is restored from the archive and streamed to a cascading standby. The code assumed that when a WAL segment is copied from the archive, it can immediately be fully streamed to a cascading standby. However, if the segment is only partially filled, ie. has the right size, but only N first bytes contain valid WAL, that's not safe. That can happen if a partial WAL segment is manually copied to the archive, or if a partial WAL segment is archived because a server is started up on a new timeline within that segment. The cascading standby will get confused if the WAL it received is not valid, and will get stuck until it's restarted. This patch fixes that problem by not allowing WAL restored from the archive to be streamed to a cascading standby until it's been replayed, and thus validated.	2012-09-04 19:33:21 -07:00
Heikki Linnakangas	fe811ae810	Fix typos in README.	2012-08-31 11:30:11 +03:00
Simon Riggs	da4efa13d8	Turn off WalSender keepalives by default, users can enable if desired	2012-08-09 17:07:03 +01:00
Simon Riggs	87d8bd7c9f	Ensure all replication message info is available and correct via WalRcv	2012-08-09 17:03:59 +01:00
Tom Lane	4a9c30a8a1	Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.	2012-07-18 15:28:10 -04:00
Alvaro Herrera	f34c68f096	Introduce timeout handling framework Management of timeouts was getting a little cumbersome; what we originally had was more than enough back when we were only concerned about deadlocks and query cancel; however, when we added timeouts for standby processes, the code got considerably messier. Since there are plans to add more complex timeouts, this seems a good time to introduce a central timeout handling module. External modules register their timeout handlers during process initialization, and later enable and disable them as they see fit using a simple API; timeout.c is in charge of keeping track of which timeouts are in effect at any time, installing a common SIGALRM signal handler, and calling setitimer() as appropriate to ensure timely firing of external handlers. timeout.c additionally supports pluggable modules to add their own timeouts, though this capability isn't exercised anywhere yet. Additionally, as of this commit, walsender processes are aware of timeouts; we had a preexisting bug there that made those ignore SIGALRM, thus being subject to unhandled deadlocks, particularly during the authentication phase. This has already been fixed in back branches in commit 0bf8eb2a, which see for more details. Main author: Zoltán Böszörményi Some review and cleanup by Álvaro Herrera Extensive reworking by Tom Lane	2012-07-16 22:55:33 -04:00
Heikki Linnakangas	2686da9db2	Don't initialize TLI variable to -1, as TimeLineID is unsigned. This was causing a compiler warning with Solaris compiler. Use 0 instead. The variable is initialized just for the sake of tidyness and/or debugging, it's not used for anything before setting it to a real value. Per report and suggestion from Peter Eisentraut.	2012-07-14 21:04:53 +03:00
Magnus Hagander	0c4b468692	Always treat a standby returning an an invalid flush location as async This ensures that a standby such as pg_receivexlog will not be selected as sync standby - which would cause the master to block waiting for a location that could never happen. Fujii Masao	2012-07-04 15:14:42 +02:00
Robert Haas	f83b59997d	Make walsender more responsive. Per testing by Andres Freund, this improves replication performance and reduces replication latency and latency jitter. I was a bit concerned about moving more work into XLogInsert, but testing seems to show that it's not a problem in practice. Along the way, improve comments for WaitLatchOrSocket. Andres Freund. Review and stylistic cleanup by me.	2012-07-02 09:41:01 -04:00
Heikki Linnakangas	ec786c6c81	I neglected many comments in the log+seg -> 64-bit segno patch. Fix. Reported by Amit Kapila.	2012-06-27 17:53:53 +03:00
Peter Eisentraut	eeece9e609	Unify calling conventions for postgres/postmaster sub-main functions There was a wild mix of calling conventions: Some were declared to return void and didn't return, some returned an int exit code, some claimed to return an exit code, which the callers checked, but actually never returned, and so on. Now all of these functions are declared to return void and decorated with attribute noreturn and don't return. That's easiest, and most code already worked that way.	2012-06-25 21:30:12 +03:00
Robert Haas	c7d47abd04	Fix typo in DEBUG message, introduced by recent WAL refactoring. Fujii Masao	2012-06-25 14:00:35 -04:00
Heikki Linnakangas	0ab9d1c4b3	Replace XLogRecPtr struct with a 64-bit integer. This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.	2012-06-24 19:19:45 +03:00
Heikki Linnakangas	dfda6ebaec	Don't waste the last segment of each 4GB logical log file. The comments claimed that wasting the last segment made it easier to do calculations with XLogRecPtrs, because you don't have problems representing last-byte-position-plus-1 that way. In my experience, however, it only made things more complicated, because the there was two ways to represent the boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0, or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were picky about which representation was used. Also, use a 64-bit segment number instead of the log/seg combination, to point to a certain WAL segment. We assume that all platforms have a working 64-bit integer type nowadays. This is an incompatible change in WAL format, so bumping WAL version number.	2012-06-24 18:35:29 +03:00
Magnus Hagander	3595a71e9c	Prevent non-streaming replication connections from being selected sync slave This prevents a pg_basebackup backup session that just does a base backup (no xlog involved at all) from becoming the synchronous slave and thus blocking all access while it runs. Also fixes the problem when a higher priority slave shows up it would become the sync standby before it has reached the STREAMING state, by making sure we can only switch to a walsender that's actually STREAMING. Fujii Masao	2012-06-11 15:17:38 +02:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Tom Lane	ed61127be4	Cast some printf arguments to avoid possibly-nonportable behavior. Per compiler warnings on buildfarm member black_firefly.	2012-03-23 20:18:04 -04:00
Simon Riggs	ba1868ba31	Minor bug fix and cleanup from self-review of sync rep queues patch.	2012-01-30 14:36:17 +00:00
Simon Riggs	73f617f13f	Various minor comments changes from bgwriter to checkpointer.	2012-01-30 14:34:25 +00:00
Simon Riggs	8366c7803e	Allow pg_basebackup from standby node with safety checking. Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications	2012-01-25 18:02:04 +00:00
Simon Riggs	443b4821f1	Add new replication mode synchronous_commit = 'write'. Replication occurs only to memory on standby, not to disk, so provides additional performance if user wishes to reduce durability level slightly. Adds concept of multiple independent sync rep queues. Fujii Masao and Simon Riggs	2012-01-24 20:22:37 +00:00
Robert Haas	4d0b11a0ca	Typo fix.	2012-01-13 08:21:45 -05:00
Simon Riggs	3f1787c253	Minor but necessary improvements to WAL keepalives Fujii Masao	2012-01-13 12:59:08 +00:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Simon Riggs	64233902d2	Send new protocol keepalive messages to standby servers. Allows streaming replication users to calculate transfer latency and apply delay via internal functions. No external functions yet.	2011-12-31 13:30:26 +00:00
Tom Lane	0d0ec527af	Fix corner cases in readlink() usage. Make sure all calls are protected by HAVE_READLINK, and get the buffer overflow tests right. Be a bit more paranoid about string length in _tarWriteHeader(), too.	2011-12-07 13:34:13 -05:00
Magnus Hagander	1f422db663	Avoid using readlink() on platforms that don't support it We don't have any such platforms now, but might in the future. Also, detect cases when a tablespace symlink points to a path that is longer than we can handle, and give a warning.	2011-12-07 12:09:05 +01:00
Robert Haas	ed0b409d22	Move "hot" members of PGPROC into a separate PGXACT array. This speeds up snapshot-taking and reduces ProcArrayLock contention. Also, the PGPROC (and PGXACT) structures used by two-phase commit are now allocated as part of the main array, rather than in a separate array, and we keep ProcArray sorted in pointer order. These changes are intended to minimize the number of cache lines that must be pulled in to take a snapshot, and testing shows a substantial increase in performance on both read and write workloads at high concurrencies. Pavan Deolasee, Heikki Linnakangas, Robert Haas	2011-11-25 08:02:10 -05:00
Simon Riggs	9aceb6ab3c	Refactor xlog.c to create src/backend/postmaster/startup.c Startup process now has its own dedicated file, just like all other special/background processes. Reduces role and size of xlog.c	2011-11-02 14:25:01 +00:00
Peter Eisentraut	654e1f96b0	Clean up whitespace and indentation in parser and scanner files These are not touched by pgindent, so clean them up a bit manually.	2011-11-01 21:51:30 +02:00
Simon Riggs	f3ebaad45b	Comment changes to show bgwriter no longer performs checkpoints.	2011-11-01 18:48:47 +00:00
Heikki Linnakangas	b436c72f61	Fix overly-complicated usage of errcode_for_file_access(). No need to do "errcode(errcode_for_file_access())", just "errcode_for_file_access()" is enough. The extra errcode() call is useless but harmless, so there's no user-visible bug here. Nevertheless, backpatch to 9.1 where this code were added.	2011-10-22 20:19:50 +03:00
Tom Lane	b4a0223d00	Simplify and improve ProcessStandbyHSFeedbackMessage logic. There's no need to clamp the standby's xmin to be greater than GetOldestXmin's result; if there were any such need this logic would be hopelessly inadequate anyway, because it fails to account for within-database versus cluster-wide values of GetOldestXmin. So get rid of that, and just rely on sanity-checking that the xmin is not wrapped around relative to the nextXid counter. Also, don't reset the walsender's xmin if the current feedback xmin is indeed out of range; that just creates more problems than we already had. Lastly, don't bother to take the ProcArrayLock; there's no need to do that to set xmin. Also improve the comments about this in GetOldestXmin itself.	2011-10-20 19:43:31 -04:00
Magnus Hagander	d1e25b78f9	Exclude postmaster.opts from base backups Noted by Fujii Masao	2011-10-18 15:58:37 +02:00
Magnus Hagander	7aeff9f4a4	Ensure walsenders can be SIGTERMed while in non-walsender code In oder to exit on SIGTERM when in non-walsender code, such as do_pg_stop_backup(), we need to set the interrupt variables that are used there, and not just the walsender local ones.	2011-10-06 21:43:14 +02:00
Alvaro Herrera	86822df9b5	Split walsender.h in public/private headers This dramatically cuts short the number of headers the public one brings into whatever includes it.	2011-09-13 21:42:49 -03:00
Tom Lane	a7801b62f2	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.	2011-09-09 13:23:41 -04:00
Alvaro Herrera	295e7dc929	Tweak string for uniformity	2011-09-08 16:39:58 -03:00
Simon Riggs	dde70cc313	Emit cascaded standby message on shutdown only when appropriate. Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas	2011-09-07 09:09:47 +01:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00

1 2 3 4

186 Commits