When using poll(), EOF on a socket is reported with the POLLHUP not
POLLIN flag (at least on Linux). WaitLatchOrSocket failed to check
this bit, causing it to go into a busy-wait loop if EOF occurs.
We earlier fixed the same mistake in the test for the state of the
postmaster_alive socket, but missed it for the caller-supplied socket.
Fortunately, this error is new in 9.2, since 9.1 only had a select()
based code path not a poll() based one.
Since we've got an "open items" list item about this, apparently some
people are pretty worried about it.
In passing remove a lot of trailing whitespace.
This example was quite old: it lacked the WAL writer and autovac launcher
as well as the more recently added checkpointer. Linux "ps" seems to show
slightly different stuff now too.
The string representation of ImportError changed. Remove printing
that; it's not necessary for the test.
The order in which members of a dict are printed changed. But this
was always implementation-dependent, so we have just been lucky for a
long time. Do the printing the hard way to ensure sorted order.
We previously recognized that citext wouldn't get marked as collatable
during pg_upgrade from a pre-9.1 installation, and hacked its
create-from-unpackaged script to manually perform the necessary catalog
adjustments. However, we overlooked the fact that domains over citext,
as well as the citext[] array type, need the same adjustments. Extend
the script to handle those cases.
Also, the documentation suggested that this was only an issue in pg_upgrade
scenarios, which is quite wrong; loading any dump containing citext from a
pre-9.1 server will also result in the type being wrongly marked.
I approached the documentation problem by changing the 9.1.2 release note
paragraphs about this issue, which is historically inaccurate. But it
seems better than having the information scattered in multiple places, and
leaving incorrect info in the 9.1.2 notes would be bad anyway. We'll still
need to mention the issue again in the 9.1.4 notes, but perhaps they can
just reference 9.1.2 for fix instructions.
Per report from Evan Carroll. Back-patch into 9.1.
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.
This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.
This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
Rewrite description of "include_if_exists" for clarity. Add subsection
headings to make the structure of the page a little clearer. A couple
other minor improvements too.
Josh Kupershmidt and Tom Lane
HEAD documentation was failing to build as US PDF for me, because a link
to "CREATE CAST" was getting split across pages. Adjust wording to
remove this rather gratuitous cross-reference.
This patch reverts commit 49340037ee and some
follow-on tweaking in pgstat.c. While the basic scheme of latch-ifying the
stats collector seems sound enough, it's failing on most Windows buildfarm
members for unknown reasons, and there's no time left to debug that before
9.2beta1. Better to ship a beta version without this improvement. I hope
to re-revert this once beta1 is out, though.
Per a suggestion from Peter Geoghegan, make WaitLatch responsible for
verifying that the WL_POSTMASTER_DEATH bit it returns is truthful (by
testing PostmasterIsAlive). Then simplify its callers, who no longer
need to do that for themselves. Remove weasel wording about falsely-set
result bits from WaitLatch's API contract.
The old way of implementing slicing support by implementing
PySequenceMethods.sq_slice no longer works in Python 3. You now have
to implement PyMappingMethods.mp_subscript. Do this by simply
proxying the call to the wrapped list of result dictionaries.
Consolidate some of the subscripting regression tests.
Jan Urbański
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point. This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.
Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set. This should make the
behavior more like Unix.
Back-patch to all supported versions.
Ensure that signal handlers are serviced before this function returns.
This should make the behavior more like Unix. Also, add some more
error checking, and make some other cosmetic improvements.
No back-patch since it's not clear whether this is fixing any live bug
that would affect 9.1. I'm more concerned about 9.2 anyway given our
considerable recent expansions in the usage of WaitLatch.
It was already on its last legs, and it turns out that it was
accidentally broken in commit 89e850e6fd
and no one cared. So remove the rest the support for it and update
the documentation to indicate that Python 2.3 is now required.
It'd be nice to be able to spell Jan Urbanski's name with the correct
accent marks, but we haven't yet found a way that works in everybody's
docs toolchain. This way definitely doesn't.
In checkpointer and walwriter, avoid calling PostmasterIsAlive unless
WaitLatch has reported WL_POSTMASTER_DEATH. This saves a kernel call per
iteration of the process's outer loop, which is not all that much, but a
cycle shaved is a cycle earned. I had already removed the unconditional
PostmasterIsAlive calls in bgwriter and pgstat in previous patches, but
forgot that WL_POSTMASTER_DEATH is supposed to be treated as untrustworthy
(per comment in unix_latch.c); so adjust those two cases to match.
There are a few other places where the same idea might be applied, but only
after substantial code rearrangement, so I didn't bother.
Get rid of some more naming choices that only make sense if you know that
this code used to be in the bgwriter, as well as some stray comments
referencing the bgwriter.
Commit 6d90eaaa89 added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption. However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out. That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening. Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop. To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.
In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes. I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately. In any case it did nothing to improve the readability or
robustness of the code.
In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant. I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.
Every time since the current rule for postgres.bki was put in place
when we change the major version, people complain that their tests
fail in strange ways. This is because the version number in
postgres.bki is not updated, because it has no dependency for that.
And you can't even force the rebuild manually if you don't happen to
know which file has the problem. Fix that now before it will happen
again.
The only remaining problem with switching major versions, as far as
the regression tests are concerned, is that contrib needs to be
rebuilt. But that's easily invoked, and in any case the failure modes
are more friendly if you forget that.