and CLUSTER) execute as the table owner rather than the calling user, using
the same privilege-switching mechanism already used for SECURITY DEFINER
functions. The purpose of this change is to ensure that user-defined
functions used in index definitions cannot acquire the privileges of a
superuser account that is performing routine maintenance. While a function
used in an index is supposed to be IMMUTABLE and thus not able to do anything
very interesting, there are several easy ways around that restriction; and
even if we could plug them all, there would remain a risk of reading sensitive
information and broadcasting it through a covert channel such as CPU usage.
To prevent bypassing this security measure, execution of SET SESSION
AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context.
Thanks to Itagaki Takahiro for reporting this vulnerability.
Security: CVE-2007-6600
having several of them. Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).
Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
* stats_start_collector goes away; we always start the collector process,
unless prevented by a problem with setting up the stats UDP socket.
* stats_reset_on_server_start goes away; it seems useless in view of the
availability of pg_stat_reset().
* stats_block_level and stats_row_level are merged into a single variable
"track_counts", which controls all reports sent to the collector process.
* stats_command_string is renamed to track_activities.
* log_autovacuum is renamed to log_autovacuum_min_duration to better reflect
its meaning.
The log_autovacuum change is not a compatibility issue since it didn't exist
before 8.3 anyway. The other changes need to be release-noted.
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
messages to the stats collector. This avoids the problem that enabling
stats_row_level for autovacuum has a significant overhead for short
read-only transactions, as noted by Arjen van der Meijden. We can avoid
an extra gettimeofday call by piggybacking on the one done for WAL-logging
xact commit or abort (although that doesn't help read-only transactions,
since they don't WAL-log anything).
In my proposal for this, I noted that we could change the WAL log entries
for commit/abort to record full TimestampTz precision, instead of only
time_t as at present. That's not done in this patch, but will be committed
separately.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
per-column options for btree indexes. The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options. The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.
Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass. This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
static variables. This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
the table being analyzed. This prevents two ANALYZEs from running
concurrently on the same table and possibly suffering concurrent-update
failures while trying to store their results into pg_statistic. The
downside is that a database-wide ANALYZE executed within a transaction
block will hold ShareUpdateExclusiveLock on many tables simultaneously,
which could lead to concurrency issues or even deadlock against another
such ANALYZE. However, this seems a corner case of less importance
than getting unexpected errors from a foreground ANALYZE when autovacuum
elects to analyze the same table concurrently. Per discussion.
the rel, it's easy to get rid of the narrow race-condition window that
used to exist in VACUUM and CLUSTER. Did some minor code-beautification
work in the same area, too.
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.
We now force all databases to be vacuumed, even template ones. A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database. In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time. Maybe we should
warn users about this somehow. Of course the real solution will be to use
autovacuum all the time ;-)
There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.
I updated the system catalogs documentation, but I didn't modify the
maintenance section. Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.
Catalog version bumped due to system catalog changes.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
which is neither needed by nor related to that header. Remove the bogus
inclusion and instead include the header in those C files that actually
need it. Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
convention for isnull flags. Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
buffer is valid, as ReadBuffer() will elog on error. Most of the call
sites of ReadBuffer() got this right, but this patch fixes those call
sites that did not.
in all cases when keep_buf = true. This allows ANALYZE's inner loop to
use heap_release_fetch, which saves multiple buffer lookups for the same
page and avoids overestimation of cost by the vacuum cost mechanism.
now are supposed to take some kind of lock on an index whenever you
are going to access the index contents, rather than relying only on a
lock on the parent table.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
ago. This should give significantly better results when the density of
live tuples is not uniform throughout a table. Manfred Koizar, with
minor kibitzing from Tom Lane.
costing us lots more to maintain than it was worth. On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date. On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about. And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
This commit teaches ANALYZE to store such stats in pg_statistic, but
nothing is done yet about teaching the planner to use 'em.
Also, repair longstanding oversight in separate ANALYZE command: it
updated the pg_class.relpages and reltuples counts for the table proper,
but not for indexes.
indexes, it seems like we ought to put another layer of indirection
between the compute_stats functions and the actual data storage. This
would allow us to compute the values on-the-fly, for example.
subroutine in src/port/pgsleep.c. Remove platform dependencies from
miscadmin.h and put them in port.h where they belong. Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.
By the way, where is the documentation for the cost-based-delay patch?
- Update comment in IsReservedName() to the present day
- Improve some variable & function names in commands/vacuum.c. I
was planning to rewrite this to avoid lappend(), but since I
still intend to do the list rewrite, there's no need for that.
- Update some smgr comments which seemed to imply that we still
forced all dirty pages to disk at commit-time.
- Replace some #ifdef DIAGNOSTIC code with assertions.
- Make the distinction between OS-level file descriptors and
virtual file descriptors a little clearer in a few comments
- Other minor comment improvements in the smgr code