In planner, don't assume that empty parent tables aren't really empty.

There's a heuristic in estimate_rel_size() to clamp the minimum size
estimate for a table to 10 pages, unless we can see that vacuum or analyze
has been run (and set relpages to something nonzero, so this will always
happen for a table that's actually empty).  However, it would be better
not to do this for inheritance parent tables, which very commonly are
really empty and can be expected to stay that way.  Per discussion of a
recent pgsql-performance report from Anish Kejariwal.  Also prevent it
from happening for indexes (although this is more in the nature of
documentation, since CREATE INDEX normally initializes relpages to
something nonzero anyway).

Back-patch to 9.0, because the ability to collect statistics across a
whole inheritance tree has improved the planner's estimates to the point
where this relatively small error makes a significant difference.  In the
referenced report, merge or hash joins were incorrectly estimated as
cheaper than a nestloop with inner indexscan on the inherited table.
That was less likely before 9.0 because the lack of inherited stats would
have resulted in a default (and rather pessimistic) estimate of the cost
of a merge or hash join.
This commit is contained in:
Tom Lane 2011-07-14 17:30:57 -04:00
parent c529f8800e
commit f3ff0433ab

View File

@ -385,24 +385,38 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
/*
* HACK: if the relation has never yet been vacuumed, use a
* minimum estimate of 10 pages. This emulates a desirable aspect
* of pre-8.0 behavior, which is that we wouldn't assume a newly
* created relation is really small, which saves us from making
* really bad plans during initial data loading. (The plans are
* not wrong when they are made, but if they are cached and used
* again after the table has grown a lot, they are bad.) It would
* be better to force replanning if the table size has changed a
* lot since the plan was made ... but we don't currently have any
* infrastructure for redoing cached plans at all, so we have to
* kluge things here instead.
* minimum size estimate of 10 pages. The idea here is to avoid
* assuming a newly-created table is really small, even if it
* currently is, because that may not be true once some data gets
* loaded into it. Once a vacuum or analyze cycle has been done
* on it, it's more reasonable to believe the size is somewhat
* stable.
*
* (Note that this is only an issue if the plan gets cached and
* used again after the table has been filled. What we're trying
* to avoid is using a nestloop-type plan on a table that has
* grown substantially since the plan was made. Normally,
* autovacuum/autoanalyze will occur once enough inserts have
* happened and cause cached-plan invalidation; but that doesn't
* happen instantaneously, and it won't happen at all for cases
* such as temporary tables.)
*
* We approximate "never vacuumed" by "has relpages = 0", which
* means this will also fire on genuinely empty relations. Not
* great, but fortunately that's a seldom-seen case in the real
* world, and it shouldn't degrade the quality of the plan too
* much anyway to err in this direction.
*
* There are two exceptions wherein we don't apply this heuristic.
* One is if the table has inheritance children. Totally empty
* parent tables are quite common, so we should be willing to
* believe that they are empty. Also, we don't apply the 10-page
* minimum to indexes.
*/
if (curpages < 10 && rel->rd_rel->relpages == 0)
if (curpages < 10 &&
rel->rd_rel->relpages == 0 &&
!rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind != RELKIND_INDEX)
curpages = 10;
/* report estimated # pages */
@ -418,9 +432,10 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
reltuples = (double) rel->rd_rel->reltuples;
/*
* If it's an index, discount the metapage. This is a kluge
* because it assumes more than it ought to about index contents;
* it's reasonably OK for btrees but a bit suspect otherwise.
* If it's an index, discount the metapage while estimating the
* number of tuples. This is a kluge because it assumes more than
* it ought to about index structure. Currently it's OK for
* btree, hash, and GIN indexes but suspect for GiST indexes.
*/
if (rel->rd_rel->relkind == RELKIND_INDEX &&
relpages > 0)
@ -428,6 +443,7 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
curpages--;
relpages--;
}
/* estimate number of tuples from previous tuple density */
if (relpages > 0)
density = reltuples / (double) relpages;