hashjoinable clause, not one path for a randomly-chosen element of each
set of clauses with the same join operator. That is, if you wrote
SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4,
and both '=' ops were the same opcode (say, all four fields are int4),
then the system would either consider hashing on f1=f2 or on f3=f4,
but it would *not* consider both possibilities. Boo hiss.
Also, revise estimation of hashjoin costs to include a penalty when the
inner join var has a high disbursion --- ie, the most common value is
pretty common. This tends to lead to badly skewed hash bucket occupancy
and way more comparisons than you'd expect on average.
I imagine that the cost calculation still needs tweaking, but at least
it generates a more reasonable plan than before on George Young's example.
(it should just call the given operator, not look up an = operator).
Fix intltsel() so that all numeric data types are converted to double
before trying to estimate where the given comparison value is in the
known range of column values. intltsel() still needs work, or replacement,
for non-numeric data types ... but for nonintegral numeric types it
should now be delivering reasonable estimates.
configure.in to determine if a system is ELF or not. Note that some
of the tests earlier may be redundant but I took the safest route.
D'Arcy J.M. Cain
neqsel now behave as per my suggestions in pghackers a few days ago.
selectivity for < > <= >= should work OK for integral types as well, but
still need work for nonintegral types. Since these routines have never
actually executed before :-(, this may result in some significant changes
in the optimizer's choices of execution plans. Let me know if you see
any serious misbehavior.
CAUTION: THESE CHANGES REQUIRE INITDB. pg_statistic table has changed.
so that Case works in WHERE join clauses. Temporary patch --- this routine
is one of many that ought to be changed to use centralized expression-tree-
walking logic.
rels that the inner path needs to join to, but it was only checking for
the first one. Failure could only have been observed with an OR-clause
that mentions 3 or more tables, and then only if the bogus path was
actually selected as cheapest ...
optimizer rather than parser. This has many advantages, such as not
getting fooled by chance uses of operator names ~ and ~~ (the operators
are identified by OID now), and not creating useless comparison operations
in contexts where the comparisons will not actually be used as indexquals.
The new code also recognizes exact-match LIKE and regex patterns, and
produces an = indexqual instead of >= and <=.
This change does NOT fix the problem with non-ASCII locales: the code
still doesn't know how to generate an upper bound indexqual for non-ASCII
collation order. But it's no worse than before, just the same deficiency
in a different place...
Also, dike out loc_restrictinfo fields in Plan nodes. These were doing
nothing useful in the absence of 'expensive functions' optimization,
and they took a considerable amount of processing to fill in.
The only place it was being used was as temporary storage in indxpath.c,
and the logic was wrong: the same restrictinfo node could get chosen to
carry the info for two different joins. Right fix is to return a second
list of unjoined-relids parallel to the list of clause groups.