Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh
This commit is contained in:
parent
de623f3335
commit
52b60530f2
@ -189,11 +189,17 @@ tsquerysel(VariableStatData *vardata, Datum constval)
|
||||
/* No most-common-elements info, so do without */
|
||||
selec = tsquery_opr_selec_no_stats(query);
|
||||
}
|
||||
|
||||
/*
|
||||
* MCE stats count only non-null rows, so adjust for null rows.
|
||||
*/
|
||||
selec *= (1.0 - stats->stanullfrac);
|
||||
}
|
||||
else
|
||||
{
|
||||
/* No stats at all, so do without */
|
||||
selec = tsquery_opr_selec_no_stats(query);
|
||||
/* we assume no nulls here, so no stanullfrac correction */
|
||||
}
|
||||
|
||||
return selec;
|
||||
|
@ -246,6 +246,8 @@ typedef FormData_pg_statistic *Form_pg_statistic;
|
||||
* type with identifiable elements (for instance, tsvector). staop contains
|
||||
* the equality operator appropriate to the element type. stavalues contains
|
||||
* the most common element values, and stanumbers their frequencies. Unlike
|
||||
* MCV slots, frequencies are measured as the fraction of non-null rows the
|
||||
* element value appears in, not the frequency of all rows. Also unlike
|
||||
* MCV slots, the values are sorted into order (to support binary search
|
||||
* for a particular value). Since this puts the minimum and maximum
|
||||
* frequencies at unpredictable spots in stanumbers, there are two extra
|
||||
|
Loading…
Reference in New Issue
Block a user