Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh
This commit is contained in:
parent
42e663cc41
commit
2b3a0630b5
@ -188,11 +188,17 @@ tsquerysel(VariableStatData *vardata, Datum constval)
|
|||||||
/* No most-common-elements info, so do without */
|
/* No most-common-elements info, so do without */
|
||||||
selec = tsquery_opr_selec_no_stats(query);
|
selec = tsquery_opr_selec_no_stats(query);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* MCE stats count only non-null rows, so adjust for null rows.
|
||||||
|
*/
|
||||||
|
selec *= (1.0 - stats->stanullfrac);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
/* No stats at all, so do without */
|
/* No stats at all, so do without */
|
||||||
selec = tsquery_opr_selec_no_stats(query);
|
selec = tsquery_opr_selec_no_stats(query);
|
||||||
|
/* we assume no nulls here, so no stanullfrac correction */
|
||||||
}
|
}
|
||||||
|
|
||||||
return selec;
|
return selec;
|
||||||
|
@ -244,6 +244,8 @@ typedef FormData_pg_statistic *Form_pg_statistic;
|
|||||||
* type with identifiable elements (for instance, tsvector). staop contains
|
* type with identifiable elements (for instance, tsvector). staop contains
|
||||||
* the equality operator appropriate to the element type. stavalues contains
|
* the equality operator appropriate to the element type. stavalues contains
|
||||||
* the most common element values, and stanumbers their frequencies. Unlike
|
* the most common element values, and stanumbers their frequencies. Unlike
|
||||||
|
* MCV slots, frequencies are measured as the fraction of non-null rows the
|
||||||
|
* element value appears in, not the frequency of all rows. Also unlike
|
||||||
* MCV slots, the values are sorted into order (to support binary search
|
* MCV slots, the values are sorted into order (to support binary search
|
||||||
* for a particular value). Since this puts the minimum and maximum
|
* for a particular value). Since this puts the minimum and maximum
|
||||||
* frequencies at unpredictable spots in stanumbers, there are two extra
|
* frequencies at unpredictable spots in stanumbers, there are two extra
|
||||||
|
Loading…
x
Reference in New Issue
Block a user