Correct Memoize's estimated cache hit ratio calculation

As demonstrated by David Johnston, the Memoize cache hit ratio calculation
wasn't quite correct.

This change only affects the estimated hit ratio when the estimated number
of entries to cache is estimated not to fit inside the cache.  For
example, if we expect 2000 distinct cache key values and only expect to be
able to cache 1000 of those at once due to memory constraints, with an
estimate of 10000 calls, if we could store all entries then the hit ratio
should be 80% to account for the first 2000 of the 10000 calls to be a
cache miss due to the value not being cached yet.  If we can only store
1000 entries for each of the 2000 distinct possible values at once then
the 80% should be reduced by half to make the final estimate of 40%.
Previously, the calculation would have produced an estimated hit ratio of
30%, which wasn't correct.

Apply to master only so as not to destabilize plans in the back branches.

Reported-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwZEmcNk3YQo2Xj4EDUOdY6qakad31rOD1Vc4q1_s68-Ew@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvrV44LwiF4W_qf_RpbGYWSgp1kF=cZr+kTRRaALUfmXqw@mail.gmail.com
This commit is contained in:
David Rowley 2023-03-22 08:44:54 +13:00
parent b0d8f2d983
commit f48b4f892f

View File

@ -2558,11 +2558,10 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
* must look at how many scans are estimated in total for this node and
* how many of those scans we expect to get a cache hit.
*/
hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
(ndistinct / calls);
hit_ratio = ((calls - ndistinct) / calls) *
(est_cache_entries / Max(ndistinct, est_cache_entries));
/* Ensure we don't go negative */
hit_ratio = Max(hit_ratio, 0.0);
Assert(hit_ratio >= 0 && hit_ratio <= 1.0);
/*
* Set the total_cost accounting for the expected cache hit ratio. We