Correct Memoize's estimated cache hit ratio calculation
As demonstrated by David Johnston, the Memoize cache hit ratio calculation wasn't quite correct. This change only affects the estimated hit ratio when the estimated number of entries to cache is estimated not to fit inside the cache. For example, if we expect 2000 distinct cache key values and only expect to be able to cache 1000 of those at once due to memory constraints, with an estimate of 10000 calls, if we could store all entries then the hit ratio should be 80% to account for the first 2000 of the 10000 calls to be a cache miss due to the value not being cached yet. If we can only store 1000 entries for each of the 2000 distinct possible values at once then the 80% should be reduced by half to make the final estimate of 40%. Previously, the calculation would have produced an estimated hit ratio of 30%, which wasn't correct. Apply to master only so as not to destabilize plans in the back branches. Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwZEmcNk3YQo2Xj4EDUOdY6qakad31rOD1Vc4q1_s68-Ew@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrV44LwiF4W_qf_RpbGYWSgp1kF=cZr+kTRRaALUfmXqw@mail.gmail.com
This commit is contained in:
parent
b0d8f2d983
commit
f48b4f892f
@ -2558,11 +2558,10 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
|
||||
* must look at how many scans are estimated in total for this node and
|
||||
* how many of those scans we expect to get a cache hit.
|
||||
*/
|
||||
hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
|
||||
(ndistinct / calls);
|
||||
hit_ratio = ((calls - ndistinct) / calls) *
|
||||
(est_cache_entries / Max(ndistinct, est_cache_entries));
|
||||
|
||||
/* Ensure we don't go negative */
|
||||
hit_ratio = Max(hit_ratio, 0.0);
|
||||
Assert(hit_ratio >= 0 && hit_ratio <= 1.0);
|
||||
|
||||
/*
|
||||
* Set the total_cost accounting for the expected cache hit ratio. We
|
||||
|
Loading…
x
Reference in New Issue
Block a user