mirror of
https://github.com/netsurf-browser/netsurf
synced 2025-02-24 18:34:27 +03:00
Musings on caching
svn path=/trunk/netsurf/; revision=9561
This commit is contained in:
parent
56e42730de
commit
665b6ceb02
168
Docs/ideas/cache.txt
Normal file
168
Docs/ideas/cache.txt
Normal file
@ -0,0 +1,168 @@
|
||||
Content caching
|
||||
===============
|
||||
|
||||
NetSurf's existing fetch/cache architecture has a number of problems:
|
||||
|
||||
1) Content dependencies are not modelled.
|
||||
2) Content source data for non-shareable contents is duplicated.
|
||||
3) Detection of content sharability is dependent on Content-Type, which
|
||||
requires content cloning (which will fail for dependent contents).
|
||||
4) Detection of cycles in content dependency graphs is not performed
|
||||
(e.g. content1 includes content2, which includes content1).
|
||||
5) All content caching is in-memory, there's no offline storage.
|
||||
|
||||
Proposal
|
||||
--------
|
||||
|
||||
A split-level cache.
|
||||
|
||||
Low-level cache:
|
||||
|
||||
+ Responsible for source data (+header) management.
|
||||
+ Interfaces with low-level fetch system to retrieve data from network.
|
||||
+ Is responsible for offline storage (if any) of cache objects.
|
||||
+ Returns opaque handles to low-level cache objects.
|
||||
+ Handles HTTP redirects, recording URLs encountered when retrieving resource.
|
||||
+ May perform content-type sniffing (requires usage context)
|
||||
|
||||
High-level cache:
|
||||
|
||||
+ Responsible for content objects.
|
||||
+ Tracks content dependencies (and potential cycles).
|
||||
+ Returns opaque handles to content objects.
|
||||
+ Manages content sharability.
|
||||
+ Contents with unknown types are never shared and thus get unique handles.
|
||||
+ Content handles <> content objects: they're an indirection mechanism.
|
||||
|
||||
Example: retrieving a top-level resource
|
||||
----------------------------------------
|
||||
|
||||
1) Client requests an URL, specifying no parent handle.
|
||||
2) High-level cache asks low-level cache for low-level handle for URL.
|
||||
3) Low-level cache looks for appropriate object in its index.
|
||||
a) it finds one that's not stale and returns its handle
|
||||
b) it finds only stale entries, or no appropiate entry,
|
||||
so allocates a new entry, requests a fetch for it,
|
||||
and returns the handle.
|
||||
4) High-level cache looks for content objects that are using the low-level
|
||||
handle.
|
||||
a) it finds one that's shareable and selects its handle for use.
|
||||
b) it finds only non-shareable entries, or no appropriate entry,
|
||||
so allocates a new entry and selects its handle for use.
|
||||
5) High-level cache registers the parent and client with the selected handle,
|
||||
then returns the selected handle.
|
||||
6) Client carries on, happy in the knowledge that a content is available.
|
||||
|
||||
Example: retrieving a child resource
|
||||
------------------------------------
|
||||
|
||||
1) Client requests an URL, specifying parent handle.
|
||||
2) High-level cache searches parent+ancestors for requested URL.
|
||||
a) it finds the URL, so returns a non-fatal error.
|
||||
b) it does not find the URL, so proceeds from step 2 of the
|
||||
top-level resource algorithm.
|
||||
|
||||
NOTE: this approach means that shareable contents may have multiple parents.
|
||||
|
||||
Handling of contents of unknown type
|
||||
------------------------------------
|
||||
|
||||
Contents of unknown type are, by definition, not shareable. Therefore, each
|
||||
client will be issued with a different content handle.
|
||||
|
||||
Content types are only known once a resource's headers are fetched (or once
|
||||
the type has been sniffed from the resource's data when the headers are
|
||||
inconclusive).
|
||||
|
||||
As a resource is fetched, users of the resource are informed of the fetch
|
||||
status. Therefore, the high-level cache is always informed of fetch progress.
|
||||
Cache clients need not care about this: they are simply interested in
|
||||
a content's readiness for use.
|
||||
|
||||
When the high-level cache is informed of a low-level cache object's type,
|
||||
it is in a position to determine whether the corresponding content handles
|
||||
can share a single content object or not.
|
||||
|
||||
If it detects that a single content object may be shared by multiple handles,
|
||||
it simply creates the content object and registers each of the handles as
|
||||
a user of the content.
|
||||
|
||||
If it detects that each handle requires a separate content object, then it
|
||||
will create a content object for each handle and register the handle as a
|
||||
user.
|
||||
|
||||
This approach requires that clients of the high-level cache get issued with
|
||||
handles to content objects, rather than content objects (so that the decision
|
||||
whether to create multiple content objects can be deferred until suitable
|
||||
information is available).
|
||||
|
||||
Handles with no associated content object will act as if they had a content
|
||||
object that was not ready for use.
|
||||
|
||||
A more concrete example
|
||||
-----------------------
|
||||
|
||||
+ bw1 contains html1 which includes css1, css2, img1, img2
|
||||
+ bw2 contains html2 which includes css1, img1, img2
|
||||
+ bw3 contains img1
|
||||
|
||||
Neither HTML nor CSS contents are shareable.
|
||||
All shareable contents are requested from the high-level cache
|
||||
once their type is known.
|
||||
|
||||
Low-level cache contains source data for:
|
||||
|
||||
1 - html1
|
||||
2 - html2
|
||||
3 - css1
|
||||
4 - css2
|
||||
5 - img1
|
||||
6 - img2
|
||||
|
||||
High-level cache contains:
|
||||
|
||||
Content objects (ll-handle in parentheses):
|
||||
|
||||
+ c1 (1 - html1)
|
||||
+ c2 (2 - html2)
|
||||
+ c3 (3 - css1)
|
||||
+ c4 (4 - css2)
|
||||
+ c5 (5 - img1)
|
||||
+ c6 (6 - img2)
|
||||
+ c7 (3 - css1)
|
||||
|
||||
Content handles (objects in parentheses):
|
||||
|
||||
+ h1 (c1, used by bw1)
|
||||
+ h2 (c3, used by h1)
|
||||
+ h3 (c4, used by h1)
|
||||
+ h4 (c2, used by bw2)
|
||||
+ h5 (c7, used by h4)
|
||||
+ h6 (c5, used by h1,h4,bw3)
|
||||
+ h7 (c6, used by h1,h4)
|
||||
|
||||
If img1 was not of known type when requested:
|
||||
|
||||
Content handles (objects in parentheses):
|
||||
|
||||
+ h1 (c1, used by bw1)
|
||||
+ h2 (c3, used by h1)
|
||||
+ h3 (c4, used by h1)
|
||||
+ h4 (c2, used by bw2)
|
||||
+ h5 (c7, used by h4)
|
||||
+ h6 (c5, used by h1)
|
||||
+ h7 (c6, used by h1,h4)
|
||||
+ h8 (c5, used by h4)
|
||||
+ h9 (c5, used by bw3)
|
||||
|
||||
This achieves the desired effect that:
|
||||
|
||||
+ source data is shared between contents
|
||||
+ content objects are only created when absolutely necessary
|
||||
+ content usage/dependency is tracked and cycles avoided
|
||||
+ offline storage is possible
|
||||
|
||||
Achieving this requires the use of indirection objects, but these are expected
|
||||
to be small in comparison to the content objects / ll-cache objects that they
|
||||
are indirecting.
|
||||
|
Loading…
x
Reference in New Issue
Block a user