mirror of
https://github.com/netsurf-browser/netsurf
synced 2024-11-22 14:31:20 +03:00
Add filesystem based backing store
This commit is contained in:
parent
4a49ff5266
commit
05932293f6
194
Docs/source-object-backing-store
Normal file
194
Docs/source-object-backing-store
Normal file
@ -0,0 +1,194 @@
|
|||||||
|
Source Object (low level) cache backing store
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
The source object cache provides a system to extend the life of source
|
||||||
|
objects (html files, images etc.) after they are no longer immediately
|
||||||
|
being used.
|
||||||
|
|
||||||
|
Only fetch types where we have well defined rules on caching are
|
||||||
|
considered, in practice this limits us to HTTP(S). The section in
|
||||||
|
RFC2616 [1] on caching specifies these rules.
|
||||||
|
|
||||||
|
To futher extend the objects lifetime they can be pushed into a
|
||||||
|
backing store where the objects are available for reuse less quickly
|
||||||
|
than from RAM but faster than retriving from the network again.
|
||||||
|
|
||||||
|
The backing store implementation provides a key:value infrastructure
|
||||||
|
with a simple store, retrive and invalidate interface.
|
||||||
|
|
||||||
|
Generic filesystem backing store
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Although the backing store interface is fully pluggable a generic
|
||||||
|
implementation based on storing objects on the filesystem in a
|
||||||
|
heirachy of directories.
|
||||||
|
|
||||||
|
The option to alter the backing store format exists and is controled
|
||||||
|
by a version field. It is implementation defined what happens if a
|
||||||
|
version mis-match occours.
|
||||||
|
|
||||||
|
As the backing store only holds cache data one should not expect a
|
||||||
|
great deal of effort to be expended converting formats (i.e. the cache
|
||||||
|
may simply be discarded).
|
||||||
|
|
||||||
|
Layout version 1
|
||||||
|
----------------
|
||||||
|
|
||||||
|
An object has an identifier value generated from the url (NetSurf
|
||||||
|
backing stores uses the url as the unique key). The value used is
|
||||||
|
obtained using nsurl_hash() which is currently a 32 bit FNV so is
|
||||||
|
directly usable.
|
||||||
|
|
||||||
|
This identifier is adequate to ensure the collision rate for the
|
||||||
|
hashed url values (a collision for every 2^16 urls added) is
|
||||||
|
sufficiently low the overhead of returning the wrong object (which
|
||||||
|
backing stores are permitted to do) is not significat.
|
||||||
|
|
||||||
|
An entry list is maintained which contains all the metadata about a
|
||||||
|
given identifier. This list is limited in length to constrain the
|
||||||
|
resources necessary to maintain it. It is made persistant to avoid the
|
||||||
|
overhead of reconstructing it at initialisation and to keep the data
|
||||||
|
used to improve the eviction decisions.
|
||||||
|
|
||||||
|
Each object is stored and retrived directly into the filesystem using
|
||||||
|
a filename generated from a base64url encoding of an address
|
||||||
|
value. The objects address is derived from the identifier by cropping
|
||||||
|
it to a shorter length.
|
||||||
|
|
||||||
|
A mapping between the object address and its entry is maintained which
|
||||||
|
uses storage directly proportional to the size of the address length.
|
||||||
|
|
||||||
|
The cropping length is stored in the control file with the default
|
||||||
|
values set at compile time. This allows existing backing stores to
|
||||||
|
continue operating with existing data independantly of new default
|
||||||
|
setting. This setting gives some ability to tune the default cache
|
||||||
|
index size to values suitable for a specific host operating system.
|
||||||
|
|
||||||
|
E.g. Linux based systems can easily cope with several megabytes of
|
||||||
|
mmaped index but RISC OS might want to limit this to a few megabytes
|
||||||
|
of heap at most.
|
||||||
|
|
||||||
|
The files are stored on disc using their base64url address value.
|
||||||
|
By creating a directory for each character of the encoded filename
|
||||||
|
(except the last which is of course the leafname) we create a
|
||||||
|
directory structure where no directory has more than 64 entries.
|
||||||
|
|
||||||
|
E.g. A 19bit address of 0x1 would be base64url encoded into AAAB
|
||||||
|
resulting in the data being stored in a file path of
|
||||||
|
"/store/prefix/data/B/A/A/BAAAAA".
|
||||||
|
|
||||||
|
An address of 0x00040001 encodes to BAAB and a file path of
|
||||||
|
"/store/prefix/meta/B/A/A/BAABAA"
|
||||||
|
|
||||||
|
Control files
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
control
|
||||||
|
+++++++
|
||||||
|
A control file is used to hold a list of values describing how the
|
||||||
|
other files in the backing store should be used.
|
||||||
|
|
||||||
|
entries
|
||||||
|
+++++++
|
||||||
|
|
||||||
|
this file contains a table of entries describing the files held on the
|
||||||
|
filesystem.
|
||||||
|
|
||||||
|
Each control file table entry is 28 bytes and consists of
|
||||||
|
|
||||||
|
- signed 64 but value for last use time
|
||||||
|
|
||||||
|
- 32bit full url hash allowing for index reconstruction and
|
||||||
|
addiitonal collision detection. Also the possibility of increasing
|
||||||
|
the ADDRESS_LENGTH although this would require renaming all the
|
||||||
|
existing files in the cache and is not currently implemented.
|
||||||
|
|
||||||
|
- unsigned 32bit length for data
|
||||||
|
|
||||||
|
- unsigned 32bit length for metadata
|
||||||
|
|
||||||
|
- unsigned 16bit value for number of times used.
|
||||||
|
|
||||||
|
- unsigned 16bit value for flags
|
||||||
|
|
||||||
|
- unsigned 16bit value for data block index (unused)
|
||||||
|
|
||||||
|
- unsigned 16bit value for metatdata block index (unused)
|
||||||
|
|
||||||
|
Address to entry index
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
An entry index is held in RAM that allows looking up the address to
|
||||||
|
map to an entry in the control file.
|
||||||
|
|
||||||
|
The index is the only data structure whose size is directly depndant
|
||||||
|
on the length of the hash specificaly:
|
||||||
|
|
||||||
|
(2 ^ (ADDRESS_BITS - 3)) * ENTRY_BITS) in bytes
|
||||||
|
|
||||||
|
where ADDRESS_BITS is how long the address is in bits and ENTRY_BITS
|
||||||
|
is how many entries the control file (and hence the while
|
||||||
|
cache) may hold.
|
||||||
|
|
||||||
|
RISCOS values
|
||||||
|
+++++++++++++
|
||||||
|
|
||||||
|
By limiting the ENTRY_BITS size to 14 (16,384 entries) the entries
|
||||||
|
list is limited to 448kilobytes.
|
||||||
|
|
||||||
|
The typical values for RISC OS would set ADDRESS_BITS to 18. This
|
||||||
|
spreads the entries over 262144 hash values which uses 512 kilobytes
|
||||||
|
for the index. Limiting the hash space like this reduces the
|
||||||
|
efectiveness of the cache.
|
||||||
|
|
||||||
|
A small ADDRESS_LENGTH causes a collision (two urls with the same
|
||||||
|
address) to happen roughly for every 2 ^ (ADDRESS_BITS / 2) = 2 ^ 9 =
|
||||||
|
512 objects stored. This roughly translates to a cache miss due to
|
||||||
|
collision every ten pages navigated to.
|
||||||
|
|
||||||
|
Larger systems
|
||||||
|
++++++++++++++
|
||||||
|
|
||||||
|
In general ENTRY_BITS set to 16 as this limits the store to 65536
|
||||||
|
objects which given the average size of an object at 8 kilobytes
|
||||||
|
yeilds half a gigabyte of disc used which is judged to be sufficient.
|
||||||
|
|
||||||
|
For larger systems e.g. those using GTK frontend we would most likely
|
||||||
|
select ADDRESS_BITS as 22 resulting in a collision every 2048 objects
|
||||||
|
but the index using some 8 Megabytes
|
||||||
|
|
||||||
|
Typical values
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Example 1
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
For a store with 1034 objects genrated from a random navigation of
|
||||||
|
pages linked from the about:welcome page.
|
||||||
|
|
||||||
|
Metadata total size is 593608 bytes an average of 574 bytes. The
|
||||||
|
majority of the storage is used to hold the urls and headers.
|
||||||
|
|
||||||
|
Data total size is 9180475 bytes a mean of 8879 bytes 1648726 in the
|
||||||
|
largest 10 entries which if excluded gives 7355 bytes average size
|
||||||
|
|
||||||
|
Example 2
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
355 pages navigated in 80 minutes from about:welcome page and a
|
||||||
|
handful of additional sites (google image search and reddit)
|
||||||
|
|
||||||
|
2018 objects in cache at quit. 400 objects from news.bbc.co.uk alone
|
||||||
|
|
||||||
|
Metadata total 987,439 bytes mean of 489 bytes
|
||||||
|
|
||||||
|
data total 33,127,831 bytes mean of 16,416 bytes
|
||||||
|
|
||||||
|
with one single 5,000,811 byte gif
|
||||||
|
|
||||||
|
data totals without gif is 28,127,020 mean 13,945
|
||||||
|
|
||||||
|
[1] http://tools.ietf.org/html/rfc2616#section-13
|
@ -86,6 +86,11 @@ NETSURF_HOMEPAGE := "about:welcome"
|
|||||||
# Valid options: YES, NO
|
# Valid options: YES, NO
|
||||||
NETSURF_USE_LIBICONV_PLUG := YES
|
NETSURF_USE_LIBICONV_PLUG := YES
|
||||||
|
|
||||||
|
# Enable building the source object cache filesystem based backing store.
|
||||||
|
# implementation.
|
||||||
|
# Valid options: YES, NO
|
||||||
|
NETSURF_FS_BACKING_STORE := NO
|
||||||
|
|
||||||
# Initial CFLAGS. Optimisation level etc. tend to be target specific.
|
# Initial CFLAGS. Optimisation level etc. tend to be target specific.
|
||||||
CFLAGS :=
|
CFLAGS :=
|
||||||
|
|
||||||
|
@ -3,4 +3,9 @@
|
|||||||
S_CONTENT := content.c content_factory.c dirlist.c fetch.c hlcache.c \
|
S_CONTENT := content.c content_factory.c dirlist.c fetch.c hlcache.c \
|
||||||
llcache.c mimesniff.c urldb.c no_backing_store.c
|
llcache.c mimesniff.c urldb.c no_backing_store.c
|
||||||
|
|
||||||
|
# Make filesystem backing store available
|
||||||
|
ifeq ($(NETSURF_FS_BACKING_STORE),YES)
|
||||||
|
S_CONTENT += fs_backing_store.c
|
||||||
|
endif
|
||||||
|
|
||||||
S_CONTENT := $(addprefix content/,$(S_CONTENT))
|
S_CONTENT := $(addprefix content/,$(S_CONTENT))
|
||||||
|
1197
content/fs_backing_store.c
Normal file
1197
content/fs_backing_store.c
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user