Add two text files containing pager design notes to the doc/ subfolder.
FossilOrigin-Name: ed817fc893e7162ae0ff4022591f7e9e3b81d622
This commit is contained in:
parent
9a6b4e9adb
commit
24e3971197
76
doc/pager-invariants.txt
Normal file
76
doc/pager-invariants.txt
Normal file
@ -0,0 +1,76 @@
|
||||
*** Throughout this document, a page is deemed to have been synced
|
||||
automatically as soon as it is written when PRAGMA synchronous=OFF.
|
||||
Otherwise, the page is not synced until the xSync method of the VFS
|
||||
is called successfully on the file containing the page.
|
||||
|
||||
*** Definition: A page of the database file is said to be "overwriteable" if
|
||||
one or more of the following are true about the page:
|
||||
|
||||
(a) The original content of the page as it was at the beginning of
|
||||
the transaction has been written into the rollback journal and
|
||||
synced.
|
||||
|
||||
(b) The page was a freelist leaf page at the start of the transaction.
|
||||
|
||||
(c) The page number is greater than the largest page that existed in
|
||||
the database file at the start of the transaction.
|
||||
|
||||
(1) A page of the database file is never overwritten unless one of the
|
||||
following are true:
|
||||
|
||||
(a) The page and all other pages on the same sector are overwriteable.
|
||||
|
||||
(b) The atomic page write optimization is enabled, and the entire
|
||||
transaction other than the update of the transaction sequence
|
||||
number consists of a single page change.
|
||||
|
||||
(2) The content of a page written into the rollback journal exactly matches
|
||||
both the content in the database when the rollback journal was written
|
||||
and the content in the database at the beginning of the current
|
||||
transaction.
|
||||
|
||||
(3) Writes to the database file are an integer multiple of the page size
|
||||
in length and are aligned to a page boundary.
|
||||
|
||||
(4) Reads from the database file are either aligned on a page boundary and
|
||||
an integer multiple of the page size in length or are taken from the
|
||||
first 100 bytes of the database file.
|
||||
|
||||
(5) All writes to the database file are synced prior to the rollback journal
|
||||
being deleted, truncated, or zeroed.
|
||||
|
||||
(6) If a master journal file is used, then all writes to the database file
|
||||
are synced prior to the master journal being deleted.
|
||||
|
||||
*** Definition: Two databases (or the same database at two points it time)
|
||||
are said to be "logically equivalent" if they give the same answer to
|
||||
all queries. Note in particular the the content of freelist leaf
|
||||
pages can be changed arbitarily without effecting the logical equivalence
|
||||
of the database.
|
||||
|
||||
(7) At any time, if any subset, including the empty set and the total set,
|
||||
of the unsynced changes to a rollback journal are removed and the
|
||||
journal is rolled back, the resulting database file will be logical
|
||||
equivalent to the database file at the beginning of the transaction.
|
||||
|
||||
(8) When a transaction is rolled back, the xTruncate method of the VFS
|
||||
is called to restore the database file to the same size it was at
|
||||
the beginning of the transaction. (In some VFSes, the xTruncate
|
||||
method is a no-op, but that does not change the fact the SQLite will
|
||||
invoke it.)
|
||||
|
||||
(9) Whenever the database file is modified, at least one bit in the range
|
||||
of bytes from 24 through 39 inclusive will be changed prior to releasing
|
||||
the EXCLUSIVE lock.
|
||||
|
||||
(10) The pattern of bits in bytes 24 through 39 shall not repeat in less
|
||||
than one billion transactions.
|
||||
|
||||
(11) A database file is well-formed at the beginning and at the conclusion
|
||||
of every transaction.
|
||||
|
||||
(12) An EXCLUSIVE lock must be held on the database file before making
|
||||
any changes to the database file.
|
||||
|
||||
(13) A SHARED lock must be held on the database file before reading any
|
||||
content out of the database file.
|
125
doc/vfs-shm.txt
Normal file
125
doc/vfs-shm.txt
Normal file
@ -0,0 +1,125 @@
|
||||
The 5 states of an historical rollback lock as implemented by the
|
||||
xLock, xUnlock, and xCheckReservedLock methods of the sqlite3_io_methods
|
||||
objec are:
|
||||
|
||||
UNLOCKED
|
||||
SHARED
|
||||
RESERVED
|
||||
PENDING
|
||||
EXCLUSIVE
|
||||
|
||||
The wal-index file has a similar locking hierarchy implemented using
|
||||
the xShmLock method of the sqlite3_vfs object, but with 7
|
||||
states. Each connection to a wal-index file must be in one of
|
||||
the following 7 states:
|
||||
|
||||
UNLOCKED
|
||||
READ
|
||||
READ_FULL
|
||||
WRITE
|
||||
PENDING
|
||||
CHECKPOINT
|
||||
RECOVER
|
||||
|
||||
These roughly correspond to the 5 states of a rollback lock except
|
||||
that SHARED is split out into 2 states: READ and READ_FULL and
|
||||
there is an extra RECOVER state used for wal-index reconstruction.
|
||||
|
||||
The meanings of the various wal-index locking states is as follows:
|
||||
|
||||
UNLOCKED - The wal-index is not in use.
|
||||
|
||||
READ - Some prefix of the wal-index is being read. Additional
|
||||
wal-index information can be appended at any time. The
|
||||
newly appended content will be ignored by the holder of
|
||||
the READ lock.
|
||||
|
||||
READ_FULL - The entire wal-index is being read. No new information
|
||||
can be added to the wal-index. The holder of a READ_FULL
|
||||
lock promises never to read pages from the database file
|
||||
that are available anywhere in the wal-index.
|
||||
|
||||
WRITE - It is OK to append to the wal-index file and to adjust
|
||||
the header to indicate the new "last valid frame".
|
||||
|
||||
PENDING - Waiting on all READ locks to clear so that a
|
||||
CHECKPOINT lock can be acquired.
|
||||
|
||||
CHECKPOINT - It is OK to write any WAL data into the database file
|
||||
and zero the last valid frame field of the wal-index
|
||||
header. The wal-index file itself may not be changed
|
||||
other than to zero the last valid frame field in the
|
||||
header.
|
||||
|
||||
RECOVER - Held during wal-index recovery. Used to prevent a
|
||||
race if multiple clients try to recover a wal-index at
|
||||
the same time.
|
||||
|
||||
|
||||
A particular lock manager implementation may coalesce one or more of
|
||||
the wal-index locking states, though with a reduction in concurrency.
|
||||
For example, an implemention might implement only exclusive locking,
|
||||
in which case all states would be equivalent to CHECKPOINT, meaning that
|
||||
only one reader or one writer or one checkpointer could be active at a
|
||||
time. Or, an implementation might combine READ and READ_FULL into
|
||||
a single state equivalent to READ, meaning that a writer could
|
||||
coexist with a reader, but no reader or writers could coexist with a
|
||||
checkpointer.
|
||||
|
||||
The lock manager must obey the following rules:
|
||||
|
||||
(1) A READ cannot coexist with CHECKPOINT.
|
||||
(2) A READ_FULL cannot coexist with WRITE.
|
||||
(3) None of WRITE, PENDING, CHECKPOINT, or RECOVER can coexist.
|
||||
|
||||
The SQLite core will obey the next set of rules. These rules are
|
||||
assertions on the behavior of the SQLite core which might be verified
|
||||
during testing using an instrumented lock manager.
|
||||
|
||||
(5) No part of the wal-index will be read without holding either some
|
||||
kind of SHM lock or an EXCLUSIVE lock on the original database.
|
||||
(6) A holder of a READ_FULL will never read any page of the database
|
||||
file that is contained anywhere in the wal-index.
|
||||
(7) No part of the wal-index other than the header will be written nor
|
||||
will the size of the wal-index grow without holding a WRITE.
|
||||
(8) The wal-index header will not be written without holding one of
|
||||
WRITE, CHECKPOINT, or RECOVER.
|
||||
(9) A CHECKPOINT or RECOVER must be held in order to reset the last valid
|
||||
frame counter in the header of the wal-index back to zero.
|
||||
(10) A WRITE can only increase the last valid frame pointer in the header.
|
||||
|
||||
The SQLite core will only ever send requests for UNLOCK, READ, WRITE,
|
||||
CHECKPOINT, or RECOVER to the lock manager. The SQLite core will never
|
||||
request a READ_FULL or PENDING lock though the lock manager may deliver
|
||||
those locking states in response to READ and CHECKPOINT requests,
|
||||
respectively, if and only if the requested READ or CHECKPOINT cannot
|
||||
be delivered.
|
||||
|
||||
The following are the allowed lock transitions:
|
||||
|
||||
Original-State Request New-State
|
||||
-------------- ---------- ----------
|
||||
(11a) UNLOCK READ READ
|
||||
(11b) UNLOCK READ READ_FULL
|
||||
(11c) UNLOCK CHECKPOINT PENDING
|
||||
(11d) UNLOCK CHECKPOINT CHECKPOINT
|
||||
(11e) READ UNLOCK UNLOCK
|
||||
(11f) READ WRITE WRITE
|
||||
(11g) READ RECOVER RECOVER
|
||||
(11h) READ_FULL UNLOCK UNLOCK
|
||||
(11i) READ_FULL WRITE WRITE
|
||||
(11j) READ_FULL RECOVER RECOVER
|
||||
(11k) WRITE READ READ
|
||||
(11l) PENDING UNLOCK UNLOCK
|
||||
(11m) PENDING CHECKPOINT CHECKPOINT
|
||||
(11n) CHECKPOINT UNLOCK UNLOCK
|
||||
(11o) CHECKPOINT RECOVER RECOVER
|
||||
(11p) RECOVER READ READ
|
||||
(11q) RECOVER CHECKPOINT CHECKPOINT
|
||||
|
||||
These 17 transitions are all that needs to be supported. The lock
|
||||
manager implementation can assert that fact. The other 25 possible
|
||||
transitions among the 7 locking states will never occur.
|
||||
|
||||
The rules above are sufficient for correctness. For maximum concurrency,
|
||||
the following additional considerations apply:
|
24
manifest
24
manifest
@ -1,5 +1,8 @@
|
||||
C Add\stest\scases\sto\stest\sthe\slibraries\shandling\sof\scorrupt\swal-index\sheaders.
|
||||
D 2010-05-06T11:32:09
|
||||
-----BEGIN PGP SIGNED MESSAGE-----
|
||||
Hash: SHA1
|
||||
|
||||
C Add\stwo\stext\sfiles\scontaining\spager\sdesign\snotes\sto\sthe\sdoc/\ssubfolder.
|
||||
D 2010-05-06T11:55:57
|
||||
F Makefile.arm-wince-mingw32ce-gcc fcd5e9cd67fe88836360bb4f9ef4cb7f8e2fb5a0
|
||||
F Makefile.in d83a0ffef3dcbfb08b410a6c6dd6c009ec9167fb
|
||||
F Makefile.linux-gcc d53183f4aa6a9192d249731c90dbdffbd2c68654
|
||||
@ -23,6 +26,8 @@ F configure 72c0ad7c8cfabbffeaf8ca61e1d24143cf857eb2 x
|
||||
F configure.ac 14740970ddb674d92a9f5da89083dff1179014ff
|
||||
F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad
|
||||
F doc/lemon.html f0f682f50210928c07e562621c3b7e8ab912a538
|
||||
F doc/pager-invariants.txt 870107036470d7c419e93768676fae2f8749cf9e
|
||||
F doc/vfs-shm.txt db230538d9d2170d838b7a79493bbc24a5c39788
|
||||
F ext/README.txt 913a7bd3f4837ab14d7e063304181787658b14e1
|
||||
F ext/async/README.txt 0c541f418b14b415212264cbaaf51c924ec62e5b
|
||||
F ext/async/sqlite3async.c 676066c2a111a8b3107aeb59bdbbbf335c348f4a
|
||||
@ -811,7 +816,14 @@ F tool/speedtest2.tcl ee2149167303ba8e95af97873c575c3e0fab58ff
|
||||
F tool/speedtest8.c 2902c46588c40b55661e471d7a86e4dd71a18224
|
||||
F tool/speedtest8inst1.c 293327bc76823f473684d589a8160bde1f52c14e
|
||||
F tool/vdbe-compress.tcl d70ea6d8a19e3571d7ab8c9b75cba86d1173ff0f
|
||||
P fbbcacb137e8f5246b88ad09331236aaa1900f60
|
||||
R c23cc19ecc9b2f6c130b1979d4249711
|
||||
U dan
|
||||
Z 7712493005da602cd936a05bd6144f71
|
||||
P 9465b267d420120c050bbe4f143ac824146a9e4a
|
||||
R 089c4fb1e4c5e2af451ed447b05621f8
|
||||
U drh
|
||||
Z 1e94a2049aa9a831f97a2f9882e08469
|
||||
-----BEGIN PGP SIGNATURE-----
|
||||
Version: GnuPG v1.4.6 (GNU/Linux)
|
||||
|
||||
iD8DBQFL4q5RoxKgR168RlERAgnEAJ9AxQEr7Uk8mFQoqD+OX/obdL89jACfTG+r
|
||||
XdjbjjPNKUjZXT97fSSLXx4=
|
||||
=ENzL
|
||||
-----END PGP SIGNATURE-----
|
||||
|
@ -1 +1 @@
|
||||
9465b267d420120c050bbe4f143ac824146a9e4a
|
||||
ed817fc893e7162ae0ff4022591f7e9e3b81d622
|
Loading…
x
Reference in New Issue
Block a user