Add two text files containing pager design notes to the doc/ subfolder.

FossilOrigin-Name: ed817fc893e7162ae0ff4022591f7e9e3b81d622
This commit is contained in:
drh 2010-05-06 11:55:56 +00:00
parent 9a6b4e9adb
commit 24e3971197
4 changed files with 220 additions and 7 deletions

76
doc/pager-invariants.txt Normal file
View File

@ -0,0 +1,76 @@
*** Throughout this document, a page is deemed to have been synced
automatically as soon as it is written when PRAGMA synchronous=OFF.
Otherwise, the page is not synced until the xSync method of the VFS
is called successfully on the file containing the page.
*** Definition: A page of the database file is said to be "overwriteable" if
one or more of the following are true about the page:
(a) The original content of the page as it was at the beginning of
the transaction has been written into the rollback journal and
synced.
(b) The page was a freelist leaf page at the start of the transaction.
(c) The page number is greater than the largest page that existed in
the database file at the start of the transaction.
(1) A page of the database file is never overwritten unless one of the
following are true:
(a) The page and all other pages on the same sector are overwriteable.
(b) The atomic page write optimization is enabled, and the entire
transaction other than the update of the transaction sequence
number consists of a single page change.
(2) The content of a page written into the rollback journal exactly matches
both the content in the database when the rollback journal was written
and the content in the database at the beginning of the current
transaction.
(3) Writes to the database file are an integer multiple of the page size
in length and are aligned to a page boundary.
(4) Reads from the database file are either aligned on a page boundary and
an integer multiple of the page size in length or are taken from the
first 100 bytes of the database file.
(5) All writes to the database file are synced prior to the rollback journal
being deleted, truncated, or zeroed.
(6) If a master journal file is used, then all writes to the database file
are synced prior to the master journal being deleted.
*** Definition: Two databases (or the same database at two points it time)
are said to be "logically equivalent" if they give the same answer to
all queries. Note in particular the the content of freelist leaf
pages can be changed arbitarily without effecting the logical equivalence
of the database.
(7) At any time, if any subset, including the empty set and the total set,
of the unsynced changes to a rollback journal are removed and the
journal is rolled back, the resulting database file will be logical
equivalent to the database file at the beginning of the transaction.
(8) When a transaction is rolled back, the xTruncate method of the VFS
is called to restore the database file to the same size it was at
the beginning of the transaction. (In some VFSes, the xTruncate
method is a no-op, but that does not change the fact the SQLite will
invoke it.)
(9) Whenever the database file is modified, at least one bit in the range
of bytes from 24 through 39 inclusive will be changed prior to releasing
the EXCLUSIVE lock.
(10) The pattern of bits in bytes 24 through 39 shall not repeat in less
than one billion transactions.
(11) A database file is well-formed at the beginning and at the conclusion
of every transaction.
(12) An EXCLUSIVE lock must be held on the database file before making
any changes to the database file.
(13) A SHARED lock must be held on the database file before reading any
content out of the database file.

125
doc/vfs-shm.txt Normal file
View File

@ -0,0 +1,125 @@
The 5 states of an historical rollback lock as implemented by the
xLock, xUnlock, and xCheckReservedLock methods of the sqlite3_io_methods
objec are:
UNLOCKED
SHARED
RESERVED
PENDING
EXCLUSIVE
The wal-index file has a similar locking hierarchy implemented using
the xShmLock method of the sqlite3_vfs object, but with 7
states. Each connection to a wal-index file must be in one of
the following 7 states:
UNLOCKED
READ
READ_FULL
WRITE
PENDING
CHECKPOINT
RECOVER
These roughly correspond to the 5 states of a rollback lock except
that SHARED is split out into 2 states: READ and READ_FULL and
there is an extra RECOVER state used for wal-index reconstruction.
The meanings of the various wal-index locking states is as follows:
UNLOCKED - The wal-index is not in use.
READ - Some prefix of the wal-index is being read. Additional
wal-index information can be appended at any time. The
newly appended content will be ignored by the holder of
the READ lock.
READ_FULL - The entire wal-index is being read. No new information
can be added to the wal-index. The holder of a READ_FULL
lock promises never to read pages from the database file
that are available anywhere in the wal-index.
WRITE - It is OK to append to the wal-index file and to adjust
the header to indicate the new "last valid frame".
PENDING - Waiting on all READ locks to clear so that a
CHECKPOINT lock can be acquired.
CHECKPOINT - It is OK to write any WAL data into the database file
and zero the last valid frame field of the wal-index
header. The wal-index file itself may not be changed
other than to zero the last valid frame field in the
header.
RECOVER - Held during wal-index recovery. Used to prevent a
race if multiple clients try to recover a wal-index at
the same time.
A particular lock manager implementation may coalesce one or more of
the wal-index locking states, though with a reduction in concurrency.
For example, an implemention might implement only exclusive locking,
in which case all states would be equivalent to CHECKPOINT, meaning that
only one reader or one writer or one checkpointer could be active at a
time. Or, an implementation might combine READ and READ_FULL into
a single state equivalent to READ, meaning that a writer could
coexist with a reader, but no reader or writers could coexist with a
checkpointer.
The lock manager must obey the following rules:
(1) A READ cannot coexist with CHECKPOINT.
(2) A READ_FULL cannot coexist with WRITE.
(3) None of WRITE, PENDING, CHECKPOINT, or RECOVER can coexist.
The SQLite core will obey the next set of rules. These rules are
assertions on the behavior of the SQLite core which might be verified
during testing using an instrumented lock manager.
(5) No part of the wal-index will be read without holding either some
kind of SHM lock or an EXCLUSIVE lock on the original database.
(6) A holder of a READ_FULL will never read any page of the database
file that is contained anywhere in the wal-index.
(7) No part of the wal-index other than the header will be written nor
will the size of the wal-index grow without holding a WRITE.
(8) The wal-index header will not be written without holding one of
WRITE, CHECKPOINT, or RECOVER.
(9) A CHECKPOINT or RECOVER must be held in order to reset the last valid
frame counter in the header of the wal-index back to zero.
(10) A WRITE can only increase the last valid frame pointer in the header.
The SQLite core will only ever send requests for UNLOCK, READ, WRITE,
CHECKPOINT, or RECOVER to the lock manager. The SQLite core will never
request a READ_FULL or PENDING lock though the lock manager may deliver
those locking states in response to READ and CHECKPOINT requests,
respectively, if and only if the requested READ or CHECKPOINT cannot
be delivered.
The following are the allowed lock transitions:
Original-State Request New-State
-------------- ---------- ----------
(11a) UNLOCK READ READ
(11b) UNLOCK READ READ_FULL
(11c) UNLOCK CHECKPOINT PENDING
(11d) UNLOCK CHECKPOINT CHECKPOINT
(11e) READ UNLOCK UNLOCK
(11f) READ WRITE WRITE
(11g) READ RECOVER RECOVER
(11h) READ_FULL UNLOCK UNLOCK
(11i) READ_FULL WRITE WRITE
(11j) READ_FULL RECOVER RECOVER
(11k) WRITE READ READ
(11l) PENDING UNLOCK UNLOCK
(11m) PENDING CHECKPOINT CHECKPOINT
(11n) CHECKPOINT UNLOCK UNLOCK
(11o) CHECKPOINT RECOVER RECOVER
(11p) RECOVER READ READ
(11q) RECOVER CHECKPOINT CHECKPOINT
These 17 transitions are all that needs to be supported. The lock
manager implementation can assert that fact. The other 25 possible
transitions among the 7 locking states will never occur.
The rules above are sufficient for correctness. For maximum concurrency,
the following additional considerations apply:

View File

@ -1,5 +1,8 @@
C Add\stest\scases\sto\stest\sthe\slibraries\shandling\sof\scorrupt\swal-index\sheaders.
D 2010-05-06T11:32:09
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
C Add\stwo\stext\sfiles\scontaining\spager\sdesign\snotes\sto\sthe\sdoc/\ssubfolder.
D 2010-05-06T11:55:57
F Makefile.arm-wince-mingw32ce-gcc fcd5e9cd67fe88836360bb4f9ef4cb7f8e2fb5a0
F Makefile.in d83a0ffef3dcbfb08b410a6c6dd6c009ec9167fb
F Makefile.linux-gcc d53183f4aa6a9192d249731c90dbdffbd2c68654
@ -23,6 +26,8 @@ F configure 72c0ad7c8cfabbffeaf8ca61e1d24143cf857eb2 x
F configure.ac 14740970ddb674d92a9f5da89083dff1179014ff
F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad
F doc/lemon.html f0f682f50210928c07e562621c3b7e8ab912a538
F doc/pager-invariants.txt 870107036470d7c419e93768676fae2f8749cf9e
F doc/vfs-shm.txt db230538d9d2170d838b7a79493bbc24a5c39788
F ext/README.txt 913a7bd3f4837ab14d7e063304181787658b14e1
F ext/async/README.txt 0c541f418b14b415212264cbaaf51c924ec62e5b
F ext/async/sqlite3async.c 676066c2a111a8b3107aeb59bdbbbf335c348f4a
@ -811,7 +816,14 @@ F tool/speedtest2.tcl ee2149167303ba8e95af97873c575c3e0fab58ff
F tool/speedtest8.c 2902c46588c40b55661e471d7a86e4dd71a18224
F tool/speedtest8inst1.c 293327bc76823f473684d589a8160bde1f52c14e
F tool/vdbe-compress.tcl d70ea6d8a19e3571d7ab8c9b75cba86d1173ff0f
P fbbcacb137e8f5246b88ad09331236aaa1900f60
R c23cc19ecc9b2f6c130b1979d4249711
U dan
Z 7712493005da602cd936a05bd6144f71
P 9465b267d420120c050bbe4f143ac824146a9e4a
R 089c4fb1e4c5e2af451ed447b05621f8
U drh
Z 1e94a2049aa9a831f97a2f9882e08469
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFL4q5RoxKgR168RlERAgnEAJ9AxQEr7Uk8mFQoqD+OX/obdL89jACfTG+r
XdjbjjPNKUjZXT97fSSLXx4=
=ENzL
-----END PGP SIGNATURE-----

View File

@ -1 +1 @@
9465b267d420120c050bbe4f143ac824146a9e4a
ed817fc893e7162ae0ff4022591f7e9e3b81d622