
Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
src/test/isolation/README Isolation tests =============== This directory contains a set of tests for the serializable isolation level. Testing isolation requires running multiple overlapping transactions, so which requires multiple concurrent connections, and can't therefore be tested using the normal pg_regress program. To represent a test with overlapping transactions, we use a test specification file with a custom syntax, described in the next section. isolationtester is program that uses libpq to open multiple connections, and executes a test specified by a spec file. A libpq connection string to specify the server and database to connect to, the defaults derived from environment variables are used otherwise. pg_isolation_regress is a tool identical to pg_regress, but instead of using psql to execute a test, it uses isolationtester. To run the tests, you need to have a server up and running. Run gmake installcheck Test specification ================== Each isolation test is defined by a specification file, stored in the specs subdirectory. A test specification consists of five parts, in this order: setup { <SQL> } The given SQL block is executed once, in one session only, before running the test. Create any test tables or such objects here. This part is optional. teardown { <SQL> } The teardown SQL block is executed once after the test is finished. Use this to clean up, e.g dropping any test tables. This part is optional. session "<name>" Each session is executed in a separate connection. A session consists of four parts: setup, teardown and one or more steps. The per-session setup and teardown parts have the same syntax as the per-test setup and teardown described above, but they are executed in every session, before and after each permutation. The setup part typically contains a "BEGIN" command to begin a transaction. Each step has a syntax of step "<name>" { <SQL> } where <name> is a unique name identifying this step, and SQL is a SQL statement (or statements, separated by semicolons) that is executed in the step. permutation "<step name>" ... A permutation line specifies a list of steps that are ran in that order. If no permutation lines are given, the test program automatically generates all possible overlapping orderings of the given sessions. Lines beginning with a # are considered comments.