Fix "failed to re-find parent key" btree VACUUM failure by revising page
deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.
This commit is contained in:
parent
19d0c46def
commit
70ce5c9082
@ -1,4 +1,4 @@
|
|||||||
$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.13 2006/07/25 19:13:00 tgl Exp $
|
$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.14 2006/11/01 19:43:17 tgl Exp $
|
||||||
|
|
||||||
This directory contains a correct implementation of Lehman and Yao's
|
This directory contains a correct implementation of Lehman and Yao's
|
||||||
high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
|
high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
|
||||||
@ -201,26 +201,25 @@ When we delete the last remaining child of a parent page, we mark the
|
|||||||
parent page "half-dead" as part of the atomic update that deletes the
|
parent page "half-dead" as part of the atomic update that deletes the
|
||||||
child page. This implicitly transfers the parent's key space to its right
|
child page. This implicitly transfers the parent's key space to its right
|
||||||
sibling (which it must have, since we never delete the overall-rightmost
|
sibling (which it must have, since we never delete the overall-rightmost
|
||||||
page of a level). No future insertions into the parent level are allowed
|
page of a level). Searches ignore the half-dead page and immediately move
|
||||||
to insert keys into the half-dead page --- they must move right to its
|
right. We need not worry about insertions into a half-dead page --- insertions
|
||||||
sibling, instead. The parent remains empty and can be deleted in a
|
into upper tree levels happen only as a result of splits of child pages, and
|
||||||
separate atomic action. (However, if it's the rightmost child of its own
|
the half-dead page no longer has any children that could split. Therefore
|
||||||
parent, it might have to stay half-dead for awhile, until it's also the
|
the page stays empty even when we don't have lock on it, and we can complete
|
||||||
only child.)
|
its deletion in a second atomic action.
|
||||||
|
|
||||||
Note that an empty leaf page is a valid tree state, but an empty interior
|
|
||||||
page is not legal (an interior page must have children to delegate its
|
|
||||||
key space to). So an interior page *must* be marked half-dead as soon
|
|
||||||
as its last child is deleted.
|
|
||||||
|
|
||||||
The notion of a half-dead page means that the key space relationship between
|
The notion of a half-dead page means that the key space relationship between
|
||||||
the half-dead page's level and its parent's level may be a little out of
|
the half-dead page's level and its parent's level may be a little out of
|
||||||
whack: key space that appears to belong to the half-dead page's parent on the
|
whack: key space that appears to belong to the half-dead page's parent on the
|
||||||
parent level may really belong to its right sibling. We can tolerate this,
|
parent level may really belong to its right sibling. To prevent any possible
|
||||||
however, because insertions and deletions on upper tree levels are always
|
problems, we hold lock on the deleted child page until we have finished
|
||||||
done by reference to child page numbers, not keys. The only cost is that
|
deleting any now-half-dead parent page(s). This prevents any insertions into
|
||||||
searches may sometimes descend to the half-dead page and then have to move
|
the transferred keyspace until the operation is complete. The reason for
|
||||||
right, rather than going directly to the sibling page.
|
doing this is that a sufficiently large number of insertions into the
|
||||||
|
transferred keyspace, resulting in multiple page splits, could propagate keys
|
||||||
|
from that keyspace into the parent level, resulting in transiently
|
||||||
|
out-of-order keys in that level. It is thought that that wouldn't cause any
|
||||||
|
serious problem, but it seems too risky to allow.
|
||||||
|
|
||||||
A deleted page cannot be reclaimed immediately, since there may be other
|
A deleted page cannot be reclaimed immediately, since there may be other
|
||||||
processes waiting to reference it (ie, search processes that just left the
|
processes waiting to reference it (ie, search processes that just left the
|
||||||
|
@ -8,7 +8,7 @@
|
|||||||
*
|
*
|
||||||
*
|
*
|
||||||
* IDENTIFICATION
|
* IDENTIFICATION
|
||||||
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.144 2006/10/04 00:29:48 momjian Exp $
|
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.145 2006/11/01 19:43:17 tgl Exp $
|
||||||
*
|
*
|
||||||
*-------------------------------------------------------------------------
|
*-------------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
@ -1337,8 +1337,8 @@ _bt_insert_parent(Relation rel,
|
|||||||
|
|
||||||
/* Check for error only after writing children */
|
/* Check for error only after writing children */
|
||||||
if (pbuf == InvalidBuffer)
|
if (pbuf == InvalidBuffer)
|
||||||
elog(ERROR, "failed to re-find parent key in \"%s\"",
|
elog(ERROR, "failed to re-find parent key in \"%s\" for split pages %u/%u",
|
||||||
RelationGetRelationName(rel));
|
RelationGetRelationName(rel), bknum, rbknum);
|
||||||
|
|
||||||
/* Recursively update the parent */
|
/* Recursively update the parent */
|
||||||
_bt_insertonpg(rel, pbuf, stack->bts_parent,
|
_bt_insertonpg(rel, pbuf, stack->bts_parent,
|
||||||
|
@ -9,7 +9,7 @@
|
|||||||
*
|
*
|
||||||
*
|
*
|
||||||
* IDENTIFICATION
|
* IDENTIFICATION
|
||||||
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.100 2006/10/04 00:29:49 momjian Exp $
|
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.101 2006/11/01 19:43:17 tgl Exp $
|
||||||
*
|
*
|
||||||
* NOTES
|
* NOTES
|
||||||
* Postgres btree pages look like ordinary relation pages. The opaque
|
* Postgres btree pages look like ordinary relation pages. The opaque
|
||||||
@ -723,7 +723,93 @@ _bt_delitems(Relation rel, Buffer buf,
|
|||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* _bt_pagedel() -- Delete a page from the b-tree.
|
* Subroutine to pre-check whether a page deletion is safe, that is, its
|
||||||
|
* parent page would be left in a valid or deletable state.
|
||||||
|
*
|
||||||
|
* "target" is the page we wish to delete, and "stack" is a search stack
|
||||||
|
* leading to it (approximately). Note that we will update the stack
|
||||||
|
* entry(s) to reflect current downlink positions --- this is harmless and
|
||||||
|
* indeed saves later search effort in _bt_pagedel.
|
||||||
|
*
|
||||||
|
* Note: it's OK to release page locks after checking, because a safe
|
||||||
|
* deletion can't become unsafe due to concurrent activity. A non-rightmost
|
||||||
|
* page cannot become rightmost unless there's a concurrent page deletion,
|
||||||
|
* but only VACUUM does page deletion and we only allow one VACUUM on an index
|
||||||
|
* at a time. An only child could acquire a sibling (of the same parent) only
|
||||||
|
* by being split ... but that would make it a non-rightmost child so the
|
||||||
|
* deletion is still safe.
|
||||||
|
*/
|
||||||
|
static bool
|
||||||
|
_bt_parent_deletion_safe(Relation rel, BlockNumber target, BTStack stack)
|
||||||
|
{
|
||||||
|
BlockNumber parent;
|
||||||
|
OffsetNumber poffset,
|
||||||
|
maxoff;
|
||||||
|
Buffer pbuf;
|
||||||
|
Page page;
|
||||||
|
BTPageOpaque opaque;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* In recovery mode, assume the deletion being replayed is valid. We
|
||||||
|
* can't always check it because we won't have a full search stack,
|
||||||
|
* and we should complain if there's a problem, anyway.
|
||||||
|
*/
|
||||||
|
if (InRecovery)
|
||||||
|
return true;
|
||||||
|
|
||||||
|
/* Locate the parent's downlink (updating the stack entry if needed) */
|
||||||
|
ItemPointerSet(&(stack->bts_btentry.t_tid), target, P_HIKEY);
|
||||||
|
pbuf = _bt_getstackbuf(rel, stack, BT_READ);
|
||||||
|
if (pbuf == InvalidBuffer)
|
||||||
|
elog(ERROR, "failed to re-find parent key in \"%s\" for deletion target page %u",
|
||||||
|
RelationGetRelationName(rel), target);
|
||||||
|
parent = stack->bts_blkno;
|
||||||
|
poffset = stack->bts_offset;
|
||||||
|
|
||||||
|
page = BufferGetPage(pbuf);
|
||||||
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
||||||
|
maxoff = PageGetMaxOffsetNumber(page);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If the target is the rightmost child of its parent, then we can't
|
||||||
|
* delete, unless it's also the only child.
|
||||||
|
*/
|
||||||
|
if (poffset >= maxoff)
|
||||||
|
{
|
||||||
|
/* It's rightmost child... */
|
||||||
|
if (poffset == P_FIRSTDATAKEY(opaque))
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* It's only child, so safe if parent would itself be removable.
|
||||||
|
* We have to check the parent itself, and then recurse to
|
||||||
|
* test the conditions at the parent's parent.
|
||||||
|
*/
|
||||||
|
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque))
|
||||||
|
{
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
return _bt_parent_deletion_safe(rel, parent, stack->bts_parent);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* Unsafe to delete */
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* Not rightmost child, so safe to delete */
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* _bt_pagedel() -- Delete a page from the b-tree, if legal to do so.
|
||||||
*
|
*
|
||||||
* This action unlinks the page from the b-tree structure, removing all
|
* This action unlinks the page from the b-tree structure, removing all
|
||||||
* pointers leading to it --- but not touching its own left and right links.
|
* pointers leading to it --- but not touching its own left and right links.
|
||||||
@ -731,19 +817,25 @@ _bt_delitems(Relation rel, Buffer buf,
|
|||||||
* may currently be trying to follow links leading to the page; they have to
|
* may currently be trying to follow links leading to the page; they have to
|
||||||
* be allowed to use its right-link to recover. See nbtree/README.
|
* be allowed to use its right-link to recover. See nbtree/README.
|
||||||
*
|
*
|
||||||
* On entry, the target buffer must be pinned and read-locked. This lock and
|
* On entry, the target buffer must be pinned and locked (either read or write
|
||||||
* pin will be dropped before exiting.
|
* lock is OK). This lock and pin will be dropped before exiting.
|
||||||
*
|
*
|
||||||
* Returns the number of pages successfully deleted (zero on failure; could
|
* The "stack" argument can be a search stack leading (approximately) to the
|
||||||
* be more than one if parent blocks were deleted).
|
* target page, or NULL --- outside callers typically pass NULL since they
|
||||||
|
* have not done such a search, but internal recursion cases pass the stack
|
||||||
|
* to avoid duplicated search effort.
|
||||||
|
*
|
||||||
|
* Returns the number of pages successfully deleted (zero if page cannot
|
||||||
|
* be deleted now; could be more than one if parent pages were deleted too).
|
||||||
*
|
*
|
||||||
* NOTE: this leaks memory. Rather than trying to clean up everything
|
* NOTE: this leaks memory. Rather than trying to clean up everything
|
||||||
* carefully, it's better to run it in a temp context that can be reset
|
* carefully, it's better to run it in a temp context that can be reset
|
||||||
* frequently.
|
* frequently.
|
||||||
*/
|
*/
|
||||||
int
|
int
|
||||||
_bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
_bt_pagedel(Relation rel, Buffer buf, BTStack stack, bool vacuum_full)
|
||||||
{
|
{
|
||||||
|
int result;
|
||||||
BlockNumber target,
|
BlockNumber target,
|
||||||
leftsib,
|
leftsib,
|
||||||
rightsib,
|
rightsib,
|
||||||
@ -756,7 +848,6 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
IndexTuple targetkey,
|
IndexTuple targetkey,
|
||||||
itup;
|
itup;
|
||||||
ScanKey itup_scankey;
|
ScanKey itup_scankey;
|
||||||
BTStack stack;
|
|
||||||
Buffer lbuf,
|
Buffer lbuf,
|
||||||
rbuf,
|
rbuf,
|
||||||
pbuf;
|
pbuf;
|
||||||
@ -778,6 +869,9 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) || P_ISDELETED(opaque) ||
|
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) || P_ISDELETED(opaque) ||
|
||||||
P_FIRSTDATAKEY(opaque) <= PageGetMaxOffsetNumber(page))
|
P_FIRSTDATAKEY(opaque) <= PageGetMaxOffsetNumber(page))
|
||||||
{
|
{
|
||||||
|
/* Should never fail to delete a half-dead page */
|
||||||
|
Assert(!P_ISHALFDEAD(opaque));
|
||||||
|
|
||||||
_bt_relbuf(rel, buf);
|
_bt_relbuf(rel, buf);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
@ -793,36 +887,79 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
targetkey = CopyIndexTuple((IndexTuple) PageGetItem(page, itemid));
|
targetkey = CopyIndexTuple((IndexTuple) PageGetItem(page, itemid));
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We need to get an approximate pointer to the page's parent page. Use
|
* To avoid deadlocks, we'd better drop the target page lock before
|
||||||
* the standard search mechanism to search for the page's high key; this
|
* going further.
|
||||||
* will give us a link to either the current parent or someplace to its
|
|
||||||
* left (if there are multiple equal high keys). To avoid deadlocks, we'd
|
|
||||||
* better drop the target page lock first.
|
|
||||||
*/
|
*/
|
||||||
_bt_relbuf(rel, buf);
|
_bt_relbuf(rel, buf);
|
||||||
/* we need an insertion scan key to do our search, so build one */
|
|
||||||
itup_scankey = _bt_mkscankey(rel, targetkey);
|
|
||||||
/* find the leftmost leaf page containing this key */
|
|
||||||
stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey, false,
|
|
||||||
&lbuf, BT_READ);
|
|
||||||
/* don't need a pin on that either */
|
|
||||||
_bt_relbuf(rel, lbuf);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If we are trying to delete an interior page, _bt_search did more than
|
* We need an approximate pointer to the page's parent page. We use
|
||||||
* we needed. Locate the stack item pointing to our parent level.
|
* the standard search mechanism to search for the page's high key; this
|
||||||
|
* will give us a link to either the current parent or someplace to its
|
||||||
|
* left (if there are multiple equal high keys). In recursion cases,
|
||||||
|
* the caller already generated a search stack and we can just re-use
|
||||||
|
* that work.
|
||||||
*/
|
*/
|
||||||
ilevel = 0;
|
if (stack == NULL)
|
||||||
for (;;)
|
|
||||||
{
|
{
|
||||||
if (stack == NULL)
|
if (!InRecovery)
|
||||||
elog(ERROR, "not enough stack items");
|
{
|
||||||
if (ilevel == targetlevel)
|
/* we need an insertion scan key to do our search, so build one */
|
||||||
break;
|
itup_scankey = _bt_mkscankey(rel, targetkey);
|
||||||
stack = stack->bts_parent;
|
/* find the leftmost leaf page containing this key */
|
||||||
ilevel++;
|
stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey, false,
|
||||||
|
&lbuf, BT_READ);
|
||||||
|
/* don't need a pin on that either */
|
||||||
|
_bt_relbuf(rel, lbuf);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If we are trying to delete an interior page, _bt_search did
|
||||||
|
* more than we needed. Locate the stack item pointing to our
|
||||||
|
* parent level.
|
||||||
|
*/
|
||||||
|
ilevel = 0;
|
||||||
|
for (;;)
|
||||||
|
{
|
||||||
|
if (stack == NULL)
|
||||||
|
elog(ERROR, "not enough stack items");
|
||||||
|
if (ilevel == targetlevel)
|
||||||
|
break;
|
||||||
|
stack = stack->bts_parent;
|
||||||
|
ilevel++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* During WAL recovery, we can't use _bt_search (for one reason,
|
||||||
|
* it might invoke user-defined comparison functions that expect
|
||||||
|
* facilities not available in recovery mode). Instead, just
|
||||||
|
* set up a dummy stack pointing to the left end of the parent
|
||||||
|
* tree level, from which _bt_getstackbuf will walk right to the
|
||||||
|
* parent page. Painful, but we don't care too much about
|
||||||
|
* performance in this scenario.
|
||||||
|
*/
|
||||||
|
pbuf = _bt_get_endpoint(rel, targetlevel + 1, false);
|
||||||
|
stack = (BTStack) palloc(sizeof(BTStackData));
|
||||||
|
stack->bts_blkno = BufferGetBlockNumber(pbuf);
|
||||||
|
stack->bts_offset = InvalidOffsetNumber;
|
||||||
|
/* bts_btentry will be initialized below */
|
||||||
|
stack->bts_parent = NULL;
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We cannot delete a page that is the rightmost child of its immediate
|
||||||
|
* parent, unless it is the only child --- in which case the parent has
|
||||||
|
* to be deleted too, and the same condition applies recursively to it.
|
||||||
|
* We have to check this condition all the way up before trying to delete.
|
||||||
|
* We don't need to re-test when deleting a non-leaf page, though.
|
||||||
|
*/
|
||||||
|
if (targetlevel == 0 &&
|
||||||
|
!_bt_parent_deletion_safe(rel, target, stack))
|
||||||
|
return 0;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We have to lock the pages we need to modify in the standard order:
|
* We have to lock the pages we need to modify in the standard order:
|
||||||
* moving right, then up. Else we will deadlock against other writers.
|
* moving right, then up. Else we will deadlock against other writers.
|
||||||
@ -898,15 +1035,16 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
ItemPointerSet(&(stack->bts_btentry.t_tid), target, P_HIKEY);
|
ItemPointerSet(&(stack->bts_btentry.t_tid), target, P_HIKEY);
|
||||||
pbuf = _bt_getstackbuf(rel, stack, BT_WRITE);
|
pbuf = _bt_getstackbuf(rel, stack, BT_WRITE);
|
||||||
if (pbuf == InvalidBuffer)
|
if (pbuf == InvalidBuffer)
|
||||||
elog(ERROR, "failed to re-find parent key in \"%s\"",
|
elog(ERROR, "failed to re-find parent key in \"%s\" for deletion target page %u",
|
||||||
RelationGetRelationName(rel));
|
RelationGetRelationName(rel), target);
|
||||||
parent = stack->bts_blkno;
|
parent = stack->bts_blkno;
|
||||||
poffset = stack->bts_offset;
|
poffset = stack->bts_offset;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If the target is the rightmost child of its parent, then we can't
|
* If the target is the rightmost child of its parent, then we can't
|
||||||
* delete, unless it's also the only child --- in which case the parent
|
* delete, unless it's also the only child --- in which case the parent
|
||||||
* changes to half-dead status.
|
* changes to half-dead status. The "can't delete" case should have been
|
||||||
|
* detected by _bt_parent_deletion_safe, so complain if we see it now.
|
||||||
*/
|
*/
|
||||||
page = BufferGetPage(pbuf);
|
page = BufferGetPage(pbuf);
|
||||||
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
||||||
@ -918,14 +1056,8 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
if (poffset == P_FIRSTDATAKEY(opaque))
|
if (poffset == P_FIRSTDATAKEY(opaque))
|
||||||
parent_half_dead = true;
|
parent_half_dead = true;
|
||||||
else
|
else
|
||||||
{
|
elog(ERROR, "failed to delete rightmost child %u of %u in \"%s\"",
|
||||||
_bt_relbuf(rel, pbuf);
|
target, parent, RelationGetRelationName(rel));
|
||||||
_bt_relbuf(rel, rbuf);
|
|
||||||
_bt_relbuf(rel, buf);
|
|
||||||
if (BufferIsValid(lbuf))
|
|
||||||
_bt_relbuf(rel, lbuf);
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
@ -940,10 +1072,13 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
* might be possible to push the fast root even further down, but the odds
|
* might be possible to push the fast root even further down, but the odds
|
||||||
* of doing so are slim, and the locking considerations daunting.)
|
* of doing so are slim, and the locking considerations daunting.)
|
||||||
*
|
*
|
||||||
|
* We don't support handling this in the case where the parent is
|
||||||
|
* becoming half-dead, even though it theoretically could occur.
|
||||||
|
*
|
||||||
* We can safely acquire a lock on the metapage here --- see comments for
|
* We can safely acquire a lock on the metapage here --- see comments for
|
||||||
* _bt_newroot().
|
* _bt_newroot().
|
||||||
*/
|
*/
|
||||||
if (leftsib == P_NONE)
|
if (leftsib == P_NONE && !parent_half_dead)
|
||||||
{
|
{
|
||||||
page = BufferGetPage(rbuf);
|
page = BufferGetPage(rbuf);
|
||||||
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
||||||
@ -1031,6 +1166,7 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
*/
|
*/
|
||||||
page = BufferGetPage(buf);
|
page = BufferGetPage(buf);
|
||||||
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
|
||||||
|
opaque->btpo_flags &= ~BTP_HALF_DEAD;
|
||||||
opaque->btpo_flags |= BTP_DELETED;
|
opaque->btpo_flags |= BTP_DELETED;
|
||||||
opaque->btpo.xact =
|
opaque->btpo.xact =
|
||||||
vacuum_full ? FrozenTransactionId : ReadNewTransactionId();
|
vacuum_full ? FrozenTransactionId : ReadNewTransactionId();
|
||||||
@ -1085,6 +1221,8 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
nextrdata++;
|
nextrdata++;
|
||||||
xlinfo = XLOG_BTREE_DELETE_PAGE_META;
|
xlinfo = XLOG_BTREE_DELETE_PAGE_META;
|
||||||
}
|
}
|
||||||
|
else if (parent_half_dead)
|
||||||
|
xlinfo = XLOG_BTREE_DELETE_PAGE_HALF;
|
||||||
else
|
else
|
||||||
xlinfo = XLOG_BTREE_DELETE_PAGE;
|
xlinfo = XLOG_BTREE_DELETE_PAGE;
|
||||||
|
|
||||||
@ -1138,34 +1276,52 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
|
|||||||
|
|
||||||
END_CRIT_SECTION();
|
END_CRIT_SECTION();
|
||||||
|
|
||||||
/* release buffers; send out relcache inval if metapage changed */
|
/* release metapage; send out relcache inval if metapage changed */
|
||||||
if (BufferIsValid(metabuf))
|
if (BufferIsValid(metabuf))
|
||||||
{
|
{
|
||||||
CacheInvalidateRelcache(rel);
|
CacheInvalidateRelcache(rel);
|
||||||
_bt_relbuf(rel, metabuf);
|
_bt_relbuf(rel, metabuf);
|
||||||
}
|
}
|
||||||
_bt_relbuf(rel, pbuf);
|
/* can always release leftsib immediately */
|
||||||
_bt_relbuf(rel, rbuf);
|
|
||||||
_bt_relbuf(rel, buf);
|
|
||||||
if (BufferIsValid(lbuf))
|
if (BufferIsValid(lbuf))
|
||||||
_bt_relbuf(rel, lbuf);
|
_bt_relbuf(rel, lbuf);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If parent became half dead, recurse to try to delete it. Otherwise, if
|
* If parent became half dead, recurse to delete it. Otherwise, if
|
||||||
* right sibling is empty and is now the last child of the parent, recurse
|
* right sibling is empty and is now the last child of the parent, recurse
|
||||||
* to try to delete it. (These cases cannot apply at the same time,
|
* to try to delete it. (These cases cannot apply at the same time,
|
||||||
* though the second case might itself recurse to the first.)
|
* though the second case might itself recurse to the first.)
|
||||||
|
*
|
||||||
|
* When recursing to parent, we hold the lock on the target page until
|
||||||
|
* done. This delays any insertions into the keyspace that was just
|
||||||
|
* effectively reassigned to the parent's right sibling. If we allowed
|
||||||
|
* that, and there were enough such insertions before we finish deleting
|
||||||
|
* the parent, page splits within that keyspace could lead to inserting
|
||||||
|
* out-of-order keys into the grandparent level. It is thought that that
|
||||||
|
* wouldn't have any serious consequences, but it still seems like a
|
||||||
|
* pretty bad idea.
|
||||||
*/
|
*/
|
||||||
if (parent_half_dead)
|
if (parent_half_dead)
|
||||||
{
|
{
|
||||||
buf = _bt_getbuf(rel, parent, BT_READ);
|
/* recursive call will release pbuf */
|
||||||
return _bt_pagedel(rel, buf, vacuum_full) + 1;
|
_bt_relbuf(rel, rbuf);
|
||||||
|
result = _bt_pagedel(rel, pbuf, stack->bts_parent, vacuum_full) + 1;
|
||||||
|
_bt_relbuf(rel, buf);
|
||||||
}
|
}
|
||||||
if (parent_one_child && rightsib_empty)
|
else if (parent_one_child && rightsib_empty)
|
||||||
{
|
{
|
||||||
buf = _bt_getbuf(rel, rightsib, BT_READ);
|
_bt_relbuf(rel, pbuf);
|
||||||
return _bt_pagedel(rel, buf, vacuum_full) + 1;
|
_bt_relbuf(rel, buf);
|
||||||
|
/* recursive call will release rbuf */
|
||||||
|
result = _bt_pagedel(rel, rbuf, stack, vacuum_full) + 1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
_bt_relbuf(rel, pbuf);
|
||||||
|
_bt_relbuf(rel, buf);
|
||||||
|
_bt_relbuf(rel, rbuf);
|
||||||
|
result = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
return 1;
|
return result;
|
||||||
}
|
}
|
||||||
|
@ -12,7 +12,7 @@
|
|||||||
* Portions Copyright (c) 1994, Regents of the University of California
|
* Portions Copyright (c) 1994, Regents of the University of California
|
||||||
*
|
*
|
||||||
* IDENTIFICATION
|
* IDENTIFICATION
|
||||||
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtree.c,v 1.152 2006/10/04 00:29:49 momjian Exp $
|
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtree.c,v 1.153 2006/11/01 19:43:17 tgl Exp $
|
||||||
*
|
*
|
||||||
*-------------------------------------------------------------------------
|
*-------------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
@ -804,8 +804,7 @@ restart:
|
|||||||
if (blkno != orig_blkno)
|
if (blkno != orig_blkno)
|
||||||
{
|
{
|
||||||
if (_bt_page_recyclable(page) ||
|
if (_bt_page_recyclable(page) ||
|
||||||
P_ISDELETED(opaque) ||
|
P_IGNORE(opaque) ||
|
||||||
(opaque->btpo_flags & BTP_HALF_DEAD) ||
|
|
||||||
!P_ISLEAF(opaque) ||
|
!P_ISLEAF(opaque) ||
|
||||||
opaque->btpo_cycleid != vstate->cycleid)
|
opaque->btpo_cycleid != vstate->cycleid)
|
||||||
{
|
{
|
||||||
@ -828,7 +827,7 @@ restart:
|
|||||||
/* Already deleted, but can't recycle yet */
|
/* Already deleted, but can't recycle yet */
|
||||||
stats->pages_deleted++;
|
stats->pages_deleted++;
|
||||||
}
|
}
|
||||||
else if (opaque->btpo_flags & BTP_HALF_DEAD)
|
else if (P_ISHALFDEAD(opaque))
|
||||||
{
|
{
|
||||||
/* Half-dead, try to delete */
|
/* Half-dead, try to delete */
|
||||||
delete_now = true;
|
delete_now = true;
|
||||||
@ -939,7 +938,7 @@ restart:
|
|||||||
MemoryContextReset(vstate->pagedelcontext);
|
MemoryContextReset(vstate->pagedelcontext);
|
||||||
oldcontext = MemoryContextSwitchTo(vstate->pagedelcontext);
|
oldcontext = MemoryContextSwitchTo(vstate->pagedelcontext);
|
||||||
|
|
||||||
ndel = _bt_pagedel(rel, buf, info->vacuum_full);
|
ndel = _bt_pagedel(rel, buf, NULL, info->vacuum_full);
|
||||||
|
|
||||||
/* count only this page, else may double-count parent */
|
/* count only this page, else may double-count parent */
|
||||||
if (ndel)
|
if (ndel)
|
||||||
|
@ -8,7 +8,7 @@
|
|||||||
* Portions Copyright (c) 1994, Regents of the University of California
|
* Portions Copyright (c) 1994, Regents of the University of California
|
||||||
*
|
*
|
||||||
* IDENTIFICATION
|
* IDENTIFICATION
|
||||||
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.38 2006/10/04 00:29:49 momjian Exp $
|
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.39 2006/11/01 19:43:17 tgl Exp $
|
||||||
*
|
*
|
||||||
*-------------------------------------------------------------------------
|
*-------------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
@ -22,31 +22,41 @@
|
|||||||
* them manually if they are not seen in the WAL log during replay. This
|
* them manually if they are not seen in the WAL log during replay. This
|
||||||
* makes it safe for page insertion to be a multiple-WAL-action process.
|
* makes it safe for page insertion to be a multiple-WAL-action process.
|
||||||
*
|
*
|
||||||
|
* Similarly, deletion of an only child page and deletion of its parent page
|
||||||
|
* form multiple WAL log entries, and we have to be prepared to follow through
|
||||||
|
* with the deletion if the log ends between.
|
||||||
|
*
|
||||||
* The data structure is a simple linked list --- this should be good enough,
|
* The data structure is a simple linked list --- this should be good enough,
|
||||||
* since we don't expect a page split to remain incomplete for long.
|
* since we don't expect a page split or multi deletion to remain incomplete
|
||||||
|
* for long. In any case we need to respect the order of operations.
|
||||||
*/
|
*/
|
||||||
typedef struct bt_incomplete_split
|
typedef struct bt_incomplete_action
|
||||||
{
|
{
|
||||||
RelFileNode node; /* the index */
|
RelFileNode node; /* the index */
|
||||||
|
bool is_split; /* T = pending split, F = pending delete */
|
||||||
|
/* these fields are for a split: */
|
||||||
|
bool is_root; /* we split the root */
|
||||||
BlockNumber leftblk; /* left half of split */
|
BlockNumber leftblk; /* left half of split */
|
||||||
BlockNumber rightblk; /* right half of split */
|
BlockNumber rightblk; /* right half of split */
|
||||||
bool is_root; /* we split the root */
|
/* these fields are for a delete: */
|
||||||
} bt_incomplete_split;
|
BlockNumber delblk; /* parent block to be deleted */
|
||||||
|
} bt_incomplete_action;
|
||||||
|
|
||||||
static List *incomplete_splits;
|
static List *incomplete_actions;
|
||||||
|
|
||||||
|
|
||||||
static void
|
static void
|
||||||
log_incomplete_split(RelFileNode node, BlockNumber leftblk,
|
log_incomplete_split(RelFileNode node, BlockNumber leftblk,
|
||||||
BlockNumber rightblk, bool is_root)
|
BlockNumber rightblk, bool is_root)
|
||||||
{
|
{
|
||||||
bt_incomplete_split *split = palloc(sizeof(bt_incomplete_split));
|
bt_incomplete_action *action = palloc(sizeof(bt_incomplete_action));
|
||||||
|
|
||||||
split->node = node;
|
action->node = node;
|
||||||
split->leftblk = leftblk;
|
action->is_split = true;
|
||||||
split->rightblk = rightblk;
|
action->is_root = is_root;
|
||||||
split->is_root = is_root;
|
action->leftblk = leftblk;
|
||||||
incomplete_splits = lappend(incomplete_splits, split);
|
action->rightblk = rightblk;
|
||||||
|
incomplete_actions = lappend(incomplete_actions, action);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
@ -54,17 +64,50 @@ forget_matching_split(RelFileNode node, BlockNumber downlink, bool is_root)
|
|||||||
{
|
{
|
||||||
ListCell *l;
|
ListCell *l;
|
||||||
|
|
||||||
foreach(l, incomplete_splits)
|
foreach(l, incomplete_actions)
|
||||||
{
|
{
|
||||||
bt_incomplete_split *split = (bt_incomplete_split *) lfirst(l);
|
bt_incomplete_action *action = (bt_incomplete_action *) lfirst(l);
|
||||||
|
|
||||||
if (RelFileNodeEquals(node, split->node) &&
|
if (RelFileNodeEquals(node, action->node) &&
|
||||||
downlink == split->rightblk)
|
action->is_split &&
|
||||||
|
downlink == action->rightblk)
|
||||||
{
|
{
|
||||||
if (is_root != split->is_root)
|
if (is_root != action->is_root)
|
||||||
elog(LOG, "forget_matching_split: fishy is_root data (expected %d, got %d)",
|
elog(LOG, "forget_matching_split: fishy is_root data (expected %d, got %d)",
|
||||||
split->is_root, is_root);
|
action->is_root, is_root);
|
||||||
incomplete_splits = list_delete_ptr(incomplete_splits, split);
|
incomplete_actions = list_delete_ptr(incomplete_actions, action);
|
||||||
|
pfree(action);
|
||||||
|
break; /* need not look further */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
log_incomplete_deletion(RelFileNode node, BlockNumber delblk)
|
||||||
|
{
|
||||||
|
bt_incomplete_action *action = palloc(sizeof(bt_incomplete_action));
|
||||||
|
|
||||||
|
action->node = node;
|
||||||
|
action->is_split = false;
|
||||||
|
action->delblk = delblk;
|
||||||
|
incomplete_actions = lappend(incomplete_actions, action);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
forget_matching_deletion(RelFileNode node, BlockNumber delblk)
|
||||||
|
{
|
||||||
|
ListCell *l;
|
||||||
|
|
||||||
|
foreach(l, incomplete_actions)
|
||||||
|
{
|
||||||
|
bt_incomplete_action *action = (bt_incomplete_action *) lfirst(l);
|
||||||
|
|
||||||
|
if (RelFileNodeEquals(node, action->node) &&
|
||||||
|
!action->is_split &&
|
||||||
|
delblk == action->delblk)
|
||||||
|
{
|
||||||
|
incomplete_actions = list_delete_ptr(incomplete_actions, action);
|
||||||
|
pfree(action);
|
||||||
break; /* need not look further */
|
break; /* need not look further */
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -389,8 +432,7 @@ btree_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
|
|||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
btree_xlog_delete_page(bool ismeta,
|
btree_xlog_delete_page(uint8 info, XLogRecPtr lsn, XLogRecord *record)
|
||||||
XLogRecPtr lsn, XLogRecord *record)
|
|
||||||
{
|
{
|
||||||
xl_btree_delete_page *xlrec = (xl_btree_delete_page *) XLogRecGetData(record);
|
xl_btree_delete_page *xlrec = (xl_btree_delete_page *) XLogRecGetData(record);
|
||||||
Relation reln;
|
Relation reln;
|
||||||
@ -427,6 +469,7 @@ btree_xlog_delete_page(bool ismeta,
|
|||||||
poffset = ItemPointerGetOffsetNumber(&(xlrec->target.tid));
|
poffset = ItemPointerGetOffsetNumber(&(xlrec->target.tid));
|
||||||
if (poffset >= PageGetMaxOffsetNumber(page))
|
if (poffset >= PageGetMaxOffsetNumber(page))
|
||||||
{
|
{
|
||||||
|
Assert(info == XLOG_BTREE_DELETE_PAGE_HALF);
|
||||||
Assert(poffset == P_FIRSTDATAKEY(pageop));
|
Assert(poffset == P_FIRSTDATAKEY(pageop));
|
||||||
PageIndexTupleDelete(page, poffset);
|
PageIndexTupleDelete(page, poffset);
|
||||||
pageop->btpo_flags |= BTP_HALF_DEAD;
|
pageop->btpo_flags |= BTP_HALF_DEAD;
|
||||||
@ -437,6 +480,7 @@ btree_xlog_delete_page(bool ismeta,
|
|||||||
IndexTuple itup;
|
IndexTuple itup;
|
||||||
OffsetNumber nextoffset;
|
OffsetNumber nextoffset;
|
||||||
|
|
||||||
|
Assert(info != XLOG_BTREE_DELETE_PAGE_HALF);
|
||||||
itemid = PageGetItemId(page, poffset);
|
itemid = PageGetItemId(page, poffset);
|
||||||
itup = (IndexTuple) PageGetItem(page, itemid);
|
itup = (IndexTuple) PageGetItem(page, itemid);
|
||||||
ItemPointerSet(&(itup->t_tid), rightsib, P_HIKEY);
|
ItemPointerSet(&(itup->t_tid), rightsib, P_HIKEY);
|
||||||
@ -523,7 +567,7 @@ btree_xlog_delete_page(bool ismeta,
|
|||||||
UnlockReleaseBuffer(buffer);
|
UnlockReleaseBuffer(buffer);
|
||||||
|
|
||||||
/* Update metapage if needed */
|
/* Update metapage if needed */
|
||||||
if (ismeta)
|
if (info == XLOG_BTREE_DELETE_PAGE_META)
|
||||||
{
|
{
|
||||||
xl_btree_metadata md;
|
xl_btree_metadata md;
|
||||||
|
|
||||||
@ -533,6 +577,13 @@ btree_xlog_delete_page(bool ismeta,
|
|||||||
md.root, md.level,
|
md.root, md.level,
|
||||||
md.fastroot, md.fastlevel);
|
md.fastroot, md.fastlevel);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Forget any completed deletion */
|
||||||
|
forget_matching_deletion(xlrec->target.node, target);
|
||||||
|
|
||||||
|
/* If parent became half-dead, remember it for deletion */
|
||||||
|
if (info == XLOG_BTREE_DELETE_PAGE_HALF)
|
||||||
|
log_incomplete_deletion(xlrec->target.node, parent);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
@ -620,10 +671,9 @@ btree_redo(XLogRecPtr lsn, XLogRecord *record)
|
|||||||
btree_xlog_delete(lsn, record);
|
btree_xlog_delete(lsn, record);
|
||||||
break;
|
break;
|
||||||
case XLOG_BTREE_DELETE_PAGE:
|
case XLOG_BTREE_DELETE_PAGE:
|
||||||
btree_xlog_delete_page(false, lsn, record);
|
|
||||||
break;
|
|
||||||
case XLOG_BTREE_DELETE_PAGE_META:
|
case XLOG_BTREE_DELETE_PAGE_META:
|
||||||
btree_xlog_delete_page(true, lsn, record);
|
case XLOG_BTREE_DELETE_PAGE_HALF:
|
||||||
|
btree_xlog_delete_page(info, lsn, record);
|
||||||
break;
|
break;
|
||||||
case XLOG_BTREE_NEWROOT:
|
case XLOG_BTREE_NEWROOT:
|
||||||
btree_xlog_newroot(lsn, record);
|
btree_xlog_newroot(lsn, record);
|
||||||
@ -724,6 +774,7 @@ btree_desc(StringInfo buf, uint8 xl_info, char *rec)
|
|||||||
}
|
}
|
||||||
case XLOG_BTREE_DELETE_PAGE:
|
case XLOG_BTREE_DELETE_PAGE:
|
||||||
case XLOG_BTREE_DELETE_PAGE_META:
|
case XLOG_BTREE_DELETE_PAGE_META:
|
||||||
|
case XLOG_BTREE_DELETE_PAGE_HALF:
|
||||||
{
|
{
|
||||||
xl_btree_delete_page *xlrec = (xl_btree_delete_page *) rec;
|
xl_btree_delete_page *xlrec = (xl_btree_delete_page *) rec;
|
||||||
|
|
||||||
@ -752,7 +803,7 @@ btree_desc(StringInfo buf, uint8 xl_info, char *rec)
|
|||||||
void
|
void
|
||||||
btree_xlog_startup(void)
|
btree_xlog_startup(void)
|
||||||
{
|
{
|
||||||
incomplete_splits = NIL;
|
incomplete_actions = NIL;
|
||||||
}
|
}
|
||||||
|
|
||||||
void
|
void
|
||||||
@ -760,45 +811,60 @@ btree_xlog_cleanup(void)
|
|||||||
{
|
{
|
||||||
ListCell *l;
|
ListCell *l;
|
||||||
|
|
||||||
foreach(l, incomplete_splits)
|
foreach(l, incomplete_actions)
|
||||||
{
|
{
|
||||||
bt_incomplete_split *split = (bt_incomplete_split *) lfirst(l);
|
bt_incomplete_action *action = (bt_incomplete_action *) lfirst(l);
|
||||||
Relation reln;
|
Relation reln;
|
||||||
Buffer lbuf,
|
|
||||||
rbuf;
|
|
||||||
Page lpage,
|
|
||||||
rpage;
|
|
||||||
BTPageOpaque lpageop,
|
|
||||||
rpageop;
|
|
||||||
bool is_only;
|
|
||||||
|
|
||||||
reln = XLogOpenRelation(split->node);
|
reln = XLogOpenRelation(action->node);
|
||||||
lbuf = XLogReadBuffer(reln, split->leftblk, false);
|
if (action->is_split)
|
||||||
/* failure should be impossible because we wrote this page earlier */
|
{
|
||||||
if (!BufferIsValid(lbuf))
|
/* finish an incomplete split */
|
||||||
elog(PANIC, "btree_xlog_cleanup: left block unfound");
|
Buffer lbuf,
|
||||||
lpage = (Page) BufferGetPage(lbuf);
|
rbuf;
|
||||||
lpageop = (BTPageOpaque) PageGetSpecialPointer(lpage);
|
Page lpage,
|
||||||
rbuf = XLogReadBuffer(reln, split->rightblk, false);
|
rpage;
|
||||||
/* failure should be impossible because we wrote this page earlier */
|
BTPageOpaque lpageop,
|
||||||
if (!BufferIsValid(rbuf))
|
rpageop;
|
||||||
elog(PANIC, "btree_xlog_cleanup: right block unfound");
|
bool is_only;
|
||||||
rpage = (Page) BufferGetPage(rbuf);
|
|
||||||
rpageop = (BTPageOpaque) PageGetSpecialPointer(rpage);
|
|
||||||
|
|
||||||
/* if the two pages are all of their level, it's a only-page split */
|
lbuf = XLogReadBuffer(reln, action->leftblk, false);
|
||||||
is_only = P_LEFTMOST(lpageop) && P_RIGHTMOST(rpageop);
|
/* failure is impossible because we wrote this page earlier */
|
||||||
|
if (!BufferIsValid(lbuf))
|
||||||
|
elog(PANIC, "btree_xlog_cleanup: left block unfound");
|
||||||
|
lpage = (Page) BufferGetPage(lbuf);
|
||||||
|
lpageop = (BTPageOpaque) PageGetSpecialPointer(lpage);
|
||||||
|
rbuf = XLogReadBuffer(reln, action->rightblk, false);
|
||||||
|
/* failure is impossible because we wrote this page earlier */
|
||||||
|
if (!BufferIsValid(rbuf))
|
||||||
|
elog(PANIC, "btree_xlog_cleanup: right block unfound");
|
||||||
|
rpage = (Page) BufferGetPage(rbuf);
|
||||||
|
rpageop = (BTPageOpaque) PageGetSpecialPointer(rpage);
|
||||||
|
|
||||||
_bt_insert_parent(reln, lbuf, rbuf, NULL,
|
/* if the pages are all of their level, it's a only-page split */
|
||||||
split->is_root, is_only);
|
is_only = P_LEFTMOST(lpageop) && P_RIGHTMOST(rpageop);
|
||||||
|
|
||||||
|
_bt_insert_parent(reln, lbuf, rbuf, NULL,
|
||||||
|
action->is_root, is_only);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* finish an incomplete deletion (of a half-dead page) */
|
||||||
|
Buffer buf;
|
||||||
|
|
||||||
|
buf = XLogReadBuffer(reln, action->delblk, false);
|
||||||
|
if (BufferIsValid(buf))
|
||||||
|
if (_bt_pagedel(reln, buf, NULL, true) == 0)
|
||||||
|
elog(PANIC, "btree_xlog_cleanup: _bt_pagdel failed");
|
||||||
|
}
|
||||||
}
|
}
|
||||||
incomplete_splits = NIL;
|
incomplete_actions = NIL;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool
|
bool
|
||||||
btree_safe_restartpoint(void)
|
btree_safe_restartpoint(void)
|
||||||
{
|
{
|
||||||
if (incomplete_splits)
|
if (incomplete_actions)
|
||||||
return false;
|
return false;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,7 @@
|
|||||||
* Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
|
* Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
|
||||||
* Portions Copyright (c) 1994, Regents of the University of California
|
* Portions Copyright (c) 1994, Regents of the University of California
|
||||||
*
|
*
|
||||||
* $PostgreSQL: pgsql/src/include/access/nbtree.h,v 1.105 2006/10/04 00:30:07 momjian Exp $
|
* $PostgreSQL: pgsql/src/include/access/nbtree.h,v 1.106 2006/11/01 19:43:17 tgl Exp $
|
||||||
*
|
*
|
||||||
*-------------------------------------------------------------------------
|
*-------------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
@ -163,6 +163,7 @@ typedef struct BTMetaPageData
|
|||||||
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
|
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
|
||||||
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
|
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
|
||||||
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
|
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
|
||||||
|
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
|
||||||
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
|
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
|
||||||
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
|
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
|
||||||
|
|
||||||
@ -203,8 +204,10 @@ typedef struct BTMetaPageData
|
|||||||
#define XLOG_BTREE_SPLIT_R_ROOT 0x60 /* as above, new item on right */
|
#define XLOG_BTREE_SPLIT_R_ROOT 0x60 /* as above, new item on right */
|
||||||
#define XLOG_BTREE_DELETE 0x70 /* delete leaf index tuple */
|
#define XLOG_BTREE_DELETE 0x70 /* delete leaf index tuple */
|
||||||
#define XLOG_BTREE_DELETE_PAGE 0x80 /* delete an entire page */
|
#define XLOG_BTREE_DELETE_PAGE 0x80 /* delete an entire page */
|
||||||
#define XLOG_BTREE_DELETE_PAGE_META 0x90 /* same, plus update metapage */
|
#define XLOG_BTREE_DELETE_PAGE_META 0x90 /* same, and update metapage */
|
||||||
#define XLOG_BTREE_NEWROOT 0xA0 /* new root page */
|
#define XLOG_BTREE_NEWROOT 0xA0 /* new root page */
|
||||||
|
#define XLOG_BTREE_DELETE_PAGE_HALF 0xB0 /* page deletion that makes
|
||||||
|
* parent half-dead */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* All that we need to find changed index tuple
|
* All that we need to find changed index tuple
|
||||||
@ -501,7 +504,8 @@ extern void _bt_pageinit(Page page, Size size);
|
|||||||
extern bool _bt_page_recyclable(Page page);
|
extern bool _bt_page_recyclable(Page page);
|
||||||
extern void _bt_delitems(Relation rel, Buffer buf,
|
extern void _bt_delitems(Relation rel, Buffer buf,
|
||||||
OffsetNumber *itemnos, int nitems);
|
OffsetNumber *itemnos, int nitems);
|
||||||
extern int _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full);
|
extern int _bt_pagedel(Relation rel, Buffer buf,
|
||||||
|
BTStack stack, bool vacuum_full);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* prototypes for functions in nbtsearch.c
|
* prototypes for functions in nbtsearch.c
|
||||||
|
Loading…
x
Reference in New Issue
Block a user