__cpu_simple_unlock(): add a note about memory ordering and why this is

correct, contrary to Intel's documentation.
This commit is contained in:
ad 2006-12-18 07:34:42 +00:00
parent 711e88f6f4
commit f48eb2511b

View File

@ -1,11 +1,11 @@
/* $NetBSD: lock.h,v 1.11 2005/12/28 19:09:30 perry Exp $ */
/* $NetBSD: lock.h,v 1.12 2006/12/18 07:34:42 ad Exp $ */
/*-
* Copyright (c) 2000 The NetBSD Foundation, Inc.
* Copyright (c) 2000, 2006 The NetBSD Foundation, Inc.
* All rights reserved.
*
* This code is derived from software contributed to The NetBSD Foundation
* by Jason R. Thorpe.
* by Jason R. Thorpe and Andrew Doran.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -101,6 +101,58 @@ __cpu_simple_lock_try(__cpu_simple_lock_t *lockp)
return (r);
}
/*
* Note on x86 memory ordering
*
* When releasing a lock we must ensure that no stores or loads from within
* the critical section are re-ordered by the CPU to occur outside of it:
* they must have completed and be visible to other processors once the lock
* has been released.
*
* NetBSD usually runs with the kernel mapped (via MTRR) in a WB (write
* back) memory region. In that case, memory ordering on x86 platforms
* looks like this:
*
* i386 All loads/stores occur in instruction sequence.
*
* i486 All loads/stores occur in instruction sequence. In
* Pentium exceptional circumstances, loads can be re-ordered around
* stores, but for the purposes of releasing a lock it does
* not matter. Stores may not be immediately visible to other
* processors as they can be buffered. However, since the
* stores are buffered in order the lock release will always be
* the last operation in the critical section that becomes
* visible to other CPUs.
*
* Pentium Pro The "Intel 64 and IA-32 Architectures Software Developer's
* onwards Manual" volume 3A (order number 248966) says that (1) "Reads
* can be carried out speculatively and in any order" and (2)
* "Reads can pass buffered stores, but the processor is
* self-consistent.". This would be a problem for the below,
* and would mandate a locked instruction cycle or load fence
* before releasing the simple lock.
*
* The "Intel Pentium 4 Processor Optimization" guide (order
* number 253668-022US) says: "Loads can be moved before stores
* that occurred earlier in the program if they are not
* predicted to load from the same linear address.". This is
* not a problem since the only loads that can be re-ordered
* take place once the lock has been released via a store.
*
* The above two documents seem to contradict each other,
* however with the exception of early steppings of the Pentium
* Pro, the second document is closer to the truth: a store
* will always act as a load fence for all loads that precede
* the store in instruction order.
*
* Again, note that stores can be buffered and will not always
* become immediately visible to other CPUs: they are however
* buffered in order.
*
* AMD64 Stores occur in order and are buffered. Loads can be
* reordered, however stores act as load fences, meaning that
* loads can not be reordered around stores.
*/
static __inline void
__cpu_simple_unlock(__cpu_simple_lock_t *lockp)
{