__cpu_simple_unlock(): add a note about memory ordering and why this is

correct, contrary to Intel's documentation.
2006-12-18 07:34:42 +00:00 · 2006-12-18 07:34:42 +00:00 · f48eb2511b
commit f48eb2511b
parent 711e88f6f4
1 changed files with 55 additions and 3 deletions
--- a/sys/arch/x86/include/lock.h
+++ b/sys/arch/x86/include/lock.h
@ -1,11 +1,11 @@
-/*	$NetBSD: lock.h,v 1.11 2005/12/28 19:09:30 perry Exp $	*/
+/*	$NetBSD: lock.h,v 1.12 2006/12/18 07:34:42 ad Exp $	*/

 /*-
- * Copyright (c) 2000 The NetBSD Foundation, Inc.
+ * Copyright (c) 2000, 2006 The NetBSD Foundation, Inc.
 * All rights reserved.
 *
 * This code is derived from software contributed to The NetBSD Foundation
- * by Jason R. Thorpe.
+ * by Jason R. Thorpe and Andrew Doran.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
@ -101,6 +101,58 @@ __cpu_simple_lock_try(__cpu_simple_lock_t *lockp)
 	return (r);
 }

+/*
+ * Note on x86 memory ordering
+ *
+ * When releasing a lock we must ensure that no stores or loads from within
+ * the critical section are re-ordered by the CPU to occur outside of it:
+ * they must have completed and be visible to other processors once the lock
+ * has been released.
+ *
+ * NetBSD usually runs with the kernel mapped (via MTRR) in a WB (write
+ * back) memory region.  In that case, memory ordering on x86 platforms
+ * looks like this:
+ *
+ * i386		All loads/stores occur in instruction sequence.
+ *
+ * i486		All loads/stores occur in instruction sequence.  In
+ * Pentium	exceptional circumstances, loads can be re-ordered around
+ *		stores, but for the purposes of releasing a lock it does
+ *		not matter.  Stores may not be immediately visible to other
+ *		processors as they can be buffered.  However, since the
+ *		stores are buffered in order the lock release will always be
+ *		the last operation in the critical section that becomes
+ *		visible to other CPUs.
+ *
+ * Pentium Pro	The "Intel 64 and IA-32 Architectures Software Developer's
+ * onwards	Manual" volume 3A (order number 248966) says that (1) "Reads
+ *		can be carried out speculatively and in any order" and (2)
+ *		"Reads can pass buffered stores, but the processor is
+ *		self-consistent.".  This would be a problem for the below,
+ *		and would mandate a locked instruction cycle or load fence
+ *		before releasing the simple lock.
+ *
+ *		The "Intel Pentium 4 Processor Optimization" guide (order
+ *		number 253668-022US) says: "Loads can be moved before stores
+ *		that occurred earlier in the program if they are not
+ *		predicted to load from the same linear address.".  This is
+ *		not a problem since the only loads that can be re-ordered
+ *		take place once the lock has been released via a store.
+ *
+ *		The above two documents seem to contradict each other,
+ *		however with the exception of early steppings of the Pentium
+ *		Pro, the second document is closer to the truth: a store
+ *		will always act as a load fence for all loads that precede
+ *		the store in instruction order.
+ *
+ *		Again, note that stores can be buffered and will not always
+ *		become immediately visible to other CPUs: they are however
+ *		buffered in order.
+ *
+ * AMD64	Stores occur in order and are buffered.  Loads can be
+ *		reordered, however stores act as load fences, meaning that
+ *		loads can not be reordered around stores.
+ */
 static __inline void
 __cpu_simple_unlock(__cpu_simple_lock_t *lockp)
 {