diff mbox

[v2] Documentation: rw_lock lessons learned

Message ID 4AFBEC44.9030409@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

William Allen Simpson Nov. 12, 2009, 11:06 a.m. UTC
In recent weeks, two different network projects erroneously
strayed down the rw_lock path.  Update the Documentation
based upon comments by Eric Dumazet and Paul E. McKenney in
those threads.

Merged with editorial changes by Stephen Hemminger.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
  Documentation/spinlocks.txt |  186 ++++++++++++++++++++-----------------------
  1 files changed, 86 insertions(+), 100 deletions(-)

Comments

Linus Torvalds Nov. 12, 2009, 3:40 p.m. UTC | #1
On Thu, 12 Nov 2009, William Allen Simpson wrote:
>
> In recent weeks, two different network projects erroneously
> strayed down the rw_lock path.  Update the Documentation
> based upon comments by Eric Dumazet and Paul E. McKenney in
> those threads.

This still retains some pretty old and bogus language. For example, that 
file still talks about spinlocks being "faster than a global interrupt 
lock", which is kind of amusing - because we've not done that global 
interrupt lock thing for the last ten years or so. 

So I certainly agree with discouraging rwlocks - I don't think we've ever 
really found a situation where they are useful except for some really 
special cases - but I suspect the thing needs a bigger overhaul.

I also suspect somebody should actually take a look at our _users_ of 
rwlocks. We have a few fairly central ones like 'tasklist_lock', and it 
may be an example of a _good_ case of rwlocks, but for a very non-obvious 
reason.

In the case of 'tasklist_lock', the magic subtle reason that makes it a 
good idea is that that lock is commonly taken for reading in _interrupts_. 
And the way rwlocks are defined, that means that you can take it for 
reading without doing the *_irq() or *_irqsave() versions, because rwlocks 
are not fair, so an active reader will never block a new reader even if 
there is a blocked writer pending.

So for tasklist_lock, raw rwlocks are still slower than raw spinlocks, but 
because of the rwlock semantics the common case doesn't need to disable 
and enable interrupts, so for the common case the comparison is not "raw 
rwlock vs raw spinlock", but "raw rwlock vs interrupt-disabling spinlock", 
and then it turns out rwlocks tend to win again.

(Of course, lock_write() needs to disable interrupts for tasklist_lock, 
but that tends to be the uncommon case).

Ho humm..

		Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Nov. 12, 2009, 5:04 p.m. UTC | #2
On Thu, 12 Nov 2009 06:06:44 -0500
William Allen Simpson <william.allen.simpson@gmail.com> wrote:

> +   NOTE! We are working hard to remove reader-writer spinlocks from the
> +   network stack, so please don't add a new one.  (Instead, see
> +   Documentation/RCU/rcu.txt for complete information.)

It is not just networking, so don't single that out.

Also, should mention -RT kernel and locking here (Appendix?)
Stephen Clark Nov. 12, 2009, 7:13 p.m. UTC | #3
William Allen Simpson wrote:
> In recent weeks, two different network projects erroneously
> strayed down the rw_lock path.  Update the Documentation
> based upon comments by Eric Dumazet and Paul E. McKenney in
> those threads.
> 
> Merged with editorial changes by Stephen Hemminger.
> 
> Signed-off-by: William.Allen.Simpson@gmail.com
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  Documentation/spinlocks.txt |  186 
> ++++++++++++++++++++-----------------------
>  1 files changed, 86 insertions(+), 100 deletions(-)
> 

How up to date is this doc?

http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/index.html

Should it be in the Documentation directory?
stephen hemminger Nov. 12, 2009, 11 p.m. UTC | #4
On Thu, 12 Nov 2009 14:13:03 -0500
Stephen Clark <sclark46@earthlink.net> wrote:

> William Allen Simpson wrote:
> > In recent weeks, two different network projects erroneously
> > strayed down the rw_lock path.  Update the Documentation
> > based upon comments by Eric Dumazet and Paul E. McKenney in
> > those threads.
> > 
> > Merged with editorial changes by Stephen Hemminger.
> > 
> > Signed-off-by: William.Allen.Simpson@gmail.com
> > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  Documentation/spinlocks.txt |  186 
> > ++++++++++++++++++++-----------------------
> >  1 files changed, 86 insertions(+), 100 deletions(-)
> > 
> 
> How up to date is this doc?
> 
> http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/index.html
> 

Out of date.
1. Missing mutex's which have largely replaced semaphores.

2. Missing change to lock initialization in later kernels.

3. Missing description of lock dependency checker which should be in same guide.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Richter Nov. 13, 2009, 8:59 a.m. UTC | #5
Stephen Hemminger wrote:
> On Thu, 12 Nov 2009 14:13:03 -0500
> Stephen Clark <sclark46@earthlink.net> wrote:
>> How up to date is this doc?
>> 
>> http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/index.html
>> 
> 
> Out of date.
> 1. Missing mutex's which have largely replaced semaphores.
> 
> 2. Missing change to lock initialization in later kernels.
> 
> 3. Missing description of lock dependency checker which should be in same guide.

4. The section on atomic reference counting should refer to <linux/kref.h>.
William Allen Simpson Nov. 13, 2009, 4:15 p.m. UTC | #6
Stefan Richter wrote:
> Stephen Hemminger wrote:
>> On Thu, 12 Nov 2009 14:13:03 -0500
>> Stephen Clark <sclark46@earthlink.net> wrote:
>>> How up to date is this doc?
>>>
>>> http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/index.html
>>>
>> Out of date.
>> 1. Missing mutex's which have largely replaced semaphores.
>>
>> 2. Missing change to lock initialization in later kernels.
>>
>> 3. Missing description of lock dependency checker which should be in same guide.
> 
> 4. The section on atomic reference counting should refer to <linux/kref.h>.

I'd also read that, and that's where I got some of my wrong thinking.  But
that does point to Documentation/spin_locks.txt, which I took to be
authoritative (and followed).  That's the reason spin_locks.txt should be
updated, as others are having the same problems....

Anybody have answers/updates to Linus's concerns about "pretty old and
bogus language"?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Allen Simpson Dec. 11, 2009, 5:01 p.m. UTC | #7
William Allen Simpson wrote:
> In recent weeks, two different network projects erroneously
> strayed down the rw_lock path.  Update the Documentation
> based upon comments by Eric Dumazet and Paul E. McKenney in
> those threads.
> 
> Merged with editorial changes by Stephen Hemminger.
> 
> Signed-off-by: William.Allen.Simpson@gmail.com
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
A month ago, I'd taken the final line "Ho humm.." of Linus'
response to mean he wasn't interested.  But at the local
discussion yesterday, I'm told that's just a typical Linusism.

The thread diverged into discussion of another document entirely.

I'm not the person to update this document with any of the other
information about global locks and tasklists and such.  But surely
somebody else could handle that in another patch.

Anybody have answers/updates to Linus's concerns about "pretty old
and bogus language"?  Would folks be interested in the update?
Does anybody know which list(s) would be better for discussion?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Dec. 11, 2009, 9:07 p.m. UTC | #8
William Allen Simpson wrote, On 12/11/2009 06:01 PM:

> William Allen Simpson wrote:
>> In recent weeks, two different network projects erroneously
>> strayed down the rw_lock path.  Update the Documentation
>> based upon comments by Eric Dumazet and Paul E. McKenney in
>> those threads.
>>
>> Merged with editorial changes by Stephen Hemminger.
>>
>> Signed-off-by: William.Allen.Simpson@gmail.com
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>
> A month ago, I'd taken the final line "Ho humm.." of Linus'
> response to mean he wasn't interested.  But at the local
> discussion yesterday, I'm told that's just a typical Linusism.

Why would he write 6 paragraphs if he wasn't interested?

> 
> The thread diverged into discussion of another document entirely.
> 
> I'm not the person to update this document with any of the other
> information about global locks and tasklists and such.  But surely
> somebody else could handle that in another patch.
> 
> Anybody have answers/updates to Linus's concerns about "pretty old
> and bogus language"?  Would folks be interested in the update?
> Does anybody know which list(s) would be better for discussion?

I guess, you could literally start with removing this "global
interrupt lock", adding "the example of a _good_ case of rwlocks",
plus Stephen's "it is not just networking" fix in v3.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Allen Simpson Dec. 12, 2009, 10:36 a.m. UTC | #9
Jarek Poplawski wrote:
> William Allen Simpson wrote, On 12/11/2009 06:01 PM:
>> A month ago, I'd taken the final line "Ho humm.." of Linus'
>> response to mean he wasn't interested.  But at the local
>> discussion yesterday, I'm told that's just a typical Linusism.
> 
> Why would he write 6 paragraphs if he wasn't interested?
> 
Good point.  Since I've only met him a couple of times, roughly a
decade or so ago, it wasn't obvious to me that it wasn't just a rant.


>> The thread diverged into discussion of another document entirely.
>>
>> I'm not the person to update this document with any of the other
>> information about global locks and tasklists and such.  But surely
>> somebody else could handle that in another patch.
>>
>> Anybody have answers/updates to Linus's concerns about "pretty old
>> and bogus language"?  Would folks be interested in the update?
>> Does anybody know which list(s) would be better for discussion?
> 
> I guess, you could literally start with removing this "global
> interrupt lock", adding "the example of a _good_ case of rwlocks",
> plus Stephen's "it is not just networking" fix in v3.
> 
As I mentioned, I'm not the person to do either of the former -- I'm
simply not conversant with the details.  If anybody has more specific
information, I'd be happy to edit it together with mine.  Or it could
be another patch entirely.

I'll do the latter later today.  Thanks for your interest.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt
index 619699d..54762c2 100644
--- a/Documentation/spinlocks.txt
+++ b/Documentation/spinlocks.txt
@@ -1,73 +1,8 @@ 
-SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED defeat lockdep state tracking and
-are hence deprecated.
+Lesson 1: Spin locks
 
-Please use DEFINE_SPINLOCK()/DEFINE_RWLOCK() or
-__SPIN_LOCK_UNLOCKED()/__RW_LOCK_UNLOCKED() as appropriate for static
-initialization.
-
-Most of the time, you can simply turn:
-
-	static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
-
-into:
-
-	static DEFINE_SPINLOCK(xxx_lock);
-
-Static structure member variables go from:
-
-	struct foo bar {
-		.lock	=	SPIN_LOCK_UNLOCKED;
-	};
-
-to:
-
-	struct foo bar {
-		.lock	=	__SPIN_LOCK_UNLOCKED(bar.lock);
-	};
-
-Declaration of static rw_locks undergo a similar transformation.
-
-Dynamic initialization, when necessary, may be performed as
-demonstrated below.
-
-   spinlock_t xxx_lock;
-   rwlock_t xxx_rw_lock;
-
-   static int __init xxx_init(void)
-   {
-   	spin_lock_init(&xxx_lock);
-	rwlock_init(&xxx_rw_lock);
-	...
-   }
-
-   module_init(xxx_init);
-
-The following discussion is still valid, however, with the dynamic
-initialization of spinlocks or with DEFINE_SPINLOCK, etc., used
-instead of SPIN_LOCK_UNLOCKED.
-
------------------------
-
-On Fri, 2 Jan 1998, Doug Ledford wrote:
-> 
-> I'm working on making the aic7xxx driver more SMP friendly (as well as
-> importing the latest FreeBSD sequencer code to have 7895 support) and wanted
-> to get some info from you.  The goal here is to make the various routines
-> SMP safe as well as UP safe during interrupts and other manipulating
-> routines.  So far, I've added a spin_lock variable to things like my queue
-> structs.  Now, from what I recall, there are some spin lock functions I can
-> use to lock these spin locks from other use as opposed to a (nasty)
-> save_flags(); cli(); stuff; restore_flags(); construct.  Where do I find
-> these routines and go about making use of them?  Do they only lock on a
-> per-processor basis or can they also lock say an interrupt routine from
-> mucking with a queue if the queue routine was manipulating it when the
-> interrupt occurred, or should I still use a cli(); based construct on that
-> one?
-
-See <asm/spinlock.h>. The basic version is:
-
-   spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+The most basic primitive for locking is spinlock.
 
+static DEFINE_SPINLOCK(xxx_lock);
 
 	unsigned long flags;
 
@@ -75,13 +10,11 @@  See <asm/spinlock.h>. The basic version is:
 	... critical section here ..
 	spin_unlock_irqrestore(&xxx_lock, flags);
 
-and the above is always safe. It will disable interrupts _locally_, but the
+The above is always safe. It will disable interrupts _locally_, but the
 spinlock itself will guarantee the global lock, so it will guarantee that
 there is only one thread-of-control within the region(s) protected by that
-lock. 
-
-Note that it works well even under UP - the above sequence under UP
-essentially is just the same as doing a
+lock. This works well even under UP. The above sequence under UP
+essentially is just the same as doing
 
 	unsigned long flags;
 
@@ -91,15 +24,13 @@  essentially is just the same as doing a
 
 so the code does _not_ need to worry about UP vs SMP issues: the spinlocks
 work correctly under both (and spinlocks are actually more efficient on
-architectures that allow doing the "save_flags + cli" in one go because I
-don't export that interface normally).
+architectures that allow doing the "save_flags + cli" in one operation).
+
+   NOTE! Implications of spin_locks for memory are further described in:
 
-NOTE NOTE NOTE! The reason the spinlock is so much faster than a global
-interrupt lock under SMP is exactly because it disables interrupts only on
-the local CPU. The spin-lock is safe only when you _also_ use the lock
-itself to do locking across CPU's, which implies that EVERYTHING that
-touches a shared variable has to agree about the spinlock they want to
-use.
+     Documentation/memory-barriers.txt
+       (5) LOCK operations.
+       (6) UNLOCK operations.
 
 The above is usually pretty simple (you usually need and want only one
 spinlock for most things - using more than one spinlock can make things a
@@ -120,20 +51,26 @@  and another sequence that does
 then they are NOT mutually exclusive, and the critical regions can happen
 at the same time on two different CPU's. That's fine per se, but the
 critical regions had better be critical for different things (ie they
-can't stomp on each other). 
+can't stomp on each other).
 
 The above is a problem mainly if you end up mixing code - for example the
 routines in ll_rw_block() tend to use cli/sti to protect the atomicity of
 their actions, and if a driver uses spinlocks instead then you should
-think about issues like the above..
+think about issues like the above.
 
 This is really the only really hard part about spinlocks: once you start
 using spinlocks they tend to expand to areas you might not have noticed
 before, because you have to make sure the spinlocks correctly protect the
 shared data structures _everywhere_ they are used. The spinlocks are most
-easily added to places that are completely independent of other code (ie
-internal driver data structures that nobody else ever touches, for
-example). 
+easily added to places that are completely independent of other code (for
+example, internal driver data structures that nobody else ever touches).
+
+   NOTE! The reason the spinlock is so much faster than a global
+   interrupt lock under SMP is exactly because it disables interrupts
+   only on the local CPU.  The spin-lock is safe only when you _also_
+   use the lock itself to do locking across CPU's, which implies that
+   EVERYTHING that touches a shared variable has to agree about the
+   spinlock they want to use.
 
 ----
 
@@ -141,13 +78,17 @@  Lesson 2: reader-writer spinlocks.
 
 If your data accesses have a very natural pattern where you usually tend
 to mostly read from the shared variables, the reader-writer locks
-(rw_lock) versions of the spinlocks are often nicer. They allow multiple
+(rw_lock) versions of the spinlocks are sometimes useful. They allow multiple
 readers to be in the same critical region at once, but if somebody wants
-to change the variables it has to get an exclusive write lock. The
-routines look the same as above:
+to change the variables it has to get an exclusive write lock.
 
-   rwlock_t xxx_lock = RW_LOCK_UNLOCKED;
+   NOTE! reader-writer locks require more atomic memory operations than
+   simple spinlocks.  Unless the reader critical section is long, you
+   are better off just using spinlocks.
+
+The routines look the same as above:
 
+   rwlock_t xxx_lock = RW_LOCK_UNLOCKED;
 
 	unsigned long flags;
 
@@ -159,18 +100,21 @@  routines look the same as above:
 	.. read and write exclusive access to the info ...
 	write_unlock_irqrestore(&xxx_lock, flags);
 
-The above kind of lock is useful for complex data structures like linked
-lists etc, especially when you know that most of the work is to just
-traverse the list searching for entries without changing the list itself,
-for example. Then you can use the read lock for that kind of list
-traversal, which allows many concurrent readers. Anything that _changes_
-the list will have to get the write lock. 
+The above kind of lock may be useful for complex data structures like
+linked lists, especially searching for entries without changing the list
+itself.  The read lock allows many concurrent readers.  Anything that
+_changes_ the list will have to get the write lock.
+
+   NOTE! RCU is better for list traversal, but requires careful
+   attention to design detail (see Documentation/RCU/listRCU.txt).
 
-Note: you cannot "upgrade" a read-lock to a write-lock, so if you at _any_
+Also, you cannot "upgrade" a read-lock to a write-lock, so if you at _any_
 time need to do any changes (even if you don't do it every time), you have
-to get the write-lock at the very beginning. I could fairly easily add a
-primitive to create a "upgradeable" read-lock, but it hasn't been an issue
-yet. Tell me if you'd want one. 
+to get the write-lock at the very beginning.
+
+   NOTE! We are working hard to remove reader-writer spinlocks from the
+   network stack, so please don't add a new one.  (Instead, see
+   Documentation/RCU/rcu.txt for complete information.)
 
 ----
 
@@ -233,4 +177,46 @@  indeed), while write-locks need to protect themselves against interrupts.
 
 		Linus
 
+----
+
+Reference information:
+
+For dynamic initialization, use spin_lock_init() or rwlock_init() as
+appropriate:
+
+   spinlock_t xxx_lock;
+   rwlock_t xxx_rw_lock;
+
+   static int __init xxx_init(void)
+   {
+   	spin_lock_init(&xxx_lock);
+	rwlock_init(&xxx_rw_lock);
+	...
+   }
+
+   module_init(xxx_init);
 
+For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or
+__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.
+
+SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED are deprecated.  These interfere
+with lockdep state tracking.
+
+Most of the time, you can simply turn:
+	static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+into:
+	static DEFINE_SPINLOCK(xxx_lock);
+
+Static structure member variables go from:
+
+	struct foo bar {
+		.lock	=	SPIN_LOCK_UNLOCKED;
+	};
+
+to:
+
+	struct foo bar {
+		.lock	=	__SPIN_LOCK_UNLOCKED(bar.lock);
+	};
+
+Declaration of static rw_locks undergo a similar transformation.