Message ID | 1277453336.22715.2154.camel@twins |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 25 Jun 2010 10:08:56 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, 2010-06-24 at 21:42 -0700, Andrew Morton wrote: > > That being said, I wonder why Herbert didn't hit this in his testing. > > I suspect that he'd enabled lockdep, which hid the bug. I haven't > > worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a > > pretty bad thing to do. > > Most weird indeed, lockdep is supposed so shout its lungs out when > someone wants to unlock a lock that isn't actually owned by him (and it > not being locked at all certainly implies you're not the owner). > > In fact, the below patch results in the below splat -- its also > something that's tested by the locking self-test: When I enabled lockdep, the bug actually went away. Is it possible that when lockdep detects this bug, it prevents mutex.count from going from 1 to 2? It could be that lockdep _did_ detect (and correct!) the bug. But because I had no usable console output at the time, I didn't see it. I did notice that the taint output was "G W". So something warned about something, but I don't know what. But that was happening with lockdep disabled. > @@ -1344,6 +1346,10 @@ SYSCALL_DEFINE0(getppid) > { > int pid; > > + mutex_lock(&foo); > + mutex_unlock(&foo); > + mutex_unlock(&foo); > + > rcu_read_lock(); > pid = task_tgid_vnr(current->real_parent); > rcu_read_unlock(); It'd be interesting to add printk("%d:%d\n", __LINE__, atomic_read(&foo.count)); after the mutex_unlock()s. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-06-25 at 01:42 -0700, Andrew Morton wrote: > On Fri, 25 Jun 2010 10:08:56 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > > On Thu, 2010-06-24 at 21:42 -0700, Andrew Morton wrote: > > > That being said, I wonder why Herbert didn't hit this in his testing. > > > I suspect that he'd enabled lockdep, which hid the bug. I haven't > > > worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a > > > pretty bad thing to do. > > > > Most weird indeed, lockdep is supposed so shout its lungs out when > > someone wants to unlock a lock that isn't actually owned by him (and it > > not being locked at all certainly implies you're not the owner). > > > > In fact, the below patch results in the below splat -- its also > > something that's tested by the locking self-test: > > When I enabled lockdep, the bug actually went away. Is it possible > that when lockdep detects this bug, it prevents mutex.count from going > from 1 to 2? Not lockdep itself but the DEBUG_MUTEXES code (forced by lockdep). The difference between the normal and the debug code is that the debug code disables all fast-path code. The x86 fast-path code does: LOCK incl &lock->count jg done: call slowpath done: Since 1++ is >0 it will complete without calling the slow-path, would do: if (__mutex_slowpath_needs_to_unlock()) /* 1 regardless of DEBUG_MUTEX */ atomic_set(&lock->count, 1); The question I guess is, do we want double unlocks to go silently unnoticed? In that case we need to touch the fastpath asm. > It could be that lockdep _did_ detect (and correct!) the bug. But > because I had no usable console output at the time, I didn't see it. > > I did notice that the taint output was "G W". So something warned > about something, but I don't know what. But that was happening with > lockdep disabled. Hrmm,. yeah without console output lockdep isn't going to help much, should we maybe use the speaker to read out the dmesg :-) > It'd be interesting to add > > printk("%d:%d\n", __LINE__, atomic_read(&foo.count)); > > after the mutex_unlock()s. 1352:1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/timer.c b/kernel/timer.c index ee3f116..0496f71 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1334,6 +1334,8 @@ SYSCALL_DEFINE0(getpid) return task_tgid_vnr(current); } +static DEFINE_MUTEX(foo); + /* * Accessing ->real_parent is not SMP-safe, it could * change from under us. However, we can use a stale @@ -1344,6 +1346,10 @@ SYSCALL_DEFINE0(getppid) { int pid; + mutex_lock(&foo); + mutex_unlock(&foo); + mutex_unlock(&foo); + rcu_read_lock(); pid = task_tgid_vnr(current->real_parent); rcu_read_unlock();