Message ID | 20140319170829.GD8557@laptop.programming.kicks-ass.net (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote: > On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: > > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that > > > reverting the commit solved the problem. > > > > Joy,.. let me look at that with ppc in mind. errr... just sat down to check email this morning. CC'ing Paul as for any subtle barrier issues.
> > Joy,.. let me look at that with ppc in mind. > > OK; so while pretty much all the comments from that patch are utter > nonsense (what was I thinking), I cannot actually find a real bug. > > But could you try the below which replaces a control dependency with a > full barrier. The control flow is plenty convoluted that I think the > control barrier isn't actually valid anymore and that might indeed > explain the fail. > Unfortunately the patch didnt help. Still seeing tasks stuck # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex 14680 pts/0 root java - 0 futex_wait_queue_me 14797 pts/0 root java - 0 futex_wait_queue_me # :> /var/log/messages # echo t > /proc/sysrq-trigger # grep futex_wait_queue_me /var/log/messages | wc -l 334 # [ 6904.211478] Call Trace: [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable) [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330 [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0 [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0 [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0 [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80 [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0 [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c [ 6904.238879] java S 00003fff825f6044 0 14682 14076 0x00000080 Is there any other information that I provide that can help?
On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the below which replaces a control dependency with a > > full barrier. The control flow is plenty convoluted that I think the > > control barrier isn't actually valid anymore and that might indeed > > explain the fail. > > > > Unfortunately the patch didnt help. Still seeing tasks stuck > > # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex > 14680 pts/0 root java - 0 futex_wait_queue_me > 14797 pts/0 root java - 0 futex_wait_queue_me > # :> /var/log/messages > # echo t > /proc/sysrq-trigger > # grep futex_wait_queue_me /var/log/messages | wc -l > 334 > # > > [ 6904.211478] Call Trace: > [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable) > [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330 > [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0 > [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0 > [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0 > [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80 > [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0 > [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c > [ 6904.238879] java S 00003fff825f6044 0 14682 14076 0x00000080 > > Is there any other information that I provide that can help? This problem suggests that we missed a wakeup for a task that was adding itself to the queue in a wait path. And the only place that can happen is with the hb spinlock check for any pending waiters. Just in case we missed some assumption about checking the hash bucket spinlock as a way of detecting any waiters (powerpc?), could you revert this commit and try the original atomic operations variant: https://lkml.org/lkml/2013/12/19/630
On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the below which replaces a control dependency with a > > full barrier. The control flow is plenty convoluted that I think the > > control barrier isn't actually valid anymore and that might indeed > > explain the fail. > > > > Unfortunately the patch didnt help. Still seeing tasks stuck Aww bugger. I'll be traveling tomorrow and today is wasted getting ready. So unless Davidlohr has anything we'll need to scrap this change.
> This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Just in case we > missed some assumption about checking the hash bucket spinlock as a way > of detecting any waiters (powerpc?), could you revert this commit and > try the original atomic operations variant: > > https://lkml.org/lkml/2013/12/19/630 I think the above url and the commit id that I reverted i.e git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999 are the same. Or am I missing something?
On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote: > On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > > Joy,.. let me look at that with ppc in mind. > > > > > > OK; so while pretty much all the comments from that patch are utter > > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > > > But could you try the below which replaces a control dependency with a > > > full barrier. The control flow is plenty convoluted that I think the > > > control barrier isn't actually valid anymore and that might indeed > > > explain the fail. > > > > > > > Unfortunately the patch didnt help. Still seeing tasks stuck > > > > # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex > > 14680 pts/0 root java - 0 futex_wait_queue_me > > 14797 pts/0 root java - 0 futex_wait_queue_me > > # :> /var/log/messages > > # echo t > /proc/sysrq-trigger > > # grep futex_wait_queue_me /var/log/messages | wc -l > > 334 > > # > > > > [ 6904.211478] Call Trace: > > [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable) > > [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330 > > [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0 > > [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0 > > [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0 > > [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80 > > [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0 > > [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c > > [ 6904.238879] java S 00003fff825f6044 0 14682 14076 0x00000080 > > > > Is there any other information that I provide that can help? > > This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Just in case we > missed some assumption about checking the hash bucket spinlock as a way > of detecting any waiters (powerpc?), could you revert this commit and > try the original atomic operations variant: > > https://lkml.org/lkml/2013/12/19/630 hmmm looking at ppc spinlock code, it seems that it doesn't have ticket spinlocks -- in fact Torsten Duwe has been trying to get them upstream very recently. Since we rely on the counter for detecting waiters, this might explain the issue. Could someone confirm this spinlock implementation difference?
On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote: > hmmm looking at ppc spinlock code, it seems that it doesn't have ticket > spinlocks -- in fact Torsten Duwe has been trying to get them upstream > very recently. Since we rely on the counter for detecting waiters, this > might explain the issue. Could someone confirm this spinlock > implementation difference? Indeed. I haven't merged ticket locks because they break lockref :-( We have a problem here because we need to store the lock holder so we can yield to the lock holder partition on contention and we are running out of space in the spinlock. The lock holder doesn't have to be atomic, so in theory we could have the tickets and the lockref in the same 64-bit and the holder separately but the way the layers are stacked at the moment that's not workable, at least not without duplicating the whole lockref implementation and breaking the spinlock in two, a "base" lock without older and the separate variant with holder field. A mess... I want to try sorting that out at some stage but haven't had a chance yet. Cheers, Ben.
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -119,42 +119,32 @@ * sys_futex(WAIT, futex, val); * futex_wait(futex, val); * - * waiters++; - * mb(); (A) <-- paired with -. - * | - * lock(hash_bucket(futex)); | - * | - * uval = *futex; | - * | *futex = newval; - * | sys_futex(WAKE, futex); - * | futex_wake(futex); - * | - * `-------> mb(); (B) - * if (uval == val) + * + * lock(hash_bucket(futex)); (A) + * + * uval = *futex; + * *futex = newval; + * sys_futex(WAKE, futex); + * futex_wake(futex); + * + * if (uval == val) (B) smp_mb(); (D) * queue(); - * unlock(hash_bucket(futex)); - * schedule(); if (waiters) + * unlock(hash_bucket(futex)); (C) + * schedule(); if (spin_is_locked(&hb_lock) || + * (smp_rmb(), !plist_empty))) (E) * lock(hash_bucket(futex)); * wake_waiters(futex); * unlock(hash_bucket(futex)); * - * Where (A) orders the waiters increment and the futex value read -- this - * is guaranteed by the head counter in the hb spinlock; and where (B) - * orders the write to futex and the waiters read -- this is done by the - * barriers in get_futex_key_refs(), through either ihold or atomic_inc, - * depending on the futex type. - * - * This yields the following case (where X:=waiters, Y:=futex): - * - * X = Y = 0 - * - * w[X]=1 w[Y]=1 - * MB MB - * r[Y]=y r[X]=x - * - * Which guarantees that x==0 && y==0 is impossible; which translates back into - * the guarantee that we cannot both miss the futex variable change and the - * enqueue. + * + * Because of the acquire (A) and release (C) the futex value load and the + * plist_add are guaranteed to be inside the locked region. Furthermore, the + * control dependency (B) ensures the futex load happens before the plist_add(). + * + * On the wakeup side, the full barrier (D) separates the futex value write + * from the hb_lock load, and matches with the control dependency. The rmb (E) + * separates the spin_is_locked() read and the plist_head_empty() read, such + * that ..., matches with the release barrier (C). */ #ifndef CONFIG_HAVE_FUTEX_CMPXCHG @@ -250,7 +240,7 @@ static inline void futex_get_mm(union fu /* * Ensure futex_get_mm() implies a full barrier such that * get_futex_key() implies a full barrier. This is relied upon - * as full barrier (B), see the ordering comment above. + * as full barrier (D), see the ordering comment above. */ smp_mb__after_atomic(); } @@ -308,10 +298,10 @@ static void get_futex_key_refs(union fut switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { case FUT_OFF_INODE: - ihold(key->shared.inode); /* implies MB (B) */ + ihold(key->shared.inode); /* implies MB (D) */ break; case FUT_OFF_MMSHARED: - futex_get_mm(key); /* implies MB (B) */ + futex_get_mm(key); /* implies MB (D) */ break; } } @@ -385,7 +375,7 @@ get_futex_key(u32 __user *uaddr, int fsh if (!fshared) { key->private.mm = mm; key->private.address = address; - get_futex_key_refs(key); /* implies MB (B) */ + get_futex_key_refs(key); /* implies MB (D) */ return 0; } @@ -492,7 +482,7 @@ get_futex_key(u32 __user *uaddr, int fsh key->shared.pgoff = basepage_index(page); } - get_futex_key_refs(key); /* implies MB (B) */ + get_futex_key_refs(key); /* implies MB (D) */ out: unlock_page(page_head); @@ -1604,7 +1594,7 @@ static inline struct futex_hash_bucket * hb = hash_futex(&q->key); q->lock_ptr = &hb->lock; - spin_lock(&hb->lock); /* implies MB (A) */ + spin_lock(&hb->lock); return hb; } @@ -1995,6 +1985,8 @@ static int futex_wait_setup(u32 __user * goto retry; } + smp_mb(); + if (uval != val) { queue_unlock(*hb); ret = -EWOULDBLOCK;