Message ID | 1367370761.11020.22.camel@edumazet-glaptop (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > instructions to manipulate them. If the 64bit word includes any > atomic_t or spinlock_t, we can lose critical concurrent changes. > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > gc_maybe_cycle/lock share the same 64bit word. > > This leads to fatal deadlock, as one/several cpus spin forever > on a spinlock that will never be available again. > > Reported-by: Ambrose Feinstein <ambrose@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > --- > > Could ppc64 experts confirm using byte is safe, or should we really add > a 32bit hole after the spinlock ? If so, I wonder how many other places > need a change... Wow, nice one ! I'm not even completely certain bytes are safe to be honest, though probably more than bitfields. I'll poke our compiler people. The worry is of course how many more of these do we potentially have ? We might be able to automate finding these issues with sparse, I suppose. Also I'd be surprised if ppc64 is the only one with that problem... what about sparc64 and arm64 ? Cheers, Ben. > include/net/af_unix.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index a8836e8..4520a23f 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -57,8 +57,8 @@ struct unix_sock { > struct list_head link; > atomic_long_t inflight; > spinlock_t lock; > - unsigned int gc_candidate : 1; > - unsigned int gc_maybe_cycle : 1; > + unsigned char gc_candidate; > + unsigned char gc_maybe_cycle; > unsigned char recursion_level; > struct socket_wq peer_wq; > }; >
Hi Eric, > From: Eric Dumazet <edumazet@google.com> > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > instructions to manipulate them. If the 64bit word includes any > atomic_t or spinlock_t, we can lose critical concurrent changes. > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > gc_maybe_cycle/lock share the same 64bit word. > > This leads to fatal deadlock, as one/several cpus spin forever > on a spinlock that will never be available again. I just spoke to Alan Modra and he suspects this is a compiler bug. Can you give us your compiler version info? Anton > Reported-by: Ambrose Feinstein <ambrose@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > --- > > Could ppc64 experts confirm using byte is safe, or should we really > add a 32bit hole after the spinlock ? If so, I wonder how many other > places need a change... > > include/net/af_unix.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index a8836e8..4520a23f 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -57,8 +57,8 @@ struct unix_sock { > struct list_head link; > atomic_long_t inflight; > spinlock_t lock; > - unsigned int gc_candidate : 1; > - unsigned int gc_maybe_cycle : 1; > + unsigned char gc_candidate; > + unsigned char gc_maybe_cycle; > unsigned char recursion_level; > struct socket_wq peer_wq; > }; > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev >
On Wed, 2013-05-01 at 11:51 +1000, Anton Blanchard wrote: > Hi Eric, > > > From: Eric Dumazet <edumazet@google.com> > > > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > > instructions to manipulate them. If the 64bit word includes any > > atomic_t or spinlock_t, we can lose critical concurrent changes. > > > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > > gc_maybe_cycle/lock share the same 64bit word. > > > > This leads to fatal deadlock, as one/several cpus spin forever > > on a spinlock that will never be available again. > > I just spoke to Alan Modra and he suspects this is a compiler > bug. Can you give us your compiler version info? $ gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -v Using built-in specs. COLLECT_GCC=gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc COLLECT_LTO_WRAPPER=/usr/local/google/home/edumazet/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/4.6.3/lto-wrapper Target: powerpc64-linux Configured with: /home/tony/buildall/src/gcc/configure --target=powerpc64-linux --host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-targets=all --prefix=/opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/ --enable-languages=c --with-newlib --without-headers --enable-sjlj-exceptions --with-system-libunwind --disable-nls --disable-threads --disable-shared --disable-libmudflap --disable-libssp --disable-libgomp --disable-decimal-float --enable-checking=release --with-mpfr=/home/tony/buildall/src/sys-x86_64 --with-gmp=/home/tony/buildall/src/sys-x86_64 --disable-bootstrap --disable-libquadmath Thread model: single gcc version 4.6.3 (GCC) $ cat try.c ; gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -O2 -S try.c ; cat try.s struct s { unsigned int lock; unsigned int f1 : 1; unsigned int f2 : 1; void *ptr; } *p ; showbug() { p->lock++; p->f1 = 1; } .file "try.c" .section ".toc","aw" .section ".text" .section ".toc","aw" .LC0: .tc p[TC],p .section ".text" .align 2 .globl showbug .section ".opd","aw" .align 3 showbug: .quad .L.showbug,.TOC.@tocbase,0 .previous .type showbug, @function .L.showbug: addis 9,2,.LC0@toc@ha ld 9,.LC0@toc@l(9) ld 9,0(9) lwz 11,0(9) addi 0,11,1 stw 0,0(9) li 11,1 ld 0,0(9) rldimi 0,11,31,32 std 0,0(9) blr .long 0 .byte 0,0,0,0,0,0,0,0 .size showbug,.-.L.showbug .comm p,8,8 .ident "GCC: (GNU) 4.6.3" You can see "ld 0,0(9)" is used : its a 64 bit load.
On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > li 11,1 > ld 0,0(9) > rldimi 0,11,31,32 > std 0,0(9) > blr > .ident "GCC: (GNU) 4.6.3" > > You can see "ld 0,0(9)" is used : its a 64 bit load. Yup. This is not a powerpc64 specific problem. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Fixed in 4.8.0 and 4.7.3.
On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > li 11,1 > > ld 0,0(9) > > rldimi 0,11,31,32 > > std 0,0(9) > > blr > > .ident "GCC: (GNU) 4.6.3" > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > Yup. This is not a powerpc64 specific problem. See > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > Fixed in 4.8.0 and 4.7.3. Ah thanks. This seems a pretty serious issue, is it documented somewhere that ppc64, sparc64 and others need such compiler version ? These kind of errors are pretty hard to find, its a pity to spend time on them.
From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Wed, 01 May 2013 11:39:53 +1000 > I'm not even completely certain bytes are safe to be honest, though > probably more than bitfields. I'll poke our compiler people. Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized accesses are not atomic, and therefore use racey read-modify-write sequences.
On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Wed, 01 May 2013 11:39:53 +1000 > > > I'm not even completely certain bytes are safe to be honest, though > > probably more than bitfields. I'll poke our compiler people. > > Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized > accesses are not atomic, and therefore use racey read-modify-write > sequences. In this case it depends whether the compiler will "chose" the smaller (32-bit) size which hopefully won't overlap with the atomic/lock provided the latter is aligned... lots of if's here, makes me nervous... At least the bytes seem to fix it for ppc64 so far... It would make feel generally better if we could get gcc to guarantee us to always use the smallest access size that encompass the whole bitfield (or at least not go larger than int when the bitfield is defined as unsigned int). This would take care of all the cases we haven't spotted yet (hopefully). For all intend and purposes those two fields are bits of an unsigned int, why the heck would the compiler use a larger access size anyway ? I seem to recall that we have other places where such an assumption is made that ints are accessed atomically, and Linus stating in the past that a compiler doing anything else was not worth bothering with. I don't see why bitfields of such int would be an exception to that rule (though again, this is probably not a rule stated in the standard ... oh well). /me goes have a glass of wine and not think about this until tomorrow. Cheers, Ben.
On Wed, 2013-05-01 at 11:39 +1000, Benjamin Herrenschmidt wrote: > On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > > instructions to manipulate them. If the 64bit word includes any > > atomic_t or spinlock_t, we can lose critical concurrent changes. > > > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > > gc_maybe_cycle/lock share the same 64bit word. > > > > This leads to fatal deadlock, as one/several cpus spin forever > > on a spinlock that will never be available again. > > > > Reported-by: Ambrose Feinstein <ambrose@google.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Cc: Paul Mackerras <paulus@samba.org> > > --- > > > > Could ppc64 experts confirm using byte is safe, or should we really add > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > need a change... > > Wow, nice one ! > > I'm not even completely certain bytes are safe to be honest, though > probably more than bitfields. I'll poke our compiler people. There is a longstanding and hard-to-fix bug in gcc that is specific to bitfields. I think that the underlying type isn't propagated, so when it comes to code generation the compiler doesn't know the natural width for the memory access. As for bytes - early Alphas couldn't load/store less than 32 bits, but I doubt anyone cares any more. > The worry is of course how many more of these do we potentially have ? > We might be able to automate finding these issues with sparse, I > suppose. > > Also I'd be surprised if ppc64 is the only one with that problem... what > about sparc64 and arm64 ? I expect they can have the same general problem, but gcc may be more or less keen to generate 64-bit load/store instructions for bitfields on different architectures. Ben.
On Tue, 30 Apr 2013 22:04:32 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote: > > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > > li 11,1 > > > ld 0,0(9) > > > rldimi 0,11,31,32 > > > std 0,0(9) > > > blr > > > .ident "GCC: (GNU) 4.6.3" > > > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > > > Yup. This is not a powerpc64 specific problem. See > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > > Fixed in 4.8.0 and 4.7.3. > > Ah thanks. > > This seems a pretty serious issue, is it documented somewhere that > ppc64, sparc64 and others need such compiler version ? > > These kind of errors are pretty hard to find, its a pity to spend time > on them. There is a checkbin target inside arch/powerpc/Makefile Shouldn't a check be added there to block building kernel with known bad GCC versions?
On 04/30/2013 10:54:25 PM, Alan Modra wrote: > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > li 11,1 > > ld 0,0(9) > > rldimi 0,11,31,32 > > std 0,0(9) > > blr > > .ident "GCC: (GNU) 4.6.3" > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > Yup. This is not a powerpc64 specific problem. See > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > Fixed in 4.8.0 and 4.7.3. FWIW (especially if a GCC version check is added), it seems to have been fixed as far back as 4.7.1, not just 4.7.3. -Scott
On Wed, 2013-05-01 at 08:10 -0700, Stephen Hemminger wrote: > > These kind of errors are pretty hard to find, its a pity to spend > time > > on them. > > There is a checkbin target inside arch/powerpc/Makefile > Shouldn't a check be added there to block building kernel with known > bad GCC versions? In this case that makes it all GCC versions except the *very latest* .... not practical. I suppose we should try to make sure that at least the next batch of enterprise distro get that fix on gcc side. Ben.
On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > These kind of errors are pretty hard to find, its a pity to spend time > on them. Well, yes. From the first comment in gcc PR52080. "For the following testcase we generate a 8 byte RMW cycle on IA64 which causes locking problems in the linux kernel btrfs filesystem." Did someone fix btrfs, but not check other kernel locks? Having now hit the same problem again, have you checked that other kernel locks don't have adjacent bit fields in the same 64-bit word? And comment the struct to ensure someone doesn't optimize those unsigned chars back to bit fields.
> Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Seems a good reason to have a general policy of not using bit fields! Separate char fields normally generate faster code - possibly at the expense of an increase in the allocated size of a structure. David
On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > > These kind of errors are pretty hard to find, its a pity to spend time > > on them. > > Well, yes. From the first comment in gcc PR52080. "For the following > testcase we generate a 8 byte RMW cycle on IA64 which causes locking > problems in the linux kernel btrfs filesystem." > > Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Unfortunately, fixing "other" kernel locks is near impossible. One could try to grep for all spinlock_t and maybe even all atomic_t, may even write a script to spot automatically if a bitfield appears to be around (though it could be hidden behind a structure etc...) but what about an int accessed with cmxchg (a kernel macro doing a lwarx/stwcx. loop on a value) for example ? There's plenty of these... I don't think we can realistically "fix" all potential occurrences of that bug in the kernel short of geting rid of all bitfields, which isn't going to happen any time soon. I'm afraid this *must* be fixed at the compiler level, with as backports much as can realistically be done back to distros. Ben.
On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > > These kind of errors are pretty hard to find, its a pity to spend time > > on them. > > Well, yes. From the first comment in gcc PR52080. "For the following > testcase we generate a 8 byte RMW cycle on IA64 which causes locking > problems in the linux kernel btrfs filesystem." > > Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Not only spinlock, but atomic_t followed by bit fields. BTW, if a spinlock is followed by bit fields, but bit fields only changed when this spinlock is held, there is no problem, unless spinlock is a ticket spinlock. In af_unix, bug happens because the bit fields were changed without spinlock being held (another global spinlock is used instead) (ppc64 doesnt use ticket spinlocks yet)
> > Could ppc64 experts confirm using byte is safe, or should we really add > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > need a change... ... > Also I'd be surprised if ppc64 is the only one with that problem... what > about sparc64 and arm64 ? Even x86 could be affected. The width of the memory cycles used by the 'bit set and bit clear' instructions isn't documented. They are certainly allowed to do RMW on adjacent bytes. I don't remember whether they are constrained to only do 32bit accesses, but nothing used to say that they wouldn't do 32bit misaligned ones! (although I suspect they never have). David
On Fri, 2013-05-03 at 15:29 +0100, David Laight wrote: > > > Could ppc64 experts confirm using byte is safe, or should we really add > > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > > need a change... > ... > > Also I'd be surprised if ppc64 is the only one with that problem... what > > about sparc64 and arm64 ? > > Even x86 could be affected. > The width of the memory cycles used by the 'bit set and bit clear' > instructions isn't documented. They are certainly allowed to do > RMW on adjacent bytes. > I don't remember whether they are constrained to only do > 32bit accesses, but nothing used to say that they wouldn't > do 32bit misaligned ones! (although I suspect they never have). x86 is not affected (or else we would have found the bug much earlier) Setting 1-bit field to one/zero uses OR/AND instructions. orb $4,724(%reg) doesn't load/store 64bits but 8bits.
> > > Also I'd be surprised if ppc64 is the only one with that problem... what > > > about sparc64 and arm64 ? > > > > Even x86 could be affected. > > The width of the memory cycles used by the 'bit set and bit clear' > > instructions isn't documented. They are certainly allowed to do > > RMW on adjacent bytes. > > I don't remember whether they are constrained to only do > > 32bit accesses, but nothing used to say that they wouldn't > > do 32bit misaligned ones! (although I suspect they never have). > > x86 is not affected (or else we would have found the bug much earlier) > > Setting 1-bit field to one/zero uses OR/AND instructions. > > orb $4,724(%reg) > > doesn't load/store 64bits but 8bits. I was thinking of code that might be using BT, BTC, BTR or BTS. These are probably used with the 'lock' prefix - which would (I think) make them safe. The documented constraint is more specific than it used to be the Intel version reads: When accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit operand size, using by the following relationship: Effective Address + (4 ∗ (BitOffset DIV 32)) Or, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship: Effective Address + (2 ∗ (BitOffset DIV 16)) It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In particular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instructions to load from or store to these addresses, and use the register form of these instructions to manipulate the data. In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit operands. David
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index a8836e8..4520a23f 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -57,8 +57,8 @@ struct unix_sock { struct list_head link; atomic_long_t inflight; spinlock_t lock; - unsigned int gc_candidate : 1; - unsigned int gc_maybe_cycle : 1; + unsigned char gc_candidate; + unsigned char gc_maybe_cycle; unsigned char recursion_level; struct socket_wq peer_wq; };