Message ID | 20171009110726.28551-1-sandipan@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/lib/sstep: Fix count leading zeros instructions | expand |
From: Sandipan Das > Sent: 09 October 2017 12:07 > According to the GCC documentation, the behaviour of __builtin_clz() > and __builtin_clzl() is undefined if the value of the input argument > is zero. Without handling this special case, these builtins have been > used for emulating the following instructions: > * Count Leading Zeros Word (cntlzw[.]) > * Count Leading Zeros Doubleword (cntlzd[.]) > > This fixes the emulated behaviour of these instructions by adding an > additional check for this special case. Presumably the result is undefined because the underlying cpu instruction is used - and it's return value is implementation defined. Here you are emulating the cpu instruction - so executing one will give the same answer as it the 'real' one were execucted. Indeed it might be worth an asm statement that definitely executes the cpu instruction? David > > Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> > --- > arch/powerpc/lib/sstep.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 0f7e41bd7e88..ebbc0b92650c 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -1717,11 +1717,19 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, > * Logical instructions > */ > case 26: /* cntlzw */ > - op->val = __builtin_clz((unsigned int) regs->gpr[rd]); > + val = (unsigned int) regs->gpr[rd]; > + if (val == 0) > + op->val = 32; > + else > + op->val = __builtin_clz(val); > goto logical_done; > #ifdef __powerpc64__ > case 58: /* cntlzd */ > - op->val = __builtin_clzl(regs->gpr[rd]); > + val = regs->gpr[rd]; > + if (val == 0) > + op->val = 64; > + else > + op->val = __builtin_clzl(val); > goto logical_done; > #endif > case 28: /* and */ > -- > 2.13.6
On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote: > From: Sandipan Das > > Sent: 09 October 2017 12:07 > > According to the GCC documentation, the behaviour of __builtin_clz() > > and __builtin_clzl() is undefined if the value of the input argument > > is zero. Without handling this special case, these builtins have been > > used for emulating the following instructions: > > * Count Leading Zeros Word (cntlzw[.]) > > * Count Leading Zeros Doubleword (cntlzd[.]) > > > > This fixes the emulated behaviour of these instructions by adding an > > additional check for this special case. > > Presumably the result is undefined because the underlying cpu > instruction is used - and it's return value is implementation defined. It is undefined because the result is undefined, and the compiler optimises based on that. The return value of the builtin is undefined, not implementation defined. The patch is correct. Segher
From: Segher Boessenkool > Sent: 09 October 2017 15:21 > On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote: > > From: Sandipan Das > > > Sent: 09 October 2017 12:07 > > > According to the GCC documentation, the behaviour of __builtin_clz() > > > and __builtin_clzl() is undefined if the value of the input argument > > > is zero. Without handling this special case, these builtins have been > > > used for emulating the following instructions: > > > * Count Leading Zeros Word (cntlzw[.]) > > > * Count Leading Zeros Doubleword (cntlzd[.]) > > > > > > This fixes the emulated behaviour of these instructions by adding an > > > additional check for this special case. > > > > Presumably the result is undefined because the underlying cpu > > instruction is used - and it's return value is implementation defined. > > It is undefined because the result is undefined, and the compiler > optimises based on that. The return value of the builtin is undefined, > not implementation defined. > > The patch is correct. But the code you are emulating might be relying on the (un)defined value the cpu instruction gives for zero input rather than the input width. Or, put another way, if the return value for a clz instruction with zero argument is undefined (as it is on x86 - intel and amd may differ) then the emulation can return any value since the code can't care. So the conditional is not needed. David
On 2017/10/09 11:07AM, Sandipan Das wrote: > According to the GCC documentation, the behaviour of __builtin_clz() > and __builtin_clzl() is undefined if the value of the input argument > is zero. Without handling this special case, these builtins have been > used for emulating the following instructions: > * Count Leading Zeros Word (cntlzw[.]) > * Count Leading Zeros Doubleword (cntlzd[.]) > > This fixes the emulated behaviour of these instructions by adding an > additional check for this special case. So: Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") > > Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> > --- > arch/powerpc/lib/sstep.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 0f7e41bd7e88..ebbc0b92650c 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -1717,11 +1717,19 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, > * Logical instructions > */ > case 26: /* cntlzw */ > - op->val = __builtin_clz((unsigned int) regs->gpr[rd]); > + val = (unsigned int) regs->gpr[rd]; > + if (val == 0) > + op->val = 32; > + else > + op->val = __builtin_clz(val); Can be made more compact: op->val = ( val ? __builtin_clz(val) : 32 ); - Naveen > goto logical_done; > #ifdef __powerpc64__ > case 58: /* cntlzd */ > - op->val = __builtin_clzl(regs->gpr[rd]); > + val = regs->gpr[rd]; > + if (val == 0) > + op->val = 64; > + else > + op->val = __builtin_clzl(val); > goto logical_done; > #endif > case 28: /* and */ > -- > 2.13.6 >
On 2017/10/09 02:43PM, David Laight wrote: > From: Segher Boessenkool > > Sent: 09 October 2017 15:21 > > On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote: > > > From: Sandipan Das > > > > Sent: 09 October 2017 12:07 > > > > According to the GCC documentation, the behaviour of __builtin_clz() > > > > and __builtin_clzl() is undefined if the value of the input argument > > > > is zero. Without handling this special case, these builtins have been > > > > used for emulating the following instructions: > > > > * Count Leading Zeros Word (cntlzw[.]) > > > > * Count Leading Zeros Doubleword (cntlzd[.]) > > > > > > > > This fixes the emulated behaviour of these instructions by adding an > > > > additional check for this special case. > > > > > > Presumably the result is undefined because the underlying cpu > > > instruction is used - and it's return value is implementation defined. > > > > It is undefined because the result is undefined, and the compiler > > optimises based on that. The return value of the builtin is undefined, > > not implementation defined. > > > > The patch is correct. > > But the code you are emulating might be relying on the (un)defined value > the cpu instruction gives for zero input rather than the input width. > > Or, put another way, if the return value for a clz instruction with zero > argument is undefined (as it is on x86 - intel and amd may differ) then the > emulation can return any value since the code can't care. > So the conditional is not needed. This is about the behavior of the gcc builtin being undefined, rather than the actual cpu instruction itself. - Naveen
On Mon, Oct 09, 2017 at 02:43:45PM +0000, David Laight wrote: > From: Segher Boessenkool > > Sent: 09 October 2017 15:21 > > On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote: > > > From: Sandipan Das > > > > Sent: 09 October 2017 12:07 > > > > According to the GCC documentation, the behaviour of __builtin_clz() > > > > and __builtin_clzl() is undefined if the value of the input argument > > > > is zero. Without handling this special case, these builtins have been > > > > used for emulating the following instructions: > > > > * Count Leading Zeros Word (cntlzw[.]) > > > > * Count Leading Zeros Doubleword (cntlzd[.]) > > > > > > > > This fixes the emulated behaviour of these instructions by adding an > > > > additional check for this special case. > > > > > > Presumably the result is undefined because the underlying cpu > > > instruction is used - and it's return value is implementation defined. > > > > It is undefined because the result is undefined, and the compiler > > optimises based on that. The return value of the builtin is undefined, > > not implementation defined. > > > > The patch is correct. > > But the code you are emulating might be relying on the (un)defined value > the cpu instruction gives for zero input rather than the input width. > > Or, put another way, if the return value for a clz instruction with zero > argument is undefined (as it is on x86 - intel and amd may differ) then the > emulation can return any value since the code can't care. > So the conditional is not needed. The cntlz[wd][.] insn has defined behaviour for 0 input. It's just the builtin that does not. So we shouldn't call the builtin with an input of 0 -- exactly what this patch does -- and that is all that was wrong. Segher
From: naveen.n.rao@linux.vnet.ibm.com > Sent: 09 October 2017 15:48 ... > This is about the behavior of the gcc builtin being undefined, rather > than the actual cpu instruction itself. I'd have hoped that the ggc builtin just generated the expected cpu instruction. So is only undefined because it is very cpu dependant. More problematic here would be any cpu flag register settings. eg: x86 would set the 'Z' flag for zero input. David
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 0f7e41bd7e88..ebbc0b92650c 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1717,11 +1717,19 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, * Logical instructions */ case 26: /* cntlzw */ - op->val = __builtin_clz((unsigned int) regs->gpr[rd]); + val = (unsigned int) regs->gpr[rd]; + if (val == 0) + op->val = 32; + else + op->val = __builtin_clz(val); goto logical_done; #ifdef __powerpc64__ case 58: /* cntlzd */ - op->val = __builtin_clzl(regs->gpr[rd]); + val = regs->gpr[rd]; + if (val == 0) + op->val = 64; + else + op->val = __builtin_clzl(val); goto logical_done; #endif case 28: /* and */
According to the GCC documentation, the behaviour of __builtin_clz() and __builtin_clzl() is undefined if the value of the input argument is zero. Without handling this special case, these builtins have been used for emulating the following instructions: * Count Leading Zeros Word (cntlzw[.]) * Count Leading Zeros Doubleword (cntlzd[.]) This fixes the emulated behaviour of these instructions by adding an additional check for this special case. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> --- arch/powerpc/lib/sstep.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)