diff mbox series

powerpc/lib/sstep: Fix count leading zeros instructions

Message ID 20171009110726.28551-1-sandipan@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series powerpc/lib/sstep: Fix count leading zeros instructions | expand

Commit Message

Sandipan Das Oct. 9, 2017, 11:07 a.m. UTC
According to the GCC documentation, the behaviour of __builtin_clz()
and __builtin_clzl() is undefined if the value of the input argument
is zero. Without handling this special case, these builtins have been
used for emulating the following instructions:
  * Count Leading Zeros Word (cntlzw[.])
  * Count Leading Zeros Doubleword (cntlzd[.])

This fixes the emulated behaviour of these instructions by adding an
additional check for this special case.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 arch/powerpc/lib/sstep.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

David Laight Oct. 9, 2017, 1:49 p.m. UTC | #1
From: Sandipan Das
> Sent: 09 October 2017 12:07
> According to the GCC documentation, the behaviour of __builtin_clz()
> and __builtin_clzl() is undefined if the value of the input argument
> is zero. Without handling this special case, these builtins have been
> used for emulating the following instructions:
>   * Count Leading Zeros Word (cntlzw[.])
>   * Count Leading Zeros Doubleword (cntlzd[.])
> 
> This fixes the emulated behaviour of these instructions by adding an
> additional check for this special case.

Presumably the result is undefined because the underlying cpu
instruction is used - and it's return value is implementation defined.

Here you are emulating the cpu instruction - so executing one will
give the same answer as it the 'real' one were execucted.

Indeed it might be worth an asm statement that definitely executes
the cpu instruction?

	David

> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/lib/sstep.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 0f7e41bd7e88..ebbc0b92650c 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1717,11 +1717,19 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>   * Logical instructions
>   */
>  		case 26:	/* cntlzw */
> -			op->val = __builtin_clz((unsigned int) regs->gpr[rd]);
> +			val = (unsigned int) regs->gpr[rd];
> +			if (val == 0)
> +				op->val = 32;
> +			else
> +				op->val = __builtin_clz(val);
>  			goto logical_done;
>  #ifdef __powerpc64__
>  		case 58:	/* cntlzd */
> -			op->val = __builtin_clzl(regs->gpr[rd]);
> +			val = regs->gpr[rd];
> +			if (val == 0)
> +				op->val = 64;
> +			else
> +				op->val = __builtin_clzl(val);
>  			goto logical_done;
>  #endif
>  		case 28:	/* and */
> --
> 2.13.6
Segher Boessenkool Oct. 9, 2017, 2:20 p.m. UTC | #2
On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote:
> From: Sandipan Das
> > Sent: 09 October 2017 12:07
> > According to the GCC documentation, the behaviour of __builtin_clz()
> > and __builtin_clzl() is undefined if the value of the input argument
> > is zero. Without handling this special case, these builtins have been
> > used for emulating the following instructions:
> >   * Count Leading Zeros Word (cntlzw[.])
> >   * Count Leading Zeros Doubleword (cntlzd[.])
> > 
> > This fixes the emulated behaviour of these instructions by adding an
> > additional check for this special case.
> 
> Presumably the result is undefined because the underlying cpu
> instruction is used - and it's return value is implementation defined.

It is undefined because the result is undefined, and the compiler
optimises based on that.  The return value of the builtin is undefined,
not implementation defined.

The patch is correct.


Segher
David Laight Oct. 9, 2017, 2:43 p.m. UTC | #3
From: Segher Boessenkool
> Sent: 09 October 2017 15:21
> On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote:
> > From: Sandipan Das
> > > Sent: 09 October 2017 12:07
> > > According to the GCC documentation, the behaviour of __builtin_clz()
> > > and __builtin_clzl() is undefined if the value of the input argument
> > > is zero. Without handling this special case, these builtins have been
> > > used for emulating the following instructions:
> > >   * Count Leading Zeros Word (cntlzw[.])
> > >   * Count Leading Zeros Doubleword (cntlzd[.])
> > >
> > > This fixes the emulated behaviour of these instructions by adding an
> > > additional check for this special case.
> >
> > Presumably the result is undefined because the underlying cpu
> > instruction is used - and it's return value is implementation defined.
> 
> It is undefined because the result is undefined, and the compiler
> optimises based on that.  The return value of the builtin is undefined,
> not implementation defined.
> 
> The patch is correct.

But the code you are emulating might be relying on the (un)defined value
the cpu instruction gives for zero input rather than the input width.

Or, put another way, if the return value for a clz instruction with zero
argument is undefined (as it is on x86 - intel and amd may differ) then the
emulation can return any value since the code can't care.
So the conditional is not needed.

	David
Naveen N. Rao Oct. 9, 2017, 2:45 p.m. UTC | #4
On 2017/10/09 11:07AM, Sandipan Das wrote:
> According to the GCC documentation, the behaviour of __builtin_clz()
> and __builtin_clzl() is undefined if the value of the input argument
> is zero. Without handling this special case, these builtins have been
> used for emulating the following instructions:
>   * Count Leading Zeros Word (cntlzw[.])
>   * Count Leading Zeros Doubleword (cntlzd[.])
> 
> This fixes the emulated behaviour of these instructions by adding an
> additional check for this special case.

So:
Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't 
modify *regs")

> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/lib/sstep.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 0f7e41bd7e88..ebbc0b92650c 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1717,11 +1717,19 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>   * Logical instructions
>   */
>  		case 26:	/* cntlzw */
> -			op->val = __builtin_clz((unsigned int) regs->gpr[rd]);
> +			val = (unsigned int) regs->gpr[rd];
> +			if (val == 0)
> +				op->val = 32;
> +			else
> +				op->val = __builtin_clz(val);

Can be made more compact:
	op->val = ( val ? __builtin_clz(val) : 32 );

- Naveen

>  			goto logical_done;
>  #ifdef __powerpc64__
>  		case 58:	/* cntlzd */
> -			op->val = __builtin_clzl(regs->gpr[rd]);
> +			val = regs->gpr[rd];
> +			if (val == 0)
> +				op->val = 64;
> +			else
> +				op->val = __builtin_clzl(val);
>  			goto logical_done;
>  #endif
>  		case 28:	/* and */
> -- 
> 2.13.6
>
Naveen N. Rao Oct. 9, 2017, 2:47 p.m. UTC | #5
On 2017/10/09 02:43PM, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 09 October 2017 15:21
> > On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote:
> > > From: Sandipan Das
> > > > Sent: 09 October 2017 12:07
> > > > According to the GCC documentation, the behaviour of __builtin_clz()
> > > > and __builtin_clzl() is undefined if the value of the input argument
> > > > is zero. Without handling this special case, these builtins have been
> > > > used for emulating the following instructions:
> > > >   * Count Leading Zeros Word (cntlzw[.])
> > > >   * Count Leading Zeros Doubleword (cntlzd[.])
> > > >
> > > > This fixes the emulated behaviour of these instructions by adding an
> > > > additional check for this special case.
> > >
> > > Presumably the result is undefined because the underlying cpu
> > > instruction is used - and it's return value is implementation defined.
> > 
> > It is undefined because the result is undefined, and the compiler
> > optimises based on that.  The return value of the builtin is undefined,
> > not implementation defined.
> > 
> > The patch is correct.
> 
> But the code you are emulating might be relying on the (un)defined value
> the cpu instruction gives for zero input rather than the input width.
> 
> Or, put another way, if the return value for a clz instruction with zero
> argument is undefined (as it is on x86 - intel and amd may differ) then the
> emulation can return any value since the code can't care.
> So the conditional is not needed.

This is about the behavior of the gcc builtin being undefined, rather 
than the actual cpu instruction itself.


- Naveen
Segher Boessenkool Oct. 9, 2017, 2:47 p.m. UTC | #6
On Mon, Oct 09, 2017 at 02:43:45PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 09 October 2017 15:21
> > On Mon, Oct 09, 2017 at 01:49:26PM +0000, David Laight wrote:
> > > From: Sandipan Das
> > > > Sent: 09 October 2017 12:07
> > > > According to the GCC documentation, the behaviour of __builtin_clz()
> > > > and __builtin_clzl() is undefined if the value of the input argument
> > > > is zero. Without handling this special case, these builtins have been
> > > > used for emulating the following instructions:
> > > >   * Count Leading Zeros Word (cntlzw[.])
> > > >   * Count Leading Zeros Doubleword (cntlzd[.])
> > > >
> > > > This fixes the emulated behaviour of these instructions by adding an
> > > > additional check for this special case.
> > >
> > > Presumably the result is undefined because the underlying cpu
> > > instruction is used - and it's return value is implementation defined.
> > 
> > It is undefined because the result is undefined, and the compiler
> > optimises based on that.  The return value of the builtin is undefined,
> > not implementation defined.
> > 
> > The patch is correct.
> 
> But the code you are emulating might be relying on the (un)defined value
> the cpu instruction gives for zero input rather than the input width.
> 
> Or, put another way, if the return value for a clz instruction with zero
> argument is undefined (as it is on x86 - intel and amd may differ) then the
> emulation can return any value since the code can't care.
> So the conditional is not needed.

The cntlz[wd][.] insn has defined behaviour for 0 input.  It's just the
builtin that does not.  So we shouldn't call the builtin with an input
of 0 -- exactly what this patch does -- and that is all that was wrong.


Segher
David Laight Oct. 9, 2017, 3:24 p.m. UTC | #7
From: naveen.n.rao@linux.vnet.ibm.com
> Sent: 09 October 2017 15:48
...
> This is about the behavior of the gcc builtin being undefined, rather
> than the actual cpu instruction itself.

I'd have hoped that the ggc builtin just generated the expected cpu instruction.
So is only undefined because it is very cpu dependant.

More problematic here would be any cpu flag register settings.
eg: x86 would set the 'Z' flag for zero input.

	David
diff mbox series

Patch

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 0f7e41bd7e88..ebbc0b92650c 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1717,11 +1717,19 @@  int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
  * Logical instructions
  */
 		case 26:	/* cntlzw */
-			op->val = __builtin_clz((unsigned int) regs->gpr[rd]);
+			val = (unsigned int) regs->gpr[rd];
+			if (val == 0)
+				op->val = 32;
+			else
+				op->val = __builtin_clz(val);
 			goto logical_done;
 #ifdef __powerpc64__
 		case 58:	/* cntlzd */
-			op->val = __builtin_clzl(regs->gpr[rd]);
+			val = regs->gpr[rd];
+			if (val == 0)
+				op->val = 64;
+			else
+				op->val = __builtin_clzl(val);
 			goto logical_done;
 #endif
 		case 28:	/* and */