diff mbox series

ppc: Three floating point fixes

Message ID 1565983669-6886-1-git-send-email-pc@us.ibm.com
State New
Headers show
Series ppc: Three floating point fixes | expand

Commit Message

Paul A. Clarke Aug. 16, 2019, 7:27 p.m. UTC
From: "Paul A. Clarke" <pc@us.ibm.com>

- target/ppc/fpu_helper.c:
  - helper_todouble() was not properly converting INFINITY from 32 bit
  float to 64 bit double.
  - helper_todouble() was not properly converting any denormalized
  32 bit float to 64 bit double.

- GCC, as of version 8 or so, takes advantage of the hardware's
  implementation of the xscvdpspn instruction to optimize the following
  sequence:
    xscvdpspn vs0,vs1
    mffprwz   r8,f0
  ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register,
  and mffprwz expecting its input to come from word 0 of the source register.
  This sequence fails with QEMU, as a shift is required between those two
  instructions.  However, the hardware splats the result to both word 0 and
  word 1 of its output register, so the shift is not necessary.
  Expect a future revision of the ISA to specify this behavior.

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
---
 target/ppc/fpu_helper.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Aleksandar Markovic Aug. 16, 2019, 10:59 p.m. UTC | #1
16.08.2019. 21.28, "Paul A. Clarke" <pc@us.ibm.com> је написао/ла:
>
> From: "Paul A. Clarke" <pc@us.ibm.com>
>
> - target/ppc/fpu_helper.c:
>   - helper_todouble() was not properly converting INFINITY from 32 bit
>   float to 64 bit double.
>   - helper_todouble() was not properly converting any denormalized
>   32 bit float to 64 bit double.
>
> - GCC, as of version 8 or so, takes advantage of the hardware's
>   implementation of the xscvdpspn instruction to optimize the following
>   sequence:
>     xscvdpspn vs0,vs1
>     mffprwz   r8,f0
>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
register,
>   and mffprwz expecting its input to come from word 0 of the source
register.
>   This sequence fails with QEMU, as a shift is required between those two
>   instructions.  However, the hardware splats the result to both word 0
and
>   word 1 of its output register, so the shift is not necessary.
>   Expect a future revision of the ISA to specify this behavior.
>

Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given
everything you said here?

Sincerely,
Aleksandar

> Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
> ---
>  target/ppc/fpu_helper.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 5611cf0..82b5425 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg)
>          ret  = (uint64_t)extract32(arg, 30, 2) << 62;
>          ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59;
>          ret |= (uint64_t)extract32(arg, 0, 30) << 29;
> +        ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52;
>      } else {
>          /* Zero or Denormalized operand.  */
>          ret = (uint64_t)extract32(arg, 31, 1) << 63;
>          if (unlikely(abs_arg != 0)) {
>              /* Denormalized operand.  */
>              int shift = clz32(abs_arg) - 9;
> -            int exp = -126 - shift + 1023;
> +            int exp = -127 - shift + 1023;
>              ret |= (uint64_t)exp << 52;
>              ret |= abs_arg << (shift + 29);
>          }
> @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t
opcode,
>
>  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
>  {
> +    uint64_t result;
> +
>      float_status tstat = env->fp_status;
>      set_float_exception_flags(0, &tstat);
>
> -    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
> +    result = (uint64_t)float64_to_float32(xb, &tstat);
> +    /* hardware replicates result to both words of the doubleword
result.  */
> +    return (result << 32) | result;
>  }
>
>  uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
> --
> 1.8.3.1
>
>
Aleksandar Markovic Aug. 17, 2019, 7:53 a.m. UTC | #2
17.08.2019. 00.59, "Aleksandar Markovic" <aleksandar.m.mail@gmail.com> је
написао/ла:
>
>
> 16.08.2019. 21.28, "Paul A. Clarke" <pc@us.ibm.com> је написао/ла:
> >
> > From: "Paul A. Clarke" <pc@us.ibm.com>
> >
> > - target/ppc/fpu_helper.c:
> >   - helper_todouble() was not properly converting INFINITY from 32 bit
> >   float to 64 bit double.
> >   - helper_todouble() was not properly converting any denormalized
> >   32 bit float to 64 bit double.
> >
> > - GCC, as of version 8 or so, takes advantage of the hardware's
> >   implementation of the xscvdpspn instruction to optimize the following
> >   sequence:
> >     xscvdpspn vs0,vs1
> >     mffprwz   r8,f0
> >   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
register,
> >   and mffprwz expecting its input to come from word 0 of the source
register.
> >   This sequence fails with QEMU, as a shift is required between those
two
> >   instructions.  However, the hardware splats the result to both word 0
and
> >   word 1 of its output register, so the shift is not necessary.
> >   Expect a future revision of the ISA to specify this behavior.
> >
>
> Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given
everything you said here?
>

Paul, you are touching here a very sensitive area. If I were you, I would
most likely propose a similar patch, given thr info you presented. However,
in general, QEMU is a compiler-agnostic tool and should not depend on
compiler features, let alone fix its bugs, and therefore I think you should
more clearly spell out in your commit message that the code segment in
question is a workaround for an outright GCC bug, and just a kind of
"necessary evil" in given situation.

Let us also wait for David possibly coming up with the final verdict wrt
this issue.

Also, this patch should be split into two or three (infinity conversions
have nothing to do with xscvdpspn) - a patch deals with only one logical
unit.

Yours,
Aleksandar

> Sincerely,
> Aleksandar
>
> > Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
> > ---
> >  target/ppc/fpu_helper.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> > index 5611cf0..82b5425 100644
> > --- a/target/ppc/fpu_helper.c
> > +++ b/target/ppc/fpu_helper.c
> > @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg)
> >          ret  = (uint64_t)extract32(arg, 30, 2) << 62;
> >          ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59;
> >          ret |= (uint64_t)extract32(arg, 0, 30) << 29;
> > +        ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52;
> >      } else {
> >          /* Zero or Denormalized operand.  */
> >          ret = (uint64_t)extract32(arg, 31, 1) << 63;
> >          if (unlikely(abs_arg != 0)) {
> >              /* Denormalized operand.  */
> >              int shift = clz32(abs_arg) - 9;
> > -            int exp = -126 - shift + 1023;
> > +            int exp = -127 - shift + 1023;
> >              ret |= (uint64_t)exp << 52;
> >              ret |= abs_arg << (shift + 29);
> >          }
> > @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t
opcode,
> >
> >  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
> >  {
> > +    uint64_t result;
> > +
> >      float_status tstat = env->fp_status;
> >      set_float_exception_flags(0, &tstat);
> >
> > -    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
> > +    result = (uint64_t)float64_to_float32(xb, &tstat);
> > +    /* hardware replicates result to both words of the doubleword
result.  */
> > +    return (result << 32) | result;
> >  }
> >
> >  uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
> > --
> > 1.8.3.1
> >
> >
Richard Henderson Aug. 18, 2019, 8 a.m. UTC | #3
On 8/16/19 8:27 PM, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <pc@us.ibm.com>
> 
> - target/ppc/fpu_helper.c:
>   - helper_todouble() was not properly converting INFINITY from 32 bit
>   float to 64 bit double.
>   - helper_todouble() was not properly converting any denormalized
>   32 bit float to 64 bit double.

These two would seem to be my fault:
Fixes: 86c0cab11aa (target/ppc: Use non-arithmetic conversions for fp load/store)

> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 5611cf0..82b5425 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg)
>          ret  = (uint64_t)extract32(arg, 30, 2) << 62;
>          ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59;
>          ret |= (uint64_t)extract32(arg, 0, 30) << 29;
> +        ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52;

Since the overwrites bits set two lines above,
I think this would be better as

    if (likely(abs_arg >= 0x00800000)) {
        /* Normalized operand, or Inf or NaN.  */
        if (unlikely(extract32(arg, 23, 8) == 0xff)) {
            /* Inf or NaN.  */
            ...
        } else {
            /* Normalized operand.  */


>      } else {
>          /* Zero or Denormalized operand.  */
>          ret = (uint64_t)extract32(arg, 31, 1) << 63;
>          if (unlikely(abs_arg != 0)) {
>              /* Denormalized operand.  */
>              int shift = clz32(abs_arg) - 9;
> -            int exp = -126 - shift + 1023;
> +            int exp = -127 - shift + 1023;
>              ret |= (uint64_t)exp << 52;
>              ret |= abs_arg << (shift + 29);

Hmm.  The manual says -126, but it also shifts the fraction by a different
amount, such that the implicit bit is 1.

What I also don't see here is where the most significant bit is removed from
the fraction, a-la "FRT[5:63] = frac[1:52]" in the manual.

The soft-float code from which this was probably copied did this by shifting
the fraction such that the msb overlaps the exponent, biasing the exponent by
-1, and then using an add instead of an or to simultaneously remove the bias,
swallow the msb, and include the lower bits unchanged.

So it looks like this should be

    /*
     * Shift fraction so that the msb is in the implicit bit position.
     * Thus shift is in the range [1:23].
     */
    int shift = clz32(abs_arg) - 8;
    /*
     * The first 3 terms compute the float64 exponent.  We then bias
     * this result by -1 so that we can swallow the implicit bit below.
     */
    int exp = -126 - shift + 1023 - 1;
    ret |= (uint64_t)exp << 52;
    ret += (uint64_t)abs_arg << (52 - 23 + shift);

Hmm.  I see another bug in the existing code whereby abs_arg is not cast before
shifting.  This truncation is probably how you're seeing correct results with
your patch, for denormals containing only one bit set, e.g. FLT_TRUE_MIN.

It would be good to test other denormals, like 0x00055555.


> @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode,
>  
>  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
>  {
> +    uint64_t result;
> +
>      float_status tstat = env->fp_status;
>      set_float_exception_flags(0, &tstat);
>  
> -    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
> +    result = (uint64_t)float64_to_float32(xb, &tstat);
> +    /* hardware replicates result to both words of the doubleword result.  */
> +    return (result << 32) | result;
>  }

This definitely should be a separate patch.  The comment should include some
language about this behaviour being required by a future ISA revision.


r~
Richard Henderson Aug. 18, 2019, 8:10 a.m. UTC | #4
On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
>> From: "Paul A. Clarke" <pc@us.ibm.com>
...
>>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
> register,
>>   and mffprwz expecting its input to come from word 0 of the source
> register.
>>   This sequence fails with QEMU, as a shift is required between those two
>>   instructions.  However, the hardware splats the result to both word 0
> and
>>   word 1 of its output register, so the shift is not necessary.
>>   Expect a future revision of the ISA to specify this behavior.
>>
> 
> Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given
> everything you said here?

The key here is "expect a future revision of the ISA to specify this behavior".

It's clearly within IBM's purview to adjust the specification to document a
previously undocumented hardware feature.


r~
Aleksandar Markovic Aug. 18, 2019, 8:59 p.m. UTC | #5
18.08.2019. 10.10, "Richard Henderson" <richard.henderson@linaro.org> је
написао/ла:
>
> On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
> >> From: "Paul A. Clarke" <pc@us.ibm.com>
> ...
> >>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
> > register,
> >>   and mffprwz expecting its input to come from word 0 of the source
> > register.
> >>   This sequence fails with QEMU, as a shift is required between those
two
> >>   instructions.  However, the hardware splats the result to both word 0
> > and
> >>   word 1 of its output register, so the shift is not necessary.
> >>   Expect a future revision of the ISA to specify this behavior.
> >>
> >
> > Hmmm... Isn't this a gcc bug (using undocumented hardware feature),
given
> > everything you said here?
>
> The key here is "expect a future revision of the ISA to specify this
behavior".
>
> It's clearly within IBM's purview to adjust the specification to document
a
> previously undocumented hardware feature.
>

By no means, yes, the key is in ISA documentation. But, the impression that
full original commit message conveys is that the main reason for change is
gcc behavior. If we accepted in general that gcc behavior determines QEMU
behavior, I am afraid we would be on a very slippery slope - therefore I
think the commit message (and possible code comment) should, in my opinion,
mention ISA docs as the central reason for change. Paul, is there any
tentative release date of the new ISA specification?

Aleksandar

>
> r~
David Gibson Aug. 19, 2019, 6:28 a.m. UTC | #6
On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote:
> 18.08.2019. 10.10, "Richard Henderson" <richard.henderson@linaro.org> је
> написао/ла:
> >
> > On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
> > >> From: "Paul A. Clarke" <pc@us.ibm.com>
> > ...
> > >>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
> > > register,
> > >>   and mffprwz expecting its input to come from word 0 of the source
> > > register.
> > >>   This sequence fails with QEMU, as a shift is required between those
> two
> > >>   instructions.  However, the hardware splats the result to both word 0
> > > and
> > >>   word 1 of its output register, so the shift is not necessary.
> > >>   Expect a future revision of the ISA to specify this behavior.
> > >>
> > >
> > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature),
> given
> > > everything you said here?
> >
> > The key here is "expect a future revision of the ISA to specify this
> behavior".
> >
> > It's clearly within IBM's purview to adjust the specification to document
> a
> > previously undocumented hardware feature.
> >
> 
> By no means, yes, the key is in ISA documentation. But, the impression that
> full original commit message conveys is that the main reason for change is
> gcc behavior. If we accepted in general that gcc behavior determines QEMU
> behavior, I am afraid we would be on a very slippery slope - therefore I
> think the commit message (and possible code comment) should, in my opinion,
> mention ISA docs as the central reason for change. Paul, is there any
> tentative release date of the new ISA specification?

It's not really a question of gcc behaviour, it's a question of actual
cpu behaviour versus ISA document.  Which one qemu should follow is
somewhat debatable, but it sounds here like the ISA will be updated to
match the cpu, which weights it heavily in favour of mimicing the
actual cpu.
David Gibson Aug. 19, 2019, 6:30 a.m. UTC | #7
On Fri, Aug 16, 2019 at 02:27:49PM -0500, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <pc@us.ibm.com>
> 
> - target/ppc/fpu_helper.c:
>   - helper_todouble() was not properly converting INFINITY from 32 bit
>   float to 64 bit double.
>   - helper_todouble() was not properly converting any denormalized
>   32 bit float to 64 bit double.
> 
> - GCC, as of version 8 or so, takes advantage of the hardware's
>   implementation of the xscvdpspn instruction to optimize the following
>   sequence:
>     xscvdpspn vs0,vs1
>     mffprwz   r8,f0
>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register,
>   and mffprwz expecting its input to come from word 0 of the source register.
>   This sequence fails with QEMU, as a shift is required between those two
>   instructions.  However, the hardware splats the result to both word 0 and
>   word 1 of its output register, so the shift is not necessary.
>   Expect a future revision of the ISA to specify this behavior.

As well as addressing Richard's comments, I'd prefer to see each of
the 3 fixes here in separate patches.  The code changes may be tiny,
but I find the fpu emulation code cryptic at the best of times, and
making it clear which change is addressing which bug would help.

> Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
> ---
>  target/ppc/fpu_helper.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 5611cf0..82b5425 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg)
>          ret  = (uint64_t)extract32(arg, 30, 2) << 62;
>          ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59;
>          ret |= (uint64_t)extract32(arg, 0, 30) << 29;
> +        ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52;
>      } else {
>          /* Zero or Denormalized operand.  */
>          ret = (uint64_t)extract32(arg, 31, 1) << 63;
>          if (unlikely(abs_arg != 0)) {
>              /* Denormalized operand.  */
>              int shift = clz32(abs_arg) - 9;
> -            int exp = -126 - shift + 1023;
> +            int exp = -127 - shift + 1023;
>              ret |= (uint64_t)exp << 52;
>              ret |= abs_arg << (shift + 29);
>          }
> @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode,
>  
>  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
>  {
> +    uint64_t result;
> +
>      float_status tstat = env->fp_status;
>      set_float_exception_flags(0, &tstat);
>  
> -    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
> +    result = (uint64_t)float64_to_float32(xb, &tstat);
> +    /* hardware replicates result to both words of the doubleword result.  */
> +    return (result << 32) | result;
>  }
>  
>  uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
Aleksandar Markovic Aug. 19, 2019, 6:44 a.m. UTC | #8
19.08.2019. 08.30, "David Gibson" <david@gibson.dropbear.id.au> је
написао/ла:
>
> On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote:
> > 18.08.2019. 10.10, "Richard Henderson" <richard.henderson@linaro.org> је
> > написао/ла:
> > >
> > > On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
> > > >> From: "Paul A. Clarke" <pc@us.ibm.com>
> > > ...
> > > >>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
> > > > register,
> > > >>   and mffprwz expecting its input to come from word 0 of the source
> > > > register.
> > > >>   This sequence fails with QEMU, as a shift is required between
those
> > two
> > > >>   instructions.  However, the hardware splats the result to both
word 0
> > > > and
> > > >>   word 1 of its output register, so the shift is not necessary.
> > > >>   Expect a future revision of the ISA to specify this behavior.
> > > >>
> > > >
> > > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature),
> > given
> > > > everything you said here?
> > >
> > > The key here is "expect a future revision of the ISA to specify this
> > behavior".
> > >
> > > It's clearly within IBM's purview to adjust the specification to
document
> > a
> > > previously undocumented hardware feature.
> > >
> >
> > By no means, yes, the key is in ISA documentation. But, the impression
that
> > full original commit message conveys is that the main reason for change
is
> > gcc behavior. If we accepted in general that gcc behavior determines
QEMU
> > behavior, I am afraid we would be on a very slippery slope - therefore I
> > think the commit message (and possible code comment) should, in my
opinion,
> > mention ISA docs as the central reason for change. Paul, is there any
> > tentative release date of the new ISA specification?
>
> It's not really a question of gcc behaviour, it's a question of actual
> cpu behaviour versus ISA document.  Which one qemu should follow is
> somewhat debatable, but it sounds here like the ISA will be updated to
> match the cpu, which weights it heavily in favour of mimicing the
> actual cpu.
>

This sounds right to me.

Aleksandar

> --
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_
_other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson
Paul A. Clarke Aug. 19, 2019, 5:13 p.m. UTC | #9
On 8/19/19 1:44 AM, Aleksandar Markovic wrote:
> 19.08.2019. 08.30, "David Gibson" <david@gibson.dropbear.id.au> је
> написао/ла:
>>
>> On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote:
>>> 18.08.2019. 10.10, "Richard Henderson" <richard.henderson@linaro.org> је
>>> написао/ла:
>>>>
>>>> On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
>>>>>> From: "Paul A. Clarke" <pc@us.ibm.com>
>>>> ...
>>>>>>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
>>>>> register,
>>>>>>   and mffprwz expecting its input to come from word 0 of the source
>>>>> register.
>>>>>>   This sequence fails with QEMU, as a shift is required between
> those
>>> two
>>>>>>   instructions.  However, the hardware splats the result to both
> word 0
>>>>> and
>>>>>>   word 1 of its output register, so the shift is not necessary.
>>>>>>   Expect a future revision of the ISA to specify this behavior.
>>>>>>
>>>>>
>>>>> Hmmm... Isn't this a gcc bug (using undocumented hardware feature),
>>> given
>>>>> everything you said here?
>>>>
>>>> The key here is "expect a future revision of the ISA to specify this
>>> behavior".
>>>>
>>>> It's clearly within IBM's purview to adjust the specification to
> document
>>> a
>>>> previously undocumented hardware feature.
>>>>
>>>
>>> By no means, yes, the key is in ISA documentation. But, the impression
> that
>>> full original commit message conveys is that the main reason for change
> is
>>> gcc behavior. If we accepted in general that gcc behavior determines
> QEMU
>>> behavior, I am afraid we would be on a very slippery slope - therefore I
>>> think the commit message (and possible code comment) should, in my
> opinion,
>>> mention ISA docs as the central reason for change. Paul, is there any
>>> tentative release date of the new ISA specification?
>>
>> It's not really a question of gcc behaviour, it's a question of actual
>> cpu behaviour versus ISA document.  Which one qemu should follow is
>> somewhat debatable, but it sounds here like the ISA will be updated to
>> match the cpu, which weights it heavily in favour of mimicing the
>> actual cpu.
>>
> 
> This sounds right to me.

Thanks for the reviews and great discussion.

While not yet part of a published version of the ISA, I did find the behavior documented in the User's Manuals for the POWER8 and POWER9 processors:

https://www-355.ibm.com/systems/power/openpower/
"Public Documents"
- "POWER9 Processor User's Manual"
- "POWER8 Processor User's Manual for the SCM"

POWER9 Processor User's Manual 
4. Power Architecture Compliance
4.3 Floating-Point Processor (FP, VMX, and VSX)
4.3.7 Floating-Point Invalid Forms and Undefined Conditions

POWER8 Processor User's Manual for the Single-Chip Module
3. Power Architecture Compliance
3.2 Floating-Point Processor (FP, VMX, and VSX)
3.2.9 Floating-Point Invalid Forms and Undefined Conditions

In a bullet:
- VSX scalar convert from double-precision to single-precision (xscvdpsp, xscvdpspn).
VSR[32:63] is set to VSR[0:31].

I have not confirmed when the new revision of the ISA will be published, but it's on somebody's "to do" list.

I will respin the patch as 3 independent patches, and include a reference to the above documents for the change under discussion here.  (The other two changes may take a bit more time, because like David, I find the FPU emulation code cryptic.  :-/  )

These issues were found while running Glibc's test suite for "math", and there are still a *LOT* of QEMU-only FAILs, so I may be back again with suggested fixes or questions.  :-)

PC
David Gibson Aug. 20, 2019, 7:31 a.m. UTC | #10
On Mon, Aug 19, 2019 at 12:13:34PM -0500, Paul Clarke wrote:
> On 8/19/19 1:44 AM, Aleksandar Markovic wrote:
> > 19.08.2019. 08.30, "David Gibson" <david@gibson.dropbear.id.au> је
> > написао/ла:
> >>
> >> On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote:
> >>> 18.08.2019. 10.10, "Richard Henderson" <richard.henderson@linaro.org> је
> >>> написао/ла:
> >>>>
> >>>> On 8/16/19 11:59 PM, Aleksandar Markovic wrote:
> >>>>>> From: "Paul A. Clarke" <pc@us.ibm.com>
> >>>> ...
> >>>>>>   ISA 3.0B has xscvdpspn leaving its result in word 1 of the target
> >>>>> register,
> >>>>>>   and mffprwz expecting its input to come from word 0 of the source
> >>>>> register.
> >>>>>>   This sequence fails with QEMU, as a shift is required between
> > those
> >>> two
> >>>>>>   instructions.  However, the hardware splats the result to both
> > word 0
> >>>>> and
> >>>>>>   word 1 of its output register, so the shift is not necessary.
> >>>>>>   Expect a future revision of the ISA to specify this behavior.
> >>>>>>
> >>>>>
> >>>>> Hmmm... Isn't this a gcc bug (using undocumented hardware feature),
> >>> given
> >>>>> everything you said here?
> >>>>
> >>>> The key here is "expect a future revision of the ISA to specify this
> >>> behavior".
> >>>>
> >>>> It's clearly within IBM's purview to adjust the specification to
> > document
> >>> a
> >>>> previously undocumented hardware feature.
> >>>>
> >>>
> >>> By no means, yes, the key is in ISA documentation. But, the impression
> > that
> >>> full original commit message conveys is that the main reason for change
> > is
> >>> gcc behavior. If we accepted in general that gcc behavior determines
> > QEMU
> >>> behavior, I am afraid we would be on a very slippery slope - therefore I
> >>> think the commit message (and possible code comment) should, in my
> > opinion,
> >>> mention ISA docs as the central reason for change. Paul, is there any
> >>> tentative release date of the new ISA specification?
> >>
> >> It's not really a question of gcc behaviour, it's a question of actual
> >> cpu behaviour versus ISA document.  Which one qemu should follow is
> >> somewhat debatable, but it sounds here like the ISA will be updated to
> >> match the cpu, which weights it heavily in favour of mimicing the
> >> actual cpu.
> >>
> > 
> > This sounds right to me.
> 
> Thanks for the reviews and great discussion.
> 
> While not yet part of a published version of the ISA, I did find the behavior documented in the User's Manuals for the POWER8 and POWER9 processors:
> 
> https://www-355.ibm.com/systems/power/openpower/
> "Public Documents"
> - "POWER9 Processor User's Manual"
> - "POWER8 Processor User's Manual for the SCM"
> 
> POWER9 Processor User's Manual 
> 4. Power Architecture Compliance
> 4.3 Floating-Point Processor (FP, VMX, and VSX)
> 4.3.7 Floating-Point Invalid Forms and Undefined Conditions
> 
> POWER8 Processor User's Manual for the Single-Chip Module
> 3. Power Architecture Compliance
> 3.2 Floating-Point Processor (FP, VMX, and VSX)
> 3.2.9 Floating-Point Invalid Forms and Undefined Conditions
> 
> In a bullet:
> - VSX scalar convert from double-precision to single-precision (xscvdpsp, xscvdpspn).
> VSR[32:63] is set to VSR[0:31].
> 
> I have not confirmed when the new revision of the ISA will be published, but it's on somebody's "to do" list.
> 
> I will respin the patch as 3 independent patches, and include a reference to the above documents for the change under discussion here.  (The other two changes may take a bit more time, because like David, I find the FPU emulation code cryptic.  :-/  )
> 
> These issues were found while running Glibc's test suite for "math",
> and there are still a *LOT* of QEMU-only FAILs, so I may be back
> again with suggested fixes or questions.  :-)

That doesn't greatly surprise me, TCG's ppc target stuff is only so-so
tested, TBH.
Peter Maydell Aug. 20, 2019, 10:35 a.m. UTC | #11
On Tue, 20 Aug 2019 at 08:36, David Gibson <david@gibson.dropbear.id.au> wrote:
> On Mon, Aug 19, 2019 at 12:13:34PM -0500, Paul Clarke wrote:
> > These issues were found while running Glibc's test suite for "math",
> > and there are still a *LOT* of QEMU-only FAILs, so I may be back
> > again with suggested fixes or questions.  :-)
>
> That doesn't greatly surprise me, TCG's ppc target stuff is only so-so
> tested, TBH.

You might also consider using/extending the risu test cases for
ppc64 -- individual checks of each insn against a known-good
implementation can be easier to track down bugs than trying
to figure out why a higher-level test suite like the glibc one
has reported a failure, IME. (There are already risu patterns
for XSCVDPSP and XSCVDPSPN, so I think that bug at least ought
to be found by risu if you run it against the right h/w as
known-good reference...)

thanks
-- PMM
David Gibson Aug. 22, 2019, 3:02 a.m. UTC | #12
On Tue, Aug 20, 2019 at 11:35:29AM +0100, Peter Maydell wrote:
> On Tue, 20 Aug 2019 at 08:36, David Gibson <david@gibson.dropbear.id.au> wrote:
> > On Mon, Aug 19, 2019 at 12:13:34PM -0500, Paul Clarke wrote:
> > > These issues were found while running Glibc's test suite for "math",
> > > and there are still a *LOT* of QEMU-only FAILs, so I may be back
> > > again with suggested fixes or questions.  :-)
> >
> > That doesn't greatly surprise me, TCG's ppc target stuff is only so-so
> > tested, TBH.
> 
> You might also consider using/extending the risu test cases for
> ppc64 -- individual checks of each insn against a known-good
> implementation can be easier to track down bugs than trying
> to figure out why a higher-level test suite like the glibc one
> has reported a failure, IME. (There are already risu patterns
> for XSCVDPSP and XSCVDPSPN, so I think that bug at least ought
> to be found by risu if you run it against the right h/w as
> known-good reference...)

Oh, I'd love to.  I tried running risu once, it failed cryptically and
I haven't had time to investigate deeper.
diff mbox series

Patch

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 5611cf0..82b5425 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -62,13 +62,14 @@  uint64_t helper_todouble(uint32_t arg)
         ret  = (uint64_t)extract32(arg, 30, 2) << 62;
         ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59;
         ret |= (uint64_t)extract32(arg, 0, 30) << 29;
+        ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52;
     } else {
         /* Zero or Denormalized operand.  */
         ret = (uint64_t)extract32(arg, 31, 1) << 63;
         if (unlikely(abs_arg != 0)) {
             /* Denormalized operand.  */
             int shift = clz32(abs_arg) - 9;
-            int exp = -126 - shift + 1023;
+            int exp = -127 - shift + 1023;
             ret |= (uint64_t)exp << 52;
             ret |= abs_arg << (shift + 29);
         }
@@ -2871,10 +2872,14 @@  void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode,
 
 uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
 {
+    uint64_t result;
+
     float_status tstat = env->fp_status;
     set_float_exception_flags(0, &tstat);
 
-    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
+    result = (uint64_t)float64_to_float32(xb, &tstat);
+    /* hardware replicates result to both words of the doubleword result.  */
+    return (result << 32) | result;
 }
 
 uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)