diff mbox

[PR81228] Fixes ICE by adding LTGT in vec_cmp<mode><v_cmp_result>.

Message ID DB5PR0801MB2742FFC7E11DD25875D0C86AE7BF0@DB5PR0801MB2742.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng July 28, 2017, 11:37 a.m. UTC
Hi,
This simple patch fixes the ICE by adding LTGT in vec_cmp<mode><v_cmp_result> pattern.
I also modified the original test case into a compilation one since -fno-wrapping-math
should not be used in general.
Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?  I would also need to
backport it to gcc-7-branch.

Thanks,
bin
2017-07-27  Bin Cheng  <bin.cheng@arm.com>

	PR target/81228
	* config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
	LTGT.

gcc/testsuite/ChangeLog
2017-07-27  Bin Cheng  <bin.cheng@arm.com>

	PR target/81228
	* gcc.dg/pr81228.c: New.

Comments

Richard Sandiford July 28, 2017, 11:55 a.m. UTC | #1
Bin Cheng <Bin.Cheng@arm.com> writes:
> Hi,
> This simple patch fixes the ICE by adding LTGT in
> vec_cmp<mode><v_cmp_result> pattern.
> I also modified the original test case into a compilation one since
> -fno-wrapping-math
> should not be used in general.
> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
> I would also need to
> backport it to gcc-7-branch.
>
> Thanks,
> bin
> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>
> 	PR target/81228
> 	* config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
> 	LTGT.
>
> gcc/testsuite/ChangeLog
> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>
> 	PR target/81228
> 	* gcc.dg/pr81228.c: New.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 011fcec0..9cd67a2 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2524,6 +2524,7 @@
>      case EQ:
>        comparison = gen_aarch64_cmeq<mode>;
>        break;
> +    case LTGT:
>      case UNEQ:
>      case ORDERED:
>      case UNORDERED:
> @@ -2571,6 +2572,7 @@
>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>        break;
>  
> +    case LTGT:
>      case UNEQ:
>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>  	 this result will then give us (a == b || a UNORDERED b).  */
> @@ -2578,7 +2580,8 @@
>  					 operands[2], operands[3]));
>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
> +      if (code == UNEQ)
> +	emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>        break;

AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
LTGT is handled like LT and GT rather than like UNEQ.

See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html

Thanks,
Richard
Bin.Cheng July 28, 2017, 12:07 p.m. UTC | #2
On Fri, Jul 28, 2017 at 12:55 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Bin Cheng <Bin.Cheng@arm.com> writes:
>> Hi,
>> This simple patch fixes the ICE by adding LTGT in
>> vec_cmp<mode><v_cmp_result> pattern.
>> I also modified the original test case into a compilation one since
>> -fno-wrapping-math
>> should not be used in general.
>> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
>> I would also need to
>> backport it to gcc-7-branch.
>>
>> Thanks,
>> bin
>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>
>>       PR target/81228
>>       * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>>       LTGT.
>>
>> gcc/testsuite/ChangeLog
>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>
>>       PR target/81228
>>       * gcc.dg/pr81228.c: New.
>>
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index 011fcec0..9cd67a2 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -2524,6 +2524,7 @@
>>      case EQ:
>>        comparison = gen_aarch64_cmeq<mode>;
>>        break;
>> +    case LTGT:
>>      case UNEQ:
>>      case ORDERED:
>>      case UNORDERED:
>> @@ -2571,6 +2572,7 @@
>>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>>        break;
>>
>> +    case LTGT:
>>      case UNEQ:
>>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>        this result will then give us (a == b || a UNORDERED b).  */
>> @@ -2578,7 +2580,8 @@
>>                                        operands[2], operands[3]));
>>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>> +      if (code == UNEQ)
>> +     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>        break;
>
> AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
> be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
> than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
> LTGT is handled like LT and GT rather than like UNEQ.
>
> See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html
Thanks for pointing me to this, I don't know anything about floating point here.
As for the change, the code now looks like:

    case LTGT:
    case UNEQ:
      /* We first check (a > b ||  b > a) which is !UNEQ, inverting
     this result will then give us (a == b || a UNORDERED b).  */
      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
                     operands[2], operands[3]));
      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
      if (code == UNEQ)
    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
      break;

So (a > b || b > a) is generated for LTGT which you suggested?  Here
we invert the result for UNEQ though.

Thanks,
bin
>
> Thanks,
> Richard
Richard Sandiford July 28, 2017, 2:15 p.m. UTC | #3
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Fri, Jul 28, 2017 at 12:55 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>> Hi,
>>> This simple patch fixes the ICE by adding LTGT in
>>> vec_cmp<mode><v_cmp_result> pattern.
>>> I also modified the original test case into a compilation one since
>>> -fno-wrapping-math
>>> should not be used in general.
>>> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
>>> I would also need to
>>> backport it to gcc-7-branch.
>>>
>>> Thanks,
>>> bin
>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       PR target/81228
>>>       * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>>>       LTGT.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       PR target/81228
>>>       * gcc.dg/pr81228.c: New.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>> index 011fcec0..9cd67a2 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -2524,6 +2524,7 @@
>>>      case EQ:
>>>        comparison = gen_aarch64_cmeq<mode>;
>>>        break;
>>> +    case LTGT:
>>>      case UNEQ:
>>>      case ORDERED:
>>>      case UNORDERED:
>>> @@ -2571,6 +2572,7 @@
>>>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>        break;
>>>
>>> +    case LTGT:
>>>      case UNEQ:
>>>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>        this result will then give us (a == b || a UNORDERED b).  */
>>> @@ -2578,7 +2580,8 @@
>>>                                        operands[2], operands[3]));
>>>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>> +      if (code == UNEQ)
>>> +     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>        break;
>>
>> AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
>> be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
>> than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
>> LTGT is handled like LT and GT rather than like UNEQ.
>>
>> See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html
> Thanks for pointing me to this, I don't know anything about floating point here.
> As for the change, the code now looks like:
>
>     case LTGT:
>     case UNEQ:
>       /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>      this result will then give us (a == b || a UNORDERED b).  */
>       emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>                      operands[2], operands[3]));
>       emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>       emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>       if (code == UNEQ)
>     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>       break;
>
> So (a > b || b > a) is generated for LTGT which you suggested?

Ah, yeah, I was just going off LTGT being treated as !UNEQ, but...

> Here we invert the result for UNEQ though.

...it looks like it might be the UNEQ code that's wrong.  E.g. this
test fails at -O3 and passes at -O for me:

#define _GNU_SOURCE
#include <fenv.h>

double x[16], y[16];
int res[16];

int
main (void)
{
  for (int i = 0; i < 16; ++i)
    {
      x[i] = __builtin_nan ("");
      y[i] = i;
    }
  asm volatile ("" ::: "memory");
  feclearexcept (FE_ALL_EXCEPT);
  for (int i = 0; i < 16; ++i)
    res[i] = __builtin_islessgreater (x[i], y[i]);
  asm volatile ("" ::: "memory");
  return fetestexcept (FE_ALL_EXCEPT) != 0;
}

(asm volatiles just added for paranoia, in case stuff gets optimised
away otherwise.)

But I suppose that's no reason to hold up your patch. :-)  Maybe it'd
be worth having a comment though?

Thanks,
Richard
Bin.Cheng July 28, 2017, 2:22 p.m. UTC | #4
On Fri, Jul 28, 2017 at 3:15 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Fri, Jul 28, 2017 at 12:55 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>>> Hi,
>>>> This simple patch fixes the ICE by adding LTGT in
>>>> vec_cmp<mode><v_cmp_result> pattern.
>>>> I also modified the original test case into a compilation one since
>>>> -fno-wrapping-math
>>>> should not be used in general.
>>>> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
>>>> I would also need to
>>>> backport it to gcc-7-branch.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>>>>       LTGT.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * gcc.dg/pr81228.c: New.
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>>> index 011fcec0..9cd67a2 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -2524,6 +2524,7 @@
>>>>      case EQ:
>>>>        comparison = gen_aarch64_cmeq<mode>;
>>>>        break;
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>      case ORDERED:
>>>>      case UNORDERED:
>>>> @@ -2571,6 +2572,7 @@
>>>>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>>        break;
>>>>
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>>        this result will then give us (a == b || a UNORDERED b).  */
>>>> @@ -2578,7 +2580,8 @@
>>>>                                        operands[2], operands[3]));
>>>>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>>>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>>> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>> +      if (code == UNEQ)
>>>> +     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>>        break;
>>>
>>> AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
>>> be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
>>> than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
>>> LTGT is handled like LT and GT rather than like UNEQ.
>>>
>>> See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html
>> Thanks for pointing me to this, I don't know anything about floating point here.
>> As for the change, the code now looks like:
>>
>>     case LTGT:
>>     case UNEQ:
>>       /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>      this result will then give us (a == b || a UNORDERED b).  */
>>       emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>                      operands[2], operands[3]));
>>       emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>       emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>       if (code == UNEQ)
>>     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>       break;
>>
>> So (a > b || b > a) is generated for LTGT which you suggested?
>
> Ah, yeah, I was just going off LTGT being treated as !UNEQ, but...
>
>> Here we invert the result for UNEQ though.
>
> ...it looks like it might be the UNEQ code that's wrong.  E.g. this
> test fails at -O3 and passes at -O for me:
Thanks very much for showing the issue with below example.  That part
code was refactored from old implementation the pattern is added.
>
> #define _GNU_SOURCE
> #include <fenv.h>
>
> double x[16], y[16];
> int res[16];
>
> int
> main (void)
> {
>   for (int i = 0; i < 16; ++i)
>     {
>       x[i] = __builtin_nan ("");
>       y[i] = i;
>     }
>   asm volatile ("" ::: "memory");
>   feclearexcept (FE_ALL_EXCEPT);
>   for (int i = 0; i < 16; ++i)
>     res[i] = __builtin_islessgreater (x[i], y[i]);
>   asm volatile ("" ::: "memory");
>   return fetestexcept (FE_ALL_EXCEPT) != 0;
> }
>
> (asm volatiles just added for paranoia, in case stuff gets optimised
> away otherwise.)
>
> But I suppose that's no reason to hold up your patch. :-)  Maybe it'd
> be worth having a comment though?
Given the code is wrong, I will file a bug for tracking and see if I
can fix it while I am on it.  Hopefully it won't be long thus a
comment can be saved now.

Thanks,
bin
>
> Thanks,
> Richard
Bin.Cheng Aug. 1, 2017, 1:44 p.m. UTC | #5
On Fri, Jul 28, 2017 at 3:15 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Fri, Jul 28, 2017 at 12:55 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>>> Hi,
>>>> This simple patch fixes the ICE by adding LTGT in
>>>> vec_cmp<mode><v_cmp_result> pattern.
>>>> I also modified the original test case into a compilation one since
>>>> -fno-wrapping-math
>>>> should not be used in general.
>>>> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
>>>> I would also need to
>>>> backport it to gcc-7-branch.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>>>>       LTGT.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * gcc.dg/pr81228.c: New.
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>>> index 011fcec0..9cd67a2 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -2524,6 +2524,7 @@
>>>>      case EQ:
>>>>        comparison = gen_aarch64_cmeq<mode>;
>>>>        break;
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>      case ORDERED:
>>>>      case UNORDERED:
>>>> @@ -2571,6 +2572,7 @@
>>>>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>>        break;
>>>>
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>>        this result will then give us (a == b || a UNORDERED b).  */
>>>> @@ -2578,7 +2580,8 @@
>>>>                                        operands[2], operands[3]));
>>>>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>>>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>>> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>> +      if (code == UNEQ)
>>>> +     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>>        break;
>>>
>>> AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
>>> be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
>>> than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
>>> LTGT is handled like LT and GT rather than like UNEQ.
>>>
>>> See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html
>> Thanks for pointing me to this, I don't know anything about floating point here.
>> As for the change, the code now looks like:
>>
>>     case LTGT:
>>     case UNEQ:
>>       /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>      this result will then give us (a == b || a UNORDERED b).  */
>>       emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>                      operands[2], operands[3]));
>>       emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>       emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>       if (code == UNEQ)
>>     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>       break;
>>
>> So (a > b || b > a) is generated for LTGT which you suggested?
>
> Ah, yeah, I was just going off LTGT being treated as !UNEQ, but...
>
>> Here we invert the result for UNEQ though.
>
> ...it looks like it might be the UNEQ code that's wrong.  E.g. this
> test fails at -O3 and passes at -O for me:
>
> #define _GNU_SOURCE
> #include <fenv.h>
>
> double x[16], y[16];
> int res[16];
>
> int
> main (void)
> {
>   for (int i = 0; i < 16; ++i)
>     {
>       x[i] = __builtin_nan ("");
>       y[i] = i;
>     }
>   asm volatile ("" ::: "memory");
>   feclearexcept (FE_ALL_EXCEPT);
>   for (int i = 0; i < 16; ++i)
>     res[i] = __builtin_islessgreater (x[i], y[i]);
>   asm volatile ("" ::: "memory");
>   return fetestexcept (FE_ALL_EXCEPT) != 0;
> }
>
> (asm volatiles just added for paranoia, in case stuff gets optimised
> away otherwise.)
Thanks for the test, I file PR81647 for tracking.  And this is
actually inconsistent LTGT behavior issue.  It's translated
differently w/o vectorization I think.

Thanks,
bin
>
> But I suppose that's no reason to hold up your patch. :-)  Maybe it'd
> be worth having a comment though?
>
> Thanks,
> Richard
Bin.Cheng Aug. 14, 2017, 8:55 a.m. UTC | #6
Ping.

On Fri, Jul 28, 2017 at 12:37 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This simple patch fixes the ICE by adding LTGT in vec_cmp<mode><v_cmp_result> pattern.
> I also modified the original test case into a compilation one since -fno-wrapping-math
> should not be used in general.
> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?  I would also need to
> backport it to gcc-7-branch.
>
> Thanks,
> bin
> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>
>         PR target/81228
>         * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>         LTGT.
>
> gcc/testsuite/ChangeLog
> 2017-07-27  Bin Cheng  <bin.cheng@arm.com>
>
>         PR target/81228
>         * gcc.dg/pr81228.c: New.
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 011fcec0..9cd67a2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2524,6 +2524,7 @@ 
     case EQ:
       comparison = gen_aarch64_cmeq<mode>;
       break;
+    case LTGT:
     case UNEQ:
     case ORDERED:
     case UNORDERED:
@@ -2571,6 +2572,7 @@ 
       emit_insn (comparison (operands[0], operands[2], operands[3]));
       break;
 
+    case LTGT:
     case UNEQ:
       /* We first check (a > b ||  b > a) which is !UNEQ, inverting
 	 this result will then give us (a == b || a UNORDERED b).  */
@@ -2578,7 +2580,8 @@ 
 					 operands[2], operands[3]));
       emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
       emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
-      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
+      if (code == UNEQ)
+	emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
       break;
 
     case UNORDERED:
diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c
new file mode 100644
index 0000000..3334299
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81228.c
@@ -0,0 +1,47 @@ 
+/* PR target/81228 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-trapping-math" } */
+/* { dg-options "-O3 -fno-trapping-math -mavx" { target avx_runtime } } */
+
+double s1[4], s2[4], s3[64];
+
+int
+main (void)
+{
+  int i;
+  asm volatile ("" : : : "memory");
+  for (i = 0; i < 4; i++)
+    s3[0 * 4 + i] = __builtin_isgreater (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[1 * 4 + i] = (!__builtin_isgreater (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[2 * 4 + i] = __builtin_isgreaterequal (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[3 * 4 + i] = (!__builtin_isgreaterequal (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[4 * 4 + i] = __builtin_isless (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[5 * 4 + i] = (!__builtin_isless (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[6 * 4 + i] = __builtin_islessequal (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[7 * 4 + i] = (!__builtin_islessequal (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[8 * 4 + i] = __builtin_islessgreater (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[9 * 4 + i] = (!__builtin_islessgreater (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[10 * 4 + i] = __builtin_isunordered (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[11 * 4 + i] = (!__builtin_isunordered (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[12 * 4 + i] = s1[i] > s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[13 * 4 + i] = s1[i] >= s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[14 * 4 + i] = s1[i] < s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+    s3[15 * 4 + i] = s1[i] <= s2[i] ? -1.0 : 0.0;
+  asm volatile ("" : : : "memory");
+  return 0;
+}