diff mbox series

Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

Message ID CAMZc-bz9WQpu55p_CYutAWAyxc6tE4gmZVoCzFiiuiLhsSy9Jw@mail.gmail.com
State New
Headers show
Series Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444) | expand

Commit Message

Hongtao Liu May 7, 2019, 5:38 a.m. UTC
Hi Uros and GCC:
  This patch is to fix ix86_expand_sse_comi_round whose implementation
was not correct.
  New implentation aligns with _mm_cmp_round_s[sd]_mask.

Bootstrap and regression tests for x86 is fine.
Ok for trunk?


ChangeLog:
gcc/
   * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
   Modified, original implementation isn't correct.

gcc/testsuite
   * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
   * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.

Comments

Jakub Jelinek May 7, 2019, 7:03 a.m. UTC | #1
On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote:
> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> +	    Hongtao Liu  <hongtao.liu@intel.com>
> +
> +	PR Target/89750
> +	PR Target/86444

target, not Target.  Various people handle these in various scripts,
so it is better to use consistency and exact spelling of the categories.

	Jakub
Hongtao Liu May 7, 2019, 7:31 a.m. UTC | #2
On Tue, May 7, 2019 at 3:03 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote:
> > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > +         Hongtao Liu  <hongtao.liu@intel.com>
> > +
> > +     PR Target/89750
> > +     PR Target/86444
>
> target, not Target.  Various people handle these in various scripts,
> so it is better to use consistency and exact spelling of the categories.
>
>         Jakub

Ok, Thank you for your reminding.
Hongtao Liu May 8, 2019, 4:29 a.m. UTC | #3
Any other comments, i'll merge to trunk?

On Tue, May 7, 2019 at 3:31 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, May 7, 2019 at 3:03 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote:
> > > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > > +         Hongtao Liu  <hongtao.liu@intel.com>
> > > +
> > > +     PR Target/89750
> > > +     PR Target/86444
> >
> > target, not Target.  Various people handle these in various scripts,
> > so it is better to use consistency and exact spelling of the categories.
> >
> >         Jakub
>
> Ok, Thank you for your reminding.
>
> --
> BR,
> Hongtao
Uros Bizjak May 8, 2019, 6:47 a.m. UTC | #4
On Wed, May 8, 2019 at 6:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Any other comments, i'll merge to trunk?

You are not allowed to merge patches without explicit approval from
the relevant maintainer. Please see [1].

However, there will be a problem with the approval of your patch.
AVX512 stuff has its own maintainer, which unfortunatelly went
missing. I don't plan to be a backup maintainer of AVX512 stuff, so
the patch may be left unreviewed for a long time. AVX512 stuff is very
low (or better, nowhere) on my priority list, so there is little point
on pinging me for a review of patches involving AVX512 functionality.

[1] https://gcc.gnu.org/svnwrite.html#policies

Uros.

>
> On Tue, May 7, 2019 at 3:31 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, May 7, 2019 at 3:03 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote:
> > > > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > > > +         Hongtao Liu  <hongtao.liu@intel.com>
> > > > +
> > > > +     PR Target/89750
> > > > +     PR Target/86444
> > >
> > > target, not Target.  Various people handle these in various scripts,
> > > so it is better to use consistency and exact spelling of the categories.
> > >
> > >         Jakub
> >
> > Ok, Thank you for your reminding.
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
Jeff Law May 9, 2019, 7:55 p.m. UTC | #5
On 5/6/19 11:38 PM, Hongtao Liu wrote:
> Hi Uros and GCC:
>   This patch is to fix ix86_expand_sse_comi_round whose implementation
> was not correct.
>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> 
> Bootstrap and regression tests for x86 is fine.
> Ok for trunk?
> 
> 
> ChangeLog:
> gcc/
>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>    Modified, original implementation isn't correct.
> 
> gcc/testsuite
>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
So you'll have to bear with me, I'm not really familiar with this code,
but in the absence of a maintainer I'll try to work through it.


> 
> -- BR, Hongtao
> 
> 
> 0001-Fix-ix86_expand_sse_comi_round.patch
> 
> Index: gcc/ChangeLog
> ===================================================================
> --- gcc/ChangeLog	(revision 270933)
> +++ gcc/ChangeLog	(working copy)
> @@ -1,3 +1,11 @@
> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> +	    Hongtao Liu  <hongtao.liu@intel.com>
> +
> +	PR Target/89750
> +	PR Target/86444
> +	* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> +	Modified, original implementation isn't correct.
> +
>  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
>  
>  	* config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> Index: gcc/config/i386/i386-expand.c
> ===================================================================
> --- gcc/config/i386/i386-expand.c	(revision 270933)
> +++ gcc/config/i386/i386-expand.c	(working copy)
> @@ -9853,18 +9853,24 @@
>    const struct insn_data_d *insn_p = &insn_data[icode];
>    machine_mode mode0 = insn_p->operand[0].mode;
>    machine_mode mode1 = insn_p->operand[1].mode;
> -  enum rtx_code comparison = UNEQ;
> -  bool need_ucomi = false;
>  
>    /* See avxintrin.h for values.  */
> -  enum rtx_code comi_comparisons[32] =
> +  static const enum rtx_code comparisons[32] =
So I assume the comment refers to the _CMP_* #defines in avxintrin.h?


>      {
> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
>      };

For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
seems right, but you're using EQ.  Can you double-check this?  If it's
wrong, then please make sure we cover this case with a test.




> @@ -9932,11 +10021,37 @@
>      }
>  
>    emit_insn (pat);
> +
> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> +     correctly?  */
> +  if (GET_MODE (set_dst) != mode)
> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
This looks worrisome, even without the cryptic comment.  I don't think
you can just blindly change the mode like that.  Unless you happen to
know that the only things you test in the new mode were set in precisely
the same way as the old mode.

Jeff
Hongtao Liu May 10, 2019, 4:54 a.m. UTC | #6
On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
>
> On 5/6/19 11:38 PM, Hongtao Liu wrote:
> > Hi Uros and GCC:
> >   This patch is to fix ix86_expand_sse_comi_round whose implementation
> > was not correct.
> >   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> >
> > Bootstrap and regression tests for x86 is fine.
> > Ok for trunk?
> >
> >
> > ChangeLog:
> > gcc/
> >    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >    Modified, original implementation isn't correct.
> >
> > gcc/testsuite
> >    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> >    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> So you'll have to bear with me, I'm not really familiar with this code,
> but in the absence of a maintainer I'll try to work through it.
>
>
> >
> > -- BR, Hongtao
> >
> >
> > 0001-Fix-ix86_expand_sse_comi_round.patch
> >
> > Index: gcc/ChangeLog
> > ===================================================================
> > --- gcc/ChangeLog     (revision 270933)
> > +++ gcc/ChangeLog     (working copy)
> > @@ -1,3 +1,11 @@
> > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > +         Hongtao Liu  <hongtao.liu@intel.com>
> > +
> > +     PR Target/89750
> > +     PR Target/86444
> > +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > +     Modified, original implementation isn't correct.
> > +
> >  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
> >
> >       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> > Index: gcc/config/i386/i386-expand.c
> > ===================================================================
> > --- gcc/config/i386/i386-expand.c     (revision 270933)
> > +++ gcc/config/i386/i386-expand.c     (working copy)
> > @@ -9853,18 +9853,24 @@
> >    const struct insn_data_d *insn_p = &insn_data[icode];
> >    machine_mode mode0 = insn_p->operand[0].mode;
> >    machine_mode mode1 = insn_p->operand[1].mode;
> > -  enum rtx_code comparison = UNEQ;
> > -  bool need_ucomi = false;
> >
> >    /* See avxintrin.h for values.  */
> > -  enum rtx_code comi_comparisons[32] =
> > +  static const enum rtx_code comparisons[32] =
> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
>
  Yes.
>
> >      {
> > -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> > -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> > -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> > +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> > +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> >      };
>
> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> seems right, but you're using EQ.  Can you double-check this?  If it's
> wrong, then please make sure we cover this case with a test.
>
Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
UNEQ and EQ behave differently when either operand is NAN, besides
they're the same.
Since NAN operands are handled separtely, so EQ/UNEQ makes no
difference, That why this passes cover tests.
I'll correct it.
>
>
>
> > @@ -9932,11 +10021,37 @@
> >      }
> >
> >    emit_insn (pat);
> > +
> > +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> > +     correctly?  */
> > +  if (GET_MODE (set_dst) != mode)
> > +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> This looks worrisome, even without the cryptic comment.  I don't think
> you can just blindly change the mode like that.  Unless you happen to
> know that the only things you test in the new mode were set in precisely
> the same way as the old mode.
>
Modified as:
+  /* NB: Set CCFPmode and check a different CCmode.  */
+  if (GET_MODE (set_dst) != mode)
+    set_dst = gen_rtx_REG (mode, FLAGS_REG);

> Jeff
Hongtao Liu May 10, 2019, 5 a.m. UTC | #7
On Fri, May 10, 2019 at 12:54 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
> >
> > On 5/6/19 11:38 PM, Hongtao Liu wrote:
> > > Hi Uros and GCC:
> > >   This patch is to fix ix86_expand_sse_comi_round whose implementation
> > > was not correct.
> > >   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> > >
> > > Bootstrap and regression tests for x86 is fine.
> > > Ok for trunk?
> > >
> > >
> > > ChangeLog:
> > > gcc/
> > >    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > >    Modified, original implementation isn't correct.
> > >
> > > gcc/testsuite
> > >    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> > >    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> > So you'll have to bear with me, I'm not really familiar with this code,
> > but in the absence of a maintainer I'll try to work through it.
> >
> >
> > >
> > > -- BR, Hongtao
> > >
> > >
> > > 0001-Fix-ix86_expand_sse_comi_round.patch
> > >
> > > Index: gcc/ChangeLog
> > > ===================================================================
> > > --- gcc/ChangeLog     (revision 270933)
> > > +++ gcc/ChangeLog     (working copy)
> > > @@ -1,3 +1,11 @@
> > > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > > +         Hongtao Liu  <hongtao.liu@intel.com>
> > > +
> > > +     PR Target/89750
> > > +     PR Target/86444
> > > +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > > +     Modified, original implementation isn't correct.
> > > +
> > >  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
> > >
> > >       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> > > Index: gcc/config/i386/i386-expand.c
> > > ===================================================================
> > > --- gcc/config/i386/i386-expand.c     (revision 270933)
> > > +++ gcc/config/i386/i386-expand.c     (working copy)
> > > @@ -9853,18 +9853,24 @@
> > >    const struct insn_data_d *insn_p = &insn_data[icode];
> > >    machine_mode mode0 = insn_p->operand[0].mode;
> > >    machine_mode mode1 = insn_p->operand[1].mode;
> > > -  enum rtx_code comparison = UNEQ;
> > > -  bool need_ucomi = false;
> > >
> > >    /* See avxintrin.h for values.  */
> > > -  enum rtx_code comi_comparisons[32] =
> > > +  static const enum rtx_code comparisons[32] =
> > So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
> >
>   Yes.
> >
> > >      {
> > > -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> > > -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> > > -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> > > +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > > +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> > > +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > > +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> > >      };
> >
> > For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> > seems right, but you're using EQ.  Can you double-check this?  If it's
> > wrong, then please make sure we cover this case with a test.
> >
> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
> UNEQ and EQ behave differently when either operand is NAN, besides
> they're the same.
> Since NAN operands are handled separtely, so EQ/UNEQ makes no
> difference, That why this passes cover tests.
> I'll correct it.
> >
> >
> >
> > > @@ -9932,11 +10021,37 @@
> > >      }
> > >
> > >    emit_insn (pat);
> > > +
> > > +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> > > +     correctly?  */
> > > +  if (GET_MODE (set_dst) != mode)
> > > +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> > This looks worrisome, even without the cryptic comment.  I don't think
> > you can just blindly change the mode like that.  Unless you happen to
> > know that the only things you test in the new mode were set in precisely
> > the same way as the old mode.
> >
> Modified as:
> +  /* NB: Set CCFPmode and check a different CCmode.  */
> +  if (GET_MODE (set_dst) != mode)
> +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
>
> > Jeff
>
>
>
> --
> BR,
> Hongtao

Update patch.
Jeff Law May 29, 2019, 7:23 p.m. UTC | #8
On 5/9/19 10:54 PM, Hongtao Liu wrote:
> On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 5/6/19 11:38 PM, Hongtao Liu wrote:
>>> Hi Uros and GCC:
>>>   This patch is to fix ix86_expand_sse_comi_round whose implementation
>>> was not correct.
>>>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
>>>
>>> Bootstrap and regression tests for x86 is fine.
>>> Ok for trunk?
>>>
>>>
>>> ChangeLog:
>>> gcc/
>>>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>>    Modified, original implementation isn't correct.
>>>
>>> gcc/testsuite
>>>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
>>>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
>> So you'll have to bear with me, I'm not really familiar with this code,
>> but in the absence of a maintainer I'll try to work through it.
>>
>>
>>>
>>> -- BR, Hongtao
>>>
>>>
>>> 0001-Fix-ix86_expand_sse_comi_round.patch
>>>
>>> Index: gcc/ChangeLog
>>> ===================================================================
>>> --- gcc/ChangeLog     (revision 270933)
>>> +++ gcc/ChangeLog     (working copy)
>>> @@ -1,3 +1,11 @@
>>> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
>>> +         Hongtao Liu  <hongtao.liu@intel.com>
>>> +
>>> +     PR Target/89750
>>> +     PR Target/86444
>>> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>> +     Modified, original implementation isn't correct.
>>> +
>>>  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
>>>
>>>       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
>>> Index: gcc/config/i386/i386-expand.c
>>> ===================================================================
>>> --- gcc/config/i386/i386-expand.c     (revision 270933)
>>> +++ gcc/config/i386/i386-expand.c     (working copy)
>>> @@ -9853,18 +9853,24 @@
>>>    const struct insn_data_d *insn_p = &insn_data[icode];
>>>    machine_mode mode0 = insn_p->operand[0].mode;
>>>    machine_mode mode1 = insn_p->operand[1].mode;
>>> -  enum rtx_code comparison = UNEQ;
>>> -  bool need_ucomi = false;
>>>
>>>    /* See avxintrin.h for values.  */
>>> -  enum rtx_code comi_comparisons[32] =
>>> +  static const enum rtx_code comparisons[32] =
>> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
>>
>   Yes.
>>
>>>      {
>>> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
>>> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
>>> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
>>>      };
>>
>> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
>> seems right, but you're using EQ.  Can you double-check this?  If it's
>> wrong, then please make sure we cover this case with a test.
>>
> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
> UNEQ and EQ behave differently when either operand is NAN, besides
> they're the same.
> Since NAN operands are handled separtely, so EQ/UNEQ makes no
> difference, That why this passes cover tests.
> I'll correct it.
Ah.  Thanks.  FWIW my approach was to walk through the _CMP_* defines in
avxintrin.h and map that to what I thought the comparison ought to be.
Then I reviewed my result against your patch.  I got a couple wrong, but
could easily see my mistake.  The only one I couldn't reconcile was the
CMP_EQ_UQ.  Knowing the NaNs are handled separately makes it clear.
Thanks gain.



>>
>>
>>
>>> @@ -9932,11 +10021,37 @@
>>>      }
>>>
>>>    emit_insn (pat);
>>> +
>>> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
>>> +     correctly?  */
>>> +  if (GET_MODE (set_dst) != mode)
>>> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
>> This looks worrisome, even without the cryptic comment.  I don't think
>> you can just blindly change the mode like that.  Unless you happen to
>> know that the only things you test in the new mode were set in precisely
>> the same way as the old mode.
>>
> Modified as:
> +  /* NB: Set CCFPmode and check a different CCmode.  */
> +  if (GET_MODE (set_dst) != mode)
> +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
That might actually be worse.  The mode carries semantic information
about where to find the various condition codes within the flags
register and which condition codes are valid.  The register number
determines which (of possibly many) flags registers we are querying.

Thus if the mode of SET_DEST is not the same as MODE, then there is a
mismatch between the point where the condition codes were set and where
we want to use them.

That can only be safe is the condition codes set in the new mode are a
strict subset of the condition codes set in the old mode and they're
found in the same place within the same flags register.

Maybe a simple example would help.  Consider a fairly standard target
with arithmetic insns that set ZNV and logicals that set ZN and clobber V.

Arithmetic insns will have a set of the flags register with one mode
(CC_ZNV for example) and logicals would use a different mode for the
flags register (CC_ZN for example).

Now consider if the architecture has memory bit operations.  Ie, you can
query the state of a bit in memory.  Unfortunately the result is stored
in the C bit of the condition code register (rather than in the Z bit
like you might hope).  Yes, this comes from a real world example.

During compare elimination, the compiler will try to eliminate a compare
insn and instead use the condition codes set by a prior insn such as an
arithmetic, logical, bit test, etc.  The mode of the flags register will
indicate which bits are valid and which bits can be tested.  So for our
hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C
for arithmetic, logical and bit testing.

If we are using the result of a prior bit test to eliminate a compare,
the mode of the flags register on the bit test would be CC_Z_N_C as
would the mode of the conditional branch.

We can not blindly change the mode to CC_ZNV because the bit we want is
not in the ZNV  bits of the flag register, it's actually in the C bit.

Things might be even more complicated if there's a separate register for
floating point condition codes or multiple integer condition code registers.

Hopefully that makes things clear in terms of how the mode of the flags
register is used as well as the register number.  You'd have to dig into
the x86 port's use of the flags register to know what changes are safe
and which aren't -- and that's the analysis I'd need to help move this
along.

Jeff
Hongtao Liu May 30, 2019, 8:53 a.m. UTC | #9
On Thu, May 30, 2019 at 3:23 AM Jeff Law <law@redhat.com> wrote:
>
> On 5/9/19 10:54 PM, Hongtao Liu wrote:
> > On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
> >>
> >> On 5/6/19 11:38 PM, Hongtao Liu wrote:
> >>> Hi Uros and GCC:
> >>>   This patch is to fix ix86_expand_sse_comi_round whose implementation
> >>> was not correct.
> >>>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> >>>
> >>> Bootstrap and regression tests for x86 is fine.
> >>> Ok for trunk?
> >>>
> >>>
> >>> ChangeLog:
> >>> gcc/
> >>>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >>>    Modified, original implementation isn't correct.
> >>>
> >>> gcc/testsuite
> >>>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> >>>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> >> So you'll have to bear with me, I'm not really familiar with this code,
> >> but in the absence of a maintainer I'll try to work through it.
> >>
> >>
> >>>
> >>> -- BR, Hongtao
> >>>
> >>>
> >>> 0001-Fix-ix86_expand_sse_comi_round.patch
> >>>
> >>> Index: gcc/ChangeLog
> >>> ===================================================================
> >>> --- gcc/ChangeLog     (revision 270933)
> >>> +++ gcc/ChangeLog     (working copy)
> >>> @@ -1,3 +1,11 @@
> >>> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> >>> +         Hongtao Liu  <hongtao.liu@intel.com>
> >>> +
> >>> +     PR Target/89750
> >>> +     PR Target/86444
> >>> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >>> +     Modified, original implementation isn't correct.
> >>> +
> >>>  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
> >>>
> >>>       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> >>> Index: gcc/config/i386/i386-expand.c
> >>> ===================================================================
> >>> --- gcc/config/i386/i386-expand.c     (revision 270933)
> >>> +++ gcc/config/i386/i386-expand.c     (working copy)
> >>> @@ -9853,18 +9853,24 @@
> >>>    const struct insn_data_d *insn_p = &insn_data[icode];
> >>>    machine_mode mode0 = insn_p->operand[0].mode;
> >>>    machine_mode mode1 = insn_p->operand[1].mode;
> >>> -  enum rtx_code comparison = UNEQ;
> >>> -  bool need_ucomi = false;
> >>>
> >>>    /* See avxintrin.h for values.  */
> >>> -  enum rtx_code comi_comparisons[32] =
> >>> +  static const enum rtx_code comparisons[32] =
> >> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
> >>
> >   Yes.
> >>
> >>>      {
> >>> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> >>> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> >>> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> >>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> >>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> >>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> >>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> >>>      };
> >>
> >> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> >> seems right, but you're using EQ.  Can you double-check this?  If it's
> >> wrong, then please make sure we cover this case with a test.
> >>
> > Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
> > UNEQ and EQ behave differently when either operand is NAN, besides
> > they're the same.
> > Since NAN operands are handled separtely, so EQ/UNEQ makes no
> > difference, That why this passes cover tests.
> > I'll correct it.
> Ah.  Thanks.  FWIW my approach was to walk through the _CMP_* defines in
> avxintrin.h and map that to what I thought the comparison ought to be.
> Then I reviewed my result against your patch.  I got a couple wrong, but
> could easily see my mistake.  The only one I couldn't reconcile was the
> CMP_EQ_UQ.  Knowing the NaNs are handled separately makes it clear.
> Thanks gain.
>
>
>
> >>
> >>
> >>
> >>> @@ -9932,11 +10021,37 @@
> >>>      }
> >>>
> >>>    emit_insn (pat);
> >>> +
> >>> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> >>> +     correctly?  */
> >>> +  if (GET_MODE (set_dst) != mode)
> >>> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> >> This looks worrisome, even without the cryptic comment.  I don't think
> >> you can just blindly change the mode like that.  Unless you happen to
> >> know that the only things you test in the new mode were set in precisely
> >> the same way as the old mode.
> >>
> > Modified as:
> > +  /* NB: Set CCFPmode and check a different CCmode.  */
> > +  if (GET_MODE (set_dst) != mode)
> > +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
> That might actually be worse.  The mode carries semantic information
> about where to find the various condition codes within the flags
> register and which condition codes are valid.  The register number
> determines which (of possibly many) flags registers we are querying.
>
> Thus if the mode of SET_DEST is not the same as MODE, then there is a
> mismatch between the point where the condition codes were set and where
> we want to use them.
>
> That can only be safe is the condition codes set in the new mode are a
> strict subset of the condition codes set in the old mode and they're
> found in the same place within the same flags register.
>
vcomiss/vcomisd will set ZF,PF,CF,OF,SF,AF flags in  FLAGS_REG which is treated
as CCFPmode. As you mentioned, It's safe to use subset of the
condition codes in a
new mode,  in the same FLAGS_REG,  so I've modified as

1. Use CCZmode instead of CCmode.

+    case EQ:
+      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
+ _CMP_EQ_OQ/_CMP_EQ_OS.  */
+      check_unordered = true;
+      mode = CCZmode;
+      break;
+    case NE:
+      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
+ _CMP_NEQ_UQ/_CMP_NEQ_US.  */
+      gcc_assert (!ordered);
+      check_unordered = true;
+      mode = CCZmode;
+      const_val = const1_rtx;

2. I've double checked that condition codes in new mode must be subset
of that in old mode.
Also gcc_assert is added to guard.

+  /* NB: Set CCFPmode and check a different CCmode which is in subset
+     of CCFPmode.  */
+  if (GET_MODE (set_dst) != mode)
+    {
+      gcc_assert (mode == CCAmode || mode == CCCmode
+   || mode == CCOmode || mode == CCPmode
+   || mode == CCSmode || mode == CCZmode);
+      set_dst = gen_rtx_REG (mode, FLAGS_REG);
+    }
+


> Maybe a simple example would help.  Consider a fairly standard target
> with arithmetic insns that set ZNV and logicals that set ZN and clobber V.
>
> Arithmetic insns will have a set of the flags register with one mode
> (CC_ZNV for example) and logicals would use a different mode for the
> flags register (CC_ZN for example).
>
> Now consider if the architecture has memory bit operations.  Ie, you can
> query the state of a bit in memory.  Unfortunately the result is stored
> in the C bit of the condition code register (rather than in the Z bit
> like you might hope).  Yes, this comes from a real world example.
>
> During compare elimination, the compiler will try to eliminate a compare
> insn and instead use the condition codes set by a prior insn such as an
> arithmetic, logical, bit test, etc.  The mode of the flags register will
> indicate which bits are valid and which bits can be tested.  So for our
> hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C
> for arithmetic, logical and bit testing.
>
> If we are using the result of a prior bit test to eliminate a compare,
> the mode of the flags register on the bit test would be CC_Z_N_C as
> would the mode of the conditional branch.
>
> We can not blindly change the mode to CC_ZNV because the bit we want is
> not in the ZNV  bits of the flag register, it's actually in the C bit.
>
> Things might be even more complicated if there's a separate register for
> floating point condition codes or multiple integer condition code registers.
>
> Hopefully that makes things clear in terms of how the mode of the flags
> register is used as well as the register number.  You'd have to dig into
> the x86 port's use of the flags register to know what changes are safe
> and which aren't -- and that's the analysis I'd need to help move this
> along.
>
> Jeff
>
Thanks a lot for your detailed explanation, It helps me a lot.

Here is updated patch.
Jeff Law May 31, 2019, 10:08 p.m. UTC | #10
On 5/30/19 2:53 AM, Hongtao Liu wrote:
> On Thu, May 30, 2019 at 3:23 AM Jeff Law <law@redhat.com> wrote:
>> On 5/9/19 10:54 PM, Hongtao Liu wrote:
>>> On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
>>>> On 5/6/19 11:38 PM, Hongtao Liu wrote:
>>>>> Hi Uros and GCC:
>>>>>   This patch is to fix ix86_expand_sse_comi_round whose implementation
>>>>> was not correct.
>>>>>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
>>>>>
>>>>> Bootstrap and regression tests for x86 is fine.
>>>>> Ok for trunk?
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> gcc/
>>>>>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>>>>    Modified, original implementation isn't correct.
>>>>>
>>>>> gcc/testsuite
>>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
>>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
>>>> So you'll have to bear with me, I'm not really familiar with this code,
>>>> but in the absence of a maintainer I'll try to work through it.
>>>>
>>>>
>>>>> -- BR, Hongtao
>>>>>
>>>>>
>>>>> 0001-Fix-ix86_expand_sse_comi_round.patch
>>>>>
>>>>> Index: gcc/ChangeLog
>>>>> ===================================================================
>>>>> --- gcc/ChangeLog     (revision 270933)
>>>>> +++ gcc/ChangeLog     (working copy)
>>>>> @@ -1,3 +1,11 @@
>>>>> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
>>>>> +         Hongtao Liu  <hongtao.liu@intel.com>
>>>>> +
>>>>> +     PR Target/89750
>>>>> +     PR Target/86444
>>>>> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>>>> +     Modified, original implementation isn't correct.
>>>>> +
>>>>>  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
>>>>>
>>>>>       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
>>>>> Index: gcc/config/i386/i386-expand.c
>>>>> ===================================================================
>>>>> --- gcc/config/i386/i386-expand.c     (revision 270933)
>>>>> +++ gcc/config/i386/i386-expand.c     (working copy)
>>>>> @@ -9853,18 +9853,24 @@
>>>>>    const struct insn_data_d *insn_p = &insn_data[icode];
>>>>>    machine_mode mode0 = insn_p->operand[0].mode;
>>>>>    machine_mode mode1 = insn_p->operand[1].mode;
>>>>> -  enum rtx_code comparison = UNEQ;
>>>>> -  bool need_ucomi = false;
>>>>>
>>>>>    /* See avxintrin.h for values.  */
>>>>> -  enum rtx_code comi_comparisons[32] =
>>>>> +  static const enum rtx_code comparisons[32] =
>>>> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
>>>>
>>>   Yes.
>>>>>      {
>>>>> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
>>>>> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
>>>>> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
>>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
>>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
>>>>>      };
>>>> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
>>>> seems right, but you're using EQ.  Can you double-check this?  If it's
>>>> wrong, then please make sure we cover this case with a test.
>>>>
>>> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
>>> UNEQ and EQ behave differently when either operand is NAN, besides
>>> they're the same.
>>> Since NAN operands are handled separtely, so EQ/UNEQ makes no
>>> difference, That why this passes cover tests.
>>> I'll correct it.
>> Ah.  Thanks.  FWIW my approach was to walk through the _CMP_* defines in
>> avxintrin.h and map that to what I thought the comparison ought to be.
>> Then I reviewed my result against your patch.  I got a couple wrong, but
>> could easily see my mistake.  The only one I couldn't reconcile was the
>> CMP_EQ_UQ.  Knowing the NaNs are handled separately makes it clear.
>> Thanks gain.
>>
>>
>>
>>>>
>>>>
>>>>> @@ -9932,11 +10021,37 @@
>>>>>      }
>>>>>
>>>>>    emit_insn (pat);
>>>>> +
>>>>> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
>>>>> +     correctly?  */
>>>>> +  if (GET_MODE (set_dst) != mode)
>>>>> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
>>>> This looks worrisome, even without the cryptic comment.  I don't think
>>>> you can just blindly change the mode like that.  Unless you happen to
>>>> know that the only things you test in the new mode were set in precisely
>>>> the same way as the old mode.
>>>>
>>> Modified as:
>>> +  /* NB: Set CCFPmode and check a different CCmode.  */
>>> +  if (GET_MODE (set_dst) != mode)
>>> +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
>> That might actually be worse.  The mode carries semantic information
>> about where to find the various condition codes within the flags
>> register and which condition codes are valid.  The register number
>> determines which (of possibly many) flags registers we are querying.
>>
>> Thus if the mode of SET_DEST is not the same as MODE, then there is a
>> mismatch between the point where the condition codes were set and where
>> we want to use them.
>>
>> That can only be safe is the condition codes set in the new mode are a
>> strict subset of the condition codes set in the old mode and they're
>> found in the same place within the same flags register.
>>
> vcomiss/vcomisd will set ZF,PF,CF,OF,SF,AF flags in  FLAGS_REG which is treated
> as CCFPmode. As you mentioned, It's safe to use subset of the
> condition codes in a
> new mode,  in the same FLAGS_REG,  so I've modified as
> 
> 1. Use CCZmode instead of CCmode.
> 
> +    case EQ:
> +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> + _CMP_EQ_OQ/_CMP_EQ_OS.  */
> +      check_unordered = true;
> +      mode = CCZmode;
> +      break;
> +    case NE:
> +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> + _CMP_NEQ_UQ/_CMP_NEQ_US.  */
> +      gcc_assert (!ordered);
> +      check_unordered = true;
> +      mode = CCZmode;
> +      const_val = const1_rtx;
> 
> 2. I've double checked that condition codes in new mode must be subset
> of that in old mode.
> Also gcc_assert is added to guard.
> 
> +  /* NB: Set CCFPmode and check a different CCmode which is in subset
> +     of CCFPmode.  */
> +  if (GET_MODE (set_dst) != mode)
> +    {
> +      gcc_assert (mode == CCAmode || mode == CCCmode
> +   || mode == CCOmode || mode == CCPmode
> +   || mode == CCSmode || mode == CCZmode);
> +      set_dst = gen_rtx_REG (mode, FLAGS_REG);
> +    }
> +
> 
> 
>> Maybe a simple example would help.  Consider a fairly standard target
>> with arithmetic insns that set ZNV and logicals that set ZN and clobber V.
>>
>> Arithmetic insns will have a set of the flags register with one mode
>> (CC_ZNV for example) and logicals would use a different mode for the
>> flags register (CC_ZN for example).
>>
>> Now consider if the architecture has memory bit operations.  Ie, you can
>> query the state of a bit in memory.  Unfortunately the result is stored
>> in the C bit of the condition code register (rather than in the Z bit
>> like you might hope).  Yes, this comes from a real world example.
>>
>> During compare elimination, the compiler will try to eliminate a compare
>> insn and instead use the condition codes set by a prior insn such as an
>> arithmetic, logical, bit test, etc.  The mode of the flags register will
>> indicate which bits are valid and which bits can be tested.  So for our
>> hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C
>> for arithmetic, logical and bit testing.
>>
>> If we are using the result of a prior bit test to eliminate a compare,
>> the mode of the flags register on the bit test would be CC_Z_N_C as
>> would the mode of the conditional branch.
>>
>> We can not blindly change the mode to CC_ZNV because the bit we want is
>> not in the ZNV  bits of the flag register, it's actually in the C bit.
>>
>> Things might be even more complicated if there's a separate register for
>> floating point condition codes or multiple integer condition code registers.
>>
>> Hopefully that makes things clear in terms of how the mode of the flags
>> register is used as well as the register number.  You'd have to dig into
>> the x86 port's use of the flags register to know what changes are safe
>> and which aren't -- and that's the analysis I'd need to help move this
>> along.
>>
>> Jeff
>>
> Thanks a lot for your detailed explanation, It helps me a lot.
> 
> Here is updated patch.
> -- BR, Hongtao
> 
> 
> 0001-Fix-ix86_expand_sse_comi_round_v3.patch
> 
> Index: gcc/ChangeLog
> ===================================================================
> --- gcc/ChangeLog	(revision 271761)
> +++ gcc/ChangeLog	(working copy)
> @@ -2372,6 +2372,14 @@
>  	* tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter
>  	detection.
>  
> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> +	    Hongtao Liu  <hongtao.liu@intel.com>
> +
> +	PR target/89750
> +	PR target/86444
> +	* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> +	Modified, original implementation isn't correct.
OK for the trunk.

Thanks for your patience,
jeff
Hongtao Liu June 3, 2019, 2:24 a.m. UTC | #11
On Sat, Jun 1, 2019 at 6:08 AM Jeff Law <law@redhat.com> wrote:
>
> On 5/30/19 2:53 AM, Hongtao Liu wrote:
> > On Thu, May 30, 2019 at 3:23 AM Jeff Law <law@redhat.com> wrote:
> >> On 5/9/19 10:54 PM, Hongtao Liu wrote:
> >>> On Fri, May 10, 2019 at 3:55 AM Jeff Law <law@redhat.com> wrote:
> >>>> On 5/6/19 11:38 PM, Hongtao Liu wrote:
> >>>>> Hi Uros and GCC:
> >>>>>   This patch is to fix ix86_expand_sse_comi_round whose implementation
> >>>>> was not correct.
> >>>>>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> >>>>>
> >>>>> Bootstrap and regression tests for x86 is fine.
> >>>>> Ok for trunk?
> >>>>>
> >>>>>
> >>>>> ChangeLog:
> >>>>> gcc/
> >>>>>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >>>>>    Modified, original implementation isn't correct.
> >>>>>
> >>>>> gcc/testsuite
> >>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> >>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> >>>> So you'll have to bear with me, I'm not really familiar with this code,
> >>>> but in the absence of a maintainer I'll try to work through it.
> >>>>
> >>>>
> >>>>> -- BR, Hongtao
> >>>>>
> >>>>>
> >>>>> 0001-Fix-ix86_expand_sse_comi_round.patch
> >>>>>
> >>>>> Index: gcc/ChangeLog
> >>>>> ===================================================================
> >>>>> --- gcc/ChangeLog     (revision 270933)
> >>>>> +++ gcc/ChangeLog     (working copy)
> >>>>> @@ -1,3 +1,11 @@
> >>>>> +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> >>>>> +         Hongtao Liu  <hongtao.liu@intel.com>
> >>>>> +
> >>>>> +     PR Target/89750
> >>>>> +     PR Target/86444
> >>>>> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >>>>> +     Modified, original implementation isn't correct.
> >>>>> +
> >>>>>  2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
> >>>>>
> >>>>>       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> >>>>> Index: gcc/config/i386/i386-expand.c
> >>>>> ===================================================================
> >>>>> --- gcc/config/i386/i386-expand.c     (revision 270933)
> >>>>> +++ gcc/config/i386/i386-expand.c     (working copy)
> >>>>> @@ -9853,18 +9853,24 @@
> >>>>>    const struct insn_data_d *insn_p = &insn_data[icode];
> >>>>>    machine_mode mode0 = insn_p->operand[0].mode;
> >>>>>    machine_mode mode1 = insn_p->operand[1].mode;
> >>>>> -  enum rtx_code comparison = UNEQ;
> >>>>> -  bool need_ucomi = false;
> >>>>>
> >>>>>    /* See avxintrin.h for values.  */
> >>>>> -  enum rtx_code comi_comparisons[32] =
> >>>>> +  static const enum rtx_code comparisons[32] =
> >>>> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
> >>>>
> >>>   Yes.
> >>>>>      {
> >>>>> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> >>>>> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> >>>>> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> >>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> >>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> >>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> >>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> >>>>>      };
> >>>> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> >>>> seems right, but you're using EQ.  Can you double-check this?  If it's
> >>>> wrong, then please make sure we cover this case with a test.
> >>>>
> >>> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
> >>> UNEQ and EQ behave differently when either operand is NAN, besides
> >>> they're the same.
> >>> Since NAN operands are handled separtely, so EQ/UNEQ makes no
> >>> difference, That why this passes cover tests.
> >>> I'll correct it.
> >> Ah.  Thanks.  FWIW my approach was to walk through the _CMP_* defines in
> >> avxintrin.h and map that to what I thought the comparison ought to be.
> >> Then I reviewed my result against your patch.  I got a couple wrong, but
> >> could easily see my mistake.  The only one I couldn't reconcile was the
> >> CMP_EQ_UQ.  Knowing the NaNs are handled separately makes it clear.
> >> Thanks gain.
> >>
> >>
> >>
> >>>>
> >>>>
> >>>>> @@ -9932,11 +10021,37 @@
> >>>>>      }
> >>>>>
> >>>>>    emit_insn (pat);
> >>>>> +
> >>>>> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> >>>>> +     correctly?  */
> >>>>> +  if (GET_MODE (set_dst) != mode)
> >>>>> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> >>>> This looks worrisome, even without the cryptic comment.  I don't think
> >>>> you can just blindly change the mode like that.  Unless you happen to
> >>>> know that the only things you test in the new mode were set in precisely
> >>>> the same way as the old mode.
> >>>>
> >>> Modified as:
> >>> +  /* NB: Set CCFPmode and check a different CCmode.  */
> >>> +  if (GET_MODE (set_dst) != mode)
> >>> +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
> >> That might actually be worse.  The mode carries semantic information
> >> about where to find the various condition codes within the flags
> >> register and which condition codes are valid.  The register number
> >> determines which (of possibly many) flags registers we are querying.
> >>
> >> Thus if the mode of SET_DEST is not the same as MODE, then there is a
> >> mismatch between the point where the condition codes were set and where
> >> we want to use them.
> >>
> >> That can only be safe is the condition codes set in the new mode are a
> >> strict subset of the condition codes set in the old mode and they're
> >> found in the same place within the same flags register.
> >>
> > vcomiss/vcomisd will set ZF,PF,CF,OF,SF,AF flags in  FLAGS_REG which is treated
> > as CCFPmode. As you mentioned, It's safe to use subset of the
> > condition codes in a
> > new mode,  in the same FLAGS_REG,  so I've modified as
> >
> > 1. Use CCZmode instead of CCmode.
> >
> > +    case EQ:
> > +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> > + _CMP_EQ_OQ/_CMP_EQ_OS.  */
> > +      check_unordered = true;
> > +      mode = CCZmode;
> > +      break;
> > +    case NE:
> > +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> > + _CMP_NEQ_UQ/_CMP_NEQ_US.  */
> > +      gcc_assert (!ordered);
> > +      check_unordered = true;
> > +      mode = CCZmode;
> > +      const_val = const1_rtx;
> >
> > 2. I've double checked that condition codes in new mode must be subset
> > of that in old mode.
> > Also gcc_assert is added to guard.
> >
> > +  /* NB: Set CCFPmode and check a different CCmode which is in subset
> > +     of CCFPmode.  */
> > +  if (GET_MODE (set_dst) != mode)
> > +    {
> > +      gcc_assert (mode == CCAmode || mode == CCCmode
> > +   || mode == CCOmode || mode == CCPmode
> > +   || mode == CCSmode || mode == CCZmode);
> > +      set_dst = gen_rtx_REG (mode, FLAGS_REG);
> > +    }
> > +
> >
> >
> >> Maybe a simple example would help.  Consider a fairly standard target
> >> with arithmetic insns that set ZNV and logicals that set ZN and clobber V.
> >>
> >> Arithmetic insns will have a set of the flags register with one mode
> >> (CC_ZNV for example) and logicals would use a different mode for the
> >> flags register (CC_ZN for example).
> >>
> >> Now consider if the architecture has memory bit operations.  Ie, you can
> >> query the state of a bit in memory.  Unfortunately the result is stored
> >> in the C bit of the condition code register (rather than in the Z bit
> >> like you might hope).  Yes, this comes from a real world example.
> >>
> >> During compare elimination, the compiler will try to eliminate a compare
> >> insn and instead use the condition codes set by a prior insn such as an
> >> arithmetic, logical, bit test, etc.  The mode of the flags register will
> >> indicate which bits are valid and which bits can be tested.  So for our
> >> hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C
> >> for arithmetic, logical and bit testing.
> >>
> >> If we are using the result of a prior bit test to eliminate a compare,
> >> the mode of the flags register on the bit test would be CC_Z_N_C as
> >> would the mode of the conditional branch.
> >>
> >> We can not blindly change the mode to CC_ZNV because the bit we want is
> >> not in the ZNV  bits of the flag register, it's actually in the C bit.
> >>
> >> Things might be even more complicated if there's a separate register for
> >> floating point condition codes or multiple integer condition code registers.
> >>
> >> Hopefully that makes things clear in terms of how the mode of the flags
> >> register is used as well as the register number.  You'd have to dig into
> >> the x86 port's use of the flags register to know what changes are safe
> >> and which aren't -- and that's the analysis I'd need to help move this
> >> along.
> >>
> >> Jeff
> >>
> > Thanks a lot for your detailed explanation, It helps me a lot.
> >
> > Here is updated patch.
> > -- BR, Hongtao
> >
> >
> > 0001-Fix-ix86_expand_sse_comi_round_v3.patch
> >
> > Index: gcc/ChangeLog
> > ===================================================================
> > --- gcc/ChangeLog     (revision 271761)
> > +++ gcc/ChangeLog     (working copy)
> > @@ -2372,6 +2372,14 @@
> >       * tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter
> >       detection.
> >
> > +2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
> > +         Hongtao Liu  <hongtao.liu@intel.com>
> > +
> > +     PR target/89750
> > +     PR target/86444
> > +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > +     Modified, original implementation isn't correct.
> OK for the trunk.
>
> Thanks for your patience,
> jeff

Author: liuhongt
Date: Mon Jun  3 02:20:33 2019
New Revision: 271853

URL: https://gcc.gnu.org/viewcvs?rev=271853&root=gcc&view=rev
Log:
2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
   Hongtao Liu  <hongtao.liu@intel.com>

PR target/89750
PR target/86444
* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
Modified, original implementation isn't correct.

2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
   Hongtao Liu  <hongtao.liu@intel.com>

PR target/89750
PR target/86444
* gcc.target/i386/avx512f-vcomisd-2.c: New.
* gcc.target/i386/avx512f-vcomisd-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512f-vcomisd-2.c
    trunk/gcc/testsuite/gcc.target/i386/avx512f-vcomiss-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-expand.c
    trunk/gcc/testsuite/ChangeLog
diff mbox series

Patch

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 270933)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@ 
+2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
+	    Hongtao Liu  <hongtao.liu@intel.com>
+
+	PR Target/89750
+	PR Target/86444
+	* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
+	Modified, original implementation isn't correct.
+
 2019-05-06  Segher Boessenkool  <segher@kernel.crashing.org>
 
 	* config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
Index: gcc/config/i386/i386-expand.c
===================================================================
--- gcc/config/i386/i386-expand.c	(revision 270933)
+++ gcc/config/i386/i386-expand.c	(working copy)
@@ -9853,18 +9853,24 @@ 
   const struct insn_data_d *insn_p = &insn_data[icode];
   machine_mode mode0 = insn_p->operand[0].mode;
   machine_mode mode1 = insn_p->operand[1].mode;
-  enum rtx_code comparison = UNEQ;
-  bool need_ucomi = false;
 
   /* See avxintrin.h for values.  */
-  enum rtx_code comi_comparisons[32] =
+  static const enum rtx_code comparisons[32] =
     {
-      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
-      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
-      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
+      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
+      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
+      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
+      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
     };
-  bool need_ucomi_values[32] =
+  static const bool ordereds[32] =
     {
+      true,  true,  true,  false, false, false, false, true,
+      false, false, false, true,  true,  true,  true,  false,
+      true,  true,  true,  false, false, false, false, true,
+      false, false, false, true,  true,  true,  true,  false
+    };
+  static const bool non_signalings[32] =
+    {
       true,  false, false, true,  true,  false, false, true,
       true,  false, false, true,  true,  false, false, true,
       false, true,  true,  false, false, true,  true,  false,
@@ -9888,16 +9894,95 @@ 
       return const0_rtx;
     }
 
-  comparison = comi_comparisons[INTVAL (op2)];
-  need_ucomi = need_ucomi_values[INTVAL (op2)];
-
   if (VECTOR_MODE_P (mode0))
     op0 = safe_vector_operand (op0, mode0);
   if (VECTOR_MODE_P (mode1))
     op1 = safe_vector_operand (op1, mode1);
 
+  enum rtx_code comparison = comparisons[INTVAL (op2)];
+  bool ordered = ordereds[INTVAL (op2)];
+  bool non_signaling = non_signalings[INTVAL (op2)];
+  rtx const_val = const0_rtx;
+  
+  bool check_unordered = false;
+  machine_mode mode = CCFPmode;
+  switch (comparison)
+    {
+    case ORDERED:
+      if (!ordered)
+	{
+	  /* NB: Use CCSmode/NE for _CMP_TRUE_UQ/_CMP_TRUE_US.  */
+	  if (!non_signaling)
+	    ordered = true;
+	  mode = CCSmode;
+	}
+      else
+	{
+	  /* NB: Use CCPmode/NE for _CMP_ORD_Q/_CMP_ORD_S.  */
+	  if (non_signaling)
+	    ordered = false;
+	  mode = CCPmode;
+	}
+      comparison = NE;
+      break;
+    case UNORDERED:
+      if (ordered)
+	{
+	  /* NB: Use CCSmode/EQ for _CMP_FALSE_OQ/_CMP_FALSE_OS.  */
+	  if (non_signaling)
+	    ordered = false;
+	  mode = CCSmode;
+	}
+      else
+	{
+	  /* NB: Use CCPmode/NE for _CMP_UNORD_Q/_CMP_UNORD_S.  */
+	  if (!non_signaling)
+	    ordered = true;
+	  mode = CCPmode;
+	}
+      comparison = EQ;
+      break;
+
+    case LE:	/* -> GE  */
+    case LT:	/* -> GT  */
+    case UNGE:	/* -> UNLE  */
+    case UNGT:	/* -> UNLT  */
+      std::swap (op0, op1);
+      comparison = swap_condition (comparison);
+      /* FALLTHRU */
+    case GT:
+    case GE:
+    case UNEQ:
+    case UNLT:
+    case UNLE:
+    case LTGT:
+      /* These are supported by CCFPmode.  NB: Use ordered/signaling
+	 COMI or unordered/non-signaling UCOMI.  Both set ZF, PF, CF
+	 with NAN operands.  */
+      if (ordered == non_signaling)
+	ordered = !ordered;
+      break;
+    case EQ:
+      if (ordered)
+	check_unordered = true;
+      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCmode for
+	 _CMP_EQ_OQ/_CMP_EQ_OS/_CMP_EQ_UQ/_CMP_EQ_US.  */
+      mode = CCmode;
+      break;
+    case NE:
+      gcc_assert (!ordered);
+      check_unordered = true;
+      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCmode for
+	 _CMP_NEQ_UQ/_CMP_NEQ_US.  */
+      mode = CCmode;
+      const_val = const1_rtx;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
   target = gen_reg_rtx (SImode);
-  emit_move_insn (target, const0_rtx);
+  emit_move_insn (target, const_val);
   target = gen_rtx_SUBREG (QImode, target, 0);
 
   if ((optimize && !register_operand (op0, mode0))
@@ -9907,10 +9992,14 @@ 
       || !insn_p->operand[1].predicate (op1, mode1))
     op1 = copy_to_mode_reg (mode1, op1);
 
-  if (need_ucomi)
-    icode = icode == CODE_FOR_sse_comi_round
-		     ? CODE_FOR_sse_ucomi_round
-		     : CODE_FOR_sse2_ucomi_round;
+  /*
+     1. COMI: ordered and signaling.
+     2. UCOMI: unordered and non-signaling.
+   */
+  if (non_signaling)
+    icode = (icode == CODE_FOR_sse_comi_round
+	     ? CODE_FOR_sse_ucomi_round
+	     : CODE_FOR_sse2_ucomi_round);
 
   pat = GEN_FCN (icode) (op0, op1, op3);
   if (! pat)
@@ -9932,11 +10021,37 @@ 
     }
 
   emit_insn (pat);
+
+  rtx_code_label *label = NULL;
+
+  /* NB: For ordered EQ or unordered NE, check ZF alone isn't sufficient
+     with NAN operands.  */
+  if (check_unordered)
+    {
+      gcc_assert (comparison == EQ || comparison == NE);
+
+      rtx flag = gen_rtx_REG (CCFPmode, FLAGS_REG);
+      label = gen_label_rtx ();
+      rtx tmp = gen_rtx_fmt_ee (UNORDERED, VOIDmode, flag, const0_rtx);
+      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				  gen_rtx_LABEL_REF (VOIDmode, label),
+				  pc_rtx);
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+    }
+
+  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
+     correctly?  */
+  if (GET_MODE (set_dst) != mode)
+    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
+
   emit_insn (gen_rtx_SET (gen_rtx_STRICT_LOW_PART (VOIDmode, target),
 			  gen_rtx_fmt_ee (comparison, QImode,
 					  set_dst,
 					  const0_rtx)));
 
+  if (label)
+    emit_label (label);
+
   return SUBREG_REG (target);
 }
 
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(revision 270933)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,11 @@ 
+2019-05-06  H.J. Lu  <hongjiu.lu@intel.com>
+	    Hongtao Liu  <hongtao.liu@intel.com>
+
+	PR Target/89750
+	PR Target/86444
+	* gcc.target/i386/avx512f-vcomisd-2.c: New.
+	* gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
+
 2019-05-06  Steven G. Kargl  <kargl@gcc.gnu.org>
 
 	PR fortran/90290
Index: gcc/testsuite/gcc.target/i386/avx512f-vcomisd-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx512f-vcomisd-2.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/avx512f-vcomisd-2.c	(working copy)
@@ -0,0 +1,104 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include "avx512f-check.h"
+
+static inline void __attribute__ ((__always_inline__))
+check_cmp (double s1, double s2, const int imm, int expected)
+{
+  __m128d source1 = _mm_load_sd (&s1);
+  __m128d source2 = _mm_load_sd (&s2);
+  int res = _mm_comi_round_sd (source1, source2, imm,
+			       _MM_FROUND_NO_EXC);
+  if (expected != res)
+    abort();
+}
+
+static void
+do_check (double s1, double s2)
+{
+  check_cmp (s1, s2, _CMP_EQ_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 == s2);
+  check_cmp (s1, s2, _CMP_LT_OS,
+	     !__builtin_isunordered (s1, s2) && s1 < s2);
+  check_cmp (s1, s2, _CMP_LE_OS,
+	     !__builtin_isunordered (s1, s2) && s1 <= s2);
+  check_cmp (s1, s2, _CMP_UNORD_Q,
+	     __builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_NEQ_UQ,
+	     __builtin_isunordered (s1, s2) || s1 != s2);
+  check_cmp (s1, s2, _CMP_NLT_US,
+	     __builtin_isunordered (s1, s2) || s1 >= s2);
+  check_cmp (s1, s2, _CMP_NLE_US,
+	     __builtin_isunordered (s1, s2) || s1 > s2);
+  check_cmp (s1, s2, _CMP_ORD_Q,
+	     !__builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_EQ_UQ,
+	     __builtin_isunordered (s1, s2) || s1 == s2);
+  check_cmp (s1, s2, _CMP_NGE_US,
+	     __builtin_isunordered (s1, s2) || s1 < s2);
+  check_cmp (s1, s2, _CMP_NGT_US,
+	     __builtin_isunordered (s1, s2) || s1 <= s2);
+  check_cmp (s1, s2, _CMP_FALSE_OQ, 0);
+  check_cmp (s1, s2, _CMP_NEQ_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 != s2);
+  check_cmp (s1, s2, _CMP_GE_OS,
+	     !__builtin_isunordered (s1, s2) && s1 >= s2);
+  check_cmp (s1, s2, _CMP_GT_OS,
+	     !__builtin_isunordered (s1, s2) && s1 > s2);
+  check_cmp (s1, s2, _CMP_TRUE_UQ, 1);
+  check_cmp (s1, s2, _CMP_EQ_OS,
+	     !__builtin_isunordered (s1, s2) && s1 == s2);
+  check_cmp (s1, s2, _CMP_LT_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 < s2);
+  check_cmp (s1, s2, _CMP_LE_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 <= s2);
+  check_cmp (s1, s2, _CMP_UNORD_S,
+	     __builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_NEQ_US,
+	     __builtin_isunordered (s1, s2) || s1 != s2);
+  check_cmp (s1, s2, _CMP_NLT_UQ,
+	     __builtin_isunordered (s1, s2) || s1 >= s2);
+  check_cmp (s1, s2, _CMP_NLE_UQ,
+	     __builtin_isunordered (s1, s2) || s1 > s2);
+  check_cmp (s1, s2, _CMP_ORD_S, !__builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_EQ_US,
+	     __builtin_isunordered (s1, s2) || s1 == s2);
+  check_cmp (s1, s2, _CMP_NGE_UQ,
+	     __builtin_isunordered (s1, s2) || s1 < s2);
+  check_cmp (s1, s2, _CMP_NGT_UQ,
+	     __builtin_isunordered (s1, s2) || s1 <= s2);
+  check_cmp (s1, s2, _CMP_FALSE_OS, 0);
+  check_cmp (s1, s2, _CMP_NEQ_OS,
+	     !__builtin_isunordered (s1, s2) && s1 != s2);
+  check_cmp (s1, s2, _CMP_GE_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 >= s2);
+  check_cmp (s1, s2, _CMP_GT_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 > s2);
+  check_cmp (s1, s2, _CMP_TRUE_US, 1);
+}
+
+static void
+avx512f_test (void)
+{
+  struct
+    {
+      double x1;
+      double x2;
+    }
+  inputs[] =
+    { 
+      { 4.3, 2.18 },
+      { -4.3, 3.18 },
+      { __builtin_nan (""), -5.8 },
+      { -4.8, __builtin_nans ("") },
+      { 3.8, __builtin_nans ("") },
+      { 4.2, 4.2 },
+      { __builtin_nan (""), __builtin_nans ("") },
+    };
+  int i;
+
+  for (i = 0; i < sizeof (inputs) / sizeof (inputs[0]); i++)
+    do_check (inputs[i].x1, inputs[i].x2);
+}
Index: gcc/testsuite/gcc.target/i386/avx512f-vcomiss-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx512f-vcomiss-2.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/avx512f-vcomiss-2.c	(working copy)
@@ -0,0 +1,104 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include "avx512f-check.h"
+
+static inline void __attribute__ ((__always_inline__))
+check_cmp (float s1, float s2, const int imm, int expected)
+{
+  __m128 source1 = _mm_load_ss (&s1);
+  __m128 source2 = _mm_load_ss (&s2);
+  int res = _mm_comi_round_ss (source1, source2, imm,
+			       _MM_FROUND_NO_EXC);
+  if (expected != res)
+    abort();
+}
+
+static void
+do_check (float s1, float s2)
+{
+  check_cmp (s1, s2, _CMP_EQ_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 == s2);
+  check_cmp (s1, s2, _CMP_LT_OS,
+	     !__builtin_isunordered (s1, s2) && s1 < s2);
+  check_cmp (s1, s2, _CMP_LE_OS,
+	     !__builtin_isunordered (s1, s2) && s1 <= s2);
+  check_cmp (s1, s2, _CMP_UNORD_Q,
+	     __builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_NEQ_UQ,
+	     __builtin_isunordered (s1, s2) || s1 != s2);
+  check_cmp (s1, s2, _CMP_NLT_US,
+	     __builtin_isunordered (s1, s2) || s1 >= s2);
+  check_cmp (s1, s2, _CMP_NLE_US,
+	     __builtin_isunordered (s1, s2) || s1 > s2);
+  check_cmp (s1, s2, _CMP_ORD_Q,
+	     !__builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_EQ_UQ,
+	     __builtin_isunordered (s1, s2) || s1 == s2);
+  check_cmp (s1, s2, _CMP_NGE_US,
+	     __builtin_isunordered (s1, s2) || s1 < s2);
+  check_cmp (s1, s2, _CMP_NGT_US,
+	     __builtin_isunordered (s1, s2) || s1 <= s2);
+  check_cmp (s1, s2, _CMP_FALSE_OQ, 0);
+  check_cmp (s1, s2, _CMP_NEQ_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 != s2);
+  check_cmp (s1, s2, _CMP_GE_OS,
+	     !__builtin_isunordered (s1, s2) && s1 >= s2);
+  check_cmp (s1, s2, _CMP_GT_OS,
+	     !__builtin_isunordered (s1, s2) && s1 > s2);
+  check_cmp (s1, s2, _CMP_TRUE_UQ, 1);
+  check_cmp (s1, s2, _CMP_EQ_OS,
+	     !__builtin_isunordered (s1, s2) && s1 == s2);
+  check_cmp (s1, s2, _CMP_LT_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 < s2);
+  check_cmp (s1, s2, _CMP_LE_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 <= s2);
+  check_cmp (s1, s2, _CMP_UNORD_S,
+	     __builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_NEQ_US,
+	     __builtin_isunordered (s1, s2) || s1 != s2);
+  check_cmp (s1, s2, _CMP_NLT_UQ,
+	     __builtin_isunordered (s1, s2) || s1 >= s2);
+  check_cmp (s1, s2, _CMP_NLE_UQ,
+	     __builtin_isunordered (s1, s2) || s1 > s2);
+  check_cmp (s1, s2, _CMP_ORD_S, !__builtin_isunordered (s1, s2));
+  check_cmp (s1, s2, _CMP_EQ_US,
+	     __builtin_isunordered (s1, s2) || s1 == s2);
+  check_cmp (s1, s2, _CMP_NGE_UQ,
+	     __builtin_isunordered (s1, s2) || s1 < s2);
+  check_cmp (s1, s2, _CMP_NGT_UQ,
+	     __builtin_isunordered (s1, s2) || s1 <= s2);
+  check_cmp (s1, s2, _CMP_FALSE_OS, 0);
+  check_cmp (s1, s2, _CMP_NEQ_OS,
+	     !__builtin_isunordered (s1, s2) && s1 != s2);
+  check_cmp (s1, s2, _CMP_GE_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 >= s2);
+  check_cmp (s1, s2, _CMP_GT_OQ,
+	     !__builtin_isunordered (s1, s2) && s1 > s2);
+  check_cmp (s1, s2, _CMP_TRUE_US, 1);
+}
+
+static void
+avx512f_test (void)
+{
+  struct
+    {
+      float x1;
+      float x2;
+    }
+  inputs[] =
+    { 
+      { 4.3, 2.18 },
+      { -4.3, 3.18 },
+      { __builtin_nanf (""), -5.8 },
+      { -4.8, __builtin_nansf ("") },
+      { 3.8, __builtin_nansf ("") },
+      { 4.2, 4.2 },
+      { __builtin_nanf (""), __builtin_nansf ("") },
+    };
+  int i;
+
+  for (i = 0; i < sizeof (inputs) / sizeof (inputs[0]); i++)
+    do_check (inputs[i].x1, inputs[i].x2);
+}