diff mbox

[AArch64,NEON] Improve vpmaxX & vpminX intrinsics

Message ID DA41BE1DDCA941489001C7FBD7A8820E837ABB5A@szxema507-mbx.china.huawei.com
State New
Headers show

Commit Message

Yangfei (Felix) Dec. 9, 2014, 8:17 a.m. UTC
> On 28 November 2014 at 09:23, Yangfei (Felix) <felix.yang@huawei.com> wrote:

> > Hi,

> >   This patch converts vpmaxX & vpminX intrinsics to use builtin functions

> instead of the previous inline assembly syntax.

> >   Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious

> testsuite of Christophe Lyon.

> >   OK for the trunk?

> 

> Hi Felix,   We know from experience that the advsimd intrinsics tend

> to be fragile for big endian and in general it is fairly easy to break the big endian

> case.  For these advsimd improvements that you are working on (that we very

> much appreciate) it is important to run both little endian and big endian

> regressions.

> 

> Thanks

> /Marcus



Okay.  Any plan for the advsimd big-endian improvement? 
I rebased this patch over Alan Lawrance's patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html 
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?

Comments

Tejas Belagod Jan. 13, 2015, 5:16 p.m. UTC | #1
On 09/12/14 08:17, Yangfei (Felix) wrote:
>> On 28 November 2014 at 09:23, Yangfei (Felix) <felix.yang@huawei.com> wrote:
>>> Hi,
>>>    This patch converts vpmaxX & vpminX intrinsics to use builtin functions
>> instead of the previous inline assembly syntax.
>>>    Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious
>> testsuite of Christophe Lyon.
>>>    OK for the trunk?
>>
>> Hi Felix,   We know from experience that the advsimd intrinsics tend
>> to be fragile for big endian and in general it is fairly easy to break the big endian
>> case.  For these advsimd improvements that you are working on (that we very
>> much appreciate) it is important to run both little endian and big endian
>> regressions.
>>
>> Thanks
>> /Marcus
>
>
> Okay.  Any plan for the advsimd big-endian improvement?
> I rebased this patch over Alan Lawrance's patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
> No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?
>
>
> Index: gcc/ChangeLog
> ===================================================================
> --- gcc/ChangeLog       (revision 218464)
> +++ gcc/ChangeLog       (working copy)
> @@ -1,3 +1,18 @@
> +2014-12-09  Felix Yang  <felix.yang@huawei.com>
> +
> +       * config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): New
> +       pattern.
> +       * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
> +       uminp, smax_nanp, smin_nanp): New builtins.
> +       * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
> +       vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
> +       vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
> +       vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
> +       vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
> +       vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
> +       vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
> +       vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
> +


>   __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
> +++ gcc/config/aarch64/aarch64-simd.md  (working copy)
> @@ -1017,6 +1017,28 @@
>     DONE;
>   })
>
> +;; Pairwise Integer Max/Min operations.
> +(define_insn "aarch64_<maxmin_uns>p<mode>"
> + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
> +       (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand" "w")
> +                        (match_operand:VDQ_BHSI 2 "register_operand" "w")]
> +                       MAXMINV))]
> + "TARGET_SIMD"
> + "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
> +  [(set_attr "type" "neon_minmax<q>")]
> +)
> +

Hi Felix,

Sorry for the delay in getting back to you on this.

If you've rolled aarch64_reduc_<maxmin_uns>_internalv2si into the above 
pattern, do you still need it? For all its call points, just point them 
to aarch64_<maxmin_uns>p<mode>?

Thanks,
Tejas.
Yangfei (Felix) Jan. 14, 2015, 7:09 a.m. UTC | #2
> On 09/12/14 08:17, Yangfei (Felix) wrote:

> >> On 28 November 2014 at 09:23, Yangfei (Felix) <felix.yang@huawei.com>

> wrote:

> >>> Hi,

> >>>    This patch converts vpmaxX & vpminX intrinsics to use builtin

> >>> functions

> >> instead of the previous inline assembly syntax.

> >>>    Regtested with aarch64-linux-gnu on QEMU.  Also passed the

> >>> glorious

> >> testsuite of Christophe Lyon.

> >>>    OK for the trunk?

> >>

> >> Hi Felix,   We know from experience that the advsimd intrinsics tend

> >> to be fragile for big endian and in general it is fairly easy to

> >> break the big endian case.  For these advsimd improvements that you

> >> are working on (that we very much appreciate) it is important to run

> >> both little endian and big endian regressions.

> >>

> >> Thanks

> >> /Marcus

> >

> >

> > Okay.  Any plan for the advsimd big-endian improvement?

> > I rebased this patch over Alan Lawrance's patch:

> > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html

> > No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?

> >

> >

> > Index: gcc/ChangeLog

> >

> =============================================================

> ======

> > --- gcc/ChangeLog       (revision 218464)

> > +++ gcc/ChangeLog       (working copy)

> > @@ -1,3 +1,18 @@

> > +2014-12-09  Felix Yang  <felix.yang@huawei.com>

> > +

> > +       * config/aarch64/aarch64-simd.md

> (aarch64_<maxmin_uns>p<mode>): New

> > +       pattern.

> > +       * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,

> > +       uminp, smax_nanp, smin_nanp): New builtins.

> > +       * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,

> > +       vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16,

> vpmaxq_s32,

> > +       vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32,

> vpmaxq_f64,

> > +       vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32,

> vpmaxnmq_f64,

> > +       vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32,

> vpmin_u8,

> > +       vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32,

> vpminq_u8,

> > +       vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64,

> vpminqd_f64,

> > +       vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64,

> > + vpminnmqd_f64,

> > +

> 

> 

> >   __extension__ static __inline float32x2_t __attribute__

> > ((__always_inline__))

> > Index: gcc/config/aarch64/aarch64-simd.md

> >

> =============================================================

> ======

> > --- gcc/config/aarch64/aarch64-simd.md  (revision 218464)

> > +++ gcc/config/aarch64/aarch64-simd.md  (working copy)

> > @@ -1017,6 +1017,28 @@

> >     DONE;

> >   })

> >

> > +;; Pairwise Integer Max/Min operations.

> > +(define_insn "aarch64_<maxmin_uns>p<mode>"

> > + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")

> > +       (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1

> "register_operand" "w")

> > +                        (match_operand:VDQ_BHSI 2 "register_operand"

> "w")]

> > +                       MAXMINV))]

> > + "TARGET_SIMD"

> > + "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"

> > +  [(set_attr "type" "neon_minmax<q>")]

> > +)

> > +

> 

> Hi Felix,

> 

> Sorry for the delay in getting back to you on this.

> 

> If you've rolled aarch64_reduc_<maxmin_uns>_internalv2si into the above

> pattern, do you still need it? For all its call points, just point them to

> aarch64_<maxmin_uns>p<mode>?

> 

> Thanks,

> Tejas.

> 



Hello Tejas,

  I didn't do this yet. 
  Currently the aarch64_reduc_<maxmin_uns>_internalv2si is only called by reduc_<maxmin_uns>_scal_<mode>. 
  I find it kind of trouble to handle this due to the use of iterators in the caller pattern. 
  Are you going to rework this part?
Tejas Belagod Jan. 16, 2015, 11:49 a.m. UTC | #3
On 14/01/15 07:09, Yangfei (Felix) wrote:
>> On 09/12/14 08:17, Yangfei (Felix) wrote:
>>>> On 28 November 2014 at 09:23, Yangfei (Felix) <felix.yang@huawei.com>
>> wrote:
>>>>> Hi,
>>>>>     This patch converts vpmaxX & vpminX intrinsics to use builtin
>>>>> functions
>>>> instead of the previous inline assembly syntax.
>>>>>     Regtested with aarch64-linux-gnu on QEMU.  Also passed the
>>>>> glorious
>>>> testsuite of Christophe Lyon.
>>>>>     OK for the trunk?
>>>>
>>>> Hi Felix,   We know from experience that the advsimd intrinsics tend
>>>> to be fragile for big endian and in general it is fairly easy to
>>>> break the big endian case.  For these advsimd improvements that you
>>>> are working on (that we very much appreciate) it is important to run
>>>> both little endian and big endian regressions.
>>>>
>>>> Thanks
>>>> /Marcus
>>>
>>>
>>> Okay.  Any plan for the advsimd big-endian improvement?
>>> I rebased this patch over Alan Lawrance's patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
>>> No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?
>>>
>>>
>>> Index: gcc/ChangeLog
>>>
>> =============================================================
>> ======
>>> --- gcc/ChangeLog       (revision 218464)
>>> +++ gcc/ChangeLog       (working copy)
>>> @@ -1,3 +1,18 @@
>>> +2014-12-09  Felix Yang  <felix.yang@huawei.com>
>>> +
>>> +       * config/aarch64/aarch64-simd.md
>> (aarch64_<maxmin_uns>p<mode>): New
>>> +       pattern.
>>> +       * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
>>> +       uminp, smax_nanp, smin_nanp): New builtins.
>>> +       * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
>>> +       vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16,
>> vpmaxq_s32,
>>> +       vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32,
>> vpmaxq_f64,
>>> +       vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32,
>> vpmaxnmq_f64,
>>> +       vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32,
>> vpmin_u8,
>>> +       vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32,
>> vpminq_u8,
>>> +       vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64,
>> vpminqd_f64,
>>> +       vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64,
>>> + vpminnmqd_f64,
>>> +
>>
>>
>>>    __extension__ static __inline float32x2_t __attribute__
>>> ((__always_inline__))
>>> Index: gcc/config/aarch64/aarch64-simd.md
>>>
>> =============================================================
>> ======
>>> --- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
>>> +++ gcc/config/aarch64/aarch64-simd.md  (working copy)
>>> @@ -1017,6 +1017,28 @@
>>>      DONE;
>>>    })
>>>
>>> +;; Pairwise Integer Max/Min operations.
>>> +(define_insn "aarch64_<maxmin_uns>p<mode>"
>>> + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
>>> +       (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1
>> "register_operand" "w")
>>> +                        (match_operand:VDQ_BHSI 2 "register_operand"
>> "w")]
>>> +                       MAXMINV))]
>>> + "TARGET_SIMD"
>>> + "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
>>> +  [(set_attr "type" "neon_minmax<q>")]
>>> +)
>>> +
>>
>> Hi Felix,
>>
>> Sorry for the delay in getting back to you on this.
>>
>> If you've rolled aarch64_reduc_<maxmin_uns>_internalv2si into the above
>> pattern, do you still need it? For all its call points, just point them to
>> aarch64_<maxmin_uns>p<mode>?
>>
>> Thanks,
>> Tejas.
>>
>
>
> Hello Tejas,
>
>    I didn't do this yet.
>    Currently the aarch64_reduc_<maxmin_uns>_internalv2si is only called by reduc_<maxmin_uns>_scal_<mode>.
>    I find it kind of trouble to handle this due to the use of iterators in the caller pattern.
>    Are you going to rework this part?
>

You're right. Nevermind. That restructuring, if we choose to do it, is 
another patch. This patch looks good(but I can't approve it).

Thanks,
Tejas.
Marcus Shawcroft Jan. 16, 2015, 12:35 p.m. UTC | #4
> +2014-12-09  Felix Yang  <felix.yang@huawei.com>
> +
> +       * config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): New
> +       pattern.
> +       * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
> +       uminp, smax_nanp, smin_nanp): New builtins.
> +       * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
> +       vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
> +       vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
> +       vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
> +       vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
> +       vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
> +       vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
> +       vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
> +

OK, Thanks /Marcus
diff mbox

Patch

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 218464)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,18 @@ 
+2014-12-09  Felix Yang  <felix.yang@huawei.com>
+
+	* config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): New
+	pattern.
+	* config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+	uminp, smax_nanp, smin_nanp): New builtins.
+	* config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+	vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
+	vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
+	vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
+	vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
+	vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
+	vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
+	vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
+
 2014-12-07  Felix Yang  <felix.yang@huawei.com>
 	    Shanyao Chen  <chenshanyao@huawei.com>
 
Index: gcc/config/aarch64/arm_neon.h
===================================================================
--- gcc/config/aarch64/arm_neon.h	(revision 218464)
+++ gcc/config/aarch64/arm_neon.h	(working copy)
@@ -8843,491 +8843,7 @@  vpadds_f32 (float32x2_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmax_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fmaxp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vpmax_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ ("smaxp %0.8b, %1.8b, %2.8b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vpmax_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ ("smaxp %0.4h, %1.4h, %2.4h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vpmax_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ ("smaxp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vpmax_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ ("umaxp %0.8b, %1.8b, %2.8b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vpmax_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ ("umaxp %0.4h, %1.4h, %2.4h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vpmax_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ ("umaxp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmaxnm_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fmaxnmp %0.2s,%1.2s,%2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fmaxnmp %0.4s,%1.4s,%2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpmaxnmq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fmaxnmp %0.2d,%1.2d,%2.2d"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vpmaxnmqd_f64 (float64x2_t a)
-{
-  float64_t result;
-  __asm__ ("fmaxnmp %d0,%1.2d"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vpmaxnms_f32 (float32x2_t a)
-{
-  float32_t result;
-  __asm__ ("fmaxnmp %s0,%1.2s"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpmaxq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fmaxp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpmaxq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fmaxp %0.2d, %1.2d, %2.2d"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vpmaxq_s8 (int8x16_t a, int8x16_t b)
-{
-  int8x16_t result;
-  __asm__ ("smaxp %0.16b, %1.16b, %2.16b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vpmaxq_s16 (int16x8_t a, int16x8_t b)
-{
-  int16x8_t result;
-  __asm__ ("smaxp %0.8h, %1.8h, %2.8h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vpmaxq_s32 (int32x4_t a, int32x4_t b)
-{
-  int32x4_t result;
-  __asm__ ("smaxp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vpmaxq_u8 (uint8x16_t a, uint8x16_t b)
-{
-  uint8x16_t result;
-  __asm__ ("umaxp %0.16b, %1.16b, %2.16b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vpmaxq_u16 (uint16x8_t a, uint16x8_t b)
-{
-  uint16x8_t result;
-  __asm__ ("umaxp %0.8h, %1.8h, %2.8h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vpmaxq_u32 (uint32x4_t a, uint32x4_t b)
-{
-  uint32x4_t result;
-  __asm__ ("umaxp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vpmaxqd_f64 (float64x2_t a)
-{
-  float64_t result;
-  __asm__ ("fmaxp %d0,%1.2d"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vpmaxs_f32 (float32x2_t a)
-{
-  float32_t result;
-  __asm__ ("fmaxp %s0,%1.2s"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmin_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fminp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vpmin_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ ("sminp %0.8b, %1.8b, %2.8b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vpmin_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ ("sminp %0.4h, %1.4h, %2.4h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vpmin_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ ("sminp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vpmin_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ ("uminp %0.8b, %1.8b, %2.8b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vpmin_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ ("uminp %0.4h, %1.4h, %2.4h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vpmin_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ ("uminp %0.2s, %1.2s, %2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpminnm_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fminnmp %0.2s,%1.2s,%2.2s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpminnmq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fminnmp %0.4s,%1.4s,%2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpminnmq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fminnmp %0.2d,%1.2d,%2.2d"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vpminnmqd_f64 (float64x2_t a)
-{
-  float64_t result;
-  __asm__ ("fminnmp %d0,%1.2d"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vpminnms_f32 (float32x2_t a)
-{
-  float32_t result;
-  __asm__ ("fminnmp %s0,%1.2s"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpminq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fminp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpminq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fminp %0.2d, %1.2d, %2.2d"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vpminq_s8 (int8x16_t a, int8x16_t b)
-{
-  int8x16_t result;
-  __asm__ ("sminp %0.16b, %1.16b, %2.16b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vpminq_s16 (int16x8_t a, int16x8_t b)
-{
-  int16x8_t result;
-  __asm__ ("sminp %0.8h, %1.8h, %2.8h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vpminq_s32 (int32x4_t a, int32x4_t b)
-{
-  int32x4_t result;
-  __asm__ ("sminp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vpminq_u8 (uint8x16_t a, uint8x16_t b)
-{
-  uint8x16_t result;
-  __asm__ ("uminp %0.16b, %1.16b, %2.16b"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vpminq_u16 (uint16x8_t a, uint16x8_t b)
-{
-  uint16x8_t result;
-  __asm__ ("uminp %0.8h, %1.8h, %2.8h"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vpminq_u32 (uint32x4_t a, uint32x4_t b)
-{
-  uint32x4_t result;
-  __asm__ ("uminp %0.4s, %1.4s, %2.4s"
-           : "=w"(result)
-           : "w"(a), "w"(b)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vpminqd_f64 (float64x2_t a)
-{
-  float64_t result;
-  __asm__ ("fminp %d0,%1.2d"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vpmins_f32 (float32x2_t a)
-{
-  float32_t result;
-  __asm__ ("fminp %s0,%1.2s"
-           : "=w"(result)
-           : "w"(a)
-           : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqdmulh_n_s16 (int16x4_t a, int16_t b)
 {
   int16x4_t result;
@@ -18205,6 +17721,290 @@  vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
 						  (int32x4_t) __b);
 }
 
+/* vpmax  */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vpmax_s8 (int8x8_t a, int8x8_t b)
+{
+  return __builtin_aarch64_smaxpv8qi (a, b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpmax_s16 (int16x4_t a, int16x4_t b)
+{
+  return __builtin_aarch64_smaxpv4hi (a, b);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpmax_s32 (int32x2_t a, int32x2_t b)
+{
+  return __builtin_aarch64_smaxpv2si (a, b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vpmax_u8 (uint8x8_t a, uint8x8_t b)
+{
+  return (uint8x8_t) __builtin_aarch64_umaxpv8qi ((int8x8_t) a,
+						  (int8x8_t) b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpmax_u16 (uint16x4_t a, uint16x4_t b)
+{
+  return (uint16x4_t) __builtin_aarch64_umaxpv4hi ((int16x4_t) a,
+						   (int16x4_t) b);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpmax_u32 (uint32x2_t a, uint32x2_t b)
+{
+  return (uint32x2_t) __builtin_aarch64_umaxpv2si ((int32x2_t) a,
+						   (int32x2_t) b);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vpmaxq_s8 (int8x16_t a, int8x16_t b)
+{
+  return __builtin_aarch64_smaxpv16qi (a, b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vpmaxq_s16 (int16x8_t a, int16x8_t b)
+{
+  return __builtin_aarch64_smaxpv8hi (a, b);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vpmaxq_s32 (int32x4_t a, int32x4_t b)
+{
+  return __builtin_aarch64_smaxpv4si (a, b);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vpmaxq_u8 (uint8x16_t a, uint8x16_t b)
+{
+  return (uint8x16_t) __builtin_aarch64_umaxpv16qi ((int8x16_t) a,
+						    (int8x16_t) b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vpmaxq_u16 (uint16x8_t a, uint16x8_t b)
+{
+  return (uint16x8_t) __builtin_aarch64_umaxpv8hi ((int16x8_t) a,
+						   (int16x8_t) b);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vpmaxq_u32 (uint32x4_t a, uint32x4_t b)
+{
+  return (uint32x4_t) __builtin_aarch64_umaxpv4si ((int32x4_t) a,
+						   (int32x4_t) b);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpmax_f32 (float32x2_t a, float32x2_t b)
+{
+  return __builtin_aarch64_smax_nanpv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vpmaxq_f32 (float32x4_t a, float32x4_t b)
+{
+  return __builtin_aarch64_smax_nanpv4sf (a, b);
+}
+
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vpmaxq_f64 (float64x2_t a, float64x2_t b)
+{
+  return __builtin_aarch64_smax_nanpv2df (a, b);
+}
+
+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+vpmaxqd_f64 (float64x2_t a)
+{
+  return __builtin_aarch64_reduc_smax_nan_scal_v2df (a);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vpmaxs_f32 (float32x2_t a)
+{
+  return __builtin_aarch64_reduc_smax_nan_scal_v2sf (a);
+}
+
+/* vpmaxnm  */
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpmaxnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return __builtin_aarch64_smaxpv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return __builtin_aarch64_smaxpv4sf (a, b);
+}
+
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vpmaxnmq_f64 (float64x2_t a, float64x2_t b)
+{
+  return __builtin_aarch64_smaxpv2df (a, b);
+}
+
+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+vpmaxnmqd_f64 (float64x2_t a)
+{
+  return __builtin_aarch64_reduc_smax_scal_v2df (a);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vpmaxnms_f32 (float32x2_t a)
+{
+  return __builtin_aarch64_reduc_smax_scal_v2sf (a);
+}
+
+/* vpmin  */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vpmin_s8 (int8x8_t a, int8x8_t b)
+{
+  return __builtin_aarch64_sminpv8qi (a, b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpmin_s16 (int16x4_t a, int16x4_t b)
+{
+  return __builtin_aarch64_sminpv4hi (a, b);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpmin_s32 (int32x2_t a, int32x2_t b)
+{
+  return __builtin_aarch64_sminpv2si (a, b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vpmin_u8 (uint8x8_t a, uint8x8_t b)
+{
+  return (uint8x8_t) __builtin_aarch64_uminpv8qi ((int8x8_t) a,
+						  (int8x8_t) b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpmin_u16 (uint16x4_t a, uint16x4_t b)
+{
+  return (uint16x4_t) __builtin_aarch64_uminpv4hi ((int16x4_t) a,
+						   (int16x4_t) b);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpmin_u32 (uint32x2_t a, uint32x2_t b)
+{
+  return (uint32x2_t) __builtin_aarch64_uminpv2si ((int32x2_t) a,
+						   (int32x2_t) b);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vpminq_s8 (int8x16_t a, int8x16_t b)
+{
+  return __builtin_aarch64_sminpv16qi (a, b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vpminq_s16 (int16x8_t a, int16x8_t b)
+{
+  return __builtin_aarch64_sminpv8hi (a, b);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vpminq_s32 (int32x4_t a, int32x4_t b)
+{
+  return __builtin_aarch64_sminpv4si (a, b);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vpminq_u8 (uint8x16_t a, uint8x16_t b)
+{
+  return (uint8x16_t) __builtin_aarch64_uminpv16qi ((int8x16_t) a,
+						    (int8x16_t) b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vpminq_u16 (uint16x8_t a, uint16x8_t b)
+{
+  return (uint16x8_t) __builtin_aarch64_uminpv8hi ((int16x8_t) a,
+						   (int16x8_t) b);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vpminq_u32 (uint32x4_t a, uint32x4_t b)
+{
+  return (uint32x4_t) __builtin_aarch64_uminpv4si ((int32x4_t) a,
+						   (int32x4_t) b);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpmin_f32 (float32x2_t a, float32x2_t b)
+{
+  return __builtin_aarch64_smin_nanpv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vpminq_f32 (float32x4_t a, float32x4_t b)
+{
+  return __builtin_aarch64_smin_nanpv4sf (a, b);
+}
+
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vpminq_f64 (float64x2_t a, float64x2_t b)
+{
+  return __builtin_aarch64_smin_nanpv2df (a, b);
+}
+
+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+vpminqd_f64 (float64x2_t a)
+{
+  return __builtin_aarch64_reduc_smin_nan_scal_v2df (a);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vpmins_f32 (float32x2_t a)
+{
+  return __builtin_aarch64_reduc_smin_nan_scal_v2sf (a);
+}
+
+/* vpminnm  */
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpminnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return __builtin_aarch64_sminpv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vpminnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return __builtin_aarch64_sminpv4sf (a, b);
+}
+
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vpminnmq_f64 (float64x2_t a, float64x2_t b)
+{
+  return __builtin_aarch64_sminpv2df (a, b);
+}
+
+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+vpminnmqd_f64 (float64x2_t a)
+{
+  return __builtin_aarch64_reduc_smin_scal_v2df (a);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vpminnms_f32 (float32x2_t a)
+{
+  return __builtin_aarch64_reduc_smin_scal_v2sf (a);
+}
+
 /* vmaxnm  */
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md	(revision 218464)
+++ gcc/config/aarch64/aarch64-simd.md	(working copy)
@@ -1017,6 +1017,28 @@ 
   DONE;
 })
 
+;; Pairwise Integer Max/Min operations.
+(define_insn "aarch64_<maxmin_uns>p<mode>"
+ [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
+       (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand" "w")
+			 (match_operand:VDQ_BHSI 2 "register_operand" "w")]
+			MAXMINV))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_minmax<q>")]
+)
+
+;; Pairwise FP Max/Min operations.
+(define_insn "aarch64_<maxmin_uns>p<mode>"
+ [(set (match_operand:VDQF 0 "register_operand" "=w")
+       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+		     (match_operand:VDQF 2 "register_operand" "w")]
+		    FMAXMINV))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_minmax<q>")]
+)
+
 ;; vec_concat gives a new vector with the low elements from operand 1, and
 ;; the high elements from operand 2.  That is to say, given op1 = { a, b }
 ;; op2 = { c, d }, vec_concat (op1, op2) = { a, b, c, d }.
Index: gcc/config/aarch64/aarch64-simd-builtins.def
===================================================================
--- gcc/config/aarch64/aarch64-simd-builtins.def	(revision 218464)
+++ gcc/config/aarch64/aarch64-simd-builtins.def	(working copy)
@@ -247,6 +247,16 @@ 
   BUILTIN_VDQF (BINOP, smax_nan, 3)
   BUILTIN_VDQF (BINOP, smin_nan, 3)
 
+  /* Implemented by aarch64_<maxmin_uns>p<mode>.  */
+  BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
+  BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
+  BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
+  BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
+  BUILTIN_VDQF (BINOP, smaxp, 0)
+  BUILTIN_VDQF (BINOP, sminp, 0)
+  BUILTIN_VDQF (BINOP, smax_nanp, 0)
+  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+
   /* Implemented by <frint_pattern><mode>2.  */
   BUILTIN_VDQF (UNOP, btrunc, 2)
   BUILTIN_VDQF (UNOP, ceil, 2)