diff mbox series

[target/92035] Add missing avx512f intrinsics

Message ID CAMZc-bwZimda+7gaKer4q0Aj+UMKbfqs7a_6jtxd6Nc1aNvzoA@mail.gmail.com
State New
Headers show
Series [target/92035] Add missing avx512f intrinsics | expand

Commit Message

Hongtao Liu Oct. 12, 2019, 7:30 a.m. UTC
Hi:
  This patch is enabling missing avx512f intrinsics listed as

_mm_mask_roundscale_sd
_mm_mask_roundscale_round_sd
_mm_maskz_roundscale_sd
_mm_maskz_roundscale_round_sd
_mm_mask_roundscale_ss
_mm_mask_roundscale_round_ss
_mm_maskz_roundscale_ss
_mm_maskz_roundscale_round_ss

  Bootstrap ok, regression tests for i386/x86 ok.

ChangeLog

gcc/
        * config/i386/avx512fintrin.h (_mm_mask_roundscale_ss,
        _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
        _mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd,
        _mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd,
        _mm_maskz_roundscale_round_sd): New intrinsics.
        (_mm_roundscale_ss, _mm_roundscale_round_ss): Fix.
        * config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round,
        __builtin_ia32_rndscalesd_round): Remove.
        (__builtin_ia32_rndscalesd_mask_round,
        __builtin_ia32_rndscalesd_mask_round): New intrinsics.
        * config/i386/sse.md
(avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
        (avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>):
... this.
        ((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
        "<round_saeonly_constraint>")): Changed to ...
        ((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
        "<round_saeonly_scalar_constraint>")): ... this.
        ("vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
        %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"): Changed to ...
        ("vrndscale<ssescalarmodesuffix>\t{%3,<round_saeonly_scalar_mask_op4>%2,
%1,
        %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
        %<iptr>2<round_saeonly_scalar_mask_op4>, %3}"): ... this.

gcc/testsuite/
        * gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss,
        _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
        _mm_maskz_roundscale_round_ss): Test new intrinsics.
        * gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss,
        _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
        _mm_maskz_roundscale_round_ss): Test new intrinsics.
        * gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd,
        _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
        _mm_maskz_roundscale_round_sd): Test new intrinsics.
        * gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd,
        _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
        _mm_maskz_roundscale_round_sd): Test new intrinsics.
        * gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round,
        __builtin_ia32_rndscalefsd_round): Remove builtin.
        (__builtin_ia32_rndscalefss_mask_round,
        __builtin_ia32_rndscalefsd_mask_round): Test new builtin.
        * gcc.target/i386/sse-13.c: Ditto.
        * gcc.target/i386/sse-23.c: Ditto.

Comments

Jakub Jelinek Oct. 12, 2019, 8:15 a.m. UTC | #1
Hi!

> gcc/
> 	* config/i386/avx512fintrin.h (_mm_mask_roundscale_ss,
> 	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> 	_mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd,
> 	_mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd,
> 	_mm_maskz_roundscale_round_sd): New intrinsics.
> 	(_mm_roundscale_ss, _mm_roundscale_round_ss): Fix.

"Fix." doesn't describe the change you've done.  So I think it should be
instead:
"Use __builtin_ia32_rndscales?_mask_round builtins instead of
__builtin_ia32_rndscales?_round."

> 	* config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round,
> 	__builtin_ia32_rndscalesd_round): Remove.
> 	(__builtin_ia32_rndscalesd_mask_round,

Pasto, sd listed twice, ss not listed, change the first one to ss.

> 	__builtin_ia32_rndscalesd_mask_round): New intrinsics.
> 	* config/i386/sse.md (avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
> 	(avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>): ... this.

These two lines are too long.  Perhaps:
	* config/i386/sse.md
	(avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
	(avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>):
	... this.

> 	((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
> 	"<round_saeonly_constraint>")): Changed to ...
> 	((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
> 	"<round_saeonly_scalar_constraint>")): ... this.
> 	("vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
> 	%0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"): Changed to ...
> 	("vrndscale<ssescalarmodesuffix>\t{%3,<round_saeonly_scalar_mask_op4>%2, %1,
> 	%0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
> 	%<iptr>2<round_saeonly_scalar_mask_op4>, %3}"): ... this.

But the above is not appropriate, the ChangeLog in *.md is at the level of
define_{insn,expand,split,peephole2,insn_and_split} etc., not at the level
of individual subrtls or patterns.
So, the right thing is just to ammend the "... this.", follow it up by a
sentence what also changed.  Like " Adjust and add subst attributes to make
it maskable."

> 
> gcc/testsuite/
> 	* gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss,
> 	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> 	_mm_maskz_roundscale_round_ss): Test new intrinsics.

That is again not what you've changed.  For tests, often the exact
spots aren't listed and one just uses *.c: Description. but if you want to
use details, you can e.g.
	* gcc.target/i386/avx512f-vrndscaless-1.c: Add scan-assembler-times
	directives for newly expected instructions.
	(m): New variable.
	(avx512f_test): Add tests for new intrinsics.

> 	* gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss,
> 	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> 	_mm_maskz_roundscale_round_ss): Test new intrinsics.
> 	* gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd,
> 	_mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
> 	_mm_maskz_roundscale_round_sd): Test new intrinsics.
> 	* gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd,
> 	_mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
> 	_mm_maskz_roundscale_round_sd): Test new intrinsics.

Likewise.

> 	* gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round,
> 	__builtin_ia32_rndscalefsd_round): Remove builtin.
> 	(__builtin_ia32_rndscalefss_mask_round,
> 	__builtin_ia32_rndscalefsd_mask_round): Test new builtin.

That is not what you've changed.  You are there not Removing a builtin
and testing a new builtin, but removing a macro and defining a new macro.
So I think
: Remove.
: Define.
is more appropriate.

> +#define _mm_roundscale_round_ss(A, B, I, R)					\
> +  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),	\
> +						  (__v4sf)(__m128)(B),	\
> +						  (int)(I),		\
> +						  (__v4sf)_mm_setzero_ps(),\

There should be a space after _mm_setzero_p[sd] .
I know the formatting of many macros is just wrong, but no need to add more
of that.

Ok for trunk with those nits fixed.

In fact, it would be probably cleaner to:
  ((__m128)								\
   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
					 (__v4sf) (__m128) (B),		\
					 (int) (I),			\
					 (__v4sf) _mm_setzero_ps (),	\
					 (__mmask8) (-1),		\
					 (int) (R))
or so, because then one has for long builtin names more space.  But
I'm not asking for that to be changed.

	Jakub
Hongtao Liu Oct. 12, 2019, 9:36 a.m. UTC | #2
On Sat, Oct 12, 2019 at 4:15 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> > gcc/
> >       * config/i386/avx512fintrin.h (_mm_mask_roundscale_ss,
> >       _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> >       _mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd,
> >       _mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd,
> >       _mm_maskz_roundscale_round_sd): New intrinsics.
> >       (_mm_roundscale_ss, _mm_roundscale_round_ss): Fix.
>
> "Fix." doesn't describe the change you've done.  So I think it should be
> instead:
> "Use __builtin_ia32_rndscales?_mask_round builtins instead of
> __builtin_ia32_rndscales?_round."
>
> >       * config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round,
> >       __builtin_ia32_rndscalesd_round): Remove.
> >       (__builtin_ia32_rndscalesd_mask_round,
>
> Pasto, sd listed twice, ss not listed, change the first one to ss.
>
> >       __builtin_ia32_rndscalesd_mask_round): New intrinsics.
> >       * config/i386/sse.md (avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
> >       (avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>): ... this.
>
> These two lines are too long.  Perhaps:
>         * config/i386/sse.md
>         (avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
>         (avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>):
>         ... this.
>
> >       ((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
> >       "<round_saeonly_constraint>")): Changed to ...
> >       ((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
> >       "<round_saeonly_scalar_constraint>")): ... this.
> >       ("vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
> >       %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"): Changed to ...
> >       ("vrndscale<ssescalarmodesuffix>\t{%3,<round_saeonly_scalar_mask_op4>%2, %1,
> >       %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
> >       %<iptr>2<round_saeonly_scalar_mask_op4>, %3}"): ... this.
>
> But the above is not appropriate, the ChangeLog in *.md is at the level of
> define_{insn,expand,split,peephole2,insn_and_split} etc., not at the level
> of individual subrtls or patterns.
> So, the right thing is just to ammend the "... this.", follow it up by a
> sentence what also changed.  Like " Adjust and add subst attributes to make
> it maskable."
>
> >
> > gcc/testsuite/
> >       * gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss,
> >       _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> >       _mm_maskz_roundscale_round_ss): Test new intrinsics.
>
> That is again not what you've changed.  For tests, often the exact
> spots aren't listed and one just uses *.c: Description. but if you want to
> use details, you can e.g.
>         * gcc.target/i386/avx512f-vrndscaless-1.c: Add scan-assembler-times
>         directives for newly expected instructions.
>         (m): New variable.
>         (avx512f_test): Add tests for new intrinsics.
>
> >       * gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss,
> >       _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
> >       _mm_maskz_roundscale_round_ss): Test new intrinsics.
> >       * gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd,
> >       _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
> >       _mm_maskz_roundscale_round_sd): Test new intrinsics.
> >       * gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd,
> >       _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
> >       _mm_maskz_roundscale_round_sd): Test new intrinsics.
>
> Likewise.
>
> >       * gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round,
> >       __builtin_ia32_rndscalefsd_round): Remove builtin.
> >       (__builtin_ia32_rndscalefss_mask_round,
> >       __builtin_ia32_rndscalefsd_mask_round): Test new builtin.
>
> That is not what you've changed.  You are there not Removing a builtin
> and testing a new builtin, but removing a macro and defining a new macro.
> So I think
> : Remove.
> : Define.
> is more appropriate.
>
> > +#define _mm_roundscale_round_ss(A, B, I, R)                                  \
> > +  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),       \
> > +                                               (__v4sf)(__m128)(B),  \
> > +                                               (int)(I),             \
> > +                                               (__v4sf)_mm_setzero_ps(),\
>
> There should be a space after _mm_setzero_p[sd] .
> I know the formatting of many macros is just wrong, but no need to add more
> of that.
>
> Ok for trunk with those nits fixed.
>
> In fact, it would be probably cleaner to:
>   ((__m128)                                                             \
>    __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),         \
>                                          (__v4sf) (__m128) (B),         \
>                                          (int) (I),                     \
>                                          (__v4sf) _mm_setzero_ps (),    \
>                                          (__mmask8) (-1),               \
>                                          (int) (R))
> or so, because then one has for long builtin names more space.  But
> I'm not asking for that to be changed.
>
>         Jakub

Thanks for your review, that helps a lot.
diff mbox series

Patch

From 39a2547f73c63493d502384c45b38b3dc54005c8 Mon Sep 17 00:00:00 2001
From: "Wang, Hongyu" <hongyu.wang@intel.com>
Date: Sat, 12 Oct 2019 00:07:01 -0700
Subject: [PATCH] PR target/92035

Add missing mask[z]_roundscale_[round]_s[d,s] intrinsics

gcc/
	* config/i386/avx512fintrin.h (_mm_mask_roundscale_ss,
	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
	_mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd,
	_mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd,
	_mm_maskz_roundscale_round_sd): New intrinsics.
	(_mm_roundscale_ss, _mm_roundscale_round_ss): Fix.
	* config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round,
	__builtin_ia32_rndscalesd_round): Remove.
	(__builtin_ia32_rndscalesd_mask_round,
	__builtin_ia32_rndscalesd_mask_round): New intrinsics.
	* config/i386/sse.md (avx512f_rndscale<mode><round_saeonly_name>): Renamed to ...
	(avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>): ... this.
	((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
	"<round_saeonly_constraint>")): Changed to ...
	((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
	"<round_saeonly_scalar_constraint>")): ... this.
	("vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
	%0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"): Changed to ...
	("vrndscale<ssescalarmodesuffix>\t{%3,<round_saeonly_scalar_mask_op4>%2, %1,
	%0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
	%<iptr>2<round_saeonly_scalar_mask_op4>, %3}"): ... this.

gcc/testsuite/
	* gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss,
	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
	_mm_maskz_roundscale_round_ss): Test new intrinsics.
	* gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss,
	_mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss,
	_mm_maskz_roundscale_round_ss): Test new intrinsics.
	* gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd,
	_mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
	_mm_maskz_roundscale_round_sd): Test new intrinsics.
	* gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd,
	_mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd,
	_mm_maskz_roundscale_round_sd): Test new intrinsics.
	* gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round,
	__builtin_ia32_rndscalefsd_round): Remove builtin.
	(__builtin_ia32_rndscalefss_mask_round,
	__builtin_ia32_rndscalefsd_mask_round): Test new builtin.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
---
 gcc/config/i386/avx512fintrin.h               | 234 ++++++++++++++++--
 gcc/config/i386/i386-builtin.def              |   4 +-
 gcc/config/i386/sse.md                        |   9 +-
 gcc/testsuite/gcc.target/i386/avx-1.c         |   4 +-
 .../gcc.target/i386/avx512f-vrndscalesd-1.c   |  12 +-
 .../gcc.target/i386/avx512f-vrndscalesd-2.c   |  42 +++-
 .../gcc.target/i386/avx512f-vrndscaless-1.c   |  12 +-
 .../gcc.target/i386/avx512f-vrndscaless-2.c   |  41 ++-
 gcc/testsuite/gcc.target/i386/sse-13.c        |   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   4 +-
 10 files changed, 324 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index c2ca4e15acd..5773ac74360 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9169,10 +9169,40 @@  _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, const int __R)
+_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm,
+			 const int __R)
 {
-  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
-						   (__v4sf) __B, __imm, __R);
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+							(__v4sf) __B, __imm,
+							(__v4sf)
+							_mm_setzero_ps (),
+							(__mmask8) -1,
+							__R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			      __m128 __D, const int __imm, const int __R)
+{
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+							(__v4sf) __D, __imm,
+							(__v4sf) __A,
+							(__mmask8) __B,
+							__R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C,
+			       const int __imm, const int __R)
+{
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+							(__v4sf) __C, __imm,
+							(__v4sf)
+							_mm_setzero_ps (),
+							(__mmask8) __A,
+							__R);
 }
 
 extern __inline __m128d
@@ -9180,8 +9210,37 @@  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
 			 const int __R)
 {
-  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
-						    (__v2df) __B, __imm, __R);
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+							 (__v2df) __B, __imm,
+							 (__v2df)
+							 _mm_setzero_pd (),
+							 (__mmask8) -1,
+							 __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
+			      const int __imm, const int __R)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+							 (__v2df) __D, __imm,
+							 (__v2df) __A,
+							 (__mmask8) __B,
+							 __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C,
+			 const int __imm, const int __R)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+							 (__v2df) __C, __imm,
+							 (__v2df)
+							 _mm_setzero_pd (),
+							 (__mmask8) __A,
+							 __R);
 }
 
 #else
@@ -9211,12 +9270,48 @@  _mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), R))
-#define _mm_roundscale_round_ss(A, B, C, R)					\
-  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), R))
-#define _mm_roundscale_round_sd(A, B, C, R)					\
-  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), R))
+#define _mm_roundscale_round_ss(A, B, I, R)					\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),	\
+						  (__v4sf)(__m128)(B),	\
+						  (int)(I),		\
+						  (__v4sf)_mm_setzero_ps(),\
+						  (__mmask8)(-1),	\
+						  (int)(R)))
+#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R)				\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(B),	\
+						  (__v4sf)(__m128)(C),	\
+						  (int)(I),		\
+						  (__v4sf)(__m128)(A),	\
+						  (__mmask8)(U),	\
+						  (int)(R)))
+#define _mm_maskz_roundscale_round_ss(U, A, B, I, R)				\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),	\
+						  (__v4sf)(__m128)(B),	\
+						  (int)(I),		\
+						  (__v4sf)_mm_setzero_ps(),\
+						  (__mmask8)(U),	\
+						  (int)(R)))
+#define _mm_roundscale_round_sd(A, B, I, R)					\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),\
+						   (__v2df)(__m128d)(B),\
+						   (int)(I),		\
+						   (__v2df)_mm_setzero_pd(),\
+						   (__mmask8)(-1),	\
+						   (int)(R)))
+#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R)				\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(B),\
+						   (__v2df)(__m128d)(C),\
+						   (int)(I),		\
+						   (__v2df)(__m128d)(A),\
+						   (__mmask8)(U),	\
+						   (int)(R)))
+#define _mm_maskz_roundscale_round_sd(U, A, B, I, R)				\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),\
+						   (__v2df)(__m128d)(B),\
+						   (int)(I),		\
+						   (__v2df)_mm_setzero_pd(),\
+						   (__mmask8)(U),	\
+						   (int)(R)))
 #endif
 
 extern __inline __m512
@@ -14812,18 +14907,75 @@  extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
 {
-  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
-						   (__v4sf) __B, __imm,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+						        (__v4sf) __B, __imm,
+							(__v4sf)
+							_mm_setzero_ps (),
+							(__mmask8) -1,
+							_MM_FROUND_CUR_DIRECTION);
+}
+
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D,
+			const int __imm)
+{
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+							(__v4sf) __D, __imm,
+							(__v4sf) __A,
+							(__mmask8) __B,
+							_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C,
+			 const int __imm)
+{
+  return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+							(__v4sf) __C, __imm,
+							(__v4sf)
+							_mm_setzero_ps (),
+							(__mmask8) __A,
+							_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
 {
-  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
-						    (__v2df) __B, __imm,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+							 (__v2df) __B, __imm,
+							 (__v2df)
+							 _mm_setzero_pd (),
+							 (__mmask8) -1,
+							 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
+			const int __imm)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+							 (__v2df) __D, __imm,
+							 (__v2df) __A,
+							 (__mmask8) __B,
+							 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
+			 const int __imm)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+							 (__v2df) __C, __imm,
+							 (__v2df)
+							 _mm_setzero_pd (),
+							 (__mmask8) __A,
+							 _MM_FROUND_CUR_DIRECTION);
 }
 
 #else
@@ -14853,12 +15005,48 @@  _mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_ss(A, B, C)					\
-  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
-  (__v4sf)(__m128)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_sd(A, B, C)					\
-  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_ss(A, B, I)					\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),	\
+						  (__v4sf)(__m128)(B),	\
+						  (int)(I),		\
+						  (__v4sf)_mm_setzero_ps(),\
+						  (__mmask8)(-1),	\
+						  _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_ss(A, U, B, C, I)				\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(B),	\
+						  (__v4sf)(__m128)(C),	\
+						  (int)(I),		\
+						  (__v4sf)(__m128)(A),	\
+						  (__mmask8)(U),	\
+						  _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_ss(U, A, B, I)				\
+  ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A),	\
+						  (__v4sf)(__m128)(B),	\
+						  (int)(I),		\
+						  (__v4sf)_mm_setzero_ps(),\
+						  (__mmask8)(U),	\
+						  _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_sd(A, B, I)					\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),	\
+						   (__v2df)(__m128d)(B),\
+						   (int)(I),		\
+						   (__v2df)_mm_setzero_pd(),\
+						   (__mmask8)(-1),	\
+						   _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_sd(A, U, B, C, I)				\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(B),	\
+						   (__v2df)(__m128d)(C),\
+						   (int)(I),		\
+						   (__v2df)(__m128d)(A),\
+						   (__mmask8)(U),	\
+						   _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_sd(U, A, B, I)				\
+  ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),	\
+						   (__v2df)(__m128d)(B),\
+						   (int)(I),		\
+						   (__v2df)_mm_setzero_pd(),\
+						   (__mmask8)(U),	\
+						   _MM_FROUND_CUR_DIRECTION))
 #endif
 
 #ifdef __OPTIMIZE__
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 6ac820eb897..11028331cda 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2828,8 +2828,8 @@  BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia3
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_mask_round, "__builtin_ia32_mulss_mask_round", IX86_BUILTIN_MULSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_round, "__builtin_ia32_rndscalesd_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_round, "__builtin_ia32_rndscaless_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_mask_round, "__builtin_ia32_rndscalesd_mask_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_mask_round, "__builtin_ia32_rndscaless_mask_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv2df_mask_round, "__builtin_ia32_scalefsd_mask_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 07922a1bf97..f474eed1c4e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -9694,18 +9694,17 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_rndscale<mode><round_saeonly_name>"
+(define_insn "avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+	    [(match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
-	  (match_dup 1)
+	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %<iptr>2<round_saeonly_scalar_mask_op4>, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 741b3c4f8e3..3600a7abe91 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -283,8 +283,8 @@ 
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8)
-#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
-#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
+#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4)
+#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
index 255b384d565..f95d4709607 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
@@ -1,14 +1,24 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 
 #include <immintrin.h>
 
 volatile __m128d x1, x2;
+volatile __mmask8 m;
 
 void extern
 avx512f_test (void)
 {
   x1 = _mm_roundscale_sd (x1, x2, 0x42);
   x1 = _mm_roundscale_round_sd (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+  x1 = _mm_mask_roundscale_sd (x1, m, x1, x2, 0x42);
+  x1 = _mm_mask_roundscale_round_sd (x1, m, x1, x2, 0x42, _MM_FROUND_NO_EXC);
+  x1 = _mm_maskz_roundscale_sd (m, x1, x2, 0x42);
+  x1 = _mm_maskz_roundscale_round_sd (m, x1, x2, 0x42, _MM_FROUND_NO_EXC);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
index b96aa462790..83b940d9636 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
@@ -6,6 +6,7 @@ 
 
 #include <math.h>
 #include "avx512f-check.h"
+#include "avx512f-mask-type.h"
 
 static void
 compute_rndscalesd (double *s1, double *s2, double *r, int imm)
@@ -33,17 +34,54 @@  compute_rndscalesd (double *s1, double *s2, double *r, int imm)
 static void
 avx512f_test (void)
 {
-  int imm = _MM_FROUND_FLOOR | (7 << 4);
-  union128d s1, s2, res1;
+  int i, imm;
+  union128d s1, s2, res1, res2, res3, res4, res5, res6;
   double res_ref[SIZE];
+  
+  MASK_TYPE mask = MASK_VALUE;
+
+  imm = _MM_FROUND_FLOOR | (7 << 4);
 
   s1.x = _mm_set_pd (4.05084, -1.23162);
   s2.x = _mm_set_pd (-3.53222, 7.33527);
 
+  for(i = 0; i < SIZE; i++)
+    {
+      res2.a[i] = DEFAULT_VALUE;
+      res5.a[i] = DEFAULT_VALUE;
+    }
+
   res1.x = _mm_roundscale_sd (s1.x, s2.x, imm);
+  res2.x = _mm_mask_roundscale_sd (res2.x, mask, s1.x, s2.x, imm);
+  res3.x = _mm_maskz_roundscale_sd (mask, s1.x, s2.x, imm);
+  res4.x = _mm_roundscale_round_sd (s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
+  res5.x = _mm_mask_roundscale_round_sd (res5.x, mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
+  res6.x = _mm_maskz_roundscale_round_sd (mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
 
   compute_rndscalesd (s1.a, s2.a, res_ref, imm);
 
   if (check_union128d (res1, res_ref))
     abort ();
+
+  MASK_MERGE (d) (res_ref, mask, 1);
+  if (check_union128d (res2, res_ref))
+    abort ();
+  
+  MASK_ZERO (d) (res_ref, mask, 1);
+  if (check_union128d (res3, res_ref))
+    abort ();
+
+  compute_rndscalesd (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128d (res4, res_ref))
+    abort ();
+
+  MASK_MERGE (d) (res_ref, mask, 1);
+  if (check_union128d (res5, res_ref))
+    abort ();
+  
+  MASK_ZERO (d) (res_ref, mask, 1);
+  if (check_union128d (res6, res_ref))
+    abort ();
+
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
index dbd6e21b762..19e3a973fa4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
@@ -1,14 +1,24 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 
 #include <immintrin.h>
 
 volatile __m128 x1, x2;
+volatile __mmask8 m;
 
 void extern
 avx512f_test (void)
 {
   x1 = _mm_roundscale_ss (x1, x2, 0x42);
   x1 = _mm_roundscale_round_ss (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+  x1 = _mm_mask_roundscale_ss (x1, m, x1, x2, 0x42);
+  x1 = _mm_mask_roundscale_round_ss (x1, m, x1, x2, 0x42, _MM_FROUND_NO_EXC);
+  x1 = _mm_maskz_roundscale_ss (m, x1, x2, 0x42);
+  x1 = _mm_maskz_roundscale_round_ss (m, x1, x2, 0x42, _MM_FROUND_NO_EXC);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
index 42dd645ab87..6906880d362 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
@@ -6,6 +6,7 @@ 
 
 #include <math.h>
 #include "avx512f-check.h"
+#include "avx512f-mask-type.h"
 
 static void
 compute_rndscaless (float *s1, float *s2, float *r, int imm)
@@ -35,17 +36,53 @@  compute_rndscaless (float *s1, float *s2, float *r, int imm)
 static void
 avx512f_test (void)
 {
-  int imm = _MM_FROUND_FLOOR | (7 << 4);
-  union128 s1, s2, res1;
+  int i, imm;
+  union128 s1, s2, res1, res2, res3, res4, res5, res6;
   float res_ref[SIZE];
+  
+  MASK_TYPE mask = MASK_VALUE;
 
+  imm = _MM_FROUND_FLOOR | (7 << 4);
+  
   s1.x = _mm_set_ps (4.05084, -1.23162, 2.00231, -6.22103);
   s2.x = _mm_set_ps (-4.19319, -3.53222, 7.33527, 5.57655);
+ 
+  for(i = 0; i < SIZE; i++)
+    {
+      res2.a[i] = DEFAULT_VALUE;
+      res5.a[i] = DEFAULT_VALUE;
+    }
 
   res1.x = _mm_roundscale_ss (s1.x, s2.x, imm);
+  res2.x = _mm_mask_roundscale_ss (res2.x, mask, s1.x, s2.x, imm);
+  res3.x = _mm_maskz_roundscale_ss (mask, s1.x, s2.x, imm);
+  res4.x = _mm_roundscale_round_ss (s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
+  res5.x = _mm_mask_roundscale_round_ss (res5.x, mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
+  res6.x = _mm_maskz_roundscale_round_ss (mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC);
 
   compute_rndscaless (s1.a, s2.a, res_ref, imm);
 
   if (check_union128 (res1, res_ref))
     abort ();
+
+  MASK_MERGE () (res_ref, mask, 1);
+  if (check_union128 (res2, res_ref))
+    abort ();
+  
+  MASK_ZERO () (res_ref, mask, 1);
+  if (check_union128 (res3, res_ref))
+    abort ();
+
+  compute_rndscaless (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128 (res4, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, 1);
+  if (check_union128 (res5, res_ref))
+    abort ();
+  
+  MASK_ZERO () (res_ref, mask, 1);
+  if (check_union128 (res6, res_ref))
+    abort ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 39b2d31578c..45c1c285c57 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -300,8 +300,8 @@ 
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8)
-#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
-#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
+#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4)
+#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 7ea665de747..e98c7693ef7 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -302,8 +302,8 @@ 
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8)
-#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
-#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
+#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4)
+#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8)
-- 
2.17.1