diff mbox series

[v3,1/6] rs6000: Support SSE4.1 "round" intrinsics

Message ID 20210823190310.1679905-2-pc@us.ibm.com
State New
Headers show
Series rs6000: Support more SSE4 intrinsics | expand

Commit Message

Paul A. Clarke Aug. 23, 2021, 7:03 p.m. UTC
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible.

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-08-20  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
	Convert from function to macro.

gcc/testsuite
	* gcc.target/powerpc/sse4_1-round3.h: New.
	* gcc.target/powerpc/sse4_1-roundpd.c: New.
	* gcc.target/powerpc/sse4_1-roundps.c: New.
	* gcc.target/powerpc/sse4_1-roundsd.c: New.
	* gcc.target/powerpc/sse4_1-roundss.c: New.
---
v3: No change.
v2:
- Replaced clever (and broken) exception masking with more straightforward
  implementation, per v1 review and closer inspection. mtfsf was only
  writing the final nybble (1) instead of the final two nybbles (2), so
  not all of the exception-enable bits were cleared.
- Renamed some variables from cryptic "tmp" and "save" to
  "fpscr_save" and "enables_save".
- Retained use of __builtin_mffsl, since that is supported pre-POWER8
  (with an alternate instruction sequence).
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Added some additional text to the commit message about some of the
  (unpleasant?) implementations and decorations coming from
  like implementations in gcc/config/i386, per v1 review.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Fixed indentation and other minor formatting changes, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h                 | 240 +++++++++++-----
 .../gcc.target/powerpc/sse4_1-round3.h        |  81 ++++++
 .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 ++++++++++
 .../gcc.target/powerpc/sse4_1-roundps.c       |  98 +++++++
 .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-roundss.c       | 208 ++++++++++++++
 6 files changed, 962 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

Comments

Li, Pan2 via Gcc-patches Aug. 27, 2021, 1:44 p.m. UTC | #1
Hi Paul,

On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> Suppress exceptions (when specified), by saving, manipulating, and
> restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> rounding mode when required.
>
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible.
>
> The scalar versions naively use the parallel versions to compute the
> single scalar result and then construct the remainder of the result.
>
> Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> are swapped from the corresponding values on x86 so as to match the
> corresponding rounding mode values in the Power ISA.
>
> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.
>
> Function signatures match the analogous functions in config/i386/smmintrin.h.
>
> Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> modeled after the very similar "floor" and "ceil" tests.
>
> Include basic tests, plus tests at the boundaries for floating-point
> representation, positive and negative, test all of the parameterized
> rounding modes as well as the C99 rounding modes and interactions
> between the two.
>
> Exceptions are not explicitly tested.

Again, please specify where the patch was tested and whether this is for 
trunk, backports, etc.  Thanks!  (I know you aren't asking for 
backports, but in general please get in the habit of this.)
>
> 2021-08-20  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> 	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> 	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> 	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> 	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> 	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> 	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> 	Convert from function to macro.
>
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-round3.h: New.
> 	* gcc.target/powerpc/sse4_1-roundpd.c: New.
> 	* gcc.target/powerpc/sse4_1-roundps.c: New.
> 	* gcc.target/powerpc/sse4_1-roundsd.c: New.
> 	* gcc.target/powerpc/sse4_1-roundss.c: New.
> ---
> v3: No change.
> v2:
> - Replaced clever (and broken) exception masking with more straightforward
>    implementation, per v1 review and closer inspection. mtfsf was only
>    writing the final nybble (1) instead of the final two nybbles (2), so
>    not all of the exception-enable bits were cleared.
> - Renamed some variables from cryptic "tmp" and "save" to
>    "fpscr_save" and "enables_save".
> - Retained use of __builtin_mffsl, since that is supported pre-POWER8
>    (with an alternate instruction sequence).
> - Added "extern" to functions to maintain compatible decorations with
>    like implementations in gcc/config/i386.
> - Added some additional text to the commit message about some of the
>    (unpleasant?) implementations and decorations coming from
>    like implementations in gcc/config/i386, per v1 review.
> - Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
> - Fixed indentation and other minor formatting changes, per v1 review.
> - Noted testing in patch series cover letter.
>
>   gcc/config/rs6000/smmintrin.h                 | 240 +++++++++++-----
>   .../gcc.target/powerpc/sse4_1-round3.h        |  81 ++++++
>   .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 ++++++++++
>   .../gcc.target/powerpc/sse4_1-roundps.c       |  98 +++++++
>   .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-roundss.c       | 208 ++++++++++++++
>   6 files changed, 962 insertions(+), 64 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 3767a67eada7..a6b88d313ad0 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -42,6 +42,182 @@
>   #include <altivec.h>
>   #include <tmmintrin.h>
>   
> +/* Rounding mode macros. */
> +#define _MM_FROUND_TO_NEAREST_INT       0x00
> +#define _MM_FROUND_TO_ZERO              0x01
> +#define _MM_FROUND_TO_POS_INF           0x02
> +#define _MM_FROUND_TO_NEG_INF           0x03
> +#define _MM_FROUND_CUR_DIRECTION        0x04
> +
> +#define _MM_FROUND_NINT		\
> +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_FLOOR	\
> +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_CEIL		\
> +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_TRUNC	\
> +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_RINT		\
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_NEARBYINT	\
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> +
> +#define _MM_FROUND_RAISE_EXC            0x00
> +#define _MM_FROUND_NO_EXC               0x08
> +
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> +  __v2df __r;
> +  union {
> +    double __fr;
> +    long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Save enabled exceptions, disable all exceptions,
> +	 and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +#else
> +      __fpscr_save.__fr = __builtin_mffs ();
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +      __fpscr_save.__fpscr &= ~0xf8;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +#endif
> +    }
> +
> +  switch (__rounding)
> +    {
> +      case _MM_FROUND_TO_NEAREST_INT:
> +	__fpscr_save.__fr = __builtin_mffsl ();

As pointed out in the v1 review, __builtin_mffsl is enabled (or supposed 
to be) only for POWER9 and later.  This will fail to work on POWER8 and 
earlier when the new builtins support is complete and this is enforced 
more carefully.  Please #ifdef and use __builtin_mffs on earlier 
processors.  Please do this everywhere this occurs.

I think you got some contradictory guidance on this, but trust me, this 
will break.

Otherwise it looks to me that comments were all addressed 
appropriately.  Recommend approval with that fixed.

Thanks!
Bill

> +	__attribute__ ((fallthrough));
> +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> +	__builtin_set_fpscr_rn (0b00);
> +	__r = vec_rint ((__v2df) __A);
> +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> +	break;
> +      case _MM_FROUND_TO_NEG_INF:
> +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_floor ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_TO_POS_INF:
> +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_ceil ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_TO_ZERO:
> +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> +	__r = vec_trunc ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_CUR_DIRECTION:
> +	__r = vec_rint ((__v2df) __A);
> +	break;
> +    }
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Restore enabled exceptions.  */
> +      __fpscr_save.__fr = __builtin_mffsl ();
> +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +    }
> +  return (__m128d) __r;
> +}
> +
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
> +{
> +  __B = _mm_round_pd (__B, __rounding);
> +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
> +  return (__m128d) __r;
> +}
> +
> +extern __inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_ps (__m128 __A, int __rounding)
> +{
> +  __v4sf __r;
> +  union {
> +    double __fr;
> +    long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Save enabled exceptions, disable all exceptions,
> +	 and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +#else
> +      __fpscr_save.__fr = __builtin_mffs ();
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +      __fpscr_save.__fpscr &= ~0xf8;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +#endif
> +    }
> +
> +  switch (__rounding)
> +    {
> +      case _MM_FROUND_TO_NEAREST_INT:
> +	__fpscr_save.__fr = __builtin_mffsl ();
> +	__attribute__ ((fallthrough));
> +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> +	__builtin_set_fpscr_rn (0b00);
> +	__r = vec_rint ((__v4sf) __A);
> +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> +	break;
> +      case _MM_FROUND_TO_NEG_INF:
> +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_floor ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_TO_POS_INF:
> +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_ceil ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_TO_ZERO:
> +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> +	__r = vec_trunc ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_CUR_DIRECTION:
> +	__r = vec_rint ((__v4sf) __A);
> +	break;
> +    }
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Restore enabled exceptions.  */
> +      __fpscr_save.__fr = __builtin_mffsl ();
> +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +    }
> +  return (__m128) __r;
> +}
> +
> +extern __inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
> +{
> +  __B = _mm_round_ps (__B, __rounding);
> +  __v4sf __r = (__v4sf) __A;
> +  __r[0] = ((__v4sf)__B)[0];
> +  return (__m128) __r;
> +}
> +
> +#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
> +#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
> +
> +#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
> +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
> +
> +#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
> +#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
> +
> +#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
> +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
> +
>   extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>   _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
>   {
> @@ -232,70 +408,6 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
>     return any_ones * any_zeros;
>   }
>   
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_pd (__m128d __A)
> -{
> -  return (__m128d) vec_ceil ((__v2df) __A);
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_sd (__m128d __A, __m128d __B)
> -{
> -  __v2df __r = vec_ceil ((__v2df) __B);
> -  __r[1] = ((__v2df) __A)[1];
> -  return (__m128d) __r;
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_pd (__m128d __A)
> -{
> -  return (__m128d) vec_floor ((__v2df) __A);
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_sd (__m128d __A, __m128d __B)
> -{
> -  __v2df __r = vec_floor ((__v2df) __B);
> -  __r[1] = ((__v2df) __A)[1];
> -  return (__m128d) __r;
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_ps (__m128 __A)
> -{
> -  return (__m128) vec_ceil ((__v4sf) __A);
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_ss (__m128 __A, __m128 __B)
> -{
> -  __v4sf __r = (__v4sf) __A;
> -  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> -  return __r;
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_ps (__m128 __A)
> -{
> -  return (__m128) vec_floor ((__v4sf) __A);
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_ss (__m128 __A, __m128 __B)
> -{
> -  __v4sf __r = (__v4sf) __A;
> -  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
> -  return __r;
> -}
> -
>   /* Return horizontal packed word minimum and its index in bits [15:0]
>      and bits [18:16] respectively.  */
>   __inline __m128i
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> new file mode 100644
> index 000000000000..de6cbf7be438
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> @@ -0,0 +1,81 @@
> +#include <smmintrin.h>
> +#include <fenv.h>
> +#include "sse4_1-check.h"
> +
> +#define DIM(a) (sizeof (a) / sizeof (a)[0])
> +
> +static int roundings[] =
> +  {
> +    _MM_FROUND_TO_NEAREST_INT,
> +    _MM_FROUND_TO_NEG_INF,
> +    _MM_FROUND_TO_POS_INF,
> +    _MM_FROUND_TO_ZERO,
> +    _MM_FROUND_CUR_DIRECTION
> +  };
> +
> +static int modes[] =
> +  {
> +    FE_TONEAREST,
> +    FE_UPWARD,
> +    FE_DOWNWARD,
> +    FE_TOWARDZERO
> +  };
> +
> +static void
> +TEST (void)
> +{
> +  int i, j, ri, mi, round_save;
> +
> +  round_save = fegetround ();
> +  for (mi = 0; mi < DIM (modes); mi++) {
> +    fesetround (modes[mi]);
> +    for (i = 0; i < DIM (data); i++) {
> +      for (ri = 0; ri < DIM (roundings); ri++) {
> +	union value guess;
> +	union value *current_answers = answers[ri];
> +	switch ( roundings[ri] ) {
> +	  case _MM_FROUND_TO_NEAREST_INT:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_NEAREST_INT);
> +	    break;
> +	  case _MM_FROUND_TO_NEG_INF:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_NEG_INF);
> +	    break;
> +	  case _MM_FROUND_TO_POS_INF:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_POS_INF);
> +	    break;
> +	  case _MM_FROUND_TO_ZERO:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_ZERO);
> +	    break;
> +	  case _MM_FROUND_CUR_DIRECTION:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_CUR_DIRECTION);
> +	    switch ( modes[mi] ) {
> +	      case FE_TONEAREST:
> +		current_answers = answers_NEAREST_INT;
> +		break;
> +	      case FE_UPWARD:
> +		current_answers = answers_POS_INF;
> +		break;
> +	      case FE_DOWNWARD:
> +		current_answers = answers_NEG_INF;
> +		break;
> +	      case FE_TOWARDZERO:
> +		current_answers = answers_ZERO;
> +		break;
> +	    }
> +	    break;
> +	  default:
> +	    abort ();
> +	}
> +	for (j = 0; j < DIM (guess.f); j++)
> +	  if (guess.f[j] != current_answers[i].f[j])
> +	    abort ();
> +      }
> +    }
> +  }
> +  fesetround (round_save);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> new file mode 100644
> index 000000000000..0528c395f233
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> @@ -0,0 +1,143 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +struct data2 data[] = {
> +  { .value1 = { .f = {  0.00,  0.25 } } },
> +  { .value1 = { .f = {  0.50,  0.75 } } },
> +
> +  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
> +  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
> +  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
> +  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
> +
> +  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
> +  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
> +
> +  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
> +  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
> +
> +  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
> +  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
> +  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
> +  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
> +
> +  { .value1 = { .f = { -1.00, -0.75 } } },
> +  { .value1 = { .f = { -0.50, -0.25 } } }
> +};
> +
> +union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  1.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00, -1.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value answers_NEG_INF[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00, -1.00 } },
> +  { .f = { -1.00, -1.00 } }
> +};
> +
> +union value answers_POS_INF[] = {
> +  { .f = {  0.00,  1.00 } },
> +  { .f = {  1.00,  1.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value answers_ZERO[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> new file mode 100644
> index 000000000000..6b5362e07590
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> @@ -0,0 +1,98 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +struct data2 data[] = {
> +  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
> +
> +  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> +			0x1.fffffcp+21,  0x1.fffffep+21 } } },
> +  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> +			0x1.fffffep+22,  0x1.fffffep+23 } } },
> +  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> +		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
> +  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> +		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
> +
> +  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
> +};
> +
> +union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00,  0.00,  0.00,  1.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.000000p+22,  0x1.000000p+22 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.000000p+23,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00, -1.00,  0.00,  0.00 } }
> +};
> +
> +union value answers_NEG_INF[] = {
> +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> +           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> +           -0x1.000000p+22, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00, -1.00, -1.00, -1.00 } }
> +};
> +
> +union value answers_POS_INF[] = {
> +  { .f = {  0.00,  1.00,  1.00,  1.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
> +            0x1.000000p+22,  0x1.000000p+22 } },
> +  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
> +            0x1.000000p+23,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> +};
> +
> +union value answers_ZERO[] = {
> +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> new file mode 100644
> index 000000000000..2b0bad6469df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> @@ -0,0 +1,256 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include <stdio.h>
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED } } }
> +};
> +
> +static union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_NEG_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_POS_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_ZERO[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> new file mode 100644
> index 000000000000..3154310314a1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> @@ -0,0 +1,208 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include <stdio.h>
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
> +};
> +
> +static union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_NEG_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_POS_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_ZERO[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
Li, Pan2 via Gcc-patches Aug. 27, 2021, 1:47 p.m. UTC | #2
On 8/27/21 8:44 AM, Bill Schmidt wrote:
>
> Again, please specify where the patch was tested and whether this is for
> trunk, backports, etc.  Thanks!  (I know you aren't asking for
> backports, but in general please get in the habit of this.)
>

Sorry, I see that you did this in the cover letter.  Never mind, sorry 
for the noise.

Bill
Paul A. Clarke Aug. 30, 2021, 9:16 p.m. UTC | #3
On Fri, Aug 27, 2021 at 08:44:43AM -0500, Bill Schmidt via Gcc-patches wrote:
> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> > +	__fpscr_save.__fr = __builtin_mffsl ();
> 
> As pointed out in the v1 review, __builtin_mffsl is enabled (or supposed to
> be) only for POWER9 and later.  This will fail to work on POWER8 and earlier
> when the new builtins support is complete and this is enforced more
> carefully.  Please #ifdef and use __builtin_mffs on earlier processors. 
> Please do this everywhere this occurs.
> 
> I think you got some contradictory guidance on this, but trust me, this will
> break.

The confusing thing is that _builtin_mffsl is explicitly supported on earlier
processors, if I read the code right (from gcc/config/rs6000/rs6000.md):
--
(define_expand "rs6000_mffsl"
  [(set (match_operand:DF 0 "gpc_reg_operand")
        (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSL))]
  "TARGET_HARD_FLOAT"
{
  /* If the low latency mffsl instruction (ISA 3.0) is available use it,
     otherwise fall back to the older mffs instruction to emulate the mffsl
     instruction.  */
  
  if (!TARGET_P9_MISC)
    {
      rtx tmp1 = gen_reg_rtx (DFmode);

      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl 
         instruction using the mffs instruction and masking the result.  */
      emit_insn (gen_rs6000_mffs (tmp1));
...
--

Is that going away?  If so, that would be a possible (undesirable?)
API change, no?

PC
Li, Pan2 via Gcc-patches Aug. 30, 2021, 9:24 p.m. UTC | #4
Hi Paul,

On 8/30/21 4:16 PM, Paul A. Clarke wrote:
> On Fri, Aug 27, 2021 at 08:44:43AM -0500, Bill Schmidt via Gcc-patches wrote:
>> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
>>> +	__fpscr_save.__fr = __builtin_mffsl ();
>> As pointed out in the v1 review, __builtin_mffsl is enabled (or supposed to
>> be) only for POWER9 and later.  This will fail to work on POWER8 and earlier
>> when the new builtins support is complete and this is enforced more
>> carefully.  Please #ifdef and use __builtin_mffs on earlier processors.
>> Please do this everywhere this occurs.
>>
>> I think you got some contradictory guidance on this, but trust me, this will
>> break.
> The confusing thing is that _builtin_mffsl is explicitly supported on earlier
> processors, if I read the code right (from gcc/config/rs6000/rs6000.md):
> --
> (define_expand "rs6000_mffsl"
>    [(set (match_operand:DF 0 "gpc_reg_operand")
>          (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSL))]
>    "TARGET_HARD_FLOAT"
> {
>    /* If the low latency mffsl instruction (ISA 3.0) is available use it,
>       otherwise fall back to the older mffs instruction to emulate the mffsl
>       instruction.  */
>    
>    if (!TARGET_P9_MISC)
>      {
>        rtx tmp1 = gen_reg_rtx (DFmode);
>
>        /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
>           instruction using the mffs instruction and masking the result.  */
>        emit_insn (gen_rs6000_mffs (tmp1));
> ...
> --
>
> Is that going away?  If so, that would be a possible (undesirable?)
> API change, no?

Hm, I see.  I missed that in the builtins conversion.  Apparently 
there's nothing in the test suite that verifies this work on P9, which 
is a hole that could use fixing.  This usage isn't documented anywhere 
near the builtin machinery, either.

I'll patch the new builtins code to move this to a more permissive 
stanza and document why.  You can leave your code as is.

Thanks!
Bill

>
> PC
Segher Boessenkool Oct. 7, 2021, 11:08 p.m. UTC | #5
On Mon, Aug 30, 2021 at 04:16:43PM -0500, Paul A. Clarke wrote:
> The confusing thing is that _builtin_mffsl is explicitly supported on earlier
> processors, if I read the code right (from gcc/config/rs6000/rs6000.md):

Yes.  It is very simple to implement everywhere, not significantly
slower than mffs.  So allowing this builtin to be used everywhere makes
it easier to use, with no real downsides.


Segher
Segher Boessenkool Oct. 7, 2021, 11:39 p.m. UTC | #6
On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible.

__builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
automatically, fwiw.

> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.

Hrm.  Using function-like macros is begging for trouble, as usual.  But
the x86 version does this, so meh.

> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> +  __v2df __r;
> +  union {
> +    double __fr;
> +    long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Save enabled exceptions, disable all exceptions,
> +	 and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));

The __volatile__ does likely not do what you want.  As far as I can see
you do not want one here anyway?

"volatile" does not order asm wrt fp insns, which you likely *do* want.

> +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };

You put spaces after only some casts, btw?  Well maybe I found the one
place you did it wrong, heh :-)  And you can avoid having so many parens
by making extra variables -- much more readable.

> +  switch (__rounding)

You do not need any of that __ either.

> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */

"dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
without VSX (so before p7) would have shown that, but do you have access
to any?  This is one of those things we are only told about a year after
it was added, because no one who tests often does that on so old
hardware :-)

So, okay for trunk (and backports after some burn-in) with that vsx_ok
fixed.  That asm needs fixing, but you can do that later.

Thanks!


Segher
Paul A. Clarke Oct. 8, 2021, 1:04 a.m. UTC | #7
On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> __builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
> automatically, fwiw.
> 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> 
> Hrm.  Using function-like macros is begging for trouble, as usual.  But
> the x86 version does this, so meh.
> 
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> > +  __v2df __r;
> > +  union {
> > +    double __fr;
> > +    long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Save enabled exceptions, disable all exceptions,
> > +	 and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> 
> The __volatile__ does likely not do what you want.  As far as I can see
> you do not want one here anyway?
> 
> "volatile" does not order asm wrt fp insns, which you likely *do* want.

Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
has no effect at all (6.47.1):

| The optional volatile qualifier has no effect. All basic asm blocks are
| implicitly volatile.

So, it could be removed without concern.

> > +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
> 
> You put spaces after only some casts, btw?  Well maybe I found the one
> place you did it wrong, heh :-)  And you can avoid having so many parens
> by making extra variables -- much more readable.

I'll fix this.

> > +  switch (__rounding)
> 
> You do not need any of that __ either.

I'm surprised that I don't. A .h file needs to be concerned about the
namespace it inherits, no?

> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> 
> "dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
> without VSX (so before p7) would have shown that, but do you have access
> to any?  This is one of those things we are only told about a year after
> it was added, because no one who tests often does that on so old
> hardware :-)
> 
> So, okay for trunk (and backports after some burn-in) with that vsx_ok
> fixed.  That asm needs fixing, but you can do that later.

OK.

Thanks!

PC
Segher Boessenkool Oct. 8, 2021, 5:39 p.m. UTC | #8
On Thu, Oct 07, 2021 at 08:04:23PM -0500, Paul A. Clarke wrote:
> On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> > > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > 
> > The __volatile__ does likely not do what you want.  As far as I can see
> > you do not want one here anyway?
> > 
> > "volatile" does not order asm wrt fp insns, which you likely *do* want.
> 
> Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
> has no effect at all (6.47.1):
> 
> | The optional volatile qualifier has no effect. All basic asm blocks are
> | implicitly volatile.
> 
> So, it could be removed without concern.

This is not a basic asm (it contains a ":"; that is not just an easy way
to see it, it is the *definition* of basic vs. extended asm).

The manual explains:

"""
Note that the compiler can move even 'volatile asm' instructions
relative to other code, including across jump instructions.  For
example, on many targets there is a system register that controls the
rounding mode of floating-point operations.  Setting it with a 'volatile
asm' statement, as in the following PowerPC example, does not work
reliably.

     asm volatile("mtfsf 255, %0" : : "f" (fpenv));
     sum = x + y;

The compiler may move the addition back before the 'volatile asm'
statement.  To make it work as expected, add an artificial dependency to
the 'asm' by referencing a variable in the subsequent code, for example:

     asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
     sum = x + y;
"""

> > You do not need any of that __ either.
> 
> I'm surprised that I don't. A .h file needs to be concerned about the
> namespace it inherits, no?

These are local variables in a function though.  You get such
complexities in macros, but never in functions, where everything is
scoped.  Local variables are a great thing.  And macros are a bad thing!


Segher
Paul A. Clarke Oct. 8, 2021, 7:27 p.m. UTC | #9
On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> On Thu, Oct 07, 2021 at 08:04:23PM -0500, Paul A. Clarke wrote:
> > On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> > > > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > 
> > > The __volatile__ does likely not do what you want.  As far as I can see
> > > you do not want one here anyway?
> > > 
> > > "volatile" does not order asm wrt fp insns, which you likely *do* want.
> > 
> > Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
> > has no effect at all (6.47.1):
> > 
> > | The optional volatile qualifier has no effect. All basic asm blocks are
> > | implicitly volatile.
> > 
> > So, it could be removed without concern.
> 
> This is not a basic asm (it contains a ":"; that is not just an easy way
> to see it, it is the *definition* of basic vs. extended asm).

Ah, basic vs extended. I learned something today... thanks for your
patience!

> The manual explains:
> 
> """
> Note that the compiler can move even 'volatile asm' instructions
> relative to other code, including across jump instructions.  For
> example, on many targets there is a system register that controls the
> rounding mode of floating-point operations.  Setting it with a 'volatile
> asm' statement, as in the following PowerPC example, does not work
> reliably.
> 
>      asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>      sum = x + y;
> 
> The compiler may move the addition back before the 'volatile asm'
> statement.  To make it work as expected, add an artificial dependency to
> the 'asm' by referencing a variable in the subsequent code, for example:
> 
>      asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>      sum = x + y;
> """

I see. Thanks for the reference. If I understand correctly, volatile
prevents some optimizations based on the defined inputs/outputs, but
the asm could still be subject to reordering.

In this particular case, I don't think it's an issue with respect to
reordering.  The code in question is:
+      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;

The output (__fpscr_save) is a source for the following assignment,
so the order should be respected, no?

With respect to volatile, I worry about removing it, because I do
indeed need that instruction to execute in order to clear the FPSCR
exception enable bits. That side-effect is not otherwise known to the
compiler.

> > > You do not need any of that __ either.
> > 
> > I'm surprised that I don't. A .h file needs to be concerned about the
> > namespace it inherits, no?
> 
> These are local variables in a function though.  You get such
> complexities in macros, but never in functions, where everything is
> scoped.  Local variables are a great thing.  And macros are a bad thing!

They are local variables in a function *in an include file*, though.
If a user's preprocessor macro just happens to match a local variable name
there could be problems, right?

a.h:
inline void foo () {
  int A = 0;
}

a.c:
#define A a+b
#include <a.h>

$ gcc -c -I. a.c
In file included from a.c:1:
a.c: In function ‘foo’:
a.h:1:12: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘+’ token
 #define A a+b
            ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
                 ^
a.h:1:13: error: ‘b’ undeclared (first use in this function)
 #define A a+b
             ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
                 ^
a.h:1:13: note: each undeclared identifier is reported only once for each function it appears in
 #define A a+b
             ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
                 ^
PC
Segher Boessenkool Oct. 8, 2021, 10:31 p.m. UTC | #10
On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > This is not a basic asm (it contains a ":"; that is not just an easy way
> > to see it, it is the *definition* of basic vs. extended asm).
> 
> Ah, basic vs extended. I learned something today... thanks for your
> patience!

To expand a little: any asm with operands is extended asm.  And without
operands can be either:  asm("eieio");  is basic, while  asm("eieio" : );
is extended.  This matters because semantics are a bit different.

> I see. Thanks for the reference. If I understand correctly, volatile
> prevents some optimizations based on the defined inputs/outputs, but
> the asm could still be subject to reordering.

"asm volatile" means there is a side effect in the asm.  This means that
it has to be executed on the real machine the same as on the abstract
machine, with the side effects in the same order.

It can still be reordered, modulo those restrictions.  It can be merged
with an identical asm as well.  And the compiler can split this into two
identical asms on two paths.

In this case you might want a side effect (the instructions writes to
the FPSCR after all).  But you need this to be tied to the FP code that
you want the flags to be changed for, and to the restore of the flags,
and finally you need to prevent other FP code from being scheduled in
between.

You need more for that than just volatile, and the solution may well
make volatile not wanted: tying the insns together somehow will
naturally make the flags restored to a sane situation again, so the
whole group can be removed if you want, etc.

> In this particular case, I don't think it's an issue with respect to
> reordering.  The code in question is:
> +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> 
> The output (__fpscr_save) is a source for the following assignment,
> so the order should be respected, no?

Other FP code can be interleaved, and then do the wrong thing.

> With respect to volatile, I worry about removing it, because I do
> indeed need that instruction to execute in order to clear the FPSCR
> exception enable bits. That side-effect is not otherwise known to the
> compiler.

Yes.  But as said above, volatile isn't enough to get this to behave
correctly.

The easiest way out is to write this all in one piece of (inline) asm.

> > > > You do not need any of that __ either.
> > > 
> > > I'm surprised that I don't. A .h file needs to be concerned about the
> > > namespace it inherits, no?
> > 
> > These are local variables in a function though.  You get such
> > complexities in macros, but never in functions, where everything is
> > scoped.  Local variables are a great thing.  And macros are a bad thing!
> 
> They are local variables in a function *in an include file*, though.
> If a user's preprocessor macro just happens to match a local variable name
> there could be problems, right?

Of course.  This is why traditionally macro names are ALL_CAPS :-)  So
in practice it doesn't matter, and in practice many users use __ names
themselves as well.

But you are right.  I just don't see it will help practically :-(


Segher
Paul A. Clarke Oct. 11, 2021, 1:46 p.m. UTC | #11
On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> > On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > I see. Thanks for the reference. If I understand correctly, volatile
> > prevents some optimizations based on the defined inputs/outputs, but
> > the asm could still be subject to reordering.
> 
> "asm volatile" means there is a side effect in the asm.  This means that
> it has to be executed on the real machine the same as on the abstract
> machine, with the side effects in the same order.
> 
> It can still be reordered, modulo those restrictions.  It can be merged
> with an identical asm as well.  And the compiler can split this into two
> identical asms on two paths.

It seems odd to me that the compiler can make any assumptions about
the side-effect(s). How does it know that a side-effect does not alter
computation (as it indeed does in this case), such that reordering is
a still correct (which it wouldn't be in this case)?

> In this case you might want a side effect (the instructions writes to
> the FPSCR after all).  But you need this to be tied to the FP code that
> you want the flags to be changed for, and to the restore of the flags,
> and finally you need to prevent other FP code from being scheduled in
> between.
> 
> You need more for that than just volatile, and the solution may well
> make volatile not wanted: tying the insns together somehow will
> naturally make the flags restored to a sane situation again, so the
> whole group can be removed if you want, etc.
> 
> > In this particular case, I don't think it's an issue with respect to
> > reordering.  The code in question is:
> > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > 
> > The output (__fpscr_save) is a source for the following assignment,
> > so the order should be respected, no?
> 
> Other FP code can be interleaved, and then do the wrong thing.
> 
> > With respect to volatile, I worry about removing it, because I do
> > indeed need that instruction to execute in order to clear the FPSCR
> > exception enable bits. That side-effect is not otherwise known to the
> > compiler.
> 
> Yes.  But as said above, volatile isn't enough to get this to behave
> correctly.
> 
> The easiest way out is to write this all in one piece of (inline) asm.

Ugh. I really don't want to go there, not just because it's work, but
I think this is a paradigm that should work without needing to drop
fully into asm.

Is there something unique about using an "asm" statement versus using,
say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?
Very similar methods are used in glibc today. Are those broken?

Would creating a __builtin_mffsce be another solution?

Would adding memory barriers between the FPSCR manipulations and the
code which is bracketed by them be sufficient?

PC
Segher Boessenkool Oct. 11, 2021, 4:28 p.m. UTC | #12
On Mon, Oct 11, 2021 at 08:46:17AM -0500, Paul A. Clarke wrote:
> On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
> > "asm volatile" means there is a side effect in the asm.  This means that
> > it has to be executed on the real machine the same as on the abstract
> > machine, with the side effects in the same order.
> > 
> > It can still be reordered, modulo those restrictions.  It can be merged
> > with an identical asm as well.  And the compiler can split this into two
> > identical asms on two paths.
> 
> It seems odd to me that the compiler can make any assumptions about
> the side-effect(s). How does it know that a side-effect does not alter
> computation (as it indeed does in this case), such that reordering is
> a still correct (which it wouldn't be in this case)?

Because by definition side effects do not change the computation (where
"computation" means "the outputs of the asm")!

And if you are talking about changing future computations, as floating
point control flags can be used for: this falls ouside of the C abstract
machine, other than fe[gs]etround etc.

> > > With respect to volatile, I worry about removing it, because I do
> > > indeed need that instruction to execute in order to clear the FPSCR
> > > exception enable bits. That side-effect is not otherwise known to the
> > > compiler.
> > 
> > Yes.  But as said above, volatile isn't enough to get this to behave
> > correctly.
> > 
> > The easiest way out is to write this all in one piece of (inline) asm.
> 
> Ugh. I really don't want to go there, not just because it's work, but
> I think this is a paradigm that should work without needing to drop
> fully into asm.

Yes.  Let's say GCC still has some challenges here :-(

> Is there something unique about using an "asm" statement versus using,
> say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?

Nope.

> Very similar methods are used in glibc today. Are those broken?

Maybe.  If you get a real (i.e. not inline) function call there, that
can save you often.

> Would creating a __builtin_mffsce be another solution?

Yes.  And not a bad idea in the first place.

> Would adding memory barriers between the FPSCR manipulations and the
> code which is bracketed by them be sufficient?

No, what you want to order is not memory accesses, but FP computations
relative to the insns that change the FP control bits.  If *both* of
those change memory you can artificially order them with that.  But most
FP computations do not access memory.


Segher
Paul A. Clarke Oct. 11, 2021, 5:31 p.m. UTC | #13
On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 08:46:17AM -0500, Paul A. Clarke wrote:
> > On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
[...]
> > > > With respect to volatile, I worry about removing it, because I do
> > > > indeed need that instruction to execute in order to clear the FPSCR
> > > > exception enable bits. That side-effect is not otherwise known to the
> > > > compiler.
> > > 
> > > Yes.  But as said above, volatile isn't enough to get this to behave
> > > correctly.
> > > 
> > > The easiest way out is to write this all in one piece of (inline) asm.
> > 
> > Ugh. I really don't want to go there, not just because it's work, but
> > I think this is a paradigm that should work without needing to drop
> > fully into asm.
> 
> Yes.  Let's say GCC still has some challenges here :-(
> 
> > Is there something unique about using an "asm" statement versus using,
> > say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?
> 
> Nope.
> 
> > Very similar methods are used in glibc today. Are those broken?
> 
> Maybe.

Ouch.

> If you get a real (i.e. not inline) function call there, that
> can save you often.

Calling a real function in order to execute a single instruction is
sub-optimal. ;-)

> > Would creating a __builtin_mffsce be another solution?
> 
> Yes.  And not a bad idea in the first place.

The previous "Nope" and this "Yes" seem in contradiction. If there is no
difference between "asm" and builtin, how does using a builtin solve the
problem?

PC
Segher Boessenkool Oct. 11, 2021, 10:04 p.m. UTC | #14
Hi!

On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote:
> On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> > > Very similar methods are used in glibc today. Are those broken?
> > 
> > Maybe.
> 
> Ouch.

So show the code?

> > If you get a real (i.e. not inline) function call there, that
> > can save you often.
> 
> Calling a real function in order to execute a single instruction is
> sub-optimal. ;-)

Calling a real function (that does not even need a stack frame, just a
blr) is not terribly expensive, either.

> > > Would creating a __builtin_mffsce be another solution?
> > 
> > Yes.  And not a bad idea in the first place.
> 
> The previous "Nope" and this "Yes" seem in contradiction. If there is no
> difference between "asm" and builtin, how does using a builtin solve the
> problem?

You will have to make the builtin solve it.  What a builtin can do is
virtually unlimited.  What an asm can do is not: it just outputs some
assembler language, and does in/out/clobber constraints.  You can do a
*lot* with that, but it is much more limited than everything you can do
in the compiler!  :-)

The fact remains that there is no way in RTL (or Gimple for that matter)
to express things like rounding mode changes.  You will need to
artificially make some barriers.


Segher
Paul A. Clarke Oct. 12, 2021, 7:35 p.m. UTC | #15
On Mon, Oct 11, 2021 at 05:04:12PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> > > > Very similar methods are used in glibc today. Are those broken?
> > > 
> > > Maybe.
> > 
> > Ouch.
> 
> So show the code?

You asked for it. ;-)  Boiled down to remove macroisms and code that
should be removed by optimization:
--
static __inline __attribute__ ((__always_inline__)) void
libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
{
  fenv_union_t old;
  register fenv_union_t __fr;
  __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
  ctx->env = old.fenv = __fr.fenv; 
  ctx->updated_status = (r != (old.l & 3));
}
static __inline __attribute__ ((__always_inline__)) void
libc_feresetround_ppc (fenv_t *envp)
{ 
  fenv_union_t new = { .fenv = *envp };
  register fenv_union_t __fr;
  __fr.l = new.l & 3;
  __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
}
double
__sin (double x)
{
  struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
  libc_feholdsetround_ppc_ctx (&ctx, (0));
  /* floating point intensive code.  */
  return retval;
}
--

There's not much to it, really.  "mffscrni" on the way in to save and set
a required rounding mode, and "mffscrn" on the way out to restore it.

> > > If you get a real (i.e. not inline) function call there, that
> > > can save you often.
> > 
> > Calling a real function in order to execute a single instruction is
> > sub-optimal. ;-)
> 
> Calling a real function (that does not even need a stack frame, just a
> blr) is not terribly expensive, either.

Not ideal, better would be better.

> > > > Would creating a __builtin_mffsce be another solution?
> > > 
> > > Yes.  And not a bad idea in the first place.
> > 
> > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > difference between "asm" and builtin, how does using a builtin solve the
> > problem?
> 
> You will have to make the builtin solve it.  What a builtin can do is
> virtually unlimited.  What an asm can do is not: it just outputs some
> assembler language, and does in/out/clobber constraints.  You can do a
> *lot* with that, but it is much more limited than everything you can do
> in the compiler!  :-)
> 
> The fact remains that there is no way in RTL (or Gimple for that matter)
> to express things like rounding mode changes.  You will need to
> artificially make some barriers.

I know there is __builtin_set_fpscr_rn that generates mffscrn. This
is not used in the code above because I believe it first appears in
GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
a return value, which would be handy in this case).  Does the
implementation of that builtin meet the requirements needed here,
to prevent reordering of FP computation across instantiations of the
builtin?  If not, is there a model on which to base an implementation
of __builtin_mffsce (or some preferred name)?

PC
Segher Boessenkool Oct. 12, 2021, 10:25 p.m. UTC | #16
On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> You asked for it. ;-)  Boiled down to remove macroisms and code that
> should be removed by optimization:

Thanks :-)

> static __inline __attribute__ ((__always_inline__)) void
> libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> {
>   fenv_union_t old;
>   register fenv_union_t __fr;
>   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
>   ctx->env = old.fenv = __fr.fenv; 
>   ctx->updated_status = (r != (old.l & 3));
> }

(Should use "n", not "i", only numbers are allowed, not e.g. the address
of something.  This actually can matter, in unusual cases.)

This orders the updating of RN before the store to __fr.fenv .  There is
no other ordering ensured here.

The store to __fr.env obviously has to stay in order with anything that
can alias it, if that store isn't optimised away completely later.

> static __inline __attribute__ ((__always_inline__)) void
> libc_feresetround_ppc (fenv_t *envp)
> { 
>   fenv_union_t new = { .fenv = *envp };
>   register fenv_union_t __fr;
>   __fr.l = new.l & 3;
>   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
> }

This both reads from and stores to __fr.fenv, the asm has to stay
between those two accesses (in the machine code).  If the code that
actually depends on the modified RN depends onb that __fr.fenv some way,
all will be fine.

> double
> __sin (double x)
> {
>   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
>   libc_feholdsetround_ppc_ctx (&ctx, (0));
>   /* floating point intensive code.  */
>   return retval;
> }

... but there is no such dependency.  The cleanup attribute does not
give any such ordering either afaik.

> There's not much to it, really.  "mffscrni" on the way in to save and set
> a required rounding mode, and "mffscrn" on the way out to restore it.

Yes.  But the code making use of the modified RN needs to have some
artificial dependencies with the RN setters, perhaps via __fr.fenv .

> > Calling a real function (that does not even need a stack frame, just a
> > blr) is not terribly expensive, either.
> 
> Not ideal, better would be better.

Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
simple, *robust* solution over some nice, faster,nicely faster way of
doing the wrong thing.

> > > > > Would creating a __builtin_mffsce be another solution?
> > > > 
> > > > Yes.  And not a bad idea in the first place.
> > > 
> > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > difference between "asm" and builtin, how does using a builtin solve the
> > > problem?
> > 
> > You will have to make the builtin solve it.  What a builtin can do is
> > virtually unlimited.  What an asm can do is not: it just outputs some
> > assembler language, and does in/out/clobber constraints.  You can do a
> > *lot* with that, but it is much more limited than everything you can do
> > in the compiler!  :-)
> > 
> > The fact remains that there is no way in RTL (or Gimple for that matter)
> > to express things like rounding mode changes.  You will need to
> > artificially make some barriers.
> 
> I know there is __builtin_set_fpscr_rn that generates mffscrn.

Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
provide the ordering either.  It *cannot*: you need to cooperate with
whatever you are ordering against.  There is no way in GCC to say "this
is an FP insn and has to stay in order with all FP control writes and FP
status reads".

Maybe now you see why I like external functions for this :-)

> This
> is not used in the code above because I believe it first appears in
> GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> a return value, which would be handy in this case).  Does the
> implementation of that builtin meet the requirements needed here,
> to prevent reordering of FP computation across instantiations of the
> builtin?  If not, is there a model on which to base an implementation
> of __builtin_mffsce (or some preferred name)?

It depends on what you are actually ordering, unfortunately.


Segher
Paul A. Clarke Oct. 19, 2021, 12:36 a.m. UTC | #17
On Tue, Oct 12, 2021 at 05:25:32PM -0500, Segher Boessenkool wrote:
> On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> > {
> >   fenv_union_t old;
> >   register fenv_union_t __fr;
> >   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
> >   ctx->env = old.fenv = __fr.fenv; 
> >   ctx->updated_status = (r != (old.l & 3));
> > }
> 
> (Should use "n", not "i", only numbers are allowed, not e.g. the address
> of something.  This actually can matter, in unusual cases.)

Noted, will submit a change to glibc when I get a chance. Thanks!

> This orders the updating of RN before the store to __fr.fenv .  There is
> no other ordering ensured here.
> 
> The store to __fr.env obviously has to stay in order with anything that
> can alias it, if that store isn't optimised away completely later.
> 
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feresetround_ppc (fenv_t *envp)
> > { 
> >   fenv_union_t new = { .fenv = *envp };
> >   register fenv_union_t __fr;
> >   __fr.l = new.l & 3;
> >   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
> > }
> 
> This both reads from and stores to __fr.fenv, the asm has to stay
> between those two accesses (in the machine code).  If the code that
> actually depends on the modified RN depends onb that __fr.fenv some way,
> all will be fine.
> 
> > double
> > __sin (double x)
> > {
> >   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
> >   libc_feholdsetround_ppc_ctx (&ctx, (0));
> >   /* floating point intensive code.  */
> >   return retval;
> > }
> 
> ... but there is no such dependency.  The cleanup attribute does not
> give any such ordering either afaik.
> 
> > There's not much to it, really.  "mffscrni" on the way in to save and set
> > a required rounding mode, and "mffscrn" on the way out to restore it.
> 
> Yes.  But the code making use of the modified RN needs to have some
> artificial dependencies with the RN setters, perhaps via __fr.fenv .
> 
> > > Calling a real function (that does not even need a stack frame, just a
> > > blr) is not terribly expensive, either.
> > 
> > Not ideal, better would be better.
> 
> Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
> simple, *robust* solution over some nice, faster,nicely faster way of
> doing the wrong thing.

Understand, and agree. 

> > > > > > Would creating a __builtin_mffsce be another solution?
> > > > > 
> > > > > Yes.  And not a bad idea in the first place.
> > > > 
> > > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > > difference between "asm" and builtin, how does using a builtin solve the
> > > > problem?
> > > 
> > > You will have to make the builtin solve it.  What a builtin can do is
> > > virtually unlimited.  What an asm can do is not: it just outputs some
> > > assembler language, and does in/out/clobber constraints.  You can do a
> > > *lot* with that, but it is much more limited than everything you can do
> > > in the compiler!  :-)
> > > 
> > > The fact remains that there is no way in RTL (or Gimple for that matter)
> > > to express things like rounding mode changes.  You will need to
> > > artificially make some barriers.
> > 
> > I know there is __builtin_set_fpscr_rn that generates mffscrn.
> 
> Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
> provide the ordering either.  It *cannot*: you need to cooperate with
> whatever you are ordering against.  There is no way in GCC to say "this
> is an FP insn and has to stay in order with all FP control writes and FP
> status reads".
> 
> Maybe now you see why I like external functions for this :-)
> 
> > This
> > is not used in the code above because I believe it first appears in
> > GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> > a return value, which would be handy in this case).  Does the
> > implementation of that builtin meet the requirements needed here,
> > to prevent reordering of FP computation across instantiations of the
> > builtin?  If not, is there a model on which to base an implementation
> > of __builtin_mffsce (or some preferred name)?
> 
> It depends on what you are actually ordering, unfortunately.

What I hear is that for the specific requirements and restrictions here,
there is nothing special that another builtin, like a theoretical
__builtin_mffsce implemented like __builtin_fpscr_set_rn, can provide
to solve the issue under discussion.  The dependencies need to be expressed
such that the compiler understand them, and there is no way to do so
with the current implementation of __builtin_fpscr_set_rn.

With some effort, and proper visibility, the dependencies can be expressed
using "asm". I believe that's the case here, and will submit a v2 for
review shortly.

For the general case of inlines, builtins, or asm without visibility,
I've opened an issue for GCC to consider accommodation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783.

Thanks so much for your help!

PC
diff mbox series

Patch

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..a6b88d313ad0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,182 @@ 
 #include <altivec.h>
 #include <tmmintrin.h>
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT       0x00
+#define _MM_FROUND_TO_ZERO              0x01
+#define _MM_FROUND_TO_POS_INF           0x02
+#define _MM_FROUND_TO_NEG_INF           0x03
+#define _MM_FROUND_CUR_DIRECTION        0x04
+
+#define _MM_FROUND_NINT		\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR	\
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL		\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC	\
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT		\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT	\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+
+#define _MM_FROUND_RAISE_EXC            0x00
+#define _MM_FROUND_NO_EXC               0x08
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_pd (__m128d __A, int __rounding)
+{
+  __v2df __r;
+  union {
+    double __fr;
+    long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Save enabled exceptions, disable all exceptions,
+	 and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+      __fpscr_save.__fr = __builtin_mffs ();
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+      __fpscr_save.__fpscr &= ~0xf8;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+#endif
+    }
+
+  switch (__rounding)
+    {
+      case _MM_FROUND_TO_NEAREST_INT:
+	__fpscr_save.__fr = __builtin_mffsl ();
+	__attribute__ ((fallthrough));
+      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
+	__builtin_set_fpscr_rn (0b00);
+	__r = vec_rint ((__v2df) __A);
+	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
+	break;
+      case _MM_FROUND_TO_NEG_INF:
+      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
+	__r = vec_floor ((__v2df) __A);
+	break;
+      case _MM_FROUND_TO_POS_INF:
+      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
+	__r = vec_ceil ((__v2df) __A);
+	break;
+      case _MM_FROUND_TO_ZERO:
+      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
+	__r = vec_trunc ((__v2df) __A);
+	break;
+      case _MM_FROUND_CUR_DIRECTION:
+	__r = vec_rint ((__v2df) __A);
+	break;
+    }
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Restore enabled exceptions.  */
+      __fpscr_save.__fr = __builtin_mffsl ();
+      __fpscr_save.__fpscr |= __enables_save.__fpscr;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+    }
+  return (__m128d) __r;
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
+{
+  __B = _mm_round_pd (__B, __rounding);
+  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
+  return (__m128d) __r;
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_ps (__m128 __A, int __rounding)
+{
+  __v4sf __r;
+  union {
+    double __fr;
+    long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Save enabled exceptions, disable all exceptions,
+	 and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+      __fpscr_save.__fr = __builtin_mffs ();
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+      __fpscr_save.__fpscr &= ~0xf8;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+#endif
+    }
+
+  switch (__rounding)
+    {
+      case _MM_FROUND_TO_NEAREST_INT:
+	__fpscr_save.__fr = __builtin_mffsl ();
+	__attribute__ ((fallthrough));
+      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
+	__builtin_set_fpscr_rn (0b00);
+	__r = vec_rint ((__v4sf) __A);
+	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
+	break;
+      case _MM_FROUND_TO_NEG_INF:
+      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
+	__r = vec_floor ((__v4sf) __A);
+	break;
+      case _MM_FROUND_TO_POS_INF:
+      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
+	__r = vec_ceil ((__v4sf) __A);
+	break;
+      case _MM_FROUND_TO_ZERO:
+      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
+	__r = vec_trunc ((__v4sf) __A);
+	break;
+      case _MM_FROUND_CUR_DIRECTION:
+	__r = vec_rint ((__v4sf) __A);
+	break;
+    }
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Restore enabled exceptions.  */
+      __fpscr_save.__fr = __builtin_mffsl ();
+      __fpscr_save.__fpscr |= __enables_save.__fpscr;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+    }
+  return (__m128) __r;
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
+{
+  __B = _mm_round_ps (__B, __rounding);
+  __v4sf __r = (__v4sf) __A;
+  __r[0] = ((__v4sf)__B)[0];
+  return (__m128) __r;
+}
+
+#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
+#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
+
+#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
+#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
+
+#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
+#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
+
+#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
+#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
+
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
 {
@@ -232,70 +408,6 @@  _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_pd (__m128d __A)
-{
-  return (__m128d) vec_ceil ((__v2df) __A);
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_sd (__m128d __A, __m128d __B)
-{
-  __v2df __r = vec_ceil ((__v2df) __B);
-  __r[1] = ((__v2df) __A)[1];
-  return (__m128d) __r;
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_pd (__m128d __A)
-{
-  return (__m128d) vec_floor ((__v2df) __A);
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_sd (__m128d __A, __m128d __B)
-{
-  __v2df __r = vec_floor ((__v2df) __B);
-  __r[1] = ((__v2df) __A)[1];
-  return (__m128d) __r;
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_ps (__m128 __A)
-{
-  return (__m128) vec_ceil ((__v4sf) __A);
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_ss (__m128 __A, __m128 __B)
-{
-  __v4sf __r = (__v4sf) __A;
-  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
-  return __r;
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_ps (__m128 __A)
-{
-  return (__m128) vec_floor ((__v4sf) __A);
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_ss (__m128 __A, __m128 __B)
-{
-  __v4sf __r = (__v4sf) __A;
-  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
-  return __r;
-}
-
 /* Return horizontal packed word minimum and its index in bits [15:0]
    and bits [18:16] respectively.  */
 __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
new file mode 100644
index 000000000000..de6cbf7be438
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
@@ -0,0 +1,81 @@ 
+#include <smmintrin.h>
+#include <fenv.h>
+#include "sse4_1-check.h"
+
+#define DIM(a) (sizeof (a) / sizeof (a)[0])
+
+static int roundings[] =
+  {
+    _MM_FROUND_TO_NEAREST_INT,
+    _MM_FROUND_TO_NEG_INF,
+    _MM_FROUND_TO_POS_INF,
+    _MM_FROUND_TO_ZERO,
+    _MM_FROUND_CUR_DIRECTION
+  };
+
+static int modes[] =
+  {
+    FE_TONEAREST,
+    FE_UPWARD,
+    FE_DOWNWARD,
+    FE_TOWARDZERO
+  };
+
+static void
+TEST (void)
+{
+  int i, j, ri, mi, round_save;
+
+  round_save = fegetround ();
+  for (mi = 0; mi < DIM (modes); mi++) {
+    fesetround (modes[mi]);
+    for (i = 0; i < DIM (data); i++) {
+      for (ri = 0; ri < DIM (roundings); ri++) {
+	union value guess;
+	union value *current_answers = answers[ri];
+	switch ( roundings[ri] ) {
+	  case _MM_FROUND_TO_NEAREST_INT:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_NEAREST_INT);
+	    break;
+	  case _MM_FROUND_TO_NEG_INF:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_NEG_INF);
+	    break;
+	  case _MM_FROUND_TO_POS_INF:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_POS_INF);
+	    break;
+	  case _MM_FROUND_TO_ZERO:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_ZERO);
+	    break;
+	  case _MM_FROUND_CUR_DIRECTION:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_CUR_DIRECTION);
+	    switch ( modes[mi] ) {
+	      case FE_TONEAREST:
+		current_answers = answers_NEAREST_INT;
+		break;
+	      case FE_UPWARD:
+		current_answers = answers_POS_INF;
+		break;
+	      case FE_DOWNWARD:
+		current_answers = answers_NEG_INF;
+		break;
+	      case FE_TOWARDZERO:
+		current_answers = answers_ZERO;
+		break;
+	    }
+	    break;
+	  default:
+	    abort ();
+	}
+	for (j = 0; j < DIM (guess.f); j++)
+	  if (guess.f[j] != current_answers[i].f[j])
+	    abort ();
+      }
+    }
+  }
+  fesetround (round_save);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
new file mode 100644
index 000000000000..0528c395f233
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
@@ -0,0 +1,143 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
+
+#include "sse4_1-round-data.h"
+
+struct data2 data[] = {
+  { .value1 = { .f = {  0.00,  0.25 } } },
+  { .value1 = { .f = {  0.50,  0.75 } } },
+
+  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
+  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
+  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
+  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
+
+  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
+  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
+
+  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
+  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
+
+  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
+  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
+  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
+  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
+
+  { .value1 = { .f = { -1.00, -0.75 } } },
+  { .value1 = { .f = { -0.50, -0.25 } } }
+};
+
+union value answers_NEAREST_INT[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  1.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00, -1.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value answers_NEG_INF[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  0.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00, -1.00 } },
+  { .f = { -1.00, -1.00 } }
+};
+
+union value answers_POS_INF[] = {
+  { .f = {  0.00,  1.00 } },
+  { .f = {  1.00,  1.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00,  0.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value answers_ZERO[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  0.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00,  0.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
new file mode 100644
index 000000000000..6b5362e07590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
@@ -0,0 +1,98 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
+
+#include "sse4_1-round-data.h"
+
+struct data2 data[] = {
+  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
+
+  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
+			0x1.fffffcp+21,  0x1.fffffep+21 } } },
+  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
+			0x1.fffffep+22,  0x1.fffffep+23 } } },
+  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
+		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
+  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
+		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
+
+  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
+};
+
+union value answers_NEAREST_INT[] = {
+  { .f = {  0.00,  0.00,  0.00,  1.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.000000p+22,  0x1.000000p+22 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.000000p+23,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.000000p+22, -0x1.000000p+22,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00, -1.00,  0.00,  0.00 } }
+};
+
+union value answers_NEG_INF[] = {
+  { .f = {  0.00,  0.00,  0.00,  0.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.fffff8p+21,  0x1.fffff8p+21 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.fffffcp+22,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
+           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
+  { .f = { -0x1.000000p+22, -0x1.000000p+22,
+           -0x1.000000p+22, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00, -1.00, -1.00, -1.00 } }
+};
+
+union value answers_POS_INF[] = {
+  { .f = {  0.00,  1.00,  1.00,  1.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
+            0x1.000000p+22,  0x1.000000p+22 } },
+  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
+            0x1.000000p+23,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00,  0.00,  0.00,  0.00 } }
+};
+
+union value answers_ZERO[] = {
+  { .f = {  0.00,  0.00,  0.00,  0.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.fffff8p+21,  0x1.fffff8p+21 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.fffffcp+22,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00,  0.00,  0.00,  0.00 } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
new file mode 100644
index 000000000000..2b0bad6469df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
@@ -0,0 +1,256 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdio.h>
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED } } }
+};
+
+static union value answers_NEAREST_INT[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH } }
+};
+
+static union value answers_NEG_INF[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } }
+};
+
+static union value answers_POS_INF[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } }
+};
+
+static union value answers_ZERO[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
new file mode 100644
index 000000000000..3154310314a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
@@ -0,0 +1,208 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdio.h>
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
+};
+
+static union value answers_NEAREST_INT[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_NEG_INF[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_POS_INF[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_ZERO[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"