diff mbox series

Add __builtin_convertvector support (PR c++/85052)

Message ID 20190103100640.GM30353@tucnak
State New
Headers show
Series Add __builtin_convertvector support (PR c++/85052) | expand

Commit Message

Jakub Jelinek Jan. 3, 2019, 10:06 a.m. UTC
Hi!

The following patch adds support for the __builtin_convertvector builtin.
C casts on generic vectors are just reinterpretation of the bits (i.e. a
VCE), this builtin allows to cast int/unsigned elements to float or vice
versa or promote/demote them.  doc/ change is missing, will write it soon.

The builtin appeared in I think clang 3.4 and is apparently in real-world
use as e.g. Honza reported.  The first argument is an expression with vector
type, the second argument is a vector type (similarly e.g. to va_arg), to
which the first argument should be converted.  Both vector types need to
have the same number of elements.

I've implemented same element size (thus also whole vector size) conversions
efficiently - signed to unsigned and vice versa or same vector type just
using a VCE, for e.g. int <-> float or long long <-> double using
appropriate optab, possibly repeated multiple times for very large vectors.
For everything there is a fallback to lower __builtin_convertvector (x, t)
as { (__typeof (t[0])) x[0], (__typeof (t[1])) x[1], ... }.

What isn't implemented efficiently (yet) are the narrowing or widening
conversions; the optabs we have are meant for same size vectors, so
for the packing we have 2 arguments that we pack into 1, for unpacking we
have those lo/hi variants, but in this case at least for the most common
vectors we have just one argument and want result with the same number of
elements.  The AVX* different vector size instructions is the thing that
does this most efficiently, of course for large generic vectors we can
easily use these optabs.  Shall we go for e.g. trying to pack the argument
and all zeros dummy operand and pick the low half of the result vector,
or pick the low and high halves of the argument and use a half sized vector
operations, or both?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-03  Jakub Jelinek  <jakub@redhat.com>

	PR c++/85052
	* tree-vect-generic.c (expand_vector_piecewise): Add defaulted
	ret_type argument, if non-NULL, use that in preference to type
	for the result type.
	(expand_vector_parallel): Formatting fix.
	(do_vec_conversion, expand_vector_conversion): New functions.
	(expand_vector_operations_1): Call expand_vector_conversion
	for VEC_CONVERT ifn calls.
	* internal-fn.def (VEC_CONVERT): New internal function.
	* internal-fn.c (expand_VEC_CONVERT): New function.
	* fold-const-call.c (fold_const_vec_convert): New function.
	(fold_const_call): Use it for CFN_VEC_CONVERT.
c-family/
	* c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
	(c_build_vec_convert): Declare.
	* c-common.c (c_build_vec_convert): New function.
c/
	* c-parser.c (c_parser_postfix_expression): Parse
	__builtin_convertvector.
cp/
	* cp-tree.h (cp_build_vec_convert): Declare.
	* parser.c (cp_parser_postfix_expression): Parse
	__builtin_convertvector.
	* constexpr.c: Include fold-const-call.h.
	(cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
	(potential_constant_expression_1): Likewise.
	* semantics.c (cp_build_vec_convert): New function.
	* pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
	IFN_VEC_CONVERT.
testsuite/
	* c-c++-common/builtin-convertvector-1.c: New test.
	* c-c++-common/torture/builtin-convertvector-1.c: New test.
	* g++.dg/ext/builtin-convertvector-1.C: New test.
	* g++.dg/cpp0x/constexpr-builtin4.C: New test.


	Jakub

Comments

Marc Glisse Jan. 3, 2019, 10:48 a.m. UTC | #1
On Thu, 3 Jan 2019, Jakub Jelinek wrote:

> The following patch adds support for the __builtin_convertvector builtin.
> C casts on generic vectors are just reinterpretation of the bits (i.e. a
> VCE), this builtin allows to cast int/unsigned elements to float or vice
> versa or promote/demote them.  doc/ change is missing, will write it soon.
>
> The builtin appeared in I think clang 3.4 and is apparently in real-world
> use as e.g. Honza reported.  The first argument is an expression with vector
> type, the second argument is a vector type (similarly e.g. to va_arg), to
> which the first argument should be converted.  Both vector types need to
> have the same number of elements.
>
> I've implemented same element size (thus also whole vector size) conversions
> efficiently - signed to unsigned and vice versa or same vector type just
> using a VCE, for e.g. int <-> float or long long <-> double using
> appropriate optab, possibly repeated multiple times for very large vectors.

Hello,

IIUC, you only lower __builtin_convertvector to VCE or FLOAT_EXPR or 
whatever in tree-vect-generic. That seems quite late. At least for the 
"easy" same-size case, I think we should do it early (gimplification?), 
before we start optimizing, without checking if it is supported by the 
target (generic lowering can fix that up later). Of course that can be 
changed later, getting the basic functionality comes first.

(while you are writing the doc patch: tree.def and generic.texi do not say 
anything about using FLOAT_EXPR on a vector)
Jakub Jelinek Jan. 3, 2019, 11:04 a.m. UTC | #2
On Thu, Jan 03, 2019 at 11:48:12AM +0100, Marc Glisse wrote:
> > The following patch adds support for the __builtin_convertvector builtin.
> > C casts on generic vectors are just reinterpretation of the bits (i.e. a
> > VCE), this builtin allows to cast int/unsigned elements to float or vice
> > versa or promote/demote them.  doc/ change is missing, will write it soon.
> > 
> > The builtin appeared in I think clang 3.4 and is apparently in real-world
> > use as e.g. Honza reported.  The first argument is an expression with vector
> > type, the second argument is a vector type (similarly e.g. to va_arg), to
> > which the first argument should be converted.  Both vector types need to
> > have the same number of elements.
> > 
> > I've implemented same element size (thus also whole vector size) conversions
> > efficiently - signed to unsigned and vice versa or same vector type just
> > using a VCE, for e.g. int <-> float or long long <-> double using
> > appropriate optab, possibly repeated multiple times for very large vectors.
> 
> IIUC, you only lower __builtin_convertvector to VCE or FLOAT_EXPR or
> whatever in tree-vect-generic. That seems quite late. At least for the
> "easy" same-size case, I think we should do it early (gimplification?),

No, it must not be done at gimplification time, think about OpenMP/OpenACC
offloading, the target before IPA optimizations might not be the target
after them, while they have to agree on ABI issues, the optabs definitely
can be and are different and these optabs originally added for the
vectorizer are something that doesn't have a fallback, whatever introduces
it into the IL is responsible for verification it is supported.

It could be done in some post-IPA pass, perhaps by just calling from
somewhere else the tree-vect-generic.c function added in the patch, maybe
with a special argument that would do it only for the single op cases and
not for the others.

That said, not sure if e.g. using an opaque builtin for the conversion that
supportable_convert_operation sometimes uses is better over this ifn.
What exact optimization opportunities you are looking for if it is lowered
earlier?  I have the VECTOR_CST folding in place...

> before we start optimizing, without checking if it is supported by the
> target (generic lowering can fix that up later). Of course that can be
> changed later, getting the basic functionality comes first.

	Jakub
Richard Biener Jan. 3, 2019, 11:16 a.m. UTC | #3
On Thu, 3 Jan 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds support for the __builtin_convertvector builtin.
> C casts on generic vectors are just reinterpretation of the bits (i.e. a
> VCE), this builtin allows to cast int/unsigned elements to float or vice
> versa or promote/demote them.  doc/ change is missing, will write it soon.
> 
> The builtin appeared in I think clang 3.4 and is apparently in real-world
> use as e.g. Honza reported.  The first argument is an expression with vector
> type, the second argument is a vector type (similarly e.g. to va_arg), to
> which the first argument should be converted.  Both vector types need to
> have the same number of elements.
> 
> I've implemented same element size (thus also whole vector size) conversions
> efficiently - signed to unsigned and vice versa or same vector type just
> using a VCE, for e.g. int <-> float or long long <-> double using
> appropriate optab, possibly repeated multiple times for very large vectors.
> For everything there is a fallback to lower __builtin_convertvector (x, t)
> as { (__typeof (t[0])) x[0], (__typeof (t[1])) x[1], ... }.
> 
> What isn't implemented efficiently (yet) are the narrowing or widening
> conversions; the optabs we have are meant for same size vectors, so
> for the packing we have 2 arguments that we pack into 1, for unpacking we
> have those lo/hi variants, but in this case at least for the most common
> vectors we have just one argument and want result with the same number of
> elements.  The AVX* different vector size instructions is the thing that
> does this most efficiently, of course for large generic vectors we can
> easily use these optabs.  Shall we go for e.g. trying to pack the argument
> and all zeros dummy operand and pick the low half of the result vector,
> or pick the low and high halves of the argument and use a half sized vector
> operations, or both?

I guess it depends on target capabilities - I think
__builtin_convertvector is a bit "misdesigned" for pack/unpack.  You
also have to consider a v2di to v2qi conversion which requires
several unpack steps.  Does the clang documentation given any
hints how to "efficiently" use __builtin_convertvector for
packing/unpacking without exposing too much of the target architecture?

But yes, for unpacking you'd use a series of vec_unpack_*_lo_expr
with padded input (padded with "don't care" if we had that, on
RTL we'd use a paradoxical subreg, on GIMPLE we _might_ consider
allowing VCE of different size?  Or simply allow half-size input
operands to vec_unpack_*_lo where that expands to paradoxical
subregs (a bit difficult for the optab query I guess).

For packing you'd use a series of vec_pack_* on argument split
to two halves via BIT_FIELD_REF.

What does clang do for testcases that request promotion/demotion?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

do_vec_conversion needs a comment.  Overall the patch (with its
existing features) looks OK to me.

As of Marcs comments I agree that vector lowering happens quite late.
It might be for example useful to lower before vectorization (or
any loop optimization) so that un-handled generic vector code can be
eventually vectorized differently.  But that's sth to investigate for
GCC 10.

Giving FE maintainers a chance to comment, so no overall ACK yet.

Thanks,
Richard.

> 2019-01-03  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/85052
> 	* tree-vect-generic.c (expand_vector_piecewise): Add defaulted
> 	ret_type argument, if non-NULL, use that in preference to type
> 	for the result type.
> 	(expand_vector_parallel): Formatting fix.
> 	(do_vec_conversion, expand_vector_conversion): New functions.
> 	(expand_vector_operations_1): Call expand_vector_conversion
> 	for VEC_CONVERT ifn calls.
> 	* internal-fn.def (VEC_CONVERT): New internal function.
> 	* internal-fn.c (expand_VEC_CONVERT): New function.
> 	* fold-const-call.c (fold_const_vec_convert): New function.
> 	(fold_const_call): Use it for CFN_VEC_CONVERT.
> c-family/
> 	* c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
> 	(c_build_vec_convert): Declare.
> 	* c-common.c (c_build_vec_convert): New function.
> c/
> 	* c-parser.c (c_parser_postfix_expression): Parse
> 	__builtin_convertvector.
> cp/
> 	* cp-tree.h (cp_build_vec_convert): Declare.
> 	* parser.c (cp_parser_postfix_expression): Parse
> 	__builtin_convertvector.
> 	* constexpr.c: Include fold-const-call.h.
> 	(cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
> 	(potential_constant_expression_1): Likewise.
> 	* semantics.c (cp_build_vec_convert): New function.
> 	* pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
> 	IFN_VEC_CONVERT.
> testsuite/
> 	* c-c++-common/builtin-convertvector-1.c: New test.
> 	* c-c++-common/torture/builtin-convertvector-1.c: New test.
> 	* g++.dg/ext/builtin-convertvector-1.C: New test.
> 	* g++.dg/cpp0x/constexpr-builtin4.C: New test.
> 
> --- gcc/tree-vect-generic.c.jj	2019-01-01 12:37:17.084976148 +0100
> +++ gcc/tree-vect-generic.c	2019-01-02 17:51:28.012876543 +0100
> @@ -267,7 +267,8 @@ do_negate (gimple_stmt_iterator *gsi, tr
>  static tree
>  expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
>  			 tree type, tree inner_type,
> -			 tree a, tree b, enum tree_code code)
> +			 tree a, tree b, enum tree_code code,
> +			 tree ret_type = NULL_TREE)
>  {
>    vec<constructor_elt, va_gc> *v;
>    tree part_width = TYPE_SIZE (inner_type);
> @@ -278,23 +279,27 @@ expand_vector_piecewise (gimple_stmt_ite
>    int i;
>    location_t loc = gimple_location (gsi_stmt (*gsi));
>  
> -  if (types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
> +  if (ret_type
> +      || types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
>      warning_at (loc, OPT_Wvector_operation_performance,
>  		"vector operation will be expanded piecewise");
>    else
>      warning_at (loc, OPT_Wvector_operation_performance,
>  		"vector operation will be expanded in parallel");
>  
> +  if (!ret_type)
> +    ret_type = type;
>    vec_alloc (v, (nunits + delta - 1) / delta);
>    for (i = 0; i < nunits;
>         i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
>      {
> -      tree result = f (gsi, inner_type, a, b, index, part_width, code, type);
> +      tree result = f (gsi, inner_type, a, b, index, part_width, code,
> +		       ret_type);
>        constructor_elt ce = {NULL_TREE, result};
>        v->quick_push (ce);
>      }
>  
> -  return build_constructor (type, v);
> +  return build_constructor (ret_type, v);
>  }
>  
>  /* Expand a vector operation to scalars with the freedom to use
> @@ -302,8 +307,7 @@ expand_vector_piecewise (gimple_stmt_ite
>     in the vector type.  */
>  static tree
>  expand_vector_parallel (gimple_stmt_iterator *gsi, elem_op_func f, tree type,
> -			tree a, tree b,
> -			enum tree_code code)
> +			tree a, tree b, enum tree_code code)
>  {
>    tree result, compute_type;
>    int n_words = tree_to_uhwi (TYPE_SIZE_UNIT (type)) / UNITS_PER_WORD;
> @@ -1547,6 +1551,147 @@ expand_vector_scalar_condition (gimple_s
>    update_stmt (gsi_stmt (*gsi));
>  }
>  
> +static tree
> +do_vec_conversion (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> +		   tree decl, tree bitpos, tree bitsize,
> +		   enum tree_code code, tree type)
> +{
> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
> +  if (!VECTOR_TYPE_P (inner_type))
> +    return gimplify_build1 (gsi, code, TREE_TYPE (type), a);
> +  if (code == CALL_EXPR)
> +    {
> +      gimple *g = gimple_build_call (decl, 1, a);
> +      tree lhs = make_ssa_name (TREE_TYPE (TREE_TYPE (decl)));
> +      gimple_call_set_lhs (g, lhs);
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      return lhs;
> +    }
> +  else
> +    {
> +      tree outer_type = build_vector_type (TREE_TYPE (type),
> +					   TYPE_VECTOR_SUBPARTS (inner_type));
> +      return gimplify_build1 (gsi, code, outer_type, a);
> +    }
> +}
> +
> +/* Expand VEC_CONVERT ifn call.  */
> +
> +static void
> +expand_vector_conversion (gimple_stmt_iterator *gsi)
> +{
> +  gimple *stmt = gsi_stmt (*gsi);
> +  gimple *g;
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree arg = gimple_call_arg (stmt, 0);
> +  tree decl = NULL_TREE;
> +  tree ret_type = TREE_TYPE (lhs);
> +  tree arg_type = TREE_TYPE (arg);
> +  tree new_rhs, compute_type = TREE_TYPE (arg_type);
> +  enum tree_code code = NOP_EXPR;
> +  enum tree_code code1 = ERROR_MARK;
> +  enum { NARROW, NONE, WIDEN } modifier = NONE;
> +  optab optab1 = unknown_optab;
> +
> +  gcc_checking_assert (VECTOR_TYPE_P (ret_type) && VECTOR_TYPE_P (arg_type));
> +  gcc_checking_assert (tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (ret_type))));
> +  gcc_checking_assert (tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (arg_type))));
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
> +      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
> +    code = FIX_TRUNC_EXPR;
> +  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> +	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
> +    code = FLOAT_EXPR;
> +  if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (ret_type)))
> +      < tree_to_uhwi (TYPE_SIZE (TREE_TYPE (arg_type))))
> +    modifier = NARROW;
> +  else if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (ret_type)))
> +	   > tree_to_uhwi (TYPE_SIZE (TREE_TYPE (arg_type))))
> +    modifier = WIDEN;
> +
> +  if (modifier == NONE && (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR))
> +    {
> +      if (supportable_convert_operation (code, ret_type, arg_type, &decl,
> +					 &code1))
> +	{
> +	  if (code1 == CALL_EXPR)
> +	    {
> +	      g = gimple_build_call (decl, 1, arg);
> +	      gimple_call_set_lhs (g, lhs);
> +	    }
> +	  else
> +	    g = gimple_build_assign (lhs, code1, arg);
> +	  gsi_replace (gsi, g, false);
> +	  return;
> +	}
> +      /* Can't use get_compute_type here, as supportable_convert_operation
> +	 doesn't necessarily use an optab and needs two arguments.  */
> +      tree vector_compute_type
> +	= type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab);
> +      unsigned HOST_WIDE_INT nelts;
> +      if (vector_compute_type
> +	  && VECTOR_MODE_P (TYPE_MODE (vector_compute_type))
> +	  && subparts_gt (arg_type, vector_compute_type)
> +	  && TYPE_VECTOR_SUBPARTS (vector_compute_type).is_constant (&nelts))
> +	{
> +	  while (nelts > 1)
> +	    {
> +	      tree ret1_type = build_vector_type (TREE_TYPE (ret_type), nelts);
> +	      tree arg1_type = build_vector_type (TREE_TYPE (arg_type), nelts);
> +	      if (supportable_convert_operation (code, ret1_type, arg1_type,
> +						 &decl, &code1))
> +		{
> +		  new_rhs = expand_vector_piecewise (gsi, do_vec_conversion,
> +						     ret_type, arg1_type, arg,
> +						     decl, code1);
> +		  g = gimple_build_assign (lhs, new_rhs);
> +		  gsi_replace (gsi, g, false);
> +		  return;
> +		}
> +	      nelts = nelts / 2;
> +	    }
> +	}
> +    }
> +  /* FIXME: __builtin_convertvector argument and return vectors have the same
> +     number of elements, so for both narrowing and widening we need to figure
> +     out what is the best set of optabs to use.  E.g. for NARROW
> +     VEC_PACK_TRUNC_EXPR has 2 arguments, shall we prefer emitting that with
> +     one argument of arg and another argument all zeros and extract first
> +     half of the resulting vector, or extract lo and hi halves of the arg
> +     vector and use VEC_PACK_TRUNC_EXPR on those?  */
> +  else if (0 && modifier == NARROW)
> +    {
> +      switch (code)
> +	{
> +	case NOP_EXPR:
> +	  code1 = VEC_PACK_TRUNC_EXPR;
> +	  optab1 = optab_for_tree_code (code1, arg_type, optab_default);
> +	  break;
> +	case FIX_TRUNC_EXPR:
> +	  code1 = VEC_PACK_FIX_TRUNC_EXPR;
> +	  /* The signedness is determined from output operand.  */
> +	  optab1 = optab_for_tree_code (code1, ret_type, optab_default);
> +	  break;
> +	case FLOAT_EXPR:
> +	  code1 = VEC_PACK_FLOAT_EXPR;
> +	  optab1 = optab_for_tree_code (code1, arg_type, optab_default);
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +
> +      if (optab1)
> +	compute_type = get_compute_type (code1, optab1, arg_type);
> +      (void) compute_type;
> +    }
> +
> +  new_rhs = expand_vector_piecewise (gsi, do_vec_conversion, arg_type,
> +				     TREE_TYPE (arg_type), arg,
> +				     NULL_TREE, code, ret_type);
> +  g = gimple_build_assign (lhs, new_rhs);
> +  gsi_replace (gsi, g, false);
> +}
> +
>  /* Process one statement.  If we identify a vector operation, expand it.  */
>  
>  static void
> @@ -1561,7 +1706,11 @@ expand_vector_operations_1 (gimple_stmt_
>    /* Only consider code == GIMPLE_ASSIGN. */
>    gassign *stmt = dyn_cast <gassign *> (gsi_stmt (*gsi));
>    if (!stmt)
> -    return;
> +    {
> +      if (gimple_call_internal_p (gsi_stmt (*gsi), IFN_VEC_CONVERT))
> +	expand_vector_conversion (gsi);
> +      return;
> +    }
>  
>    code = gimple_assign_rhs_code (stmt);
>    rhs_class = get_gimple_rhs_class (code);
> --- gcc/internal-fn.def.jj	2019-01-01 12:37:17.893962875 +0100
> +++ gcc/internal-fn.def	2019-01-02 11:24:24.307681792 +0100
> @@ -296,6 +296,7 @@ DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST
>  DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  
>  /* An unduplicable, uncombinable function.  Generally used to preserve
>     a CFG property in the face of jump threading, tail merging or
> --- gcc/internal-fn.c.jj	2019-01-01 12:37:19.567935410 +0100
> +++ gcc/internal-fn.c	2019-01-02 11:24:24.315681661 +0100
> @@ -2581,6 +2581,15 @@ expand_VA_ARG (internal_fn, gcall *)
>    gcc_unreachable ();
>  }
>  
> +/* IFN_VEC_CONVERT is supposed to be expanded at pass_lower_vector.  So this
> +   dummy function should never be called.  */
> +
> +static void
> +expand_VEC_CONVERT (internal_fn, gcall *)
> +{
> +  gcc_unreachable ();
> +}
> +
>  /* Expand the IFN_UNIQUE function according to its first argument.  */
>  
>  static void
> --- gcc/fold-const-call.c.jj	2019-01-01 12:37:16.528985271 +0100
> +++ gcc/fold-const-call.c	2019-01-02 15:57:36.656449175 +0100
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
>  #include "tm.h" /* For C[LT]Z_DEFINED_AT_ZERO.  */
>  #include "builtins.h"
>  #include "gimple-expr.h"
> +#include "tree-vector-builder.h"
>  
>  /* Functions that test for certain constant types, abstracting away the
>     decision about whether to check for overflow.  */
> @@ -645,6 +646,40 @@ fold_const_reduction (tree type, tree ar
>    return res;
>  }
>  
> +/* Fold a call to IFN_VEC_CONVERT (ARG) returning TYPE.  */
> +
> +static tree
> +fold_const_vec_convert (tree ret_type, tree arg)
> +{
> +  enum tree_code code = NOP_EXPR;
> +  tree arg_type = TREE_TYPE (arg);
> +  if (TREE_CODE (arg) != VECTOR_CST)
> +    return NULL_TREE;
> +
> +  gcc_checking_assert (VECTOR_TYPE_P (ret_type) && VECTOR_TYPE_P (arg_type));
> +
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
> +      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
> +    code = FIX_TRUNC_EXPR;
> +  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> +	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
> +    code = FLOAT_EXPR;
> +
> +  tree_vector_builder elts;
> +  elts.new_unary_operation (ret_type, arg, true);
> +  unsigned int count = elts.encoded_nelts ();
> +  for (unsigned int i = 0; i < count; ++i)
> +    {
> +      tree elt = fold_unary (code, TREE_TYPE (ret_type),
> +			     VECTOR_CST_ELT (arg, i));
> +      if (elt == NULL_TREE || !CONSTANT_CLASS_P (elt))
> +	return NULL_TREE;
> +      elts.quick_push (elt);
> +    }
> +
> +  return elts.build ();
> +}
> +
>  /* Try to evaluate:
>  
>        *RESULT = FN (*ARG)
> @@ -1232,6 +1267,9 @@ fold_const_call (combined_fn fn, tree ty
>      case CFN_REDUC_XOR:
>        return fold_const_reduction (type, arg, BIT_XOR_EXPR);
>  
> +    case CFN_VEC_CONVERT:
> +      return fold_const_vec_convert (type, arg);
> +
>      default:
>        return fold_const_call_1 (fn, type, arg);
>      }
> --- gcc/c-family/c-common.h.jj	2019-01-01 12:37:51.309414610 +0100
> +++ gcc/c-family/c-common.h	2019-01-02 11:24:24.314681677 +0100
> @@ -102,7 +102,7 @@ enum rid
>    RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
>    RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
>    RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
> -  RID_BUILTIN_TGMATH,
> +  RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
>    RID_BUILTIN_HAS_ATTRIBUTE,
>    RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
>  
> @@ -1001,6 +1001,7 @@ extern bool lvalue_p (const_tree);
>  extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
>  extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note);
>  extern tree c_build_vec_perm_expr (location_t, tree, tree, tree, bool = true);
> +extern tree c_build_vec_convert (location_t, tree, location_t, tree, bool = true);
>  
>  extern void init_c_lex (void);
>  
> --- gcc/c-family/c-common.c.jj	2019-01-01 12:37:51.366413675 +0100
> +++ gcc/c-family/c-common.c	2019-01-02 11:24:24.314681677 +0100
> @@ -376,6 +376,7 @@ const struct c_common_resword c_common_r
>      RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
>    { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
>    { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
> +  { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
>    { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
>    { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
>    { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
> @@ -1070,6 +1071,70 @@ c_build_vec_perm_expr (location_t loc, t
>      ret = c_wrap_maybe_const (ret, true);
>  
>    return ret;
> +}
> +
> +/* Build a VEC_CONVERT ifn for __builtin_convertvector builtin.  */
> +
> +tree
> +c_build_vec_convert (location_t loc1, tree expr, location_t loc2, tree type,
> +		     bool complain)
> +{
> +  if (error_operand_p (type))
> +    return error_mark_node;
> +  if (error_operand_p (expr))
> +    return error_mark_node;
> +
> +  if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
> +      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (expr)))
> +    {
> +      if (complain)
> +	error_at (loc1, "%<__builtin_convertvector%> first argument must "
> +			"be an integer or floating vector");
> +      return error_mark_node;
> +    }
> +
> +  if (!VECTOR_INTEGER_TYPE_P (type) && !VECTOR_FLOAT_TYPE_P (type))
> +    {
> +      if (complain)
> +	error_at (loc2, "%<__builtin_convertvector%> second argument must "
> +			"be an integer or floating vector type");
> +      return error_mark_node;
> +    }
> +
> +  if (maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (expr)),
> +		TYPE_VECTOR_SUBPARTS (type)))
> +    {
> +      if (complain)
> +	error_at (loc1, "%<__builtin_convertvector%> number of elements "
> +			"of the first argument vector and the second argument "
> +			"vector type should be the same");
> +      return error_mark_node;
> +    }
> +
> +  if ((TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (expr)))
> +       == TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> +      || (VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
> +	  && VECTOR_INTEGER_TYPE_P (type)
> +	  && (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (expr)))
> +	      == TYPE_PRECISION (TREE_TYPE (type)))))
> +    return build1_loc (loc1, VIEW_CONVERT_EXPR, type, expr);
> +
> +  bool wrap = true;
> +  bool maybe_const = false;
> +  tree ret;
> +  if (!c_dialect_cxx ())
> +    {
> +      /* Avoid C_MAYBE_CONST_EXPRs inside of VEC_CONVERT argument.  */
> +      expr = c_fully_fold (expr, false, &maybe_const);
> +      wrap &= maybe_const;
> +    }
> +
> +  ret = build_call_expr_internal_loc (loc1, IFN_VEC_CONVERT, type, 1, expr);
> +
> +  if (!wrap)
> +    ret = c_wrap_maybe_const (ret, true);
> +
> +  return ret;
>  }
>  
>  /* Like tree.c:get_narrower, but retain conversion from C++0x scoped enum
> --- gcc/c/c-parser.c.jj	2019-01-01 12:37:48.677457794 +0100
> +++ gcc/c/c-parser.c	2019-01-02 11:24:24.312681710 +0100
> @@ -8038,6 +8038,7 @@ enum tgmath_parm_kind
>       __builtin_shuffle ( assignment-expression ,
>  			 assignment-expression ,
>  			 assignment-expression, )
> +     __builtin_convertvector ( assignment-expression , type-name )
>  
>     offsetof-member-designator:
>       identifier
> @@ -9113,17 +9114,14 @@ c_parser_postfix_expression (c_parser *p
>  	      *p = convert_lvalue_to_rvalue (loc, *p, true, true);
>  
>  	    if (vec_safe_length (cexpr_list) == 2)
> -	      expr.value =
> -		c_build_vec_perm_expr
> -		  (loc, (*cexpr_list)[0].value,
> -		   NULL_TREE, (*cexpr_list)[1].value);
> +	      expr.value = c_build_vec_perm_expr (loc, (*cexpr_list)[0].value,
> +						  NULL_TREE,
> +						  (*cexpr_list)[1].value);
>  
>  	    else if (vec_safe_length (cexpr_list) == 3)
> -	      expr.value =
> -		c_build_vec_perm_expr
> -		  (loc, (*cexpr_list)[0].value,
> -		   (*cexpr_list)[1].value,
> -		   (*cexpr_list)[2].value);
> +	      expr.value = c_build_vec_perm_expr (loc, (*cexpr_list)[0].value,
> +						  (*cexpr_list)[1].value,
> +						  (*cexpr_list)[2].value);
>  	    else
>  	      {
>  		error_at (loc, "wrong number of arguments to "
> @@ -9133,6 +9131,41 @@ c_parser_postfix_expression (c_parser *p
>  	    set_c_expr_source_range (&expr, loc, close_paren_loc);
>  	    break;
>  	  }
> +	case RID_BUILTIN_CONVERTVECTOR:
> +	  {
> +	    location_t start_loc = loc;
> +	    c_parser_consume_token (parser);
> +	    matching_parens parens;
> +	    if (!parens.require_open (parser))
> +	      {
> +		expr.set_error ();
> +		break;
> +	      }
> +	    e1 = c_parser_expr_no_commas (parser, NULL);
> +	    mark_exp_read (e1.value);
> +	    if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
> +	      {
> +		c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
> +		expr.set_error ();
> +		break;
> +	      }
> +	    loc = c_parser_peek_token (parser)->location;
> +	    t1 = c_parser_type_name (parser);
> +	    location_t end_loc = c_parser_peek_token (parser)->get_finish ();
> +	    c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> +				       "expected %<)%>");
> +	    if (t1 == NULL)
> +	      expr.set_error ();
> +	    else
> +	      {
> +		tree type_expr = NULL_TREE;
> +		expr.value = c_build_vec_convert (start_loc, e1.value, loc,
> +						  groktypename (t1, &type_expr,
> +								NULL));
> +		set_c_expr_source_range (&expr, start_loc, end_loc);
> +	      }
> +	  }
> +	  break;
>  	case RID_AT_SELECTOR:
>  	  {
>  	    gcc_assert (c_dialect_objc ());
> --- gcc/cp/cp-tree.h.jj	2019-01-01 12:37:46.884487212 +0100
> +++ gcc/cp/cp-tree.h	2019-01-02 16:43:35.480393140 +0100
> @@ -7142,6 +7142,8 @@ extern bool is_lambda_ignored_entity
>  extern bool lambda_static_thunk_p		(tree);
>  extern tree finish_builtin_launder		(location_t, tree,
>  						 tsubst_flags_t);
> +extern tree cp_build_vec_convert		(tree, location_t, tree,
> +						 tsubst_flags_t);
>  extern void start_lambda_scope			(tree);
>  extern void record_lambda_scope			(tree);
>  extern void record_null_lambda_scope		(tree);
> --- gcc/cp/parser.c.jj	2019-01-01 12:37:47.352479534 +0100
> +++ gcc/cp/parser.c	2019-01-02 16:19:44.765760167 +0100
> @@ -7031,6 +7031,32 @@ cp_parser_postfix_expression (cp_parser
>  	break;
>        }
>  
> +    case RID_BUILTIN_CONVERTVECTOR:
> +      {
> +	tree expression;
> +	tree type;
> +	/* Consume the `__builtin_convertvector' token.  */
> +	cp_lexer_consume_token (parser->lexer);
> +	/* Look for the opening `('.  */
> +	matching_parens parens;
> +	parens.require_open (parser);
> +	/* Now, parse the assignment-expression.  */
> +	expression = cp_parser_assignment_expression (parser);
> +	/* Look for the `,'.  */
> +	cp_parser_require (parser, CPP_COMMA, RT_COMMA);
> +	location_t type_location
> +	  = cp_lexer_peek_token (parser->lexer)->location;
> +	/* Parse the type-id.  */
> +	{
> +	  type_id_in_expr_sentinel s (parser);
> +	  type = cp_parser_type_id (parser);
> +	}
> +	/* Look for the closing `)'.  */
> +	parens.require_close (parser);
> +	return cp_build_vec_convert (expression, type_location, type,
> +				     tf_warning_or_error);
> +      }
> +
>      default:
>        {
>  	tree type;
> --- gcc/cp/constexpr.c.jj	2019-01-01 12:37:47.282480682 +0100
> +++ gcc/cp/constexpr.c	2019-01-02 16:56:54.126359632 +0100
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.
>  #include "ubsan.h"
>  #include "gimple-fold.h"
>  #include "timevar.h"
> +#include "fold-const-call.h"
>  
>  static bool verify_constant (tree, bool, bool *, bool *);
>  #define VERIFY_CONSTANT(X)						\
> @@ -1449,6 +1450,20 @@ cxx_eval_internal_function (const conste
>        return cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
>  					   false, non_constant_p, overflow_p);
>  
> +    case IFN_VEC_CONVERT:
> +      {
> +	tree arg = cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
> +						 false, non_constant_p,
> +						 overflow_p);
> +	if (TREE_CODE (arg) == VECTOR_CST)
> +	  return fold_const_call (CFN_VEC_CONVERT, TREE_TYPE (t), arg);
> +	else
> +	  {
> +	    *non_constant_p = true;
> +	    return t;
> +	  }
> +      }
> +
>      default:
>        if (!ctx->quiet)
>  	error_at (cp_expr_loc_or_loc (t, input_location),
> @@ -5623,7 +5638,9 @@ potential_constant_expression_1 (tree t,
>  		case IFN_SUB_OVERFLOW:
>  		case IFN_MUL_OVERFLOW:
>  		case IFN_LAUNDER:
> +		case IFN_VEC_CONVERT:
>  		  bail = false;
> +		  break;
>  
>  		default:
>  		  break;
> --- gcc/cp/semantics.c.jj	2019-01-01 12:37:46.976485703 +0100
> +++ gcc/cp/semantics.c	2019-01-02 18:15:42.844133048 +0100
> @@ -9933,4 +9933,26 @@ finish_builtin_launder (location_t loc,
>  				       TREE_TYPE (arg), 1, arg);
>  }
>  
> +/* Finish __builtin_convertvector (arg, type).  */
> +
> +tree
> +cp_build_vec_convert (tree arg, location_t loc, tree type,
> +		      tsubst_flags_t complain)
> +{
> +  if (error_operand_p (type))
> +    return error_mark_node;
> +  if (error_operand_p (arg))
> +    return error_mark_node;
> +
> +  tree ret = NULL_TREE;
> +  if (!type_dependent_expression_p (arg) && !dependent_type_p (type))
> +    ret = c_build_vec_convert (cp_expr_loc_or_loc (arg, input_location), arg,
> +			       loc, type, (complain & tf_error) != 0);
> +
> +  if (!processing_template_decl)
> +    return ret;
> +
> +  return build_call_expr_internal_loc (loc, IFN_VEC_CONVERT, type, 1, arg);
> +}
> +
>  #include "gt-cp-semantics.h"
> --- gcc/cp/pt.c.jj	2019-01-01 12:37:47.081483980 +0100
> +++ gcc/cp/pt.c	2019-01-02 18:25:17.997778249 +0100
> @@ -18813,6 +18813,27 @@ tsubst_copy_and_build (tree t,
>  					      (*call_args)[0], complain);
>  	      break;
>  
> +	    case IFN_VEC_CONVERT:
> +	      gcc_assert (nargs == 1);
> +	      if (vec_safe_length (call_args) != 1)
> +		{
> +		  error_at (cp_expr_loc_or_loc (t, input_location),
> +			    "wrong number of arguments to "
> +			    "%<__builtin_convertvector%>");
> +		  ret = error_mark_node;
> +		  break;
> +		}
> +	      ret = cp_build_vec_convert ((*call_args)[0], input_location,
> +					  tsubst (TREE_TYPE (t), args,
> +						  complain, in_decl),
> +					  complain);
> +	      if (TREE_CODE (ret) == VIEW_CONVERT_EXPR)
> +		{
> +		  release_tree_vector (call_args);
> +		  RETURN (ret);
> +		}
> +	      break;
> +
>  	    default:
>  	      /* Unsupported internal function with arguments.  */
>  	      gcc_unreachable ();
> --- gcc/testsuite/c-c++-common/builtin-convertvector-1.c.jj	2019-01-02 18:38:18.265090910 +0100
> +++ gcc/testsuite/c-c++-common/builtin-convertvector-1.c	2019-01-02 18:37:50.337544972 +0100
> @@ -0,0 +1,15 @@
> +typedef int v8si __attribute__((vector_size (8 * sizeof (int))));
> +typedef long long v4di __attribute__((vector_size (4 * sizeof (long long))));
> +
> +void
> +foo (v8si *x, v4di *y, int z)
> +{
> +  __builtin_convertvector (*y, v8si);	/* { dg-error "number of elements of the first argument vector and the second argument vector type should be the same" } */
> +  __builtin_convertvector (*x, v4di);	/* { dg-error "number of elements of the first argument vector and the second argument vector type should be the same" } */
> +  __builtin_convertvector (*x, int);	/* { dg-error "second argument must be an integer or floating vector type" } */
> +  __builtin_convertvector (z, v4di);	/* { dg-error "first argument must be an integer or floating vector" } */
> +  __builtin_convertvector ();		/* { dg-error "expected" } */
> +  __builtin_convertvector (*x);		/* { dg-error "expected" } */
> +  __builtin_convertvector (*x, *y);	/* { dg-error "expected" } */
> +  __builtin_convertvector (*x, v8si, 1);/* { dg-error "expected" } */
> +}
> --- gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c.jj	2019-01-02 18:00:59.982534637 +0100
> +++ gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c	2019-01-02 18:00:32.871977360 +0100
> @@ -0,0 +1,131 @@
> +extern
> +#ifdef __cplusplus
> +"C"
> +#endif
> +void abort (void);
> +typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
> +typedef unsigned int v4usi __attribute__((vector_size (4 * sizeof (unsigned int))));
> +typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
> +typedef double v4df __attribute__((vector_size (4 * sizeof (double))));
> +typedef long long v256di __attribute__((vector_size (256 * sizeof (long long))));
> +typedef double v256df __attribute__((vector_size (256 * sizeof (double))));
> +
> +void
> +f1 (v4usi *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +void
> +f2 (v4sf *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +void
> +f3 (v4si *x, v4sf *y)
> +{
> +  *y = __builtin_convertvector (*x, v4sf);
> +}
> +
> +void
> +f4 (v4df *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +void
> +f5 (v4si *x, v4df *y)
> +{
> +  *y = __builtin_convertvector (*x, v4df);
> +}
> +
> +void
> +f6 (v256df *x, v256di *y)
> +{
> +  *y = __builtin_convertvector (*x, v256di);
> +}
> +
> +void
> +f7 (v256di *x, v256df *y)
> +{
> +  *y = __builtin_convertvector (*x, v256df);
> +}
> +
> +void
> +f8 (v4df *x)
> +{
> +  v4si a = { 1, 2, -3, -4 };
> +  *x = __builtin_convertvector (a, v4df);
> +}
> +
> +int
> +main ()
> +{
> +  union U1 { v4si v; int a[4]; } u1;
> +  union U2 { v4usi v; unsigned int a[4]; } u2;
> +  union U3 { v4sf v; float a[4]; } u3;
> +  union U4 { v4df v; double a[4]; } u4;
> +  union U5 { v256di v; long long a[256]; } u5;
> +  union U6 { v256df v; double a[256]; } u6;
> +  int i;
> +  for (i = 0; i < 4; i++)
> +    u2.a[i] = i * 2;
> +  f1 (&u2.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i * 2)
> +      abort ();
> +    else
> +      u3.a[i] = i - 2.25f;
> +  f2 (&u3.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != (i == 3 ? 0 : i - 2))
> +      abort ();
> +    else
> +      u3.a[i] = i + 0.75f;
> +  f2 (&u3.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i)
> +      abort ();
> +    else
> +      u1.a[i] = 7 * i - 5;
> +  f3 (&u1.v, &u3.v);
> +  for (i = 0; i < 4; i++)
> +    if (u3.a[i] != 7 * i - 5)
> +      abort ();
> +    else
> +      u4.a[i] = i - 2.25;
> +  f4 (&u4.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != (i == 3 ? 0 : i - 2))
> +      abort ();
> +    else
> +      u4.a[i] = i + 0.75;
> +  f4 (&u4.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i)
> +      abort ();
> +    else
> +      u1.a[i] = 7 * i - 5;
> +  f5 (&u1.v, &u4.v);
> +  for (i = 0; i < 4; i++)
> +    if (u4.a[i] != 7 * i - 5)
> +      abort ();
> +  for (i = 0; i < 256; i++)
> +    u6.a[i] = i - 128.25;
> +  f6 (&u6.v, &u5.v);
> +  for (i = 0; i < 256; i++)
> +    if (u5.a[i] != i - 128 - (i > 128))
> +      abort ();
> +    else
> +      u5.a[i] = i - 128;
> +  f7 (&u5.v, &u6.v);
> +  for (i = 0; i < 256; i++)
> +    if (u6.a[i] != i - 128)
> +      abort ();
> +  f8 (&u4.v);
> +  for (i = 0; i < 4; i++)
> +    if (u4.a[i] != (i >= 2 ? -1 - i : i + 1))
> +      abort ();
> +  return 0;
> +}
> --- gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C.jj	2019-01-02 18:04:14.984350274 +0100
> +++ gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C	2019-01-02 18:07:17.122375950 +0100
> @@ -0,0 +1,137 @@
> +// { dg-do run }
> +
> +extern "C" void abort ();
> +typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
> +typedef unsigned int v4usi __attribute__((vector_size (4 * sizeof (unsigned int))));
> +typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
> +typedef double v4df __attribute__((vector_size (4 * sizeof (double))));
> +typedef long long v256di __attribute__((vector_size (256 * sizeof (long long))));
> +typedef double v256df __attribute__((vector_size (256 * sizeof (double))));
> +
> +template <int N>
> +void
> +f1 (v4usi *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +template <typename T>
> +void
> +f2 (T *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +template <typename T>
> +void
> +f3 (v4si *x, T *y)
> +{
> +  *y = __builtin_convertvector (*x, T);
> +}
> +
> +template <int N>
> +void
> +f4 (v4df *x, v4si *y)
> +{
> +  *y = __builtin_convertvector (*x, v4si);
> +}
> +
> +template <typename T, typename U>
> +void
> +f5 (T *x, U *y)
> +{
> +  *y = __builtin_convertvector (*x, U);
> +}
> +
> +template <typename T>
> +void
> +f6 (v256df *x, T *y)
> +{
> +  *y = __builtin_convertvector (*x, T);
> +}
> +
> +template <int N>
> +void
> +f7 (v256di *x, v256df *y)
> +{
> +  *y = __builtin_convertvector (*x, v256df);
> +}
> +
> +template <int N>
> +void
> +f8 (v4df *x)
> +{
> +  v4si a = { 1, 2, -3, -4 };
> +  *x = __builtin_convertvector (a, v4df);
> +}
> +
> +int
> +main ()
> +{
> +  union U1 { v4si v; int a[4]; } u1;
> +  union U2 { v4usi v; unsigned int a[4]; } u2;
> +  union U3 { v4sf v; float a[4]; } u3;
> +  union U4 { v4df v; double a[4]; } u4;
> +  union U5 { v256di v; long long a[256]; } u5;
> +  union U6 { v256df v; double a[256]; } u6;
> +  int i;
> +  for (i = 0; i < 4; i++)
> +    u2.a[i] = i * 2;
> +  f1<0> (&u2.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i * 2)
> +      abort ();
> +    else
> +      u3.a[i] = i - 2.25f;
> +  f2 (&u3.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != (i == 3 ? 0 : i - 2))
> +      abort ();
> +    else
> +      u3.a[i] = i + 0.75f;
> +  f2 (&u3.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i)
> +      abort ();
> +    else
> +      u1.a[i] = 7 * i - 5;
> +  f3 (&u1.v, &u3.v);
> +  for (i = 0; i < 4; i++)
> +    if (u3.a[i] != 7 * i - 5)
> +      abort ();
> +    else
> +      u4.a[i] = i - 2.25;
> +  f4<12> (&u4.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != (i == 3 ? 0 : i - 2))
> +      abort ();
> +    else
> +      u4.a[i] = i + 0.75;
> +  f4<13> (&u4.v, &u1.v);
> +  for (i = 0; i < 4; i++)
> +    if (u1.a[i] != i)
> +      abort ();
> +    else
> +      u1.a[i] = 7 * i - 5;
> +  f5 (&u1.v, &u4.v);
> +  for (i = 0; i < 4; i++)
> +    if (u4.a[i] != 7 * i - 5)
> +      abort ();
> +  for (i = 0; i < 256; i++)
> +    u6.a[i] = i - 128.25;
> +  f6 (&u6.v, &u5.v);
> +  for (i = 0; i < 256; i++)
> +    if (u5.a[i] != i - 128 - (i > 128))
> +      abort ();
> +    else
> +      u5.a[i] = i - 128;
> +  f7<-1> (&u5.v, &u6.v);
> +  for (i = 0; i < 256; i++)
> +    if (u6.a[i] != i - 128)
> +      abort ();
> +  f8<5> (&u4.v);
> +  for (i = 0; i < 4; i++)
> +    if (u4.a[i] != (i >= 2 ? -1 - i : i + 1))
> +      abort ();
> +  return 0;
> +}
> --- gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C.jj	2019-01-02 18:39:12.767204801 +0100
> +++ gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C	2019-01-02 18:42:30.749985890 +0100
> @@ -0,0 +1,17 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wno-psabi" }
> +
> +typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
> +typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
> +constexpr v4sf a = __builtin_convertvector (v4si { 1, 2, -3, -4 }, v4sf);
> +
> +constexpr v4sf
> +foo (v4si x)
> +{
> +  return __builtin_convertvector (x, v4sf);
> +}
> +
> +constexpr v4sf b = foo (v4si { 3, 4, -1, -2 });
> +
> +static_assert (a[0] == 1.0f && a[1] == 2.0f && a[2] == -3.0f && a[3] == -4.0f, "");
> +static_assert (b[0] == 3.0f && b[1] == 4.0f && b[2] == -1.0f && b[3] == -2.0f, "");
> 
> 	Jakub
> 
>
Jakub Jelinek Jan. 3, 2019, 12:11 p.m. UTC | #4
On Thu, Jan 03, 2019 at 12:16:31PM +0100, Richard Biener wrote:
> I guess it depends on target capabilities - I think
> __builtin_convertvector is a bit "misdesigned" for pack/unpack.  You
> also have to consider a v2di to v2qi conversion which requires

I'm aware of that, I know supportable_{widening,narrowing}_conversion in the
vectorizer handles those, but they are vectorizer specific and written for
the model vectorizer uses.  In any case, I wanted to have something correct
first (i.e. the scalar ops fallback in there) and then improve what I can,
and start with the 2x narrowing and widening first and only when that works
go further.

> several unpack steps.  Does the clang documentation given any
> hints how to "efficiently" use __builtin_convertvector for
> packing/unpacking without exposing too much of the target architecture?

The clang documentation is completely useless here, trying e.g.
typedef signed char v16qi __attribute__((vector_size (16 * sizeof (signed char))));
typedef int v16si __attribute__((vector_size (16 * sizeof (int))));

void
foo (v16si *x, v16qi *y)
{
  *y = __builtin_convertvector (*x, v16qi);
}

void
bar (v16qi *x, v16si *y)
{
  *y = __builtin_convertvector (*x, v16si);
}

with clang -O2 -mavx512{bw,vl,dq} shows efficient:
        vmovdqa64       (%rdi), %zmm0
        vpmovdb %zmm0, (%rsi)
and
        vpmovsxbd       (%rdi), %zmm0
        vmovdqa64       %zmm0, (%rsi)
With -O2 -mavx2 bar is:
        vpmovsxbd       (%rdi), %ymm0
        vpmovsxbd       8(%rdi), %ymm1
        vmovdqa %ymm1, 32(%rsi)
        vmovdqa %ymm0, (%rsi)
which is what would be emitted for v8[qs]i twice, and foo is:
        vmovdqa (%rdi), %ymm0
        vmovdqa 32(%rdi), %ymm1
        vmovdqa .LCPI0_0(%rip), %ymm2   # ymm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]
        vpshufb %ymm2, %ymm1, %ymm1
        vpermq  $232, %ymm1, %ymm1      # ymm1 = ymm1[0,2,2,3]
        vmovdqa .LCPI0_1(%rip), %xmm3   # xmm3 = <0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u>
        vpshufb %xmm3, %xmm1, %xmm1
        vpshufb %ymm2, %ymm0, %ymm0
        vpermq  $232, %ymm0, %ymm0      # ymm0 = ymm0[0,2,2,3]
        vpshufb %xmm3, %xmm0, %xmm0
        vpunpcklqdq     %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0]
        vmovdqa %xmm0, (%rsi)
which looks quite complicated to me.  I would think we could emit e.g. what
we emit for:
typedef signed char v16qi __attribute__((vector_size (16 * sizeof (signed char))));
typedef signed char v32qi __attribute__((vector_size (32 * sizeof (signed char))));

void
baz (v32qi *x, v16qi *y)
{
  v32qi z = __builtin_shuffle (x[0], x[1], (v32qi) { 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
						     0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60 });
  v16qi u;
  __builtin_memcpy (&u, &z, sizeof (u));
  *y = u;
}
which is with gcc trunk:
        vmovdqa (%rdi), %ymm0
        vmovdqa 32(%rdi), %ymm1
        vpshufb .LC0(%rip), %ymm0, %ymm0
        vpshufb .LC1(%rip), %ymm1, %ymm1
        vpermq  $78, %ymm0, %ymm3
        vpermq  $78, %ymm1, %ymm2
        vpor    %ymm3, %ymm0, %ymm0
        vpor    %ymm2, %ymm1, %ymm1
        vpor    %ymm1, %ymm0, %ymm0
        vmovaps %xmm0, (%rsi)
although really the upper half is a don't care (so a properly implemented
__builtin_shufflevector might be handy too, with
__builtin_shufflevector (x[0], x[1], 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
				     -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1);
).

> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> do_vec_conversion needs a comment.  Overall the patch (with its

Will add one (though do_unop/do_binop don't have one either) and add
documentation.

> existing features) looks OK to me.
> 
> As of Marcs comments I agree that vector lowering happens quite late.
> It might be for example useful to lower before vectorization (or
> any loop optimization) so that un-handled generic vector code can be
> eventually vectorized differently.  But that's sth to investigate for
> GCC 10.
> 
> Giving FE maintainers a chance to comment, so no overall ACK yet.

Ok.

	Jakub
Richard Sandiford Jan. 3, 2019, 1:06 p.m. UTC | #5
Jakub Jelinek <jakub@redhat.com> writes:
> +      /* Can't use get_compute_type here, as supportable_convert_operation
> +	 doesn't necessarily use an optab and needs two arguments.  */
> +      tree vector_compute_type
> +	= type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab);
> +      unsigned HOST_WIDE_INT nelts;
> +      if (vector_compute_type
> +	  && VECTOR_MODE_P (TYPE_MODE (vector_compute_type))
> +	  && subparts_gt (arg_type, vector_compute_type)
> +	  && TYPE_VECTOR_SUBPARTS (vector_compute_type).is_constant (&nelts))
> +	{
> +	  while (nelts > 1)
> +	    {
> +	      tree ret1_type = build_vector_type (TREE_TYPE (ret_type), nelts);
> +	      tree arg1_type = build_vector_type (TREE_TYPE (arg_type), nelts);
> +	      if (supportable_convert_operation (code, ret1_type, arg1_type,
> +						 &decl, &code1))
> +		{
> +		  new_rhs = expand_vector_piecewise (gsi, do_vec_conversion,
> +						     ret_type, arg1_type, arg,
> +						     decl, code1);
> +		  g = gimple_build_assign (lhs, new_rhs);
> +		  gsi_replace (gsi, g, false);
> +		  return;
> +		}
> +	      nelts = nelts / 2;
> +	    }
> +	}

I think for this it would be better to use:

      if (vector_compute_type
	  && VECTOR_MODE_P (TYPE_MODE (vector_compute_type))
	  && subparts_gt (arg_type, vector_compute_type))
	{
	  unsigned HOST_WIDE_INT nelts = constant_lower_bound
	    (TYPE_VECTOR_SUBPARTS (vector_compute_type));

since the loop is self-checking.

E.g. this will make the Advanced SIMD handling on AArch64 the same
regardless of whether SVE is also enabled.

Thanks,
Richard
Martin Sebor Jan. 3, 2019, 5:04 p.m. UTC | #6
> +/* Build a VEC_CONVERT ifn for __builtin_convertvector builtin.  */

Can you please document the function arguments and explain how they
are used?

> +
> +tree
> +c_build_vec_convert (location_t loc1, tree expr, location_t loc2, tree type,
> +		     bool complain)
> +{
> +  if (error_operand_p (type))
> +    return error_mark_node;
> +  if (error_operand_p (expr))
> +    return error_mark_node;
> +
> +  if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
> +      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (expr)))
> +    {
> +      if (complain)
> +	error_at (loc1, "%<__builtin_convertvector%> first argument must "
> +			"be an integer or floating vector");
> +      return error_mark_node;
> +    }
> +
> +  if (!VECTOR_INTEGER_TYPE_P (type) && !VECTOR_FLOAT_TYPE_P (type))
> +    {
> +      if (complain)
> +	error_at (loc2, "%<__builtin_convertvector%> second argument must "
> +			"be an integer or floating vector type");
> +      return error_mark_node;
> +    }
> +
> +  if (maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (expr)),
> +		TYPE_VECTOR_SUBPARTS (type)))
> +    {
> +      if (complain)
> +	error_at (loc1, "%<__builtin_convertvector%> number of elements "
> +			"of the first argument vector and the second argument "
> +			"vector type should be the same");
> +      return error_mark_node;
> +    }

Just a few wording suggestions for the errors:

1) for the first two errors consider using a single message
    parameterized on the argument number to reduce translation effort
    (both styles are in use but the more concise form seems preferable
    to me)
2) in the last error use "must" instead of "should" as in the first
    two ("must" is imperative rather than just suggestive)
3) consider simplifying the third message to "%<...%> argument
    vectors must have the same size" (or "the same number of
    elements") along the same lines as in c_build_vec_perm_expr.

> +
> +  if ((TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (expr)))
> +       == TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> +      || (VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
> +	  && VECTOR_INTEGER_TYPE_P (type)
> +	  && (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (expr)))
> +	      == TYPE_PRECISION (TREE_TYPE (type)))))
> +    return build1_loc (loc1, VIEW_CONVERT_EXPR, type, expr);

The conditional above is very difficult to read, and without
a comment explaining its purpose, difficult to understand.
Introducing named temporaries for the repetitive subexpressions
would help with the readability (not just here but in rest of
the function as well).  A comment explaining what the conditional
handles would help with the latter.

> +
> +  bool wrap = true;
> +  bool maybe_const = false;
> +  tree ret;

Moving maybe_const to the conditional block where it's used and ret
to the point of its initialization just after that block would improve
readability.

> +  if (!c_dialect_cxx ())
> +    {
> +      /* Avoid C_MAYBE_CONST_EXPRs inside of VEC_CONVERT argument.  */
> +      expr = c_fully_fold (expr, false, &maybe_const);
> +      wrap &= maybe_const;
> +    }
> +
> +  ret = build_call_expr_internal_loc (loc1, IFN_VEC_CONVERT, type, 1, expr);
> +
> +  if (!wrap)
> +    ret = c_wrap_maybe_const (ret, true);
> +
> +  return ret;
>   }

Martin
Marc Glisse Jan. 3, 2019, 5:32 p.m. UTC | #7
On Thu, 3 Jan 2019, Jakub Jelinek wrote:

> On Thu, Jan 03, 2019 at 11:48:12AM +0100, Marc Glisse wrote:
>>> The following patch adds support for the __builtin_convertvector builtin.
>>> C casts on generic vectors are just reinterpretation of the bits (i.e. a
>>> VCE), this builtin allows to cast int/unsigned elements to float or vice
>>> versa or promote/demote them.  doc/ change is missing, will write it soon.
>>>
>>> The builtin appeared in I think clang 3.4 and is apparently in real-world
>>> use as e.g. Honza reported.  The first argument is an expression with vector
>>> type, the second argument is a vector type (similarly e.g. to va_arg), to
>>> which the first argument should be converted.  Both vector types need to
>>> have the same number of elements.
>>>
>>> I've implemented same element size (thus also whole vector size) conversions
>>> efficiently - signed to unsigned and vice versa or same vector type just
>>> using a VCE, for e.g. int <-> float or long long <-> double using
>>> appropriate optab, possibly repeated multiple times for very large vectors.
>>
>> IIUC, you only lower __builtin_convertvector to VCE or FLOAT_EXPR or
>> whatever in tree-vect-generic. That seems quite late. At least for the
>> "easy" same-size case, I think we should do it early (gimplification?),
>
> No, it must not be done at gimplification time, think about OpenMP/OpenACC
> offloading, the target before IPA optimizations might not be the target
> after them, while they have to agree on ABI issues, the optabs definitely
> can be and are different and these optabs originally added for the
> vectorizer are something that doesn't have a fallback, whatever introduces
> it into the IL is responsible for verification it is supported.

Ah, I was missing this. And I don't see why we should keep it that way. As 
long as the vectorizer was the only producer, it made sense not to have a 
fallback, it was not needed. But now that we are talking of having the 
user produce it almost directly, it would make sense for it to behave like 
other vector operations (say PLUS_EXPR).

> That said, not sure if e.g. using an opaque builtin for the conversion that
> supportable_convert_operation sometimes uses is better over this ifn.
> What exact optimization opportunities you are looking for if it is lowered
> earlier?  I have the VECTOR_CST folding in place...

I don't know, any kind of optimization we currently do on scalars... For 
conversions between integers and floats, that seems to be very limited, 
maybe combine consecutive casts in rare cases. For sign changes, we have a 
number of transformations in match.pd that are fine with an intermediate 
cast that only changes the sign (I even introduced nop_convert to handle 
vectors at the same time). I guess we could handle this IFN as well. It is 
just that having 2 ways to express the same thing tends to cause code 
duplication.

On the other hand, for narrowing/widening conversions, keeping it as one 
stmt with your ifn may be more convenient to optimize than a large mess of 
VEC_UNPACK_FLOAT_HI_EXPR and friends. Again I am thinking more of match.pd 
type of transformation, nothing that looks at the target.
diff mbox series

Patch

--- gcc/tree-vect-generic.c.jj	2019-01-01 12:37:17.084976148 +0100
+++ gcc/tree-vect-generic.c	2019-01-02 17:51:28.012876543 +0100
@@ -267,7 +267,8 @@  do_negate (gimple_stmt_iterator *gsi, tr
 static tree
 expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
 			 tree type, tree inner_type,
-			 tree a, tree b, enum tree_code code)
+			 tree a, tree b, enum tree_code code,
+			 tree ret_type = NULL_TREE)
 {
   vec<constructor_elt, va_gc> *v;
   tree part_width = TYPE_SIZE (inner_type);
@@ -278,23 +279,27 @@  expand_vector_piecewise (gimple_stmt_ite
   int i;
   location_t loc = gimple_location (gsi_stmt (*gsi));
 
-  if (types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
+  if (ret_type
+      || types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
     warning_at (loc, OPT_Wvector_operation_performance,
 		"vector operation will be expanded piecewise");
   else
     warning_at (loc, OPT_Wvector_operation_performance,
 		"vector operation will be expanded in parallel");
 
+  if (!ret_type)
+    ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
   for (i = 0; i < nunits;
        i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
     {
-      tree result = f (gsi, inner_type, a, b, index, part_width, code, type);
+      tree result = f (gsi, inner_type, a, b, index, part_width, code,
+		       ret_type);
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
     }
 
-  return build_constructor (type, v);
+  return build_constructor (ret_type, v);
 }
 
 /* Expand a vector operation to scalars with the freedom to use
@@ -302,8 +307,7 @@  expand_vector_piecewise (gimple_stmt_ite
    in the vector type.  */
 static tree
 expand_vector_parallel (gimple_stmt_iterator *gsi, elem_op_func f, tree type,
-			tree a, tree b,
-			enum tree_code code)
+			tree a, tree b, enum tree_code code)
 {
   tree result, compute_type;
   int n_words = tree_to_uhwi (TYPE_SIZE_UNIT (type)) / UNITS_PER_WORD;
@@ -1547,6 +1551,147 @@  expand_vector_scalar_condition (gimple_s
   update_stmt (gsi_stmt (*gsi));
 }
 
+static tree
+do_vec_conversion (gimple_stmt_iterator *gsi, tree inner_type, tree a,
+		   tree decl, tree bitpos, tree bitsize,
+		   enum tree_code code, tree type)
+{
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  if (!VECTOR_TYPE_P (inner_type))
+    return gimplify_build1 (gsi, code, TREE_TYPE (type), a);
+  if (code == CALL_EXPR)
+    {
+      gimple *g = gimple_build_call (decl, 1, a);
+      tree lhs = make_ssa_name (TREE_TYPE (TREE_TYPE (decl)));
+      gimple_call_set_lhs (g, lhs);
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      return lhs;
+    }
+  else
+    {
+      tree outer_type = build_vector_type (TREE_TYPE (type),
+					   TYPE_VECTOR_SUBPARTS (inner_type));
+      return gimplify_build1 (gsi, code, outer_type, a);
+    }
+}
+
+/* Expand VEC_CONVERT ifn call.  */
+
+static void
+expand_vector_conversion (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  gimple *g;
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg = gimple_call_arg (stmt, 0);
+  tree decl = NULL_TREE;
+  tree ret_type = TREE_TYPE (lhs);
+  tree arg_type = TREE_TYPE (arg);
+  tree new_rhs, compute_type = TREE_TYPE (arg_type);
+  enum tree_code code = NOP_EXPR;
+  enum tree_code code1 = ERROR_MARK;
+  enum { NARROW, NONE, WIDEN } modifier = NONE;
+  optab optab1 = unknown_optab;
+
+  gcc_checking_assert (VECTOR_TYPE_P (ret_type) && VECTOR_TYPE_P (arg_type));
+  gcc_checking_assert (tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (ret_type))));
+  gcc_checking_assert (tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (arg_type))));
+  if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
+      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
+    code = FIX_TRUNC_EXPR;
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
+	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
+    code = FLOAT_EXPR;
+  if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (ret_type)))
+      < tree_to_uhwi (TYPE_SIZE (TREE_TYPE (arg_type))))
+    modifier = NARROW;
+  else if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (ret_type)))
+	   > tree_to_uhwi (TYPE_SIZE (TREE_TYPE (arg_type))))
+    modifier = WIDEN;
+
+  if (modifier == NONE && (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR))
+    {
+      if (supportable_convert_operation (code, ret_type, arg_type, &decl,
+					 &code1))
+	{
+	  if (code1 == CALL_EXPR)
+	    {
+	      g = gimple_build_call (decl, 1, arg);
+	      gimple_call_set_lhs (g, lhs);
+	    }
+	  else
+	    g = gimple_build_assign (lhs, code1, arg);
+	  gsi_replace (gsi, g, false);
+	  return;
+	}
+      /* Can't use get_compute_type here, as supportable_convert_operation
+	 doesn't necessarily use an optab and needs two arguments.  */
+      tree vector_compute_type
+	= type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab);
+      unsigned HOST_WIDE_INT nelts;
+      if (vector_compute_type
+	  && VECTOR_MODE_P (TYPE_MODE (vector_compute_type))
+	  && subparts_gt (arg_type, vector_compute_type)
+	  && TYPE_VECTOR_SUBPARTS (vector_compute_type).is_constant (&nelts))
+	{
+	  while (nelts > 1)
+	    {
+	      tree ret1_type = build_vector_type (TREE_TYPE (ret_type), nelts);
+	      tree arg1_type = build_vector_type (TREE_TYPE (arg_type), nelts);
+	      if (supportable_convert_operation (code, ret1_type, arg1_type,
+						 &decl, &code1))
+		{
+		  new_rhs = expand_vector_piecewise (gsi, do_vec_conversion,
+						     ret_type, arg1_type, arg,
+						     decl, code1);
+		  g = gimple_build_assign (lhs, new_rhs);
+		  gsi_replace (gsi, g, false);
+		  return;
+		}
+	      nelts = nelts / 2;
+	    }
+	}
+    }
+  /* FIXME: __builtin_convertvector argument and return vectors have the same
+     number of elements, so for both narrowing and widening we need to figure
+     out what is the best set of optabs to use.  E.g. for NARROW
+     VEC_PACK_TRUNC_EXPR has 2 arguments, shall we prefer emitting that with
+     one argument of arg and another argument all zeros and extract first
+     half of the resulting vector, or extract lo and hi halves of the arg
+     vector and use VEC_PACK_TRUNC_EXPR on those?  */
+  else if (0 && modifier == NARROW)
+    {
+      switch (code)
+	{
+	case NOP_EXPR:
+	  code1 = VEC_PACK_TRUNC_EXPR;
+	  optab1 = optab_for_tree_code (code1, arg_type, optab_default);
+	  break;
+	case FIX_TRUNC_EXPR:
+	  code1 = VEC_PACK_FIX_TRUNC_EXPR;
+	  /* The signedness is determined from output operand.  */
+	  optab1 = optab_for_tree_code (code1, ret_type, optab_default);
+	  break;
+	case FLOAT_EXPR:
+	  code1 = VEC_PACK_FLOAT_EXPR;
+	  optab1 = optab_for_tree_code (code1, arg_type, optab_default);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+      if (optab1)
+	compute_type = get_compute_type (code1, optab1, arg_type);
+      (void) compute_type;
+    }
+
+  new_rhs = expand_vector_piecewise (gsi, do_vec_conversion, arg_type,
+				     TREE_TYPE (arg_type), arg,
+				     NULL_TREE, code, ret_type);
+  g = gimple_build_assign (lhs, new_rhs);
+  gsi_replace (gsi, g, false);
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -1561,7 +1706,11 @@  expand_vector_operations_1 (gimple_stmt_
   /* Only consider code == GIMPLE_ASSIGN. */
   gassign *stmt = dyn_cast <gassign *> (gsi_stmt (*gsi));
   if (!stmt)
-    return;
+    {
+      if (gimple_call_internal_p (gsi_stmt (*gsi), IFN_VEC_CONVERT))
+	expand_vector_conversion (gsi);
+      return;
+    }
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
--- gcc/internal-fn.def.jj	2019-01-01 12:37:17.893962875 +0100
+++ gcc/internal-fn.def	2019-01-02 11:24:24.307681792 +0100
@@ -296,6 +296,7 @@  DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
+DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
 /* An unduplicable, uncombinable function.  Generally used to preserve
    a CFG property in the face of jump threading, tail merging or
--- gcc/internal-fn.c.jj	2019-01-01 12:37:19.567935410 +0100
+++ gcc/internal-fn.c	2019-01-02 11:24:24.315681661 +0100
@@ -2581,6 +2581,15 @@  expand_VA_ARG (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* IFN_VEC_CONVERT is supposed to be expanded at pass_lower_vector.  So this
+   dummy function should never be called.  */
+
+static void
+expand_VEC_CONVERT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Expand the IFN_UNIQUE function according to its first argument.  */
 
 static void
--- gcc/fold-const-call.c.jj	2019-01-01 12:37:16.528985271 +0100
+++ gcc/fold-const-call.c	2019-01-02 15:57:36.656449175 +0100
@@ -30,6 +30,7 @@  along with GCC; see the file COPYING3.
 #include "tm.h" /* For C[LT]Z_DEFINED_AT_ZERO.  */
 #include "builtins.h"
 #include "gimple-expr.h"
+#include "tree-vector-builder.h"
 
 /* Functions that test for certain constant types, abstracting away the
    decision about whether to check for overflow.  */
@@ -645,6 +646,40 @@  fold_const_reduction (tree type, tree ar
   return res;
 }
 
+/* Fold a call to IFN_VEC_CONVERT (ARG) returning TYPE.  */
+
+static tree
+fold_const_vec_convert (tree ret_type, tree arg)
+{
+  enum tree_code code = NOP_EXPR;
+  tree arg_type = TREE_TYPE (arg);
+  if (TREE_CODE (arg) != VECTOR_CST)
+    return NULL_TREE;
+
+  gcc_checking_assert (VECTOR_TYPE_P (ret_type) && VECTOR_TYPE_P (arg_type));
+
+  if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
+      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
+    code = FIX_TRUNC_EXPR;
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
+	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
+    code = FLOAT_EXPR;
+
+  tree_vector_builder elts;
+  elts.new_unary_operation (ret_type, arg, true);
+  unsigned int count = elts.encoded_nelts ();
+  for (unsigned int i = 0; i < count; ++i)
+    {
+      tree elt = fold_unary (code, TREE_TYPE (ret_type),
+			     VECTOR_CST_ELT (arg, i));
+      if (elt == NULL_TREE || !CONSTANT_CLASS_P (elt))
+	return NULL_TREE;
+      elts.quick_push (elt);
+    }
+
+  return elts.build ();
+}
+
 /* Try to evaluate:
 
       *RESULT = FN (*ARG)
@@ -1232,6 +1267,9 @@  fold_const_call (combined_fn fn, tree ty
     case CFN_REDUC_XOR:
       return fold_const_reduction (type, arg, BIT_XOR_EXPR);
 
+    case CFN_VEC_CONVERT:
+      return fold_const_vec_convert (type, arg);
+
     default:
       return fold_const_call_1 (fn, type, arg);
     }
--- gcc/c-family/c-common.h.jj	2019-01-01 12:37:51.309414610 +0100
+++ gcc/c-family/c-common.h	2019-01-02 11:24:24.314681677 +0100
@@ -102,7 +102,7 @@  enum rid
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
-  RID_BUILTIN_TGMATH,
+  RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
   RID_BUILTIN_HAS_ATTRIBUTE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
@@ -1001,6 +1001,7 @@  extern bool lvalue_p (const_tree);
 extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
 extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note);
 extern tree c_build_vec_perm_expr (location_t, tree, tree, tree, bool = true);
+extern tree c_build_vec_convert (location_t, tree, location_t, tree, bool = true);
 
 extern void init_c_lex (void);
 
--- gcc/c-family/c-common.c.jj	2019-01-01 12:37:51.366413675 +0100
+++ gcc/c-family/c-common.c	2019-01-02 11:24:24.314681677 +0100
@@ -376,6 +376,7 @@  const struct c_common_resword c_common_r
     RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
@@ -1070,6 +1071,70 @@  c_build_vec_perm_expr (location_t loc, t
     ret = c_wrap_maybe_const (ret, true);
 
   return ret;
+}
+
+/* Build a VEC_CONVERT ifn for __builtin_convertvector builtin.  */
+
+tree
+c_build_vec_convert (location_t loc1, tree expr, location_t loc2, tree type,
+		     bool complain)
+{
+  if (error_operand_p (type))
+    return error_mark_node;
+  if (error_operand_p (expr))
+    return error_mark_node;
+
+  if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
+      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (expr)))
+    {
+      if (complain)
+	error_at (loc1, "%<__builtin_convertvector%> first argument must "
+			"be an integer or floating vector");
+      return error_mark_node;
+    }
+
+  if (!VECTOR_INTEGER_TYPE_P (type) && !VECTOR_FLOAT_TYPE_P (type))
+    {
+      if (complain)
+	error_at (loc2, "%<__builtin_convertvector%> second argument must "
+			"be an integer or floating vector type");
+      return error_mark_node;
+    }
+
+  if (maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (expr)),
+		TYPE_VECTOR_SUBPARTS (type)))
+    {
+      if (complain)
+	error_at (loc1, "%<__builtin_convertvector%> number of elements "
+			"of the first argument vector and the second argument "
+			"vector type should be the same");
+      return error_mark_node;
+    }
+
+  if ((TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (expr)))
+       == TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+      || (VECTOR_INTEGER_TYPE_P (TREE_TYPE (expr))
+	  && VECTOR_INTEGER_TYPE_P (type)
+	  && (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (expr)))
+	      == TYPE_PRECISION (TREE_TYPE (type)))))
+    return build1_loc (loc1, VIEW_CONVERT_EXPR, type, expr);
+
+  bool wrap = true;
+  bool maybe_const = false;
+  tree ret;
+  if (!c_dialect_cxx ())
+    {
+      /* Avoid C_MAYBE_CONST_EXPRs inside of VEC_CONVERT argument.  */
+      expr = c_fully_fold (expr, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+
+  ret = build_call_expr_internal_loc (loc1, IFN_VEC_CONVERT, type, 1, expr);
+
+  if (!wrap)
+    ret = c_wrap_maybe_const (ret, true);
+
+  return ret;
 }
 
 /* Like tree.c:get_narrower, but retain conversion from C++0x scoped enum
--- gcc/c/c-parser.c.jj	2019-01-01 12:37:48.677457794 +0100
+++ gcc/c/c-parser.c	2019-01-02 11:24:24.312681710 +0100
@@ -8038,6 +8038,7 @@  enum tgmath_parm_kind
      __builtin_shuffle ( assignment-expression ,
 			 assignment-expression ,
 			 assignment-expression, )
+     __builtin_convertvector ( assignment-expression , type-name )
 
    offsetof-member-designator:
      identifier
@@ -9113,17 +9114,14 @@  c_parser_postfix_expression (c_parser *p
 	      *p = convert_lvalue_to_rvalue (loc, *p, true, true);
 
 	    if (vec_safe_length (cexpr_list) == 2)
-	      expr.value =
-		c_build_vec_perm_expr
-		  (loc, (*cexpr_list)[0].value,
-		   NULL_TREE, (*cexpr_list)[1].value);
+	      expr.value = c_build_vec_perm_expr (loc, (*cexpr_list)[0].value,
+						  NULL_TREE,
+						  (*cexpr_list)[1].value);
 
 	    else if (vec_safe_length (cexpr_list) == 3)
-	      expr.value =
-		c_build_vec_perm_expr
-		  (loc, (*cexpr_list)[0].value,
-		   (*cexpr_list)[1].value,
-		   (*cexpr_list)[2].value);
+	      expr.value = c_build_vec_perm_expr (loc, (*cexpr_list)[0].value,
+						  (*cexpr_list)[1].value,
+						  (*cexpr_list)[2].value);
 	    else
 	      {
 		error_at (loc, "wrong number of arguments to "
@@ -9133,6 +9131,41 @@  c_parser_postfix_expression (c_parser *p
 	    set_c_expr_source_range (&expr, loc, close_paren_loc);
 	    break;
 	  }
+	case RID_BUILTIN_CONVERTVECTOR:
+	  {
+	    location_t start_loc = loc;
+	    c_parser_consume_token (parser);
+	    matching_parens parens;
+	    if (!parens.require_open (parser))
+	      {
+		expr.set_error ();
+		break;
+	      }
+	    e1 = c_parser_expr_no_commas (parser, NULL);
+	    mark_exp_read (e1.value);
+	    if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+	      {
+		c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
+		expr.set_error ();
+		break;
+	      }
+	    loc = c_parser_peek_token (parser)->location;
+	    t1 = c_parser_type_name (parser);
+	    location_t end_loc = c_parser_peek_token (parser)->get_finish ();
+	    c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+				       "expected %<)%>");
+	    if (t1 == NULL)
+	      expr.set_error ();
+	    else
+	      {
+		tree type_expr = NULL_TREE;
+		expr.value = c_build_vec_convert (start_loc, e1.value, loc,
+						  groktypename (t1, &type_expr,
+								NULL));
+		set_c_expr_source_range (&expr, start_loc, end_loc);
+	      }
+	  }
+	  break;
 	case RID_AT_SELECTOR:
 	  {
 	    gcc_assert (c_dialect_objc ());
--- gcc/cp/cp-tree.h.jj	2019-01-01 12:37:46.884487212 +0100
+++ gcc/cp/cp-tree.h	2019-01-02 16:43:35.480393140 +0100
@@ -7142,6 +7142,8 @@  extern bool is_lambda_ignored_entity
 extern bool lambda_static_thunk_p		(tree);
 extern tree finish_builtin_launder		(location_t, tree,
 						 tsubst_flags_t);
+extern tree cp_build_vec_convert		(tree, location_t, tree,
+						 tsubst_flags_t);
 extern void start_lambda_scope			(tree);
 extern void record_lambda_scope			(tree);
 extern void record_null_lambda_scope		(tree);
--- gcc/cp/parser.c.jj	2019-01-01 12:37:47.352479534 +0100
+++ gcc/cp/parser.c	2019-01-02 16:19:44.765760167 +0100
@@ -7031,6 +7031,32 @@  cp_parser_postfix_expression (cp_parser
 	break;
       }
 
+    case RID_BUILTIN_CONVERTVECTOR:
+      {
+	tree expression;
+	tree type;
+	/* Consume the `__builtin_convertvector' token.  */
+	cp_lexer_consume_token (parser->lexer);
+	/* Look for the opening `('.  */
+	matching_parens parens;
+	parens.require_open (parser);
+	/* Now, parse the assignment-expression.  */
+	expression = cp_parser_assignment_expression (parser);
+	/* Look for the `,'.  */
+	cp_parser_require (parser, CPP_COMMA, RT_COMMA);
+	location_t type_location
+	  = cp_lexer_peek_token (parser->lexer)->location;
+	/* Parse the type-id.  */
+	{
+	  type_id_in_expr_sentinel s (parser);
+	  type = cp_parser_type_id (parser);
+	}
+	/* Look for the closing `)'.  */
+	parens.require_close (parser);
+	return cp_build_vec_convert (expression, type_location, type,
+				     tf_warning_or_error);
+      }
+
     default:
       {
 	tree type;
--- gcc/cp/constexpr.c.jj	2019-01-01 12:37:47.282480682 +0100
+++ gcc/cp/constexpr.c	2019-01-02 16:56:54.126359632 +0100
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.
 #include "ubsan.h"
 #include "gimple-fold.h"
 #include "timevar.h"
+#include "fold-const-call.h"
 
 static bool verify_constant (tree, bool, bool *, bool *);
 #define VERIFY_CONSTANT(X)						\
@@ -1449,6 +1450,20 @@  cxx_eval_internal_function (const conste
       return cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
 					   false, non_constant_p, overflow_p);
 
+    case IFN_VEC_CONVERT:
+      {
+	tree arg = cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
+						 false, non_constant_p,
+						 overflow_p);
+	if (TREE_CODE (arg) == VECTOR_CST)
+	  return fold_const_call (CFN_VEC_CONVERT, TREE_TYPE (t), arg);
+	else
+	  {
+	    *non_constant_p = true;
+	    return t;
+	  }
+      }
+
     default:
       if (!ctx->quiet)
 	error_at (cp_expr_loc_or_loc (t, input_location),
@@ -5623,7 +5638,9 @@  potential_constant_expression_1 (tree t,
 		case IFN_SUB_OVERFLOW:
 		case IFN_MUL_OVERFLOW:
 		case IFN_LAUNDER:
+		case IFN_VEC_CONVERT:
 		  bail = false;
+		  break;
 
 		default:
 		  break;
--- gcc/cp/semantics.c.jj	2019-01-01 12:37:46.976485703 +0100
+++ gcc/cp/semantics.c	2019-01-02 18:15:42.844133048 +0100
@@ -9933,4 +9933,26 @@  finish_builtin_launder (location_t loc,
 				       TREE_TYPE (arg), 1, arg);
 }
 
+/* Finish __builtin_convertvector (arg, type).  */
+
+tree
+cp_build_vec_convert (tree arg, location_t loc, tree type,
+		      tsubst_flags_t complain)
+{
+  if (error_operand_p (type))
+    return error_mark_node;
+  if (error_operand_p (arg))
+    return error_mark_node;
+
+  tree ret = NULL_TREE;
+  if (!type_dependent_expression_p (arg) && !dependent_type_p (type))
+    ret = c_build_vec_convert (cp_expr_loc_or_loc (arg, input_location), arg,
+			       loc, type, (complain & tf_error) != 0);
+
+  if (!processing_template_decl)
+    return ret;
+
+  return build_call_expr_internal_loc (loc, IFN_VEC_CONVERT, type, 1, arg);
+}
+
 #include "gt-cp-semantics.h"
--- gcc/cp/pt.c.jj	2019-01-01 12:37:47.081483980 +0100
+++ gcc/cp/pt.c	2019-01-02 18:25:17.997778249 +0100
@@ -18813,6 +18813,27 @@  tsubst_copy_and_build (tree t,
 					      (*call_args)[0], complain);
 	      break;
 
+	    case IFN_VEC_CONVERT:
+	      gcc_assert (nargs == 1);
+	      if (vec_safe_length (call_args) != 1)
+		{
+		  error_at (cp_expr_loc_or_loc (t, input_location),
+			    "wrong number of arguments to "
+			    "%<__builtin_convertvector%>");
+		  ret = error_mark_node;
+		  break;
+		}
+	      ret = cp_build_vec_convert ((*call_args)[0], input_location,
+					  tsubst (TREE_TYPE (t), args,
+						  complain, in_decl),
+					  complain);
+	      if (TREE_CODE (ret) == VIEW_CONVERT_EXPR)
+		{
+		  release_tree_vector (call_args);
+		  RETURN (ret);
+		}
+	      break;
+
 	    default:
 	      /* Unsupported internal function with arguments.  */
 	      gcc_unreachable ();
--- gcc/testsuite/c-c++-common/builtin-convertvector-1.c.jj	2019-01-02 18:38:18.265090910 +0100
+++ gcc/testsuite/c-c++-common/builtin-convertvector-1.c	2019-01-02 18:37:50.337544972 +0100
@@ -0,0 +1,15 @@ 
+typedef int v8si __attribute__((vector_size (8 * sizeof (int))));
+typedef long long v4di __attribute__((vector_size (4 * sizeof (long long))));
+
+void
+foo (v8si *x, v4di *y, int z)
+{
+  __builtin_convertvector (*y, v8si);	/* { dg-error "number of elements of the first argument vector and the second argument vector type should be the same" } */
+  __builtin_convertvector (*x, v4di);	/* { dg-error "number of elements of the first argument vector and the second argument vector type should be the same" } */
+  __builtin_convertvector (*x, int);	/* { dg-error "second argument must be an integer or floating vector type" } */
+  __builtin_convertvector (z, v4di);	/* { dg-error "first argument must be an integer or floating vector" } */
+  __builtin_convertvector ();		/* { dg-error "expected" } */
+  __builtin_convertvector (*x);		/* { dg-error "expected" } */
+  __builtin_convertvector (*x, *y);	/* { dg-error "expected" } */
+  __builtin_convertvector (*x, v8si, 1);/* { dg-error "expected" } */
+}
--- gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c.jj	2019-01-02 18:00:59.982534637 +0100
+++ gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c	2019-01-02 18:00:32.871977360 +0100
@@ -0,0 +1,131 @@ 
+extern
+#ifdef __cplusplus
+"C"
+#endif
+void abort (void);
+typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
+typedef unsigned int v4usi __attribute__((vector_size (4 * sizeof (unsigned int))));
+typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
+typedef double v4df __attribute__((vector_size (4 * sizeof (double))));
+typedef long long v256di __attribute__((vector_size (256 * sizeof (long long))));
+typedef double v256df __attribute__((vector_size (256 * sizeof (double))));
+
+void
+f1 (v4usi *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+void
+f2 (v4sf *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+void
+f3 (v4si *x, v4sf *y)
+{
+  *y = __builtin_convertvector (*x, v4sf);
+}
+
+void
+f4 (v4df *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+void
+f5 (v4si *x, v4df *y)
+{
+  *y = __builtin_convertvector (*x, v4df);
+}
+
+void
+f6 (v256df *x, v256di *y)
+{
+  *y = __builtin_convertvector (*x, v256di);
+}
+
+void
+f7 (v256di *x, v256df *y)
+{
+  *y = __builtin_convertvector (*x, v256df);
+}
+
+void
+f8 (v4df *x)
+{
+  v4si a = { 1, 2, -3, -4 };
+  *x = __builtin_convertvector (a, v4df);
+}
+
+int
+main ()
+{
+  union U1 { v4si v; int a[4]; } u1;
+  union U2 { v4usi v; unsigned int a[4]; } u2;
+  union U3 { v4sf v; float a[4]; } u3;
+  union U4 { v4df v; double a[4]; } u4;
+  union U5 { v256di v; long long a[256]; } u5;
+  union U6 { v256df v; double a[256]; } u6;
+  int i;
+  for (i = 0; i < 4; i++)
+    u2.a[i] = i * 2;
+  f1 (&u2.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i * 2)
+      abort ();
+    else
+      u3.a[i] = i - 2.25f;
+  f2 (&u3.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != (i == 3 ? 0 : i - 2))
+      abort ();
+    else
+      u3.a[i] = i + 0.75f;
+  f2 (&u3.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i)
+      abort ();
+    else
+      u1.a[i] = 7 * i - 5;
+  f3 (&u1.v, &u3.v);
+  for (i = 0; i < 4; i++)
+    if (u3.a[i] != 7 * i - 5)
+      abort ();
+    else
+      u4.a[i] = i - 2.25;
+  f4 (&u4.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != (i == 3 ? 0 : i - 2))
+      abort ();
+    else
+      u4.a[i] = i + 0.75;
+  f4 (&u4.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i)
+      abort ();
+    else
+      u1.a[i] = 7 * i - 5;
+  f5 (&u1.v, &u4.v);
+  for (i = 0; i < 4; i++)
+    if (u4.a[i] != 7 * i - 5)
+      abort ();
+  for (i = 0; i < 256; i++)
+    u6.a[i] = i - 128.25;
+  f6 (&u6.v, &u5.v);
+  for (i = 0; i < 256; i++)
+    if (u5.a[i] != i - 128 - (i > 128))
+      abort ();
+    else
+      u5.a[i] = i - 128;
+  f7 (&u5.v, &u6.v);
+  for (i = 0; i < 256; i++)
+    if (u6.a[i] != i - 128)
+      abort ();
+  f8 (&u4.v);
+  for (i = 0; i < 4; i++)
+    if (u4.a[i] != (i >= 2 ? -1 - i : i + 1))
+      abort ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C.jj	2019-01-02 18:04:14.984350274 +0100
+++ gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C	2019-01-02 18:07:17.122375950 +0100
@@ -0,0 +1,137 @@ 
+// { dg-do run }
+
+extern "C" void abort ();
+typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
+typedef unsigned int v4usi __attribute__((vector_size (4 * sizeof (unsigned int))));
+typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
+typedef double v4df __attribute__((vector_size (4 * sizeof (double))));
+typedef long long v256di __attribute__((vector_size (256 * sizeof (long long))));
+typedef double v256df __attribute__((vector_size (256 * sizeof (double))));
+
+template <int N>
+void
+f1 (v4usi *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+template <typename T>
+void
+f2 (T *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+template <typename T>
+void
+f3 (v4si *x, T *y)
+{
+  *y = __builtin_convertvector (*x, T);
+}
+
+template <int N>
+void
+f4 (v4df *x, v4si *y)
+{
+  *y = __builtin_convertvector (*x, v4si);
+}
+
+template <typename T, typename U>
+void
+f5 (T *x, U *y)
+{
+  *y = __builtin_convertvector (*x, U);
+}
+
+template <typename T>
+void
+f6 (v256df *x, T *y)
+{
+  *y = __builtin_convertvector (*x, T);
+}
+
+template <int N>
+void
+f7 (v256di *x, v256df *y)
+{
+  *y = __builtin_convertvector (*x, v256df);
+}
+
+template <int N>
+void
+f8 (v4df *x)
+{
+  v4si a = { 1, 2, -3, -4 };
+  *x = __builtin_convertvector (a, v4df);
+}
+
+int
+main ()
+{
+  union U1 { v4si v; int a[4]; } u1;
+  union U2 { v4usi v; unsigned int a[4]; } u2;
+  union U3 { v4sf v; float a[4]; } u3;
+  union U4 { v4df v; double a[4]; } u4;
+  union U5 { v256di v; long long a[256]; } u5;
+  union U6 { v256df v; double a[256]; } u6;
+  int i;
+  for (i = 0; i < 4; i++)
+    u2.a[i] = i * 2;
+  f1<0> (&u2.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i * 2)
+      abort ();
+    else
+      u3.a[i] = i - 2.25f;
+  f2 (&u3.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != (i == 3 ? 0 : i - 2))
+      abort ();
+    else
+      u3.a[i] = i + 0.75f;
+  f2 (&u3.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i)
+      abort ();
+    else
+      u1.a[i] = 7 * i - 5;
+  f3 (&u1.v, &u3.v);
+  for (i = 0; i < 4; i++)
+    if (u3.a[i] != 7 * i - 5)
+      abort ();
+    else
+      u4.a[i] = i - 2.25;
+  f4<12> (&u4.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != (i == 3 ? 0 : i - 2))
+      abort ();
+    else
+      u4.a[i] = i + 0.75;
+  f4<13> (&u4.v, &u1.v);
+  for (i = 0; i < 4; i++)
+    if (u1.a[i] != i)
+      abort ();
+    else
+      u1.a[i] = 7 * i - 5;
+  f5 (&u1.v, &u4.v);
+  for (i = 0; i < 4; i++)
+    if (u4.a[i] != 7 * i - 5)
+      abort ();
+  for (i = 0; i < 256; i++)
+    u6.a[i] = i - 128.25;
+  f6 (&u6.v, &u5.v);
+  for (i = 0; i < 256; i++)
+    if (u5.a[i] != i - 128 - (i > 128))
+      abort ();
+    else
+      u5.a[i] = i - 128;
+  f7<-1> (&u5.v, &u6.v);
+  for (i = 0; i < 256; i++)
+    if (u6.a[i] != i - 128)
+      abort ();
+  f8<5> (&u4.v);
+  for (i = 0; i < 4; i++)
+    if (u4.a[i] != (i >= 2 ? -1 - i : i + 1))
+      abort ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C.jj	2019-01-02 18:39:12.767204801 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C	2019-01-02 18:42:30.749985890 +0100
@@ -0,0 +1,17 @@ 
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wno-psabi" }
+
+typedef int v4si __attribute__((vector_size (4 * sizeof (int))));
+typedef float v4sf __attribute__((vector_size (4 * sizeof (float))));
+constexpr v4sf a = __builtin_convertvector (v4si { 1, 2, -3, -4 }, v4sf);
+
+constexpr v4sf
+foo (v4si x)
+{
+  return __builtin_convertvector (x, v4sf);
+}
+
+constexpr v4sf b = foo (v4si { 3, 4, -1, -2 });
+
+static_assert (a[0] == 1.0f && a[1] == 2.0f && a[2] == -3.0f && a[3] == -4.0f, "");
+static_assert (b[0] == 3.0f && b[1] == 4.0f && b[2] == -1.0f && b[3] == -2.0f, "");