diff mbox

[4/4,ARM] Add attribute/pragma target fpu=

Message ID 5600096E.4030403@st.com
State New
Headers show

Commit Message

Christian Bruel Sept. 21, 2015, 1:43 p.m. UTC
Hi Kyrill,

Thanks for your comments. Answers interleaved and the new patch attached.

On 09/18/2015 11:04 AM, Kyrill Tkachov wrote:
>
> On 15/09/15 11:47, Christian Bruel wrote:
>>
>> On 09/14/2015 04:30 PM, Christian Bruel wrote:
>>> Finally, the final part of the patch set does the attribute target
>>> parsing and checking, redefines the preprocessor macros and implements
>>> the inlining rules.
>>>
>>> testcases and documentation included.
>>>
>> new version to remove a shadowed remnant piece of code.
>>
>>
>>    > thanks
>>    >
>>    > Christian
>>    >
>
> +  /* OK to inline between different modes.
> +     Function with mode specific instructions, e.g using asm,
> +     must be explicitely protected with noinline.  */
>
> s/explicitely/explicitly/
>

thanks

>
> +  const struct arm_fpu_desc *fpu_desc1
> +    = &all_fpus[caller_opts->x_arm_fpu_index];
> +  const struct arm_fpu_desc *fpu_desc2
> +    = &all_fpus[callee_opts->x_arm_fpu_index];
>
> Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way

ok

>
> +
> +  /* Can't inline NEON extension if the caller doesn't support it.  */
> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
> +    return false;
> +
> +  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
> +    return false;
> +
>
> We also need to take into account FPU_FL_FP16...
> In general what we want is for the callee FPU features to be
> a subset of the callers features, similar to the way we handle
> the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
> aarch64 port. I think that's the way to go here rather than explicitly
> writing down a check for each feature.

ok, with FL_FP16 now,

>
> @@ -242,6 +239,8 @@
>
>          /* Update macros.  */
>          gcc_assert (cur_opt->x_target_flags == target_flags);
> +      /* This one can be redefined by the pragma without warning.  */
> +      cpp_undef (parse_in, "__ARM_FP");
>          arm_cpu_builtins (parse_in);
>
> Could you elaborate why the cpp_undef here?
> If you want to undefine __ARM_FP so you can redefine it to a new value
> in arm_cpu_builtins then I think you should just undefine it in that function.

This is to avoid a warning: "__ARM_FP" redefined when creating a new 
pragma scope. (See the test attr-crypto.c).

We cannot call the cpp_undef inside arm_cpu_builtins, because it is also 
used for the TARGET_CPU_CPP_BUILTINS hook and then would prevent real 
illegitimate redefinitions.

Alternatively, I thought to reset the warn_builtin_macro_redefined flag, 
but that doesn't work as the macro is not NODE_BUILTIN (see the 
definition of warn_of_redefinition in libcpp).
We might need to change this later : should target macros be marked as 
NOTE_BUILTIN ? We can discuss this separately (I can open a defect) as 
we have the cpp_undep solution for now, if you agree.

>
>
> diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
> --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-10 12:21:00.698911244 +0200
> +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-14 10:27:20.281932581 +0200
> @@ -13360,6 +13363,8 @@
>    floating-point arithmetic (in particular denormal values are treated as
>    zero), so the use of NEON instructions may lead to a loss of precision.
>
> +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
> +
>
> s/"mfpu="/"fpu="
>

thanks

>
> --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
> +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-14 16:12:08.449698268 +0200
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
> +
> +void
> +f3(int n, int x[], int y[]) {
> +  int i;
> +  for (i = 0; i < n; ++i)
> +    y[i] = x[i] << 3;
> +}
> +
>
> What if GCC has been configured with --with-fpu=neon?
> Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.

Ah yes. I've added ((target("fpu=vfp")) instead, since we are testing 
the attribute.

Comments

Kyrylo Tkachov Oct. 8, 2015, 8:52 a.m. UTC | #1
Hi Christian,

On 21/09/15 14:43, Christian Bruel wrote:
> Hi Kyrill,
>
> Thanks for your comments. Answers interleaved and the new patch attached.
>
> On 09/18/2015 11:04 AM, Kyrill Tkachov wrote:
>> On 15/09/15 11:47, Christian Bruel wrote:
>>> On 09/14/2015 04:30 PM, Christian Bruel wrote:
>>>> Finally, the final part of the patch set does the attribute target
>>>> parsing and checking, redefines the preprocessor macros and implements
>>>> the inlining rules.
>>>>
>>>> testcases and documentation included.
>>>>
>>> new version to remove a shadowed remnant piece of code.
>>>
>>>
>>>     > thanks
>>>     >
>>>     > Christian
>>>     >
>> +  /* OK to inline between different modes.
>> +     Function with mode specific instructions, e.g using asm,
>> +     must be explicitely protected with noinline.  */
>>
>> s/explicitely/explicitly/
>>
> thanks
>
>> +  const struct arm_fpu_desc *fpu_desc1
>> +    = &all_fpus[caller_opts->x_arm_fpu_index];
>> +  const struct arm_fpu_desc *fpu_desc2
>> +    = &all_fpus[callee_opts->x_arm_fpu_index];
>>
>> Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way
> ok
>
>> +
>> +  /* Can't inline NEON extension if the caller doesn't support it.  */
>> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
>> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
>> +    return false;
>> +
>> +  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
>> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
>> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
>> +    return false;
>> +
>>
>> We also need to take into account FPU_FL_FP16...
>> In general what we want is for the callee FPU features to be
>> a subset of the callers features, similar to the way we handle
>> the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
>> aarch64 port. I think that's the way to go here rather than explicitly
>> writing down a check for each feature.
> ok, with FL_FP16 now,
>
>> @@ -242,6 +239,8 @@
>>
>>           /* Update macros.  */
>>           gcc_assert (cur_opt->x_target_flags == target_flags);
>> +      /* This one can be redefined by the pragma without warning.  */
>> +      cpp_undef (parse_in, "__ARM_FP");
>>           arm_cpu_builtins (parse_in);
>>
>> Could you elaborate why the cpp_undef here?
>> If you want to undefine __ARM_FP so you can redefine it to a new value
>> in arm_cpu_builtins then I think you should just undefine it in that function.
> This is to avoid a warning: "__ARM_FP" redefined when creating a new
> pragma scope. (See the test attr-crypto.c).
>
> We cannot call the cpp_undef inside arm_cpu_builtins, because it is also
> used for the TARGET_CPU_CPP_BUILTINS hook and then would prevent real
> illegitimate redefinitions.
>
> Alternatively, I thought to reset the warn_builtin_macro_redefined flag,
> but that doesn't work as the macro is not NODE_BUILTIN (see the
> definition of warn_of_redefinition in libcpp).
> We might need to change this later : should target macros be marked as
> NOTE_BUILTIN ? We can discuss this separately (I can open a defect) as
> we have the cpp_undep solution for now, if you agree.
>
>>
>> diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
>> --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-10 12:21:00.698911244 +0200
>> +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-14 10:27:20.281932581 +0200
>> @@ -13360,6 +13363,8 @@
>>     floating-point arithmetic (in particular denormal values are treated as
>>     zero), so the use of NEON instructions may lead to a loss of precision.
>>
>> +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
>> +
>>
>> s/"mfpu="/"fpu="
>>
> thanks
>
>> --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
>> +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-14 16:12:08.449698268 +0200
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_neon_ok } */
>> +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
>> +
>> +void
>> +f3(int n, int x[], int y[]) {
>> +  int i;
>> +  for (i = 0; i < n; ++i)
>> +    y[i] = x[i] << 3;
>> +}
>> +
>>
>> What if GCC has been configured with --with-fpu=neon?
>> Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.
> Ah yes. I've added ((target("fpu=vfp")) instead, since we are testing
> the attribute.
>

2015-05-26  Christian Bruel<christian.bruel@st.com>

	PR target/65837
	* config/arm/arm-c.c (arm_cpu_builtins): Set or reset
	__ARM_FEATURE_CRYPTO, __VFP_FP__, __ARM_NEON__
	(arm_pragma_target_parse): Change check for arm_cpu_builtins.
	undefine __ARM_FP.
	* config/arm/arm.c (arm_can_inline_p): Check FPUs.
	(arm_valid_target_attribute_rec): Handle -mfpu attribute target.
	* doc/invoke.texi (-mfpu=): Mention attribute and pragma.
	* doc/extend.texi (-mfpu=): Describe attribute.

2015-09-14  Christian Bruel<christian.bruel@st.com>

	PR target/65837
	gcc.target/arm/lto/pr65837_0.c
	gcc.target/arm/attr-neon2.c
	gcc.target/arm/attr-neon.c
	gcc.target/arm/attr-neon-builtin-fail.c
	gcc.target/arm/attr-crypto.c

The parts in this patch look ok to me.
However, I think we need some more functionality
In aarch64 we support compiling a file with no simd, including arm_neon.h and using arm_neon.h intrinsics
within functions tagged with simd support.
We want to support such functionality on arm i.e. compile a file with -mfpu=vfp and use arm_neon.h intrinsics
in a function tagged with an fpu=neon attribute.
For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, like in the aarch64 version of arm_neon.h

Thanks,
Kyrill
diff mbox

Patch

2015-05-26  Christian Bruel  <christian.bruel@st.com>

	PR target/65837
	* config/arm/arm-c.c (arm_cpu_builtins): Set or reset
	__ARM_FEATURE_CRYPTO, __VFP_FP__, __ARM_NEON__
	(arm_pragma_target_parse): Change check for arm_cpu_builtins.
	undefine __ARM_FP.
	* config/arm/arm.c (arm_can_inline_p): Check FPUs.
	(arm_valid_target_attribute_rec): Handle -mfpu attribute target.
	* doc/invoke.texi (-mfpu=): Mention attribute and pragma.
	* doc/extend.texi (-mfpu=): Describe attribute.

2015-09-14  Christian Bruel  <christian.bruel@st.com>

	PR target/65837
	gcc.target/arm/lto/pr65837_0.c
	gcc.target/arm/attr-neon2.c
	gcc.target/arm/attr-neon.c
	gcc.target/arm/attr-neon-builtin-fail.c
	gcc.target/arm/attr-crypto.c

diff -ruN gnu_trunk.p3/gcc/gcc/config/arm/arm.c gnu_trunk.p4/gcc/gcc/config/arm/arm.c
--- gnu_trunk.p3/gcc/gcc/config/arm/arm.c	2015-09-21 14:07:39.218566954 +0200
+++ gnu_trunk.p4/gcc/gcc/config/arm/arm.c	2015-09-21 13:36:36.242397513 +0200
@@ -29789,11 +29788,36 @@ 
 /* Hook to determine if one function can safely inline another.  */
 
 static bool
-arm_can_inline_p (tree caller ATTRIBUTE_UNUSED, tree callee ATTRIBUTE_UNUSED)
+arm_can_inline_p (tree caller, tree callee)
 {
-  /* Overidde default hook: Always OK to inline between different modes. 
-     Function with mode specific instructions, e.g using asm, must be explicitely 
-     protected with noinline.  */
+  tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+
+  struct cl_target_option *caller_opts
+	= TREE_TARGET_OPTION (caller_tree ? caller_tree
+					   : target_option_default_node);
+
+  struct cl_target_option *callee_opts
+	= TREE_TARGET_OPTION (callee_tree ? callee_tree
+					   : target_option_default_node);
+
+  const struct arm_fpu_desc *caller_fpu
+    = &all_fpus[caller_opts->x_arm_fpu_index];
+  const struct arm_fpu_desc *callee_fpu
+    = &all_fpus[callee_opts->x_arm_fpu_index];
+
+  /* Callee's fpu features should be a subset of the caller's.  */
+  if ((caller_fpu->features & callee_fpu->features) != callee_fpu->features)
+    return false;
+
+  /* Need same model and regs.  */
+  if (callee_fpu->model != caller_fpu->model
+      || callee_fpu->regs != callee_fpu->regs)
+    return false;
+
+  /* OK to inline between different modes.
+     Function with mode specific instructions, e.g using asm,
+     must be explicitly protected with noinline.  */
   return true;
 }
 
@@ -29821,30 +29846,38 @@ 
     }
 
   char *argstr = ASTRDUP (TREE_STRING_POINTER (args));
-  while (argstr && *argstr != '\0')
+  char *q;
+
+  while ((q = strtok (argstr, ",")) != NULL)
     {
-      while (ISSPACE (*argstr))
-	argstr++;
+      while (ISSPACE (*q)) ++q;
 
-      if (!strcmp (argstr, "thumb"))
-	{
+      argstr = NULL;
+      if (!strncmp (q, "thumb", 5))
 	  opts->x_target_flags |= MASK_THUMB;
-	  arm_option_check_internal (opts);
-	  return true;
-	}
 
-      if (!strcmp (argstr, "arm"))
-	{
+      else if (!strncmp (q, "arm", 3))
 	  opts->x_target_flags &= ~MASK_THUMB;
-	  arm_option_check_internal (opts);
-	  return true;
+
+      else if (!strncmp (q, "fpu=", 4))
+	{
+	  if (! opt_enum_arg_to_value (OPT_mfpu_, q+4,
+				       &opts->x_arm_fpu_index, CL_TARGET))
+	    {
+	      error ("invalid fpu for attribute(target(\"%s\"))", q);
+	      return false;
+	    }
+	}
+      else
+	{
+	  error ("attribute(target(\"%s\")) is unknown", q);
+	  return false;
 	}
 
-      warning (0, "attribute(target(\"%s\")) is unknown", argstr);
-      return false;
+      arm_option_check_internal (opts);
     }
 
-  return false;
+  return true;
 }
 
 /* Return a TARGET_OPTION_NODE tree of the target options listed or NULL.  */
diff -ruN gnu_trunk.p3/gcc/gcc/config/arm/arm-c.c gnu_trunk.p4/gcc/gcc/config/arm/arm-c.c
--- gnu_trunk.p3/gcc/gcc/config/arm/arm-c.c	2015-09-21 14:07:12.186506227 +0200
+++ gnu_trunk.p4/gcc/gcc/config/arm/arm-c.c	2015-09-21 13:46:02.655664904 +0200
@@ -68,8 +68,8 @@ 
   def_or_undef_macro (pfile, "__ARM_FEATURE_DSP", TARGET_DSP_MULTIPLY);
   def_or_undef_macro (pfile, "__ARM_FEATURE_QBIT", TARGET_ARM_QBIT); 
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
-  if (TARGET_CRYPTO)
-    builtin_define ("__ARM_FEATURE_CRYPTO");
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
+
   if (unaligned_access)
     builtin_define ("__ARM_FEATURE_UNALIGNED");
   if (TARGET_CRC32)
@@ -129,8 +129,7 @@ 
   if (TARGET_SOFT_FLOAT)
     builtin_define ("__SOFTFP__");
 
-  if (TARGET_VFP)
-    builtin_define ("__VFP_FP__");
+  def_or_undef_macro (pfile, "__VFP_FP__", TARGET_VFP);
 	
   if (TARGET_ARM_FP)
     builtin_define_with_int_value ("__ARM_FP", TARGET_ARM_FP);
@@ -141,11 +140,9 @@ 
   if (TARGET_FMA)
     builtin_define ("__ARM_FEATURE_FMA");
 
-  if (TARGET_NEON)
-    {
-      builtin_define ("__ARM_NEON__");
-      builtin_define ("__ARM_NEON");
-    }
+  def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
+  def_or_undef_macro (pfile, "__ARM_NEON", TARGET_NEON);
+
   if (TARGET_NEON_FP)
     builtin_define_with_int_value ("__ARM_NEON_FP", TARGET_NEON_FP);
   
@@ -232,7 +228,7 @@ 
   gcc_assert (prev_opt);
   gcc_assert (cur_opt);
 
-  if (cur_opt->x_target_flags != prev_opt->x_target_flags)
+  if (cur_opt != prev_opt)
     {
       /* For the definitions, ensure all newly defined macros are considered
 	 as used for -Wunused-macros.  There is no point warning about the
@@ -243,6 +239,8 @@ 
 
       /* Update macros.  */
       gcc_assert (cur_opt->x_target_flags == target_flags);
+      /* This one can be redefined by the pragma without warning.  */
+      cpp_undef (parse_in, "__ARM_FP");
       arm_cpu_builtins (parse_in);
 
       cpp_opts->warn_unused_macros = saved_warn_unused_macros;
diff -ruN gnu_trunk.p3/gcc/gcc/doc/extend.texi gnu_trunk.p4/gcc/gcc/doc/extend.texi
--- gnu_trunk.p3/gcc/gcc/doc/extend.texi	2015-09-07 13:35:20.777683005 +0200
+++ gnu_trunk.p4/gcc/gcc/doc/extend.texi	2015-09-14 13:58:49.271385001 +0200
@@ -3606,10 +3606,17 @@ 
 @item arm
 @cindex @code{target("arm")} function attribute, ARM
 Force code generation in the ARM (A32) ISA.
-@end table
 
 Functions from different modes can be inlined in the caller's mode.
 
+@item fpu=
+@cindex @code{target("fpu=")} function attribute, ARM
+Specifies the fpu for which to tune the performance of this function.
+The behavior and permissible arguments are the same as for the @option{-mfpu=}
+command-line option.
+
+@end table
+
 @end table
 
 @node AVR Function Attributes
diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
--- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-21 13:35:49.274292268 +0200
+++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-21 13:36:18.798358427 +0200
@@ -13386,6 +13386,8 @@ 
 floating-point arithmetic (in particular denormal values are treated as
 zero), so the use of NEON instructions may lead to a loss of precision.
 
+You can also set the fpu name at function level by using the @code{target("fpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
+
 @item -mfp16-format=@var{name}
 @opindex mfp16-format
 Specify the format of the @code{__fp16} half-precision floating-point type.
diff -ruN gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-crypto.c gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-crypto.c
--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-crypto.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-crypto.c	2015-09-14 15:58:24.967898634 +0200
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_crypto_ok } */
+
+#pragma GCC target ("fpu=crypto-neon-fp-armv8")
+
+#ifndef __ARM_FEATURE_CRYPTO
+#error __ARM_FEATURE_CRYPTO not defined.
+#endif
+
+#ifndef __ARM_NEON
+#error __ARM_NEON not defined.
+#endif
+
+#if !defined(__ARM_FP) || (__ARM_FP != 14)
+#error __ARM_FP
+#endif
+
+#include "arm_neon.h"
+
+int
+foo (void)
+{
+  uint32x4_t a = {0xd, 0xe, 0xa, 0xd};
+  uint32x4_t b = {0, 1, 2, 3};
+
+  uint32x4_t res = vsha256su0q_u32 (a, b);
+  return res[0];
+}
+
+#pragma GCC reset_options
+
+/* Check that the FP version is correctly reset.  */
+
+#if !defined(__ARM_FP) || (__ARM_FP != 12)
+#error __ARM_FP
+#endif
+
+/* { dg-final { scan-assembler "sha256su0.32\tq\[0-9\]+, q\[0-9\]+" } } */
diff -ruN gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon2.c gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon2.c
--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon2.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon2.c	2015-09-14 15:58:24.967898634 +0200
@@ -0,0 +1,28 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp -mfpu=vfp" } */
+
+#pragma GCC target ("fpu=neon")
+#include <arm_neon.h>
+
+/* Check that pragma target is used.  */
+int8x8_t 
+my (int8x8_t __a, int8x8_t __b)
+{
+  return __a + __b;
+}
+
+#pragma GCC reset_options
+
+/* Check that command line option is restored.  */
+int8x8_t 
+my1 (int8x8_t __a, int8x8_t __b)
+{
+  return __a + __b;
+}
+
+/* { dg-final { scan-assembler-times "\.fpu vfp" 1 } } */
+/* { dg-final { scan-assembler-times "\.fpu neon" 1 } } */
+/* { dg-final { scan-assembler "vadd" } } */
+
+
diff -ruN gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail.c gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail.c
--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail.c	2015-09-14 15:58:24.967898634 +0200
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp -mfpu=neon" } */
+
+#include <arm_neon.h>
+
+void __attribute__ ((target ("fpu=vfp")))
+foo (uint8x16_t *p)  
+{
+  *p = vmovq_n_u8 (3); /* { dg-error "called from here" } */
+
+}
+
+
+/* { dg-error "inlining failed in call to always_inline" "" { target *-*-* } 0 } */
+
+
+
diff -ruN gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c
--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-21 13:43:45.983359388 +0200
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp -ftree-vectorize" } */
+
+/* Verify that neon instructions are emitted once.  */
+void __attribute__ ((target("fpu=neon")))
+ f1(int n, int x[], int y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+    y[i] = x[i] << 3;
+}
+
+void __attribute__ ((target("fpu=vfp")))
+f3(int n, int x[], int y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+    y[i] = x[i] << 3;
+}
+
+/* { dg-final { scan-assembler-times "\.fpu vfp" 1 } } */
+/* { dg-final { scan-assembler-times "\.fpu neon" 1 } } */
+/* { dg-final { scan-assembler-times "vshl" 1 } } */
+
+
+
+
diff -ruN gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	2015-09-14 15:58:13.899874587 +0200
@@ -0,0 +1,14 @@ 
+/* { dg-lto-do run } */
+/* { dg-lto-options {{-flto -mfpu=neon}} } */
+/* { dg-suppress-ld-options {-mfpu=neon} } */
+
+#include "arm_neon.h"
+
+float32x2_t a, b, c, e;
+
+int main()
+{
+  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
+  return 0;
+}
+