diff mbox

[ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem

Message ID CAKdteOZBDfaSt341ssmPniT5_6-3mHcKJRiRC_BM0+FLCi_4aQ@mail.gmail.com
State New
Headers show

Commit Message

Christophe Lyon Nov. 30, 2012, 4:34 p.m. UTC
On 29 November 2012 21:59, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Thu, 29 Nov 2012, Christophe Lyon wrote:
>
>> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>     gcc/
>>     * config/arm/arm-protos.h (tune_params): Add
>>     prefer_neon_for_64bits field.
>>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>     (arm_option_override): Handle -mneon-for-64bits new option.
>>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>     (prefer_neon_for_64bits): Declare new variable.
>>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>     avoid_neon_for_64bits and neon_for_64bits.
>>     (arch_enabled): Handle new arch types.
>>     (one_cmpldi2): Use new arch names.
>>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>     of onlya8.
>
> This ChangeLog entry doesn't appear to mention the arm.opt change.
> Furthermore, the patch seems to be missing any .texi change to document
> the option; any new option needs documentation.  You are also missing
> testcases for the testsuite to verify that both enabled and disabled
> states of the option work properly.
>
Indeed, I forgot about the documentation; here is an updated patch.

Regarding the testcases, as this patch disables transformations
recently introduced, I would have appreciated if testcases had been
associated with them in the 1st place.... This requirement should be
enforced :-)

Tested with qemu on target arm-none-linux-gnueabi.


2012-11-30  Christophe Lyon  <christophe.lyon@linaro.org>

    gcc/
    * config/arm/arm-protos.h (tune_params): Add
    prefer_neon_for_64bits field.
    * config/arm/arm.c (prefer_neon_for_64bits): New variable.
    (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
    (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
    (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
    (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
    (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
    (arm_option_override): Handle -mneon-for-64bits new option.
    * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
    (prefer_neon_for_64bits): Declare new variable.
    * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
    avoid_neon_for_64bits and neon_for_64bits.
    (arch_enabled): Handle new arch types.
    (one_cmpldi2): Use new arch names.
    * config/arm/arm.opt (mneon-for-64bits): Add option.
    * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
    (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
    neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
    of onlya8.
    * doc/invoke.texi (-mneon-for-64bits): Document.

    gcc/testsuite/
    * gcc.target/arm/neon-for-64bits-1.c: New tests.
    * gcc.target/arm/neon-for-64bits-2.c: Likewise.

Comments

Christophe Lyon Dec. 7, 2012, 8:34 a.m. UTC | #1
Ping?
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02558.html

Thanks,

Christophe.

On 30 November 2012 17:34, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 29 November 2012 21:59, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> On Thu, 29 Nov 2012, Christophe Lyon wrote:
>>
>>> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
>>>
>>>     gcc/
>>>     * config/arm/arm-protos.h (tune_params): Add
>>>     prefer_neon_for_64bits field.
>>>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>>     (arm_option_override): Handle -mneon-for-64bits new option.
>>>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>>     (prefer_neon_for_64bits): Declare new variable.
>>>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>>     avoid_neon_for_64bits and neon_for_64bits.
>>>     (arch_enabled): Handle new arch types.
>>>     (one_cmpldi2): Use new arch names.
>>>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>>     of onlya8.
>>
>> This ChangeLog entry doesn't appear to mention the arm.opt change.
>> Furthermore, the patch seems to be missing any .texi change to document
>> the option; any new option needs documentation.  You are also missing
>> testcases for the testsuite to verify that both enabled and disabled
>> states of the option work properly.
>>
> Indeed, I forgot about the documentation; here is an updated patch.
>
> Regarding the testcases, as this patch disables transformations
> recently introduced, I would have appreciated if testcases had been
> associated with them in the 1st place.... This requirement should be
> enforced :-)
>
> Tested with qemu on target arm-none-linux-gnueabi.
>
>
> 2012-11-30  Christophe Lyon  <christophe.lyon@linaro.org>
>
>     gcc/
>     * config/arm/arm-protos.h (tune_params): Add
>     prefer_neon_for_64bits field.
>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>     (arm_option_override): Handle -mneon-for-64bits new option.
>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>     (prefer_neon_for_64bits): Declare new variable.
>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>     avoid_neon_for_64bits and neon_for_64bits.
>     (arch_enabled): Handle new arch types.
>     (one_cmpldi2): Use new arch names.
>     * config/arm/arm.opt (mneon-for-64bits): Add option.
>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>     of onlya8.
>     * doc/invoke.texi (-mneon-for-64bits): Document.
>
>     gcc/testsuite/
>     * gcc.target/arm/neon-for-64bits-1.c: New tests.
>     * gcc.target/arm/neon-for-64bits-2.c: Likewise.
Christophe Lyon Dec. 14, 2012, 4:58 p.m. UTC | #2
Ping^2?

On 7 December 2012 09:34, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02558.html
>
> Thanks,
>
> Christophe.
diff mbox

Patch

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..c92f055 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -247,6 +247,8 @@  struct tune_params
      performance. The first element covers Thumb state and the second one
      is for ARM state.  */
   bool logical_op_non_short_circuit[2];
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 286a6c5..9efd215 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -816,6 +816,10 @@  int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
    we must report the mode of the memory reference from
    TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -895,6 +899,7 @@  const struct tune_params arm_slowmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -908,6 +913,7 @@  const struct tune_params arm_fastmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -924,6 +930,7 @@  const struct tune_params arm_strongarm_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -937,6 +944,7 @@  const struct tune_params arm_xscale_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -950,6 +958,7 @@  const struct tune_params arm_9e_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -963,6 +972,7 @@  const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -977,6 +987,7 @@  const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -990,6 +1001,7 @@  const struct tune_params arm_cortex_a15_tune =
   arm_default_branch_cost,
   true,						/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1006,6 +1018,7 @@  const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1019,6 +1032,7 @@  const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -1034,6 +1048,7 @@  const struct tune_params arm_v6m_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -1047,6 +1062,7 @@  const struct tune_params arm_fa726te_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 
@@ -2077,6 +2093,12 @@  arm_option_override (void)
                            global_options.x_param_values,
                            global_options_set.x_param_values);
 
+  /* Use Neon to perform 64-bits operations rather than core
+     registers.  */
+  prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits;
+  if (use_neon_for_64bits == 1)
+     prefer_neon_for_64bits = true;
+
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
                          global_options.x_param_values,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f520cc7..c71d85f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -356,6 +356,9 @@  extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
 
+/* Should NEON be used for 64-bits bitops.  */
+#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits)
+
 /* True iff the full BPABI is being used.  If TARGET_BPABI is true,
    then TARGET_AAPCS_BASED must be true -- but the converse does not
    hold.  TARGET_BPABI implies the use of the BPABI runtime library,
@@ -541,6 +544,10 @@  extern int arm_arch_arm_hwdiv;
 /* Nonzero if chip supports integer division instruction in Thumb mode.  */
 extern int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ac507ef..afde613 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -202,7 +202,7 @@ 
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,nota8,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -241,18 +241,18 @@ 
 	      (eq_attr "tune" "cortexa8"))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_onlya8")
-	      (eq_attr "tune" "cortexa8")
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "avoid_neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (not (match_test "TARGET_PREFER_NEON_64BITS")))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "nota8")
 	      (not (eq_attr "tune" "cortexa8")))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_nota8")
-	      (not (eq_attr "tune" "cortexa8"))
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (match_test "TARGET_PREFER_NEON_64BITS"))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "iwmmxt2")
@@ -4370,7 +4370,7 @@ 
   [(set_attr "length" "*,8,8,*")
    (set_attr "predicable" "no,yes,yes,no")
    (set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_expand "one_cmplsi2"
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index fb12c55..83b6002 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -251,3 +251,7 @@  that may trigger Cortex-M3 errata.
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
+
+mneon-for-64bits
+Target Report RejectNegative Var(use_neon_for_64bits) Init(0)
+Use Neon to perform 64-bits operations rather than core registers.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2103580..8b0e877 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -617,7 +617,7 @@ 
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
    (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
    (set_attr "length" "*,8,8,*,8,8,8")
-   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -654,7 +654,7 @@ 
   [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2")
    (set_attr "conds" "*,clob,clob,clob,*")
    (set_attr "length" "*,8,8,8,*")
-   (set_attr "arch" "nota8,*,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -816,7 +816,7 @@ 
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -861,7 +861,7 @@ 
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -957,7 +957,7 @@ 
    veor\t%P0, %P1, %P2"
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
    (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "one_cmpl<mode>2"
@@ -1279,7 +1279,7 @@ 
       }
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
@@ -1380,7 +1380,7 @@ 
 
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 51b6e85..3918b1d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -514,7 +514,8 @@  Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name} -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
--munaligned-access}
+-munaligned-access @gol
+-mneon-for-64bits}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
@@ -11521,6 +11522,11 @@  setting of this option.  If unaligned access is enabled then the
 preprocessor symbol @code{__ARM_FEATURE_UNALIGNED} will also be
 defined.
 
+@item -mneon-for-64bits
+@opindex mneon-for-64bits
+Enables using Neon to handle scalar 64-bits operations. This is
+disabled by default since the cost of moving data from core registers
+to Neon is high.
 @end table
 
 @node AVR Options
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
new file mode 100644
index 0000000..a2a4103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
@@ -0,0 +1,54 @@ 
+/* Check that Neon is *not* used by default to handle 64-bits scalar
+   operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 0} }  */
+/* { dg-final {scan-assembler-times "vadd" 0} }  */
+/* { dg-final {scan-assembler-times "vsub" 0} }  */
+/* { dg-final {scan-assembler-times "vand" 0} }  */
+/* { dg-final {scan-assembler-times "vorr" 0} }  */
+/* { dg-final {scan-assembler-times "veor" 0} }  */
+/* { dg-final {scan-assembler-times "vshr" 0} }  */
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
new file mode 100644
index 0000000..035bfb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
@@ -0,0 +1,57 @@ 
+/* Check that Neon is used to handle 64-bits scalar operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mneon-for-64bits" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 1} }  */
+/* Two vadd: 1 in unary_not, 1 in binary_add */
+/* { dg-final {scan-assembler-times "vadd" 2} }  */
+/* { dg-final {scan-assembler-times "vsub" 1} }  */
+/* { dg-final {scan-assembler-times "vand" 1} }  */
+/* { dg-final {scan-assembler-times "vorr" 1} }  */
+/* { dg-final {scan-assembler-times "veor" 1} }  */
+/* 6 vshr for right shifts by constant, and variable right shift uses
+   vshl with a negative amount in register.  */
+/* { dg-final {scan-assembler-times "vshr" 6} }  */
+/* { dg-final {scan-assembler-times "vshl" 2} }  */