Patchwork [ARM] 64-bit shifts in NEON.

login
register
mail settings
Submitter Andrew Stubbs
Date May 24, 2012, 4:51 p.m.
Message ID <4FBE6701.7010506@codesourcery.com>
Download mbox | patch
Permalink /patch/161164/
State New
Headers show

Comments

Andrew Stubbs - May 24, 2012, 4:51 p.m.
On 23/02/12 20:36, Andrew Stubbs wrote:
> On 21/02/12 15:23, Andrew Stubbs wrote:
>> On 06/02/12 13:13, Andrew Stubbs wrote:
>>> This patch adds DImode shift support in NEON registers/instructions.
>>>
>>> The patch causes delays any lowering until the split2 pass, after the
>>> register allocator has chosen whether to do the shift in NEON (VFP)
>>> registers, or in core-registers.
>>>
>>> The core-registers case depends on the patch I previously posted here:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01472.html
>>>
>>> The NEON right-shifts make life more interesting by using a left-shift
>>> instruction with a negative offset. This means that the amount has to be
>>> negated. Ideally you'd want to do this at expand time, but the delayed
>>> NEON/core decision makes this impossible, so I've chosen to expand this
>>> in the post-reload split pass. Unfortunately, NEON does not provide a
>>> suitable instruction for negating the shift amount, so that ends up
>>> happening in core-registers.
>>>
>>> Another complication is that the NEON shift instructions use a 64-bit
>>> register for the shift amount, but they only pay attention to the bottom
>>> 8 bits. I did experiment with using a DImode shift amount, but that
>>> didn't work out well; there were unnecessary extends and the
>>> core-registers fall back was less efficient.
>>>
>>> Therefore, I've chosen to create a new register class, VFP_LO_REGS_EVEN,
>>> which includes only the 32-bit low-part of the DImode NEON registers so
>>> the shift amount can be loaded into VFP regs without extending them.
>>> This required a new print format 'E' that converts the low-part name to
>>> the full register name the instructions need. Unfortunately, this does
>>> artificially limit the shift amount to the bottom half of the register
>>> set, but hopefully that's not going to be a big problem.
>>>
>>> The register allocator is causing me trouble though. The problem is that
>>> the compiler just refused to use the NEON variant in all of my toy
>>> examples. It turns out to be simply that the IRA & reload passes do not
>>> change hard-registers already present in the RTL (function parameters,
>>> return values, etc.) unless there is absolutely no alternative that
>>> works with that register. I'm not sure if there's anything that can be
>>> done about this, or not. I'm not even sure if it isn't the right choice
>>> much of the time, cost wise.
>>
>> I've now updated the patch to take into account size optimization.
>>
>> Currently, if optimizing for size the compiler prefers to call the
>> libgcc function, rather that do the shift inline.
>>
>> With my old patch, when NEON is enabled it always used the inline code
>> (either in NEON or core-registers) no matter which optimization flags
>> were set. This is more-or-less correct if the register allocator chooses
>> to do the operation in NEON, but much less space efficient otherwise.
>>
>> The update simply disables the core-registers fall-back option when
>> optimizing for size. Transferring the values to NEON registers and back
>> should be roughly the same size as calling a function, so there
>> shouldn't be a big loss.
>>
>> I'm in two minds about the shift-by-constant cases though, since they
>> expand to fewer instructions. Any thoughts?
>
> And yet another update.
>
> This time I noticed that I didn't discard the "clobber"s after the split
> has determined they're not necessary any more. Presumably the
> unallocated "match_scratch"es were harmless, but the unnecessary CC
> clobbers could affect if-conversion and scheduling.
>
> This patch is the same as the previous, except that I've broken out the
> alternatives that don't need any clobbers.
>
> Ok for 4.8?

Ping!

The pre-requisite patch is now committed so this one is ready for review.

Here's a rebased version of the same patch. The only real difference is 
that one of my constraint names is no longer available, so now their all 
renamed.

OK?

Andrew

Patch

2012-05-18  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.c (arm_print_operand): Add new 'E' format code.
	* config/arm/arm.h (enum reg_class): Add VFP_LO_REGS_EVEN.
	(REG_CLASS_NAMES, REG_CLASS_CONTENTS, IS_VFP_CLASS): Likewise.
	* config/arm/arm.md (opt, opt_enabled): New attributes.
	(enabled): Use opt_enabled.
	(ashldi3, ashrdi3, lshrdi3): Add TARGET_NEON case.
	* config/arm/constraints.md (T): New register constraint.
	(Pf, PF, P1, Pg, Ph): New constraints.
	* config/arm/neon.md (signed_shift_di3_neon, unsigned_shift_di3_neon,
	ashldi3_neon, ashldi3_neon_noclobber, ashrdi3_neon_imm,
	ashrdi3_neon_reg, ashrdi3_neon, ashrdi3_neon_imm_noclobber,
	lshrdi3_neon_imm, ashrdi3_neon, lshrdi3_neon_imm_noclobber,
	lshrdi3_neon_imm, lshrdi3_neon_reg, lshrdi3_neon): New patterns.
	* config/arm/predicates.md (int_1_to_64): New predicate.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3ad4c75..f581d15 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17973,6 +17973,24 @@  arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
+    /* Print the VFP/Neon double precision register name that overlaps the
+       given single-precision register.  */
+    case 'E':
+      {
+	int mode = GET_MODE (x);
+
+	if (GET_MODE_SIZE (mode) != 4
+	    || GET_CODE (x) != REG
+	    || !IS_VFP_REGNUM (REGNO (x)))
+	  {
+	    output_operand_lossage ("invalid operand for code '%c'", code);
+	    return;
+	  }
+
+	fprintf (stream, "d%d", (REGNO (x) - FIRST_VFP_REGNUM) >> 1);
+      }
+      return;
+
     /* These two codes print the low/high doubleword register of a Neon quad
        register, respectively.  For pair-structure types, can also print
        low/high quadword registers.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f4204e4..ebd77c6 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1043,6 +1043,7 @@  enum reg_class
   CIRRUS_REGS,
   VFP_D0_D7_REGS,
   VFP_LO_REGS,
+  VFP_LO_REGS_EVEN,
   VFP_HI_REGS,
   VFP_REGS,
   IWMMXT_GR_REGS,
@@ -1069,6 +1070,7 @@  enum reg_class
   "CIRRUS_REGS",	\
   "VFP_D0_D7_REGS",	\
   "VFP_LO_REGS",	\
+  "VFP_LO_REGS_EVEN",	\
   "VFP_HI_REGS",	\
   "VFP_REGS",		\
   "IWMMXT_GR_REGS",	\
@@ -1094,6 +1096,7 @@  enum reg_class
   { 0xF8000000, 0x000007FF, 0x00000000, 0x00000000 }, /* CIRRUS_REGS */	\
   { 0x00000000, 0x80000000, 0x00007FFF, 0x00000000 }, /* VFP_D0_D7_REGS  */ \
   { 0x00000000, 0x80000000, 0x7FFFFFFF, 0x00000000 }, /* VFP_LO_REGS  */ \
+  { 0x00000000, 0x80000000, 0x2AAAAAAA, 0x00000000 }, /* VFP_LO_REGS_EVEN  */ \
   { 0x00000000, 0x00000000, 0x80000000, 0x7FFFFFFF }, /* VFP_HI_REGS  */ \
   { 0x00000000, 0x80000000, 0xFFFFFFFF, 0x7FFFFFFF }, /* VFP_REGS  */	\
   { 0x00000000, 0x00007800, 0x00000000, 0x00000000 }, /* IWMMXT_GR_REGS */ \
@@ -1111,7 +1114,7 @@  enum reg_class
 
 /* Any of the VFP register classes.  */
 #define IS_VFP_CLASS(X) \
-  ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS \
+  ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS || (X) == VFP_LO_REGS_EVEN \
    || (X) == VFP_HI_REGS || (X) == VFP_REGS)
 
 /* The same information, inverted:
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index bc97a4a..d68dc0c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -251,6 +251,22 @@ 
 	 (const_string "yes")]
 	(const_string "no")))
 
+(define_attr "opt" "any,speed,size"
+  (const_string "any"))
+
+(define_attr "opt_enabled" "no,yes"
+  (cond [(eq_attr "opt" "any")
+         (const_string "yes")
+
+	 (and (eq_attr "opt" "speed")
+	      (match_test "optimize_function_for_speed_p (cfun)"))
+	 (const_string "yes")
+
+	 (and (eq_attr "opt" "size")
+	      (match_test "optimize_function_for_size_p (cfun)"))
+	 (const_string "yes")]
+	(const_string "no")))
+
 ; Allows an insn to disable certain alternatives for reasons other than
 ; arch support.
 (define_attr "insn_enabled" "no,yes"
@@ -258,11 +274,15 @@ 
 
 ; Enable all alternatives that are both arch_enabled and insn_enabled.
  (define_attr "enabled" "no,yes"
-   (if_then_else (eq_attr "insn_enabled" "yes")
-               (if_then_else (eq_attr "arch_enabled" "yes")
-                             (const_string "yes")
-                             (const_string "no"))
-                (const_string "no")))
+   (cond [(eq_attr "insn_enabled" "no")
+	  (const_string "no")
+
+	  (eq_attr "arch_enabled" "no")
+	  (const_string "no")
+
+	  (eq_attr "opt_enabled" "no")
+	  (const_string "no")]
+	 (const_string "yes")))
 
 ; POOL_RANGE is how far away from a constant pool entry that this insn
 ; can be placed.  If the distance is zero, then this insn will never
@@ -3520,8 +3540,15 @@ 
                    (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (!CONST_INT_P (operands[2])
-      && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
+  if (TARGET_NEON)
+    {
+      /* Delay the decision whether to use NEON or core-regs until
+	 register allocation.  */
+      emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
+      DONE;
+    }
+  else if (!CONST_INT_P (operands[2])
+	   && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
     ; /* No special preparation statements; expand pattern as above.  */
   else
     {
@@ -3595,8 +3622,15 @@ 
                      (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (!CONST_INT_P (operands[2])
-      && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
+  if (TARGET_NEON)
+    {
+      /* Delay the decision whether to use NEON or core-regs until
+	 register allocation.  */
+      emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
+      DONE;
+    }
+  else if (!CONST_INT_P (operands[2])
+	   && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
     ; /* No special preparation statements; expand pattern as above.  */
   else
     {
@@ -3668,8 +3702,15 @@ 
                      (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (!CONST_INT_P (operands[2])
-      && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
+  if (TARGET_NEON)
+    {
+      /* Delay the decision whether to use NEON or core-regs until
+	 register allocation.  */
+      emit_insn (gen_lshrdi3_neon (operands[0], operands[1], operands[2]));
+      DONE;
+    }
+  else if (!CONST_INT_P (operands[2])
+	   && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK)))
     ; /* No special preparation statements; expand pattern as above.  */
   else
     {
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 0b80e1f..95507fc 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -19,7 +19,7 @@ 
 ;; <http://www.gnu.org/licenses/>.
 
 ;; The following register constraints have been used:
-;; - in ARM/Thumb-2 state: f, t, v, w, x, y, z
+;; - in ARM/Thumb-2 state: f, t, T, v, w, x, y, z
 ;; - in Thumb state: h, b
 ;; - in both states: l, c, k
 ;; In ARM state, 'l' is an alias for 'r'
@@ -29,7 +29,8 @@ 
 ;; in Thumb-1 state: I, J, K, L, M, N, O
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz, Pf, PF,
+;;			 Pg, Ph, P1
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -45,6 +46,9 @@ 
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
 
+(define_register_constraint "T" "TARGET_32BIT ? VFP_LO_REGS_EVEN : NO_REGS"
+ "The even numbered VFP registers @code{s0}-@code{s31}.")
+
 (define_register_constraint "v" "TARGET_ARM ? CIRRUS_REGS : NO_REGS"
  "The Cirrus Maverick co-processor registers.")
 
@@ -177,6 +181,32 @@ 
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
 
+(define_constraint "Pf"
+  "@internal In ARM/Thumb-2 state, a constant in the range 0 to 63"
+  (and (match_code "const_int")
+       (match_test "TARGET_32BIT && ival >= 0 && ival < 64")))
+
+(define_constraint "PF"
+  "@internal In ARM/Thumb-2 state, a constant in the range 1 to 64"
+  (and (match_code "const_int")
+       (match_test "TARGET_32BIT && ival > 0 && ival <= 64")))
+
+(define_constraint "P1"
+  "@internal In ARM/Thumb2 state, a constant of 1"
+  (and (match_code "const_int")
+       (match_test "TARGET_32BIT && ival == 1")))
+
+(define_constraint "Pg"
+  "@internal In ARM state, a constant in the range 0 to 63, and in thumb-2 state, 32 to 63"
+  (and (match_code "const_int")
+       (match_test "(TARGET_ARM && ival >= 0 && ival < 64)
+		    || (TARGET_THUMB2 && ival >= 32 && ival < 64)")))
+
+(define_constraint "Ph"
+  "@internal In Thumb-2 state, a constant in the range 0 to 31"
+  (and (match_code "const_int")
+       (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 31")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 4568dea..a671c1b 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1133,6 +1133,266 @@ 
   DONE;
 })
 
+;; 64-bit shifts
+
+; The shift amount needs to be negated for right-shifts
+(define_insn "signed_shift_di3_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	     "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" " w")
+		    (match_operand:SI 2 "s_register_operand" " T")]
+		   UNSPEC_ASHIFT_SIGNED))]
+  "TARGET_NEON && reload_completed"
+  "vshl.s64\t%P0, %P1, %E2    @ ashr %P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd")]
+)
+
+; The shift amount needs to be negated for right-shifts
+(define_insn "unsigned_shift_di3_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	     "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" " w")
+		    (match_operand:SI 2 "s_register_operand" " T")]
+		   UNSPEC_ASHIFT_UNSIGNED))]
+  "TARGET_NEON && reload_completed"
+  "vshl.u64\t%P0, %P1, %E2    @ lshr %P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd")]
+)
+
+(define_insn "ashldi3_neon_noclobber"
+  [(set (match_operand:DI 0 "s_register_operand"	    "=w,w")
+	(ashift:DI (match_operand:DI 1 "s_register_operand" " w,w")
+		   (match_operand:SI 2 "reg_or_int_operand" "Pf,T")))]
+  "TARGET_NEON && reload_completed"
+  "@
+   vshl.u64\t%P0, %P1, %2
+   vshl.u64\t%P0, %P1, %E2    @ ashl %P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd,neon_vshl_ddd")]
+)
+
+(define_insn_and_split "ashldi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	    "=w,w,?&r,?&r,?r,?r,?r,?w,?w")
+	(ashift:DI (match_operand:DI 1 "s_register_operand" " w,w,  0,  r, r, r, r, w, w")
+		   (match_operand:SI 2 "reg_or_int_operand" " T,i,  r,  r,P1,Pf,Pg, T, i")))
+   (clobber (match_scratch:SI 3				    "=X,X, &r, &r, X, X,&r, X, X"))
+   (clobber (match_scratch:SI 4				    "=X,X, &r, &r, X, X, X, X, X"))
+   (clobber (reg:CC_C CC_REGNUM))]
+  "TARGET_NEON"
+  "#"
+  "TARGET_NEON && reload_completed"
+  [(const_int 0)]
+  "
+  {
+    if (IS_VFP_REGNUM (REGNO (operands[0])))
+      {
+        if (CONST_INT_P (operands[2]))
+	  {
+	    if (INTVAL (operands[2]) < 1)
+	      {
+	        emit_insn (gen_movdi (operands[0], operands[1]));
+		DONE;
+	      }
+	    else if (INTVAL (operands[2]) > 63)
+	      operands[2] = gen_rtx_CONST_INT (VOIDmode, 63);
+	  }
+
+	/* Ditch the unnecessary clobbers.  */
+	emit_insn (gen_ashldi3_neon_noclobber (operands[0], operands[1],
+					       operands[2]));
+      }
+    else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1)
+      /* This clobbers CC.  */
+      emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
+    else
+      arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+				     operands[2], operands[3], operands[4]);
+    DONE;
+  }"
+  [(set_attr "length" "*,*,24,24,8,12,12,*,*")
+   (set_attr "arch" "nota8,nota8,*,*,*,*,*,onlya8,onlya8")
+   (set_attr "opt" "*,*,speed,speed,speed,speed,speed,*,*")]
+)
+
+(define_insn "ashrdi3_neon_imm_noclobber"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w")
+	(ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w")
+		     (match_operand:SI 2 "int_1_to_64"	      "PF")))]
+  "TARGET_NEON && reload_completed"
+  "vshr.s64\t%P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd")]
+)
+
+(define_insn_and_split "ashrdi3_neon_imm"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w,?r,?r,?r,?w")
+	(ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w")
+		     (match_operand:SI 2 "int_1_to_64"	      "PF,P1,Pg,Ph,PF")))
+   (clobber (match_scratch:SI 3				      "=X, X, X,&r, X"))
+   (clobber (reg:CC_C CC_REGNUM))]
+  "TARGET_NEON"
+  "#"
+  "TARGET_NEON && reload_completed"
+  [(const_int 0)]
+  "
+  {
+    if (IS_VFP_REGNUM (REGNO (operands[0])))
+      /* Ditch the unnecessary clobbers.  */
+      emit_insn (gen_ashrdi3_neon_imm_noclobber (operands[0], operands[1],
+						 operands[2]));
+    else if (INTVAL (operands[2]) == 1)
+      /* This clobbers CC.  */
+      emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
+    else
+      arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1],
+				     operands[2], operands[3], NULL);
+    DONE;
+  }"
+  [(set_attr "length" "*,8,12,12,*")
+   (set_attr "arch" "nota8,*,*,*,onlya8")
+   (set_attr "opt" "*,speed,speed,speed,*")]
+)
+
+(define_insn_and_split "ashrdi3_neon_reg"
+  [(set (match_operand:DI 0 "s_register_operand"	      "= w, w,?&r,?&r,?w,?w")
+	(ashiftrt:DI (match_operand:DI 1 "s_register_operand" "  w, w,  0,  r, w, w")
+		     (match_operand:SI 2 "s_register_operand" "  r, r,  r,  r, r, r")))
+   (clobber (match_scratch:SI 3				      "= 2,&r, &r, &r, 2,&r"))
+   (clobber (match_scratch:SI 4				      "=&T,&T, &r, &r,&T,&T"))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "#"
+  "TARGET_NEON && reload_completed"
+  [(const_int 0)]
+  "
+  {
+    if (IS_VFP_REGNUM (REGNO (operands[0])))
+      {
+	emit_insn (gen_negsi2 (operands[3], operands[2]));
+	emit_insn (gen_rtx_SET (SImode, operands[4], operands[3]));
+        emit_insn (gen_signed_shift_di3_neon (operands[0], operands[1],
+					      operands[4]));
+      }
+    else
+      /* This clobbers CC (ASHIFTRT only).  */
+      arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1],
+				     operands[2], operands[3], operands[4]);
+    DONE;
+  }"
+  [(set_attr "length" "12,12,24,24,12,12")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+   (set_attr "opt" "*,*,speed,speed,*,*")]
+)
+
+(define_expand "ashrdi3_neon"
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:DI 1 "s_register_operand" "")
+   (match_operand:SI 2 "reg_or_int_operand" "")]
+  "TARGET_NEON"
+{
+  if (CONST_INT_P (operands[2]))
+    {
+      if (INTVAL (operands[2]) < 1)
+        {
+	  emit_insn (gen_movdi (operands[0], operands[1]));
+	  DONE;
+	}
+      else if (INTVAL (operands[2]) > 64)
+	operands[2] = gen_rtx_CONST_INT (VOIDmode, 64);
+
+      emit_insn (gen_ashrdi3_neon_imm (operands[0], operands[1], operands[2]));
+    }
+  else
+    emit_insn (gen_ashrdi3_neon_reg (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "lshrdi3_neon_imm_noclobber"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w")
+	(lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w")
+		     (match_operand:SI 2 "int_1_to_64"	      "PF")))]
+  "TARGET_NEON && reload_completed"
+  "vshr.u64\t%P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd")]
+)
+
+(define_insn_and_split "lshrdi3_neon_imm"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w,?r,?r,?r,?w")
+	(lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w")
+		     (match_operand:SI 2 "int_1_to_64"	      "PF,P1,Pg,Ph,PF")))
+   (clobber (match_scratch:SI 3				      "=X, X, X,&r, X"))
+   (clobber (reg:CC_C CC_REGNUM))]
+  "TARGET_NEON"
+  "#"
+  "TARGET_NEON && reload_completed"
+  [(const_int 0)]
+  "
+  {
+    if (IS_VFP_REGNUM (REGNO (operands[0])))
+      /* Ditch the unnecessary clobbers.  */
+      emit_insn (gen_lshrdi3_neon_imm_noclobber (operands[0], operands[1],
+						 operands[2]));
+    else if (INTVAL (operands[2]) == 1)
+      /* This clobbers CC.  */
+      emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
+    else
+      arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
+				     operands[2], operands[3], NULL);
+    DONE;
+  }"
+  [(set_attr "length" "4,8,12,12,4")
+   (set_attr "arch" "nota8,*,*,*,onlya8")
+   (set_attr "opt" "*,speed,speed,speed,*")]
+)
+
+(define_insn_and_split "lshrdi3_neon_reg"
+  [(set (match_operand:DI 0 "s_register_operand"	      "= w, w,?&r,?&r,?w,?w")
+	(lshiftrt:DI (match_operand:DI 1 "s_register_operand" "  w, w,  0,  r, w, w")
+		     (match_operand:SI 2 "s_register_operand" "  r, r,  r,  r, r, r")))
+   (clobber (match_scratch:SI 3				      "= 2,&r, &r, &r, 2,&r"))
+   (clobber (match_scratch:SI 4				      "=&T,&T, &r, &r,&T,&T"))]
+  "TARGET_NEON"
+  "#"
+  "TARGET_NEON && reload_completed"
+  [(const_int 0)]
+  "
+  {
+    if (IS_VFP_REGNUM (REGNO (operands[0])))
+      {
+        emit_insn (gen_negsi2 (operands[3], operands[2]));
+	emit_insn (gen_rtx_SET (SImode, operands[4], operands[3]));
+	emit_insn (gen_unsigned_shift_di3_neon (operands[0], operands[1],
+						operands[4]));
+      }
+    else
+      arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
+				     operands[2], operands[3], operands[4]);
+    DONE;
+  }"
+  [(set_attr "length" "12,12,24,24,12,12")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+   (set_attr "opt" "*,*,speed,speed,*,*")]
+)
+
+(define_expand "lshrdi3_neon"
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:DI 1 "s_register_operand" "")
+   (match_operand:SI 2 "reg_or_int_operand" "")]
+  "TARGET_NEON"
+{
+  if (CONST_INT_P (operands[2]))
+    {
+      if (INTVAL (operands[2]) < 1)
+        {
+	  emit_insn (gen_movdi (operands[0], operands[1]));
+	  DONE;
+	}
+      else if (INTVAL (operands[2]) > 64)
+	operands[2] = gen_rtx_CONST_INT (VOIDmode, 64);
+
+      emit_insn (gen_lshrdi3_neon_imm (operands[0], operands[1], operands[2]));
+    }
+  else
+    emit_insn (gen_lshrdi3_neon_reg (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ;; Widening operations
 
 (define_insn "widen_ssum<mode>3"
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index fa2027c..62af222 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -644,6 +644,11 @@ 
 (define_special_predicate "add_operator"
   (match_code "plus"))
 
+
 (define_predicate "mem_noofs_operand"
   (and (match_code "mem")
        (match_code "reg" "0")))
+
+(define_predicate "int_1_to_64"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 1, 64)")))