Patchwork [v3,i386] BMI2 support for GCC, mulx, rorx, <shift>x part

login
register
mail settings
Submitter Uros Bizjak
Date Aug. 22, 2011, 8:45 a.m.
Message ID <CAFULd4bh_Yto+d52grDR38kEQmjeXpG2+OWTsQr=-4QSn7jzNA@mail.gmail.com>
Download mbox | patch
Permalink /patch/110872/
State New
Headers show

Comments

Uros Bizjak - Aug. 22, 2011, 8:45 a.m.
On Sun, Aug 21, 2011 at 1:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

> This is the third version of BMI2 support that includes generation of
> mulx, rorx, <shift>x part. This patch includes all comments on
> previous version, splits all insn post-reload, uses "enable" attribute
> and avoids new register modifiers. As a compromise (see previous
> posts), the mulx insn is now split post-reload into pattern that
> separates outputs (so, post-reload passes can do their job more
> effectively), with the hope that someday other DWI patterns will be
> rewritten in the same way.

A small update that removes the need for "w" mode attribute. We can
convert count register to the correct mode in a splitter.

Re-tested on x86_64-pc-linux-gnu {,-m32}.

Uros.
Kirill Yukhin - Aug. 23, 2011, 11:07 a.m.
Hi,
I've slightly updated mulx split to avoid ICE.
Updated patch, ChangeLog entry (with Uros's contribution) and
ChangeLog.testsuite entry are attached.

Bootstrapped and make-checked.

Tests all pass under simulator (expept one, but it is simulator issue).

Uros, you asked if BMI2 is inherited from BMI. The answer is no, these
2 extensions are not connected.

Is is OK?

--
Thanks, K

On Mon, Aug 22, 2011 at 12:45 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Aug 21, 2011 at 1:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>> This is the third version of BMI2 support that includes generation of
>> mulx, rorx, <shift>x part. This patch includes all comments on
>> previous version, splits all insn post-reload, uses "enable" attribute
>> and avoids new register modifiers. As a compromise (see previous
>> posts), the mulx insn is now split post-reload into pattern that
>> separates outputs (so, post-reload passes can do their job more
>> effectively), with the hope that someday other DWI patterns will be
>> rewritten in the same way.
>
> A small update that removes the need for "w" mode attribute. We can
> convert count register to the correct mode in a splitter.
>
> Re-tested on x86_64-pc-linux-gnu {,-m32}.
>
> Uros.
>
Uros Bizjak - Aug. 23, 2011, 11:25 a.m.
On Tue, Aug 23, 2011 at 1:07 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hi,
> I've slightly updated mulx split to avoid ICE.
> Updated patch, ChangeLog entry (with Uros's contribution) and
> ChangeLog.testsuite entry are attached.
>
> Bootstrapped and make-checked.
>
> Tests all pass under simulator (expept one, but it is simulator issue).
>
> Uros, you asked if BMI2 is inherited from BMI. The answer is no, these
> 2 extensions are not connected.
>
> Is is OK?

+{
+  operands[3] = gen_lowpart (<MODE>mode, operands[0]);
+  operands[4] = gen_highpart (<MODE>mode, operands[0]);
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
+})

Please change this part to:

{
  split_double_mode (<DWI>mode, &operands[0], 1, &operands[3], &operands[4]);

  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
})

Please also add -mbmi2 to gcc.target/i386/sse-{12,13,14,22,23}.c files.

Please also change some entries in the ChangeLog to:

	* config/i386/i386-c.c (ix86_target_macros_internal):
	Conditionally define __BMI2__.
	* config/i386/i386.c (ix86_option_override_internal): Define PTA_BMI2.
	Handle BMI2 option.
	(ix86_valid_target_attribute_inner_p): Handle BMI2 option.

OK with these changes.

Thanks,
Uros.
Kirill Yukhin - Aug. 23, 2011, 4:22 p.m.
Hi,
thanks. I've applied your inputs.

Updated patch, ChangeLog, testsuite/ChangeLog are attached.

Are they ok now?

--
Thanks, K

On Tue, Aug 23, 2011 at 3:25 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 1:07 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Hi,
>> I've slightly updated mulx split to avoid ICE.
>> Updated patch, ChangeLog entry (with Uros's contribution) and
>> ChangeLog.testsuite entry are attached.
>>
>> Bootstrapped and make-checked.
>>
>> Tests all pass under simulator (expept one, but it is simulator issue).
>>
>> Uros, you asked if BMI2 is inherited from BMI. The answer is no, these
>> 2 extensions are not connected.
>>
>> Is is OK?
>
> +{
> +  operands[3] = gen_lowpart (<MODE>mode, operands[0]);
> +  operands[4] = gen_highpart (<MODE>mode, operands[0]);
> +  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
> +})
>
> Please change this part to:
>
> {
>  split_double_mode (<DWI>mode, &operands[0], 1, &operands[3], &operands[4]);
>
>  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
> })
>
> Please also add -mbmi2 to gcc.target/i386/sse-{12,13,14,22,23}.c files.
>
> Please also change some entries in the ChangeLog to:
>
>        * config/i386/i386-c.c (ix86_target_macros_internal):
>        Conditionally define __BMI2__.
>        * config/i386/i386.c (ix86_option_override_internal): Define PTA_BMI2.
>        Handle BMI2 option.
>        (ix86_valid_target_attribute_inner_p): Handle BMI2 option.
>
> OK with these changes.
>
> Thanks,
> Uros.
>
Uros Bizjak - Aug. 23, 2011, 4:53 p.m.
On Tue, Aug 23, 2011 at 6:22 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:

> thanks. I've applied your inputs.
>
> Updated patch, ChangeLog, testsuite/ChangeLog are attached.
>
> Are they ok now?

OK for mainline.

Thanks,
Uros.
Kirill Yukhin - Aug. 23, 2011, 4:55 p.m.
Great! Thanks.

Could anybody please commit that?

K

On Tue, Aug 23, 2011 at 8:53 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 6:22 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>
>> thanks. I've applied your inputs.
>>
>> Updated patch, ChangeLog, testsuite/ChangeLog are attached.
>>
>> Are they ok now?
>
> OK for mainline.
>
> Thanks,
> Uros.
>
H.J. Lu - Aug. 23, 2011, 5:02 p.m.
On Tue, Aug 23, 2011 at 9:55 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Great! Thanks.
>
> Could anybody please commit that?

Done.

Thanks.

> K
>
> On Tue, Aug 23, 2011 at 8:53 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 6:22 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>>
>>> thanks. I've applied your inputs.
>>>
>>> Updated patch, ChangeLog, testsuite/ChangeLog are attached.
>>>
>>> Are they ok now?
>>
>> OK for mainline.
>>
>> Thanks,
>> Uros.
>>
>

Patch

Index: i386.md
===================================================================
--- i386.md	(revision 177949)
+++ i386.md	(working copy)
@@ -377,7 +377,7 @@ 
 (define_attr "type"
   "other,multi,
    alu,alu1,negnot,imov,imovx,lea,
-   incdec,ishift,ishift1,rotate,rotate1,imul,idiv,
+   incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
    icmp,test,ibr,setcc,icmov,
    push,pop,call,callv,leave,
    str,bitmanip,
@@ -410,12 +410,12 @@ 
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-                          bitmanip")
+                          bitmanip,imulx")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
-	 (eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1,
-			  imul,icmp,push,pop")
+	 (eq_attr "type" "alu,alu1,negnot,imovx,ishift,ishiftx,ishift1,
+			  rotate,rotatex,rotate1,imul,icmp,push,pop")
 	   (symbol_ref "ix86_attr_length_immediate_default (insn, true)")
 	 (eq_attr "type" "imov,test")
 	   (symbol_ref "ix86_attr_length_immediate_default (insn, false)")
@@ -675,7 +675,7 @@ 
 	      (and (match_operand 0 "memory_displacement_operand" "")
 		   (match_operand 1 "immediate_operand" "")))
 	   (const_string "true")
-	 (and (eq_attr "type" "alu,ishift,rotate,imul,idiv")
+	 (and (eq_attr "type" "alu,ishift,ishiftx,rotate,rotatex,imul,idiv")
 	      (and (match_operand 0 "memory_displacement_operand" "")
 		   (match_operand 2 "immediate_operand" "")))
 	   (const_string "true")
@@ -699,12 +699,13 @@ 
 (define_attr "movu" "0,1" (const_string "0"))
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
-(define_attr "isa" "base,noavx,avx"
+(define_attr "isa" "base,noavx,avx,bmi2"
   (const_string "base"))
 
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx") (symbol_ref "TARGET_AVX")
+	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	]
 	(const_int 1)))
 
@@ -6844,16 +6845,102 @@ 
 	      (clobber (reg:CC FLAGS_REG))])]
   "TARGET_QIMODE_MATH")
 
-(define_insn "*<u>mul<mode><dwi>3_1"
+(define_insn "*bmi2_umulditi3_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(mult:DI
+	  (match_operand:DI 2 "nonimmediate_operand" "%d")
+	  (match_operand:DI 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:DI 1 "register_operand" "=r")
+	(truncate:DI
+	  (lshiftrt:TI
+	    (mult:TI (zero_extend:TI (match_dup 2))
+		     (zero_extend:TI (match_dup 3)))
+	    (const_int 64))))]
+  "TARGET_64BIT && TARGET_BMI2
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "mulx\t{%3, %0, %1|%1, %0, %3}"
+  [(set_attr "type" "imulx")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "DI")])
+
+(define_insn "*bmi2_umulsidi3_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(mult:SI
+	  (match_operand:SI 2 "nonimmediate_operand" "%d")
+	  (match_operand:SI 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:SI 1 "register_operand" "=r")
+	(truncate:SI
+	  (lshiftrt:DI
+	    (mult:DI (zero_extend:DI (match_dup 2))
+		     (zero_extend:DI (match_dup 3)))
+	    (const_int 32))))]
+  "!TARGET_64BIT && TARGET_BMI2
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "mulx\t{%3, %0, %1|%1, %0, %3}"
+  [(set_attr "type" "imulx")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "SI")])
+
+(define_insn "*umul<mode><dwi>3_1"
+  [(set (match_operand:<DWI> 0 "register_operand" "=A,r")
+	(mult:<DWI>
+	  (zero_extend:<DWI>
+	    (match_operand:DWIH 1 "nonimmediate_operand" "%0,d"))
+	  (zero_extend:<DWI>
+	    (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"))))
+   (clobber (reg:CC FLAGS_REG))]
+  "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   mul{<imodesuffix>}\t%2
+   #"
+  [(set_attr "isa" "base,bmi2")
+   (set_attr "type" "imul,imulx")
+   (set_attr "length_immediate" "0,*")
+   (set (attr "athlon_decode")
+	(cond [(eq_attr "alternative" "0")
+		 (if_then_else (eq_attr "cpu" "athlon")
+		   (const_string "vector")
+		   (const_string "double"))]
+	      (const_string "*")))
+   (set_attr "amdfam10_decode" "double,*")
+   (set_attr "bdver1_decode" "direct,*")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "<MODE>")])
+
+;; Convert mul to the mulx pattern to avoid flags dependency.
+(define_split
+ [(set (match_operand:<DWI> 0 "register_operand" "")
+       (mult:<DWI>
+	 (zero_extend:<DWI>
+	   (match_operand:DWIH 1 "nonimmediate_operand" ""))
+	 (zero_extend:<DWI>
+	   (match_operand:DWIH 2 "nonimmediate_operand" ""))))
+  (clobber (reg:CC FLAGS_REG))]
+ "TARGET_BMI2 && reload_completed"
+  [(parallel [(set (match_dup 3)
+		   (mult:DWIH (match_dup 1) (match_dup 2)))
+	      (set (match_dup 4)
+		   (truncate:DWIH
+		     (lshiftrt:<DWI> 
+		       (mult:<DWI> (zero_extend:<DWI> (match_dup 1))
+				   (zero_extend:<DWI> (match_dup 2)))
+	    	       (match_dup 5))))])]
+{
+  operands[3] = gen_lowpart (<MODE>mode, operands[0]);
+  operands[4] = gen_highpart (<MODE>mode, operands[0]);
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
+})
+
+(define_insn "*mul<mode><dwi>3_1"
   [(set (match_operand:<DWI> 0 "register_operand" "=A")
 	(mult:<DWI>
-	  (any_extend:<DWI>
+	  (sign_extend:<DWI>
 	    (match_operand:DWIH 1 "nonimmediate_operand" "%0"))
-	  (any_extend:<DWI>
+	  (sign_extend:<DWI>
 	    (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))
    (clobber (reg:CC FLAGS_REG))]
   "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "<sgnprefix>mul{<imodesuffix>}\t%2"
+  "imul{<imodesuffix>}\t%2"
   [(set_attr "type" "imul")
    (set_attr "length_immediate" "0")
    (set (attr "athlon_decode")
@@ -9051,16 +9138,26 @@ 
   [(set_attr "type" "ishift")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*bmi2_ashl<mode>3_1"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+		      (match_operand:SWI48 2 "register_operand" "r")))]
+  "TARGET_BMI2"
+  "salx\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ishiftx")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*ashl<mode>3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
-	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l")
-		      (match_operand:QI 2 "nonmemory_operand" "c<S>,M")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
+	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
+		      (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
 {
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
+    case TYPE_ISHIFTX:
       return "#";
 
     case TYPE_ALU:
@@ -9076,9 +9173,12 @@ 
 	return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "base,base,bmi2")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
+	    (eq_attr "alternative" "2")
+	      (const_string "ishiftx")
             (and (and (ne (symbol_ref "TARGET_DOUBLE_WITH_ADD")
 		          (const_int 0))
 		      (match_operand 0 "register_operand" ""))
@@ -9097,17 +9197,39 @@ 
        (const_string "*")))
    (set_attr "mode" "<MODE>")])
 
+;; Convert shift to the shiftx pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand" "")
+	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "")
+		      (match_operand:QI 2 "register_operand" "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+	(ashift:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (<MODE>mode, operands[2]);")
+
+(define_insn "*bmi2_ashlsi3_1_zext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(zero_extend:DI
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "register_operand" "r"))))]
+  "TARGET_64BIT && TARGET_BMI2"
+  "salx\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "type" "ishiftx")
+   (set_attr "mode" "SI")])
+
 (define_insn "*ashlsi3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	  (ashift:SI (match_operand:SI 1 "register_operand" "0,l")
-		     (match_operand:QI 2 "nonmemory_operand" "cI,M"))))
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm")
+		     (match_operand:QI 2 "nonmemory_operand" "cI,M,r"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (ASHIFT, SImode, operands)"
 {
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
+    case TYPE_ISHIFTX:
       return "#";
 
     case TYPE_ALU:
@@ -9122,9 +9244,12 @@ 
 	return "sal{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "base,base,bmi2")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
+	    (eq_attr "alternative" "2")
+	      (const_string "ishiftx")
             (and (ne (symbol_ref "TARGET_DOUBLE_WITH_ADD")
 		     (const_int 0))
 		 (match_operand 2 "const1_operand" ""))
@@ -9142,6 +9267,18 @@ 
        (const_string "*")))
    (set_attr "mode" "SI")])
 
+;; Convert shift to the shiftx pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:DI 0 "register_operand" "")
+	(zero_extend:DI
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		     (match_operand:QI 2 "register_operand" ""))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+  	(zero_extend:DI (ashift:SI (match_dup 1) (match_dup 2))))]
+  "operands[2] = gen_lowpart (SImode, operands[2]);")
+
 (define_insn "*ashlhi3_1"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
 	(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0")
@@ -9758,20 +9895,38 @@ 
   DONE;
 })
 
+(define_insn "*bmi2_<shiftrt_insn><mode>3_1"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+	(any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+			   (match_operand:SWI48 2 "register_operand" "r")))]
+  "TARGET_BMI2"
+  "<shiftrt>x\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ishiftx")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*<shiftrt_insn><mode>3_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
-	(any_shiftrt:SWI (match_operand:SWI 1 "nonimmediate_operand" "0")
-			 (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+	(any_shiftrt:SWI48
+	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
-  if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-    return "<shiftrt>{<imodesuffix>}\t%0";
-  else
-    return "<shiftrt>{<imodesuffix>}\t{%2, %0|%0, %2}";
+  switch (get_attr_type (insn))
+    {
+    case TYPE_ISHIFTX:
+      return "#";
+
+    default:
+      if (operands[2] == const1_rtx
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	return "<shiftrt>{<imodesuffix>}\t%0";
+      else
+	return "<shiftrt>{<imodesuffix>}\t{%2, %0|%0, %2}";
+    }
 }
-  [(set_attr "type" "ishift")
+  [(set_attr "isa" "base,bmi2")
+   (set_attr "type" "ishift,ishiftx")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand" "")
@@ -9781,19 +9936,84 @@ 
        (const_string "*")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<shiftrt_insn>si3_1_zext"
+;; Convert shift to the shiftx pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand" "")
+	(any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "")
+			   (match_operand:QI 2 "register_operand" "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+	(any_shiftrt:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (<MODE>mode, operands[2]);")
+
+(define_insn "*bmi2_<shiftrt_insn>si3_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(zero_extend:DI
-	  (any_shiftrt:SI (match_operand:SI 1 "register_operand" "0")
-			  (match_operand:QI 2 "nonmemory_operand" "cI"))))
+	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+			  (match_operand:SI 2 "register_operand" "r"))))]
+  "TARGET_64BIT && TARGET_BMI2"
+  "<shiftrt>x\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "type" "ishiftx")
+   (set_attr "mode" "SI")])
+
+(define_insn "*<shiftrt_insn>si3_1_zext"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+	(zero_extend:DI
+	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
+			  (match_operand:QI 2 "nonmemory_operand" "cI,r"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
 {
+  switch (get_attr_type (insn))
+    {
+    case TYPE_ISHIFTX:
+      return "#";
+
+    default:
+      if (operands[2] == const1_rtx
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	return "<shiftrt>{l}\t%k0";
+      else
+	return "<shiftrt>{l}\t{%2, %k0|%k0, %2}";
+    }
+}
+  [(set_attr "isa" "base,bmi2")
+   (set_attr "type" "ishift,ishiftx")
+   (set (attr "length_immediate")
+     (if_then_else
+       (and (match_operand 2 "const1_operand" "")
+	    (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)")
+		(const_int 0)))
+       (const_string "0")
+       (const_string "*")))
+   (set_attr "mode" "SI")])
+
+;; Convert shift to the shiftx pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:DI 0 "register_operand" "")
+	(zero_extend:DI
+	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "")
+			  (match_operand:QI 2 "register_operand" ""))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+  	(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2))))]
+  "operands[2] = gen_lowpart (SImode, operands[2]);")
+
+(define_insn "*<shiftrt_insn><mode>3_1"
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
+	(any_shiftrt:SWI12
+	  (match_operand:SWI12 1 "nonimmediate_operand" "0")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+{
   if (operands[2] == const1_rtx
       && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-    return "<shiftrt>{l}\t%k0";
+    return "<shiftrt>{<imodesuffix>}\t%0";
   else
-    return "<shiftrt>{l}\t{%2, %k0|%k0, %2}";
+    return "<shiftrt>{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
   [(set_attr "type" "ishift")
    (set (attr "length_immediate")
@@ -9803,7 +10023,7 @@ 
 		(const_int 0)))
        (const_string "0")
        (const_string "*")))
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "<MODE>")])
 
 (define_insn "*<shiftrt_insn>qi3_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm"))
@@ -10055,42 +10275,151 @@ 
   split_double_mode (<DWI>mode, &operands[0], 1, &operands[4], &operands[5]);
 })
 
+(define_insn "*bmi2_rorx<mode>3_1"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+	(rotatert:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+			(match_operand:QI 2 "immediate_operand" "<S>")))]
+  "TARGET_BMI2"
+  "rorx\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "rotatex")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*<rotate_insn><mode>3_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
-	(any_rotate:SWI (match_operand:SWI 1 "nonimmediate_operand" "0")
-			(match_operand:QI 2 "nonmemory_operand" "c<S>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+	(any_rotate:SWI48
+	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>,<S>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
-  if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-    return "<rotate>{<imodesuffix>}\t%0";
-  else
-    return "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
+  switch (get_attr_type (insn))
+    {
+    case TYPE_ROTATEX:
+      return "#";
+
+    default:
+      if (operands[2] == const1_rtx
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	return "<rotate>{<imodesuffix>}\t%0";
+      else
+	return "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
+    }
 }
-  [(set_attr "type" "rotate")
+  [(set_attr "isa" "base,bmi2")
+   (set_attr "type" "rotate,rotatex")
    (set (attr "length_immediate")
      (if_then_else
-       (and (match_operand 2 "const1_operand" "")
-	    (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)")
-		(const_int 0)))
+       (and (eq_attr "type" "rotate")
+	    (and (match_operand 2 "const1_operand" "")
+		 (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)")
+		     (const_int 0))))
        (const_string "0")
        (const_string "*")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<rotate_insn>si3_1_zext"
+;; Convert rotate to the rotatex pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand" "")
+	(rotate:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "")
+		      (match_operand:QI 2 "immediate_operand" "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+	(rotatert:SWI48 (match_dup 1) (match_dup 2)))]
+{
+  operands[2]
+    = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[2]));
+})
+
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand" "")
+	(rotatert:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "")
+			(match_operand:QI 2 "immediate_operand" "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+	(rotatert:SWI48 (match_dup 1) (match_dup 2)))])
+
+(define_insn "*bmi2_rorxsi3_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(zero_extend:DI
-	  (any_rotate:SI (match_operand:SI 1 "register_operand" "0")
-			 (match_operand:QI 2 "nonmemory_operand" "cI"))))
+	  (rotatert:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+		       (match_operand:QI 2 "immediate_operand" "I"))))]
+  "TARGET_64BIT && TARGET_BMI2"
+  "rorx\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "type" "rotatex")
+   (set_attr "mode" "SI")])
+
+(define_insn "*<rotate_insn>si3_1_zext"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+	(zero_extend:DI
+	  (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
+			 (match_operand:QI 2 "nonmemory_operand" "cI,I"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
 {
-    if (operands[2] == const1_rtx
-	&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-    return "<rotate>{l}\t%k0";
+  switch (get_attr_type (insn))
+    {
+    case TYPE_ROTATEX:
+      return "#";
+
+    default:
+      if (operands[2] == const1_rtx
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	return "<rotate>{l}\t%k0";
+      else
+	return "<rotate>{l}\t{%2, %k0|%k0, %2}";
+    }
+}
+  [(set_attr "isa" "base,bmi2")
+   (set_attr "type" "rotate,rotatex")
+   (set (attr "length_immediate")
+     (if_then_else
+       (and (eq_attr "type" "rotate")
+	    (and (match_operand 2 "const1_operand" "")
+		 (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)")
+		     (const_int 0))))
+       (const_string "0")
+       (const_string "*")))
+   (set_attr "mode" "SI")])
+
+;; Convert rotate to the rotatex pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:DI 0 "register_operand" "")
+	(zero_extend:DI
+	  (rotate:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		     (match_operand:QI 2 "immediate_operand" ""))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+  	(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))]
+{
+  operands[2]
+    = GEN_INT (GET_MODE_BITSIZE (SImode) - INTVAL (operands[2]));
+})
+
+(define_split
+  [(set (match_operand:DI 0 "register_operand" "")
+	(zero_extend:DI
+	  (rotatert:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		       (match_operand:QI 2 "immediate_operand" ""))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+  	(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))])
+
+(define_insn "*<rotate_insn><mode>3_1"
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
+	(any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0")
+			  (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+{
+  if (operands[2] == const1_rtx
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+    return "<rotate>{<imodesuffix>}\t%0";
   else
-    return "<rotate>{l}\t{%2, %k0|%k0, %2}";
+    return "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
   [(set_attr "type" "rotate")
    (set (attr "length_immediate")
@@ -10100,7 +10429,7 @@ 
 		(const_int 0)))
        (const_string "0")
        (const_string "*")))
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "<MODE>")])
 
 (define_insn "*<rotate_insn>qi3_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm"))