diff mbox

[i386] : Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

Message ID CAFULd4ZTfkm8DB1=-TdXiNJfPOGa=0cvy6RDmZvdQAOUHR2nkg@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak July 24, 2011, 9:34 p.m. UTC
On Sat, Jul 23, 2011 at 3:57 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> This patch adds x32 LEA insn support.  The main issue is
>>>
>>> gen_lowpart (Pmode, operands[1]);
>>>
>>> doesn't work on symbol.  This patch avoids it.
>>>
>>> Also we shouldn't generate 32bit store with x32 PIC source.
>>>
>>> Any comments?

You are not fixing the core of the problem... this is why you need so
much hacks and kludges at various places (some w.r.t. -fPIC already
existed, see the patch). Above, you correctly identified the problem,
so let's avoid gen_lowpart on SImode operands by not calling it
anymore.

Attached patch effectively rewrites LEA handling. The trick is, that
instead of using Pmode operations in addresses, we use either SImode
or DImode operations to calculate the address on 64bit targets. Up to
now, address calculations strictly used Pmode, so SImode on 32bit
targets and DImode on 64bit targets. Recent patches to
ix86_decompose_address and ix86_legitimate_address_p relaxed this
requirement.

Attached patch changes LEA patterns and LEA splitters to accept
addresses, calculated with either SImode or DImode operations.This
means, that on x64 targets, we don't use gen_lowpart on SImode
operands anymore. Since symbol references on x32 are in SImode, this
solves the problem. The patch also avoids generating SImode subregs of
DImode addresses and DImode zero_extends of SImode addresses, since
LEA insn does this for us automatically.

Please also note the change to ix86_print_operand_address. To avoid
addr32 prefixes, we can force registers in DImode on 64bit targets
without any problems. On x32, we can investigate, if this change
avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).
Also, we can investigate the effect of addr32 on benchmarks.

Patched gcc also fixes all testcases from PR 47381.

2011-07-24  Uros Bizjak  <ubizjak@gmail.com>

	PR target/47381
	* config/i386/i386.md (*lea_1): Use SWI48 mode iterator.
	(*lea_1_zext): New insn pattern.
	(add->lea splitter): Check operand modes in insn constraint.  Extend
	operands less than SImode wide to SImode.
	(add->lea zext splitter): Do not extend operands to DImode.
	(*lea_general_1): Handle only QImode and HImode operands.
	(*lea_general_2): Ditto.
	(*lea_general_3): Ditto.
	(*lea_general_1_zext): Remove.
	(*lea_general_2_zext): Ditto.
	(*lea_general_3_zext): Ditto.
	(*lea_general_4): Check operand modes in insn constraint.  Extend
	operands less than SImode wide to SImode.
	(ashift->lea splitter): Ditto.
	* config/i386/i386.md (ix86_print_operand_address): Print address
	registers with 'q' modifier on 64bit targets.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} with no regressions. H.J., can you please test it on x32?

BTW: -fPIC is not yet implemented on trunk and still fails there with
an (unrelated) error, I didn't check x32 branch.

Uros.

Comments

H.J. Lu July 25, 2011, 1:58 a.m. UTC | #1
On Sun, Jul 24, 2011 at 2:34 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Jul 23, 2011 at 3:57 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> This patch adds x32 LEA insn support.  The main issue is
>>>>
>>>> gen_lowpart (Pmode, operands[1]);
>>>>
>>>> doesn't work on symbol.  This patch avoids it.
>>>>
>>>> Also we shouldn't generate 32bit store with x32 PIC source.
>>>>
>>>> Any comments?
>
> You are not fixing the core of the problem... this is why you need so
> much hacks and kludges at various places (some w.r.t. -fPIC already
> existed, see the patch). Above, you correctly identified the problem,
> so let's avoid gen_lowpart on SImode operands by not calling it
> anymore.
>
> Attached patch effectively rewrites LEA handling. The trick is, that
> instead of using Pmode operations in addresses, we use either SImode
> or DImode operations to calculate the address on 64bit targets. Up to
> now, address calculations strictly used Pmode, so SImode on 32bit
> targets and DImode on 64bit targets. Recent patches to
> ix86_decompose_address and ix86_legitimate_address_p relaxed this
> requirement.
>
> Attached patch changes LEA patterns and LEA splitters to accept
> addresses, calculated with either SImode or DImode operations.This
> means, that on x64 targets, we don't use gen_lowpart on SImode
> operands anymore. Since symbol references on x32 are in SImode, this
> solves the problem. The patch also avoids generating SImode subregs of
> DImode addresses and DImode zero_extends of SImode addresses, since
> LEA insn does this for us automatically.
>
> Please also note the change to ix86_print_operand_address. To avoid
> addr32 prefixes, we can force registers in DImode on 64bit targets
> without any problems. On x32, we can investigate, if this change
> avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).

The testcase won't compile since PIC doesn't work:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49833

> Also, we can investigate the effect of addr32 on benchmarks.

We should get x32 working on trunk first.  We will improve its
performance later.

> Patched gcc also fixes all testcases from PR 47381.
>
> 2011-07-24  Uros Bizjak  <ubizjak@gmail.com>
>
>        PR target/47381
>        * config/i386/i386.md (*lea_1): Use SWI48 mode iterator.
>        (*lea_1_zext): New insn pattern.
>        (add->lea splitter): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (add->lea zext splitter): Do not extend operands to DImode.
>        (*lea_general_1): Handle only QImode and HImode operands.
>        (*lea_general_2): Ditto.
>        (*lea_general_3): Ditto.
>        (*lea_general_1_zext): Remove.
>        (*lea_general_2_zext): Ditto.
>        (*lea_general_3_zext): Ditto.
>        (*lea_general_4): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (ashift->lea splitter): Ditto.
>        * config/i386/i386.md (ix86_print_operand_address): Print address
>        registers with 'q' modifier on 64bit targets.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
> {,-m32} with no regressions. H.J., can you please test it on x32?

On x32, it failed:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49832

> BTW: -fPIC is not yet implemented on trunk and still fails there with
> an (unrelated) error, I didn't check x32 branch.
>

This could be:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49833

Thanks.
diff mbox

Patch

Index: i386.md
===================================================================
--- i386.md	(revision 176713)
+++ i386.md	(working copy)
@@ -5425,13 +5425,22 @@ 
    (set_attr "mode" "QI")])
 
 (define_insn "*lea_1"
-  [(set (match_operand:P 0 "register_operand" "=r")
-	(match_operand:P 1 "no_seg_address_operand" "p"))]
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+	(match_operand:SWI48 1 "no_seg_address_operand" "p"))]
   ""
   "lea{<imodesuffix>}\t{%a1, %0|%0, %a1}"
   [(set_attr "type" "lea")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*lea_1_zext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(zero_extend:DI
+	  (match_operand:SI 1 "no_seg_address_operand" "p")))]
+  "TARGET_64BIT"
+  "lea{l}\t{%a1, %k0|%k0, %a1}"
+  [(set_attr "type" "lea")
+   (set_attr "mode" "SI")])
+
 (define_insn "*lea_2"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(subreg:SI (match_operand:DI 1 "no_seg_address_operand" "p") 0))]
@@ -5794,39 +5803,36 @@ 
         (const_string "none")))
    (set_attr "mode" "QI")])
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert add to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand 0 "register_operand" "")
 	(plus (match_operand 1 "register_operand" "")
               (match_operand 2 "nonmemory_operand" "")))
    (clobber (reg:CC FLAGS_REG))]
-  "reload_completed && ix86_lea_for_add_ok (insn, operands)" 
+  "GET_MODE (operands[0]) == GET_MODE (operands[1])
+   && (GET_MODE (operands[0]) == GET_MODE (operands[2])
+       || GET_MODE (operands[2]) == VOIDmode)
+   && reload_completed && ix86_lea_for_add_ok (insn, operands)" 
   [(const_int 0)]
 {
-  rtx pat;
   enum machine_mode mode = GET_MODE (operands[0]);
-
-  /* In -fPIC mode the constructs like (const (unspec [symbol_ref]))
-     may confuse gen_lowpart.  */
-  if (mode != Pmode)
-    {
-      operands[1] = gen_lowpart (Pmode, operands[1]);
-      operands[2] = gen_lowpart (Pmode, operands[2]);
-    }
-
-  pat = gen_rtx_PLUS (Pmode, operands[1], operands[2]);
+  rtx pat;
 
   if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
-    operands[0] = gen_lowpart (SImode, operands[0]);
+    { 
+      mode = SImode; 
+      operands[0] = gen_lowpart (mode, operands[0]);
+      operands[1] = gen_lowpart (mode, operands[1]);
+      operands[2] = gen_lowpart (mode, operands[2]);
+    }
 
-  if (TARGET_64BIT && mode != Pmode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+  pat = gen_rtx_PLUS (mode, operands[1], operands[2]);
 
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 })
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert add to the lea pattern to avoid flags dependency.
 ;; ??? This pattern handles immediate operands that do not satisfy immediate
 ;; operand predicate (TARGET_LEGITIMATE_CONSTANT_P) in the previous pattern.
 (define_split
@@ -5839,7 +5845,7 @@ 
   [(set (match_dup 0)
 	(plus:DI (match_dup 1) (match_dup 2)))])
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert add to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand:DI 0 "register_operand" "")
 	(zero_extend:DI
@@ -5849,11 +5855,7 @@ 
   "TARGET_64BIT && reload_completed
    && ix86_lea_for_add_ok (insn, operands)"
   [(set (match_dup 0)
-	(zero_extend:DI (subreg:SI (plus:DI (match_dup 1) (match_dup 2)) 0)))]
-{
-  operands[1] = gen_lowpart (DImode, operands[1]);
-  operands[2] = gen_lowpart (DImode, operands[2]);
-})
+	(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))])
 
 (define_insn "*add<mode>_2"
   [(set (reg FLAGS_REG)
@@ -6233,7 +6235,7 @@ 
   [(set_attr "type" "alu")
    (set_attr "mode" "QI")])
 
-;; The lea patterns for non-Pmodes needs to be matched by
+;; The lea patterns for modes less than 32 bits need to be matched by
 ;; several insns converted to real lea by splitters.
 
 (define_insn_and_split "*lea_general_1"
@@ -6241,8 +6243,7 @@ 
 	(plus (plus (match_operand 1 "index_register_operand" "l")
 		    (match_operand 2 "register_operand" "r"))
 	      (match_operand 3 "immediate_operand" "i")))]
-  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode
-    || (TARGET_64BIT && GET_MODE (operands[0]) == SImode))
+  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode)
    && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
    && GET_MODE (operands[0]) == GET_MODE (operands[1])
    && GET_MODE (operands[0]) == GET_MODE (operands[2])
@@ -6252,53 +6253,30 @@ 
   "&& reload_completed"
   [(const_int 0)]
 {
+  enum machine_mode mode = SImode;
   rtx pat;
-  operands[0] = gen_lowpart (SImode, operands[0]);
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[2] = gen_lowpart (Pmode, operands[2]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-  pat = gen_rtx_PLUS (Pmode, gen_rtx_PLUS (Pmode, operands[1], operands[2]),
+
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = gen_lowpart (mode, operands[1]);
+  operands[2] = gen_lowpart (mode, operands[2]);
+  operands[3] = gen_lowpart (mode, operands[3]);
+
+  pat = gen_rtx_PLUS (mode, gen_rtx_PLUS (mode, operands[1], operands[2]),
   		      operands[3]);
-  if (Pmode != SImode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 }
   [(set_attr "type" "lea")
    (set_attr "mode" "SI")])
 
-(define_insn_and_split "*lea_general_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(zero_extend:DI
-	  (plus:SI (plus:SI
-		     (match_operand:SI 1 "index_register_operand" "l")
-		     (match_operand:SI 2 "register_operand" "r"))
-		   (match_operand:SI 3 "immediate_operand" "i"))))]
-  "TARGET_64BIT"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(zero_extend:DI (subreg:SI (plus:DI (plus:DI (match_dup 1)
-						     (match_dup 2))
-					    (match_dup 3)) 0)))]
-{
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[2] = gen_lowpart (Pmode, operands[2]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-}
-  [(set_attr "type" "lea")
-   (set_attr "mode" "SI")])
-
 (define_insn_and_split "*lea_general_2"
   [(set (match_operand 0 "register_operand" "=r")
 	(plus (mult (match_operand 1 "index_register_operand" "l")
 		    (match_operand 2 "const248_operand" "i"))
 	      (match_operand 3 "nonmemory_operand" "ri")))]
-  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode
-    || (TARGET_64BIT && GET_MODE (operands[0]) == SImode))
-   && (!TARGET_PARTIAL_REG_STALL
-       || GET_MODE (operands[0]) == SImode
-       || optimize_function_for_size_p (cfun))
+  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode)
+   && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
    && GET_MODE (operands[0]) == GET_MODE (operands[1])
    && (GET_MODE (operands[0]) == GET_MODE (operands[3])
        || GET_MODE (operands[3]) == VOIDmode)"
@@ -6306,110 +6284,68 @@ 
   "&& reload_completed"
   [(const_int 0)]
 {
+  enum machine_mode mode = SImode;
   rtx pat;
-  operands[0] = gen_lowpart (SImode, operands[0]);
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-  pat = gen_rtx_PLUS (Pmode, gen_rtx_MULT (Pmode, operands[1], operands[2]),
-  		      operands[3]);
-  if (Pmode != SImode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = gen_lowpart (mode, operands[1]);
+  operands[3] = gen_lowpart (mode, operands[3]);
+
+  pat = gen_rtx_PLUS (mode, gen_rtx_MULT (mode, operands[1], operands[2]),
+		      operands[3]);
+
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 }
   [(set_attr "type" "lea")
    (set_attr "mode" "SI")])
 
-(define_insn_and_split "*lea_general_2_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(zero_extend:DI
-	  (plus:SI (mult:SI
-		     (match_operand:SI 1 "index_register_operand" "l")
-		     (match_operand:SI 2 "const248_operand" "n"))
-		   (match_operand:SI 3 "nonmemory_operand" "ri"))))]
-  "TARGET_64BIT"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(zero_extend:DI (subreg:SI (plus:DI (mult:DI (match_dup 1)
-						     (match_dup 2))
-					    (match_dup 3)) 0)))]
-{
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-}
-  [(set_attr "type" "lea")
-   (set_attr "mode" "SI")])
-
 (define_insn_and_split "*lea_general_3"
   [(set (match_operand 0 "register_operand" "=r")
 	(plus (plus (mult (match_operand 1 "index_register_operand" "l")
 			  (match_operand 2 "const248_operand" "i"))
 		    (match_operand 3 "register_operand" "r"))
 	      (match_operand 4 "immediate_operand" "i")))]
-  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode
-    || (TARGET_64BIT && GET_MODE (operands[0]) == SImode))
-   && (!TARGET_PARTIAL_REG_STALL
-       || GET_MODE (operands[0]) == SImode
-       || optimize_function_for_size_p (cfun))
+  "(GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode)
+   && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
    && GET_MODE (operands[0]) == GET_MODE (operands[1])
    && GET_MODE (operands[0]) == GET_MODE (operands[3])"
   "#"
   "&& reload_completed"
   [(const_int 0)]
 {
+  enum machine_mode mode = SImode;
   rtx pat;
-  operands[0] = gen_lowpart (SImode, operands[0]);
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-  operands[4] = gen_lowpart (Pmode, operands[4]);
-  pat = gen_rtx_PLUS (Pmode,
-  		      gen_rtx_PLUS (Pmode, gen_rtx_MULT (Pmode, operands[1],
-		      					 operands[2]),
+
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = gen_lowpart (mode, operands[1]);
+  operands[3] = gen_lowpart (mode, operands[3]);
+  operands[4] = gen_lowpart (mode, operands[4]);
+
+  pat = gen_rtx_PLUS (mode,
+  		      gen_rtx_PLUS (mode,
+				    gen_rtx_MULT (mode, operands[1],
+		      					operands[2]),
 				    operands[3]),
   		      operands[4]);
-  if (Pmode != SImode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 }
   [(set_attr "type" "lea")
    (set_attr "mode" "SI")])
 
-(define_insn_and_split "*lea_general_3_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(zero_extend:DI
-	  (plus:SI (plus:SI
-		     (mult:SI
-		       (match_operand:SI 1 "index_register_operand" "l")
-		       (match_operand:SI 2 "const248_operand" "n"))
-		     (match_operand:SI 3 "register_operand" "r"))
-		   (match_operand:SI 4 "immediate_operand" "i"))))]
-  "TARGET_64BIT"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(zero_extend:DI (subreg:SI (plus:DI (plus:DI (mult:DI (match_dup 1)
-							      (match_dup 2))
-						     (match_dup 3))
-					    (match_dup 4)) 0)))]
-{
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[3] = gen_lowpart (Pmode, operands[3]);
-  operands[4] = gen_lowpart (Pmode, operands[4]);
-}
-  [(set_attr "type" "lea")
-   (set_attr "mode" "SI")])
-
 (define_insn_and_split "*lea_general_4"
-  [(set (match_operand:SWI 0 "register_operand" "=r")
-	(any_or:SWI (ashift:SWI (match_operand:SWI 1 "index_register_operand" "l")
-				(match_operand:SWI 2 "const_int_operand" "n"))
-		    (match_operand 3 "const_int_operand" "n")))]
-  "(<MODE>mode == DImode
-    || <MODE>mode == SImode
-    || !TARGET_PARTIAL_REG_STALL
-    || optimize_function_for_size_p (cfun))
+  [(set (match_operand 0 "register_operand" "=r")
+	(any_or (ashift
+		  (match_operand 1 "index_register_operand" "l")
+		  (match_operand 2 "const_int_operand" "n"))
+		(match_operand 3 "const_int_operand" "n")))]
+  "(((GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode)
+      && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)))
+    || GET_MODE (operands[0]) == SImode
+    || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
+   && GET_MODE (operands[0]) == GET_MODE (operands[1])
    && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) - 1 < 3
    && ((unsigned HOST_WIDE_INT) INTVAL (operands[3])
        < ((unsigned HOST_WIDE_INT) 1 << INTVAL (operands[2])))"
@@ -6417,23 +6353,29 @@ 
   "&& reload_completed"
   [(const_int 0)]
 {
+  enum machine_mode mode = GET_MODE (operands[0]);
   rtx pat;
-  if (<MODE>mode != DImode)
-    operands[0] = gen_lowpart (SImode, operands[0]);
-  operands[1] = gen_lowpart (Pmode, operands[1]);
+
+  if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
+    { 
+      mode = SImode; 
+      operands[0] = gen_lowpart (mode, operands[0]);
+      operands[1] = gen_lowpart (mode, operands[1]);
+    }
+
   operands[2] = GEN_INT (1 << INTVAL (operands[2]));
-  pat = plus_constant (gen_rtx_MULT (Pmode, operands[1], operands[2]),
+
+  pat = plus_constant (gen_rtx_MULT (mode, operands[1], operands[2]),
 		       INTVAL (operands[3]));
-  if (Pmode != SImode && <MODE>mode != DImode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 }
   [(set_attr "type" "lea")
    (set (attr "mode")
-      (if_then_else (eq (symbol_ref "<MODE>mode == DImode") (const_int 0))
-	(const_string "SI")
-	(const_string "DI")))])
+      (if_then_else (match_operand:DI 0 "" "")
+	(const_string "DI")
+	(const_string "SI")))])
 
 ;; Subtract instructions
 
@@ -9395,36 +9337,36 @@ 
        (const_string "*")))
    (set_attr "mode" "QI")])
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert ashift to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand 0 "register_operand" "")
 	(ashift (match_operand 1 "index_register_operand" "")
                 (match_operand:QI 2 "const_int_operand" "")))
    (clobber (reg:CC FLAGS_REG))]
-  "reload_completed
+  "GET_MODE (operands[0]) == GET_MODE (operands[1])
+   && reload_completed
    && true_regnum (operands[0]) != true_regnum (operands[1])"
   [(const_int 0)]
 {
-  rtx pat;
   enum machine_mode mode = GET_MODE (operands[0]);
-
-  if (mode != Pmode)
-    operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[2] = gen_int_mode (1 << INTVAL (operands[2]), Pmode);
-
-  pat = gen_rtx_MULT (Pmode, operands[1], operands[2]);
+  rtx pat;
 
   if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
-    operands[0] = gen_lowpart (SImode, operands[0]);
+    { 
+      mode = SImode; 
+      operands[0] = gen_lowpart (mode, operands[0]);
+      operands[1] = gen_lowpart (mode, operands[1]);
+    }
+
+  operands[2] = gen_int_mode (1 << INTVAL (operands[2]), mode);
 
-  if (TARGET_64BIT && mode != Pmode)
-    pat = gen_rtx_SUBREG (SImode, pat, 0);
+  pat = gen_rtx_MULT (mode, operands[1], operands[2]);
 
   emit_insn (gen_rtx_SET (VOIDmode, operands[0], pat));
   DONE;
 })
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert ashift to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand:DI 0 "register_operand" "")
 	(zero_extend:DI
Index: i386.c
===================================================================
--- i386.c	(revision 176713)
+++ i386.c	(working copy)
@@ -14122,6 +14122,9 @@  ix86_print_operand_address (FILE *file, 
     }
   else
     {
+      /* Use DImode registers in address on 64bit target.  */
+      int code = TARGET_64BIT ? 'q' : 0;
+
       if (ASSEMBLER_DIALECT == ASM_ATT)
 	{
 	  if (disp)
@@ -14136,11 +14139,11 @@  ix86_print_operand_address (FILE *file, 
 
 	  putc ('(', file);
 	  if (base)
-	    print_reg (base, 0, file);
+	    print_reg (base, code, file);
 	  if (index)
 	    {
 	      putc (',', file);
-	      print_reg (index, 0, file);
+	      print_reg (index, code, file);
 	      if (scale != 1)
 		fprintf (file, ",%d", scale);
 	    }
@@ -14175,7 +14178,7 @@  ix86_print_operand_address (FILE *file, 
 	  putc ('[', file);
 	  if (base)
 	    {
-	      print_reg (base, 0, file);
+	      print_reg (base, code, file);
 	      if (offset)
 		{
 		  if (INTVAL (offset) >= 0)
@@ -14191,7 +14194,7 @@  ix86_print_operand_address (FILE *file, 
 	  if (index)
 	    {
 	      putc ('+', file);
-	      print_reg (index, 0, file);
+	      print_reg (index, code, file);
 	      if (scale != 1)
 		fprintf (file, "*%d", scale);
 	    }