diff mbox

[i386] : Generate addr32 prefixed addresses

Message ID CAFULd4aBjjz03Vp1DHcXdGe6bX68Jm+s0OiJ+z-_CF9tOcyNOg@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Aug. 8, 2011, 3:16 p.m. UTC
Hello!

Attached patch implements addr32 prefixed addresses for x86_64
targets, where memory locations are accessed with 32bit base and index
registers in the form (zero_extend:DI (... SImode registers ...)).
The optimization rarely (if at all) triggers on x86_64, but is very
important on x32 (see [1]), where many LEAs get moved into addresses
of the operators.

Of some interest is inability of reload to fix-up its own generated
moves for offsetable memory operand constraint "o", as it happens with
TImode moves. See [2] for further analysis and [3] for the workaround.

2011-08-08  Uros Bizjak  <ubizjak@gmail.com>

	PR target/49781
	* config/i386/i386.c (ix86_decompose_address): Allow zero-extended
	SImode addresses.
	(ix86_print_operand_address): Handle zero-extended addresses.
	(memory_address_length): Add length of addr32 prefix for
	zero-extended addresses.
	(ix86_secondary_reload): Handle moves to/from double-word general
	registers from/to zero-extended addresses.
	* config/i386/predicates.md (lea_address_operand): Reject
	zero-extended operands.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}. Additionally, H.J. tested the patch on x32 target with GCC
bootstrap/regression tests, build of glibc (+regression tests) and
SPEC2000/2006.

Patch was committed to mainline SVN.

BTW: There is a strange optimization in combine pass, where
zero-extended address is converted on-the-fly to:

Trying 9 -> 10:
Failed to match this instruction:
(... (and:DI (subreg:DI (plus:SI (ashift:SI (reg/v:SI 63 [ i ])
                    (const_int 2 [0x2]))
                (subreg:SI (reg/v/f:DI 62 [ a ]) 0)) 0)
        (const_int 4294967295 [0xffffffff]))
...)

While it is easy to add a pattern recognizer for this RTX to
ix86_decompose_address/ix86_legitimate_address_p, I would like to
understand the purpose of the conversion better and eventually fix it
in combine pass.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49781
[2] http://gcc.gnu.org/ml/gcc/2011-08/msg00129.html
[3] http://gcc.gnu.org/ml/gcc/2011-08/msg00157.html

Uros.

Comments

Paolo Bonzini Aug. 8, 2011, 3:41 p.m. UTC | #1
On 08/08/2011 05:16 PM, Uros Bizjak wrote:
> BTW: There is a strange optimization in combine pass, where
> zero-extended address is converted on-the-fly to:
>
> Trying 9 ->  10:
> Failed to match this instruction:
> (... (and:DI (subreg:DI (plus:SI (ashift:SI (reg/v:SI 63 [ i ])
>                      (const_int 2 [0x2]))
>                  (subreg:SI (reg/v/f:DI 62 [ a ]) 0)) 0)
>          (const_int 4294967295 [0xffffffff]))
> ...)
>
> While it is easy to add a pattern recognizer for this RTX to
> ix86_decompose_address/ix86_legitimate_address_p, I would like to
> understand the purpose of the conversion better and eventually fix it
> in combine pass.

Combine tries to expand and recreate extend and extract operations, 
hoping to merge it with other bitwise operations.

Paolo
diff mbox

Patch

Index: predicates.md
===================================================================
--- predicates.md	(revision 177547)
+++ predicates.md	(working copy)
@@ -801,6 +801,10 @@ 
   struct ix86_address parts;
   int ok;
 
+  /*  LEA handles zero-extend by itself.  */
+  if (GET_CODE (op) == ZERO_EXTEND)
+    return false;
+
   ok = ix86_decompose_address (op, &parts);
   gcc_assert (ok);
   return parts.seg == SEG_DEFAULT;
Index: i386.c
===================================================================
--- i386.c	(revision 177547)
+++ i386.c	(working copy)
@@ -11142,6 +11142,14 @@  ix86_decompose_address (rtx addr, struct ix86_addr
   int retval = 1;
   enum ix86_address_seg seg = SEG_DEFAULT;
 
+  /* Allow zero-extended SImode addresses,
+     they will be emitted with addr32 prefix.  */
+  if (TARGET_64BIT
+      && GET_CODE (addr) == ZERO_EXTEND
+      && GET_MODE (addr) == DImode
+      && GET_MODE (XEXP (addr, 0)) == SImode)
+    addr = XEXP (addr, 0);
+ 
   if (REG_P (addr))
     base = addr;
   else if (GET_CODE (addr) == SUBREG)
@@ -14159,9 +14167,13 @@  ix86_print_operand_address (FILE *file, rtx addr)
     }
   else
     {
-      /* Print DImode registers on 64bit targets to avoid addr32 prefixes.  */
-      int code = TARGET_64BIT ? 'q' : 0;
+      int code = 0;
 
+      /* Print SImode registers for zero-extended addresses to force
+	 addr32 prefix.  Otherwise print DImode registers to avoid it.  */
+      if (TARGET_64BIT)
+	code = (GET_CODE (addr) == ZERO_EXTEND) ? 'l' : 'q';
+
       if (ASSEMBLER_DIALECT == ASM_ATT)
 	{
 	  if (disp)
@@ -21772,7 +21784,8 @@  assign_386_stack_local (enum machine_mode mode, en
 }
 
 /* Calculate the length of the memory address in the instruction
-   encoding.  Does not include the one-byte modrm, opcode, or prefix.  */
+   encoding.  Includes addr32 prefix, does not include the one-byte modrm,
+   opcode, or other prefixes.  */
 
 int
 memory_address_length (rtx addr)
@@ -21799,8 +21812,10 @@  memory_address_length (rtx addr)
   base = parts.base;
   index = parts.index;
   disp = parts.disp;
-  len = 0;
 
+  /* Add length of addr32 prefix.  */
+  len = (GET_CODE (addr) == ZERO_EXTEND);
+
   /* Rule of thumb:
        - esp as the base always wants an index,
        - ebp as the base always wants a displacement,
@@ -28233,6 +28248,15 @@  ix86_secondary_reload (bool in_p, rtx x, reg_class
 		       enum machine_mode mode,
 		       secondary_reload_info *sri ATTRIBUTE_UNUSED)
 {
+  /* Double-word spills from general registers to non-offsettable memory
+     references (zero-extended addresses) go through XMM register.  */
+  if (TARGET_64BIT
+      && MEM_P (x)
+      && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      && rclass == GENERAL_REGS
+      && !offsettable_memref_p (x))
+    return SSE_REGS;
+
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
   if (!TARGET_64BIT