Patchwork RFC: LRA for x86/x86-64 [4/9]

login
register
mail settings
Submitter Vladimir Makarov
Date Sept. 28, 2012, 1:44 a.m.
Message ID <506500F4.6020707@redhat.com>
Download mbox | patch
Permalink /patch/187695/
State New
Headers show

Comments

Vladimir Makarov - Sept. 28, 2012, 1:44 a.m.
On 09/27/2012 08:07 PM, Joseph S. Myers wrote:
> On Thu, 27 Sep 2012, Vladimir Makarov wrote:
>
>> Hook spill_class returns a value of enum reg_class which is defined in
>> target-depend include file.
> That's what reg_class_t is for: avoiding enum reg_class in hook
> interfaces.
>
Ok.  Thanks for pointing this out.

Here is the modified patch.

2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>

     * targhooks.h (default_lra_p): Declare.
     (default_register_bank): Ditto.
     (default_different_addr_displacement_p): Ditto.
     * targhooks.c (default_lra_p): New function.
     (default_register_bank): Ditto.
     (default_different_addr_displacement_p): Ditto.
     * target.def (lra_p): New hook.
     (register_bank): Ditto.
     (different_addr_displacement_p): Ditto.
     (spill_class, spill_class_mode): New hooks.
     * doc/tm.texi.in: Add TARGET_LRA_P, TARGET_REGISTER_BANK,
     TARGET_DIFFERENT_ADDR_DISPLACEMENT_P, TARGET_SPILL_CLASS, and
     TARGET_SPILL_CLASS_MODE.
     * doc/tm.texi: Update.

The change also requires some modification in the 9th patch. The 
ChangeLog for the patch should be the same as before.
Richard Sandiford - Oct. 1, 2012, 6:51 p.m.
Thanks a lot for doing this.  When you finally get to the stage of
"rm reload.c reload1.c", please do it in a screen session and save
the log for posterity.

Vladimir Makarov <vmakarov@redhat.com> writes:
> +/* Return register bank of given hard regno for the current target.  */
> +DEFHOOK
> +(register_bank,
> + "A target hook which returns the register bank number to which the\
> +  register @var{hard_regno} belongs to.  The smaller the number, the\
> +  more preferable the hard register usage (when all other conditions are\
> +  the same).  This hook can be used to prefer some hard register over\
> +  others in LRA.  For example, some x86-64 register usage needs\
> +  additional prefix which makes instructions longer.  The hook can\
> +  return bigger bank number for such registers make them less favorable\
> +  and as result making the generated code smaller.\
> +  \
> +  The default version of this target hook returns always zero.",
> + int, (int),
> + default_register_bank)

This is a horribly bikeshed-level comment, sorry, but I wonder if
something like "register_priority" would be better.  Register classes
are in some ways an extension of register banks, so it wasn't obvious
from the name why we needed both.

> +/* Return true if maximal address displacement can be different.  */
> +DEFHOOK
> +(different_addr_displacement_p,
> + "A target hook which returns true if an address with the same structure\
> +  can have different maximal legitimate displacement.  For example, the\
> +  displacement can depend on memory mode or on operand combinations in\
> +  the insn.\
> +  \
> +  The default version of this target hook returns always false.",
> + bool, (void),
> + default_different_addr_displacement_p)

If I read the patch correctly, this is only used in:

+	    if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
+	      lra_set_used_insn_alternative (insn, -1);

and so we keep the current alternative when neither spill_class_mode
nor different_addr_displacement_p is defined.  How many targets on the
LRA branch are like that?  I would have expected most targets with limited
address displacements would have to return true for the above hook,
because multiword loads and stores typically have to be split into word
loads and stores.  Same goes for strict-alignment targets, where wider
modes often have slightly lower maximal displacements.

E.g. for MIPS, SImode loads and stores have a displacement range of
[-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
So the maximal displacement depends on mode, even though the instruction set
is pretty regular.

Targets with full address-size displacements can use the default false return,
but it looks like the x86 port defines spill_class_mode instead, so AIUI
the value isn't really tested on Core i7.  What's the impact of that compared
to the other x86 targets that don't set X86_TUNE_GENERAL_REGS_SSE_SPILL?
Is LRA just quicker for them, or will it make different decisions
(compared to Core i7) even for non-SSE insns?

> +/* Determine class of registers which could be used for spilled
> +   pseudos instead of memory.  */
> +DEFHOOK
> +(spill_class,
> + "This hook defines a class of registers which could be used for spilled pseudos\
> +  of given class instead of memory",
> + reg_class_t, (reg_class_t),
> + NULL)

Should probably say that NO_REGS means "none".

> +/* Determine mode for spilling pseudos into registers instead of memory.  */
> +DEFHOOK
> +(spill_class_mode,
> + "This hook defines mode in which a pseudo of given mode and of the first\
> +  register class can be spilled into the second register class",
> + enum machine_mode, (reg_class_t, reg_class_t, enum machine_mode),
> + NULL)

It looks like the only use is in:

+	  || (targetm.spill_class_mode (rclass, spill_class,
+					PSEUDO_REGNO_MODE (regno))
+	      != PSEUDO_REGNO_MODE (regno))

So would it make sense to have a single hook like:

/* Determine mode for spilling pseudos into registers instead of memory.  */
DEFHOOK
(spill_class,
 "This hook defines a class of registers which could be used for spilling\
  pseudos of the given mode and class, or @code{NO_REGS} if only memory\
  should be used.  Not defining this hook is equivalent to returning\
  @code{NO_REGS} for all inputs."
 reg_class_t, (reg_class_t, enum machine_mode),
 NULL)

?  It means that setup_reg_spill_flag needs a class-x-mode walk, but
(bad excuse ahoy) we have plenty of those already.  If we really wanted
to avoid the extra loop, we could make VOIDmode mean "any mode",
although that does make the interface a bit more clunky.

Richard
Paul Koning - Oct. 1, 2012, 7 p.m.
On Oct 1, 2012, at 2:51 PM, Richard Sandiford wrote:

> ...
> E.g. for MIPS, SImode loads and stores have a displacement range of
> [-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
> So the maximal displacement depends on mode, even though the instruction set
> is pretty regular.

It may be that the case doesn't arise in code GCC generates, but I don't think that's true.  The offset field is always a 2's complement 16 bit integer, hence in the range -32768..32767.  The alignment required in loading multibyte data with aligned load/store instructions applies to the final address, not the offset.  For example, if R1 contains 1, then LD r2,32767(r1) will work.

	paul
Jeff Law - Oct. 1, 2012, 8:39 p.m.
On 09/27/2012 07:44 PM, Vladimir Makarov wrote:
> On 09/27/2012 08:07 PM, Joseph S. Myers wrote:
>> On Thu, 27 Sep 2012, Vladimir Makarov wrote:
>>
>>> Hook spill_class returns a value of enum reg_class which is defined in
>>> target-depend include file.
>> That's what reg_class_t is for: avoiding enum reg_class in hook
>> interfaces.
>>
> Ok.  Thanks for pointing this out.
>
> Here is the modified patch.
>
> 2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>
>
>      * targhooks.h (default_lra_p): Declare.
>      (default_register_bank): Ditto.
>      (default_different_addr_displacement_p): Ditto.
>      * targhooks.c (default_lra_p): New function.
>      (default_register_bank): Ditto.
>      (default_different_addr_displacement_p): Ditto.
>      * target.def (lra_p): New hook.
>      (register_bank): Ditto.
>      (different_addr_displacement_p): Ditto.
>      (spill_class, spill_class_mode): New hooks.
>      * doc/tm.texi.in: Add TARGET_LRA_P, TARGET_REGISTER_BANK,
>      TARGET_DIFFERENT_ADDR_DISPLACEMENT_P, TARGET_SPILL_CLASS, and
>      TARGET_SPILL_CLASS_MODE.
>      * doc/tm.texi: Update.
>
> The change also requires some modification in the 9th patch. The
> ChangeLog for the patch should be the same as before.
This looks fine to me.

jeff
Vladimir Makarov - Oct. 2, 2012, 3:52 p.m.
On 10/01/2012 02:51 PM, Richard Sandiford wrote:
> Vladimir Makarov <vmakarov@redhat.com> writes:
>> +/* Return register bank of given hard regno for the current target.  */
>> +DEFHOOK
>> +(register_bank,
>> + "A target hook which returns the register bank number to which the\
>> +  register @var{hard_regno} belongs to.  The smaller the number, the\
>> +  more preferable the hard register usage (when all other conditions are\
>> +  the same).  This hook can be used to prefer some hard register over\
>> +  others in LRA.  For example, some x86-64 register usage needs\
>> +  additional prefix which makes instructions longer.  The hook can\
>> +  return bigger bank number for such registers make them less favorable\
>> +  and as result making the generated code smaller.\
>> +  \
>> +  The default version of this target hook returns always zero.",
>> + int, (int),
>> + default_register_bank)
> This is a horribly bikeshed-level comment, sorry, but I wonder if
> something like "register_priority" would be better.  Register classes
> are in some ways an extension of register banks, so it wasn't obvious
> from the name why we needed both.
Ok.  I agree that is not a good term.  Register bank in hardware 
(especially in DSP) means a bit different thing.

Actually, on the Cauldron Ian asked me why it is different from register 
allocation order.  I should say that the order usually takes 
caller-saves info into account.  In x86-64, reg with REX flags can be 
caller-saved or not.
>> +/* Return true if maximal address displacement can be different.  */
>> +DEFHOOK
>> +(different_addr_displacement_p,
>> + "A target hook which returns true if an address with the same structure\
>> +  can have different maximal legitimate displacement.  For example, the\
>> +  displacement can depend on memory mode or on operand combinations in\
>> +  the insn.\
>> +  \
>> +  The default version of this target hook returns always false.",
>> + bool, (void),
>> + default_different_addr_displacement_p)
> If I read the patch correctly, this is only used in:
>
> +	    if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
> +	      lra_set_used_insn_alternative (insn, -1);
>
> and so we keep the current alternative when neither spill_class_mode
> nor different_addr_displacement_p is defined.  How many targets on the
> LRA branch are like that?  I would have expected most targets with limited
> address displacements would have to return true for the above hook,
> because multiword loads and stores typically have to be split into word
> loads and stores.  Same goes for strict-alignment targets, where wider
> modes often have slightly lower maximal displacements.
>
> E.g. for MIPS, SImode loads and stores have a displacement range of
> [-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
> So the maximal displacement depends on mode, even though the instruction set
> is pretty regular.
>
> Targets with full address-size displacements can use the default false return,
> but it looks like the x86 port defines spill_class_mode instead, so AIUI
> the value isn't really tested on Core i7.  What's the impact of that compared
> to the other x86 targets that don't set X86_TUNE_GENERAL_REGS_SSE_SPILL?
> Is LRA just quicker for them, or will it make different decisions
> (compared to Core i7) even for non-SSE insns?
It is mostly done for the LRA speed.  We could remove if-stmt and 
everything will be all right, only lra-constraints pass will go over all 
alternatives again.

Currently, there are two targets for which if-cond is true.  One is 
x86-64 (more exactly when corei7 tune is used) and another target PARISC 
for which different_add_displacement_p is true.

It might be more in the future.  Future ARMs and Powers might be 
profitable for spilling general registers into vector/floating point 
registers.  Might be new other targets will need 
different_addr_displacement_p.

We could rid off the hook but it is important for speeding LRA up. I 
felt that I should make LRA speed competitive with reload even it is 
hard because LRA works on RTL and not on internal structures as reload.  
I see now I was right worrying about the speed.  I guess if we could 
sacrifice 2% of compilation time, LRA code would be smaller and more 
clear.  If we could sacrifice 10% percent compilation time, the code 
would be even smaller and clear because we could use DF-infrastructure.
>> +/* Determine class of registers which could be used for spilled
>> +   pseudos instead of memory.  */
>> +DEFHOOK
>> +(spill_class,
>> + "This hook defines a class of registers which could be used for spilled pseudos\
>> +  of given class instead of memory",
>> + reg_class_t, (reg_class_t),
>> + NULL)
> Should probably say that NO_REGS means "none".
>> +/* Determine mode for spilling pseudos into registers instead of memory.  */
>> +DEFHOOK
>> +(spill_class_mode,
>> + "This hook defines mode in which a pseudo of given mode and of the first\
>> +  register class can be spilled into the second register class",
>> + enum machine_mode, (reg_class_t, reg_class_t, enum machine_mode),
>> + NULL)
> It looks like the only use is in:
>
> +	  || (targetm.spill_class_mode (rclass, spill_class,
> +					PSEUDO_REGNO_MODE (regno))
> +	      != PSEUDO_REGNO_MODE (regno))
>
> So would it make sense to have a single hook like:
>
> /* Determine mode for spilling pseudos into registers instead of memory.  */
> DEFHOOK
> (spill_class,
>   "This hook defines a class of registers which could be used for spilling\
>    pseudos of the given mode and class, or @code{NO_REGS} if only memory\
>    should be used.  Not defining this hook is equivalent to returning\
>    @code{NO_REGS} for all inputs."
>   reg_class_t, (reg_class_t, enum machine_mode),
>   NULL)
>
> ?  It means that setup_reg_spill_flag needs a class-x-mode walk, but
> (bad excuse ahoy) we have plenty of those already.  If we really wanted
> to avoid the extra loop, we could make VOIDmode mean "any mode",
> although that does make the interface a bit more clunky.
>
No, I think it has sense.  Too many hooks already.  Although after 
removing reloads, some hooks will be gone.

Thanks for the proposals.

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 191771)
+++ config/i386/i386.c	(working copy)
@@ -2267,7 +2267,11 @@  static unsigned int initial_ix86_tune_fe
 
   /* X86_TUNE_REASSOC_FP_TO_PARALLEL: Try to produce parallel computations
      during reassociation of fp computation.  */
-  m_ATOM
+  m_ATOM,
+
+  /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
+     regs instead of memory.  */
+  m_COREI7 | m_CORE2I7
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -31694,6 +31698,38 @@  ix86_free_from_memory (enum machine_mode
     }
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+ix86_lra_p (void)
+{
+  return true;
+}
+
+/* Return a register bank number for hard reg REGNO.  */
+static int
+ix86_register_bank (int hard_regno)
+{
+  /* ebp and r13 as the base always wants a displacement, r12 as the
+     base always wants an index.  So discourage their usage in an
+     address.  */
+  if (hard_regno == R12_REG || hard_regno == R13_REG)
+    return 4;
+  if (hard_regno == BP_REG)
+    return 2;
+  /* New x86-64 int registers result in bigger code size.  Discourage
+     them.  */
+  if (FIRST_REX_INT_REG <= hard_regno && hard_regno <= LAST_REX_INT_REG)
+    return 3;
+  /* New x86-64 SSE registers result in bigger code size.  Discourage
+     them.  */
+  if (FIRST_REX_SSE_REG <= hard_regno && hard_regno <= LAST_REX_SSE_REG)
+    return 3;
+  /* Usage of AX register results in smaller code.  Prefer it.  */
+  if (hard_regno == 0)
+    return 0;
+  return 1;
+}
+
 /* Implement TARGET_PREFERRED_RELOAD_CLASS.
 
    Put float CONST_DOUBLE in the constant pool instead of fp regs.
@@ -31827,6 +31863,9 @@  ix86_secondary_reload (bool in_p, rtx x,
       && !in_p && mode == QImode
       && (rclass == GENERAL_REGS
 	  || rclass == LEGACY_REGS
+	  || rclass == NON_Q_REGS
+	  || rclass == SIREG
+	  || rclass == DIREG
 	  || rclass == INDEX_REGS))
     {
       int regno;
@@ -31936,7 +31975,7 @@  inline_secondary_memory_needed (enum reg
       || MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
       || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
     {
-      gcc_assert (!strict);
+      gcc_assert (!strict || lra_in_progress);
       return true;
     }
 
@@ -40483,6 +40522,39 @@  ix86_autovectorize_vector_sizes (void)
   return (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+
+
+/* Return class of registers which could be used for pseudo of class
+   RCLASS for spilling instead of memory.  Return NO_REGS if it is not
+   possible or non-profitable.  */
+static reg_class_t
+ix86_spill_class (reg_class_t rclass)
+{
+  if (TARGET_SSE && TARGET_GENERAL_REGS_SSE_SPILL
+      && hard_reg_set_subset_p (reg_class_contents[rclass],
+				reg_class_contents[GENERAL_REGS]))
+    return SSE_REGS;
+  return NO_REGS;
+}
+
+/* Return mode in which pseudo of MODE and RCLASS can be spilled into
+   a register of class SPILL_CLASS.  Return VOIDmode if it is not
+   possible.  */
+static enum machine_mode
+ix86_spill_class_mode (reg_class_t rclass, reg_class_t spill_class,
+		       enum machine_mode mode)
+{
+  if (! TARGET_SSE || ! TARGET_GENERAL_REGS_SSE_SPILL
+      || ! hard_reg_set_subset_p (reg_class_contents[rclass],
+				  reg_class_contents[GENERAL_REGS])
+      || spill_class != SSE_REGS)
+    return VOIDmode;
+  if (mode == SImode || (TARGET_64BIT && mode == DImode))
+    return mode;
+  return VOIDmode;
+}
+
+
 /* Implement targetm.vectorize.init_cost.  */
 
 static void *
@@ -40885,6 +40957,12 @@  ix86_memmodel_check (unsigned HOST_WIDE_
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P ix86_legitimate_address_p
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P ix86_lra_p
+
+#undef TARGET_REGISTER_BANK
+#define TARGET_REGISTER_BANK ix86_register_bank
+
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
@@ -40908,6 +40986,12 @@  ix86_memmodel_check (unsigned HOST_WIDE_
 #define TARGET_INIT_LIBFUNCS darwin_rename_builtins
 #endif
 
+#undef TARGET_SPILL_CLASS
+#define TARGET_SPILL_CLASS ix86_spill_class
+
+#undef TARGET_SPILL_CLASS_MODE
+#define TARGET_SPILL_CLASS_MODE ix86_spill_class_mode
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-i386.h"
Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 191771)
+++ config/i386/i386.h	(working copy)
@@ -327,6 +327,7 @@  enum ix86_tune_indices {
   X86_TUNE_AVX128_OPTIMAL,
   X86_TUNE_REASSOC_INT_TO_PARALLEL,
   X86_TUNE_REASSOC_FP_TO_PARALLEL,
+  X86_TUNE_GENERAL_REGS_SSE_SPILL,
 
   X86_TUNE_LAST
 };
@@ -431,6 +432,8 @@  extern unsigned char ix86_tune_features[
 	ix86_tune_features[X86_TUNE_REASSOC_INT_TO_PARALLEL]
 #define TARGET_REASSOC_FP_TO_PARALLEL \
 	ix86_tune_features[X86_TUNE_REASSOC_FP_TO_PARALLEL]
+#define TARGET_GENERAL_REGS_SSE_SPILL \
+	ix86_tune_features[X86_TUNE_GENERAL_REGS_SSE_SPILL]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {