diff mbox

[RFC,MIPS] Patch to enable LRA for MIPS backend

Message ID B5E67142681B53468FAF6B7C313565623D3DD70C@KLMAIL01.kl.imgtec.org
State New
Headers show

Commit Message

Robert Suchanek March 29, 2014, 1:07 a.m. UTC
Hi All,

This patch enables LRA by default for MIPS. The classic reload is still
available and can be enabled via -mreload switch. 

All regression are fixed, with one exception described below.

There was a necessary change in the LRA core as I believe there was a genuine 
unhandled case in LRA when processing addresses. It is specific to MIPS16
as store/load[unsigned] halfword/byte instructions cannot access the stack pointer
directly. Potentially, it can affect other architectures if they have similar
limitation. One of the problems showed an RTL that contained $frame as the base register
(without any offset, simple move) but LRA temporarily eliminated it to $sp before
calling the target hook to validate the address.
The backend rejected it because of the mode and $sp. Then, LRA tried to emit base+disp 
but ICEd because there never was any displacement. Another testcase, revealed offset 
not being used and unnecessary 'add' instructions were inserted preventing the use of offsets. 
Marking an insn with STACK_POINTER_REGNUM as valid was not an option as LRA would 
generate an insn with $sp and fail during coherency check. The patch attempts to 
reload $sp into a register and re-validate the address with offset (if there is one). 
If this fails it sticks to the original plan inserting base+disp.

The generated code optimized for size is fairly acceptable. CSiBE shows a slight 
advantage of LRA over the reload for MIPS16 with some minor regression for mips32*, 
mips64*, on average less than 0.5%. The code size improvements are being investigated.

The patch has been tested on the following variations:
- cross-tested mips-mti-elf, mips-mti-linux-gnu (languages=c,c++):
  {-mips32,-mips32r2}{-EL,-EB}{-mhard-float,-msoft-float}{-mno-mips16,-mips16}
  -mips64r2 -mabi=n32 {-mhard-float,-msoft-float}
  -mips64r2 -mabi=64 {-mhard-float,-msoft-float}
- bootstrapped and regtested x86_84-unknown-linux-gnu (all languages)

There are two known DejaGNU failures on mips64 with mabi=64, namely, m{add,sub}-8 tests
because of the subtleties in LRA costing model but it's not a correctness issue.
The *mul_{add,sub}_si patterns are tuned explicitly for LRA and all failures 
have been resolved with m{add,sub}-* except the above. By saying failures I mean 
the differences between tests ran with/without -mreload switch. A number of failures 
already existed on ToT at the time of testing.

The patch is intended for Stage 1. As for the legal part, the company-wide 
copyright assignment is in process.

Regards,
Robert

testsuite/ChangeLog:

2014-03-26  Robert Suchanek  <Robert.Suchanek@imgtec.com>

	* lra-constraints.c (base_to_reg): New function.
	(process_address): Use new function.
	* rtlanal.c (get_base_term): Add CONSTANT_P (*inner).

	* config/mips/constraints.md ("d"): BASE_REG_CLASS
	replaced by ADDR_REG_CLASS.
	* config/mips/mips.c (mips_regno_mode_ok_for_base_p):
	Remove use !strict_p for MIPS16.
	(mips_register_priority): New function that implements
	the target hook TARGET_REGISTER_PRIORITY.
	(mips_spill_class): Likewise for TARGET_SPILL_CLASS
	(mips_lra_p): Likewise for TARGET_LRA_P.
	* config/mips/mips.h (reg_class): Add M16F_REGS and SPILL_REGS
	classes.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	(BASE_REG_CLASS): Use M16F_REGS.
	(ADDR_REG_CLASS): Define.
	(IRA_HARD_REGNO_ADD_COST_MULTIPLIER): Define.
	* config/mips/mips.md (*mul_acc_si, *mul_sub_si): Add alternative
	tuned for LRA. New set attribute to enable alternatives
	depending on the register allocator used.
	(*and<mode>3_mips16): Remove the load alternatives.
	(*lea64): Disable pattern for MIPS16.
	* config/mips/mips.opt
	(mreload): New option.
---
 gcc/config/mips/constraints.md |    2 +-
 gcc/config/mips/mips.c         |   51 +++++++++++++++++-
 gcc/config/mips/mips.h         |   17 +++++-
 gcc/config/mips/mips.md        |  112 +++++++++++++++++++++++-----------------
 gcc/config/mips/mips.opt       |    4 ++
 gcc/lra-constraints.c          |   44 +++++++++++++++-
 gcc/rtlanal.c                  |    3 +-
 7 files changed, 181 insertions(+), 52 deletions(-)

Comments

Richard Sandiford March 29, 2014, 10:52 a.m. UTC | #1
First of all, thanks a lot for doing this.

Robert Suchanek <Robert.Suchanek@imgtec.com> writes:
> diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
> index 143169b..f27a801 100644
> --- gcc/config/mips/mips.c
> +++ gcc/config/mips/mips.c
> @@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode,
>       All in all, it seems more consistent to only enforce this restriction
>       during and after reload.  */
>    if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
> -    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
> +    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>  
>    return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
>  }

Not sure about this one.  We would need to update the comment that
explains why "!strict_p" is there, but AFAIK reason (1) would still apply.

Was this needed for correctness or because it gave better code?

> +/* Return a register priority for hard reg REGNO.  */
> +
> +static int
> +mips_register_priority (int hard_regno)
> +{
> +  if (TARGET_MIPS16)
> +   {
> +     /* Treat MIPS16 registers with higher priority than other regs.  */
> +     switch (hard_regno)
> +       {
> +       case 2:
> +       case 3:
> +       case 4:
> +       case 5:
> +       case 6:
> +       case 7:
> +       case 16:
> +       case 17:
> +         return 1;
> +       default:
> +         return 0;
> +       }
> +   }
> +  return 0;
> +}
> +

Easier as:

  if (TARGET_MIPS16
      && TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
    return 1;
  return 0;

> diff --git gcc/config/mips/mips.h gcc/config/mips/mips.h
> index a786d4c..712008f 100644
> --- gcc/config/mips/mips.h
> +++ gcc/config/mips/mips.h
> @@ -1871,10 +1871,12 @@ enum reg_class
>  {
>    NO_REGS,			/* no registers in set */
>    M16_REGS,			/* mips16 directly accessible registers */
> +  M16F_REGS,			/* mips16 + frame */

Constraints are supposed to be operating on real registers, after
elimination, so it seems odd to include a fake register.  What went
wrong with just M16_REGS?

> +  SPILL_REGS,			/* All but $sp and call preserved regs are in here */
...
> +  { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* SPILL_REGS */      	\

These two don't seem to match.  I think literally it would be 0x0300fffc,
but maybe you had to make SPILL_REGS a superset of M16_REGs?

> +/* Add costs to hard registers based on frequency. This helps to negate
> +   some of the reduced cost associated with argument registers which 
> +   unfairly promotes their use and increases register pressure */
> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
> +  (TARGET_MIPS16 && optimize_size                                       \
> +   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)

So we would be trying to use, say, $4 for the first incoming argument
even after it had been spilled?  Hmm.

Since this change is aimed specifically at one heuristic, I wonder
whether we should parameterise that heuristic somehow rather than
try to use a general hook to undo it.  But I don't think there's
anything particularly special about MIPS16 here, so maybe it's too
eager for all targets.

> diff --git gcc/config/mips/mips.md gcc/config/mips/mips.md
> index 1e3e9e6..ababd5e 100644
> --- gcc/config/mips/mips.md
> +++ gcc/config/mips/mips.md
> @@ -1622,40 +1622,66 @@
>  ;; copy instructions.  Reload therefore thinks that the second alternative
>  ;; is two reloads more costly than the first.  We add "*?*?" to the first
>  ;; alternative as a counterweight.
> +;;
> +;; LRA simulates reload but the cost of reloading scratches is lower
> +;; than of the classic reload. For the time being, removing the counterweight
> +;; for LRA is more profitable.
>  (define_insn "*mul_acc_si"
> -  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
> -	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
> -			  (match_operand:SI 2 "register_operand" "d,d"))
> -		 (match_operand:SI 3 "register_operand" "0,d")))
> -   (clobber (match_scratch:SI 4 "=X,l"))
> -   (clobber (match_scratch:SI 5 "=X,&d"))]
> +  [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
> +	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
> +			  (match_operand:SI 2 "register_operand" "d,d,d"))
> +		 (match_operand:SI 3 "register_operand" "0,0,d")))
> +   (clobber (match_scratch:SI 4 "=X,X,l"))
> +   (clobber (match_scratch:SI 5 "=X,X,&d"))]
>    "GENERATE_MADD_MSUB && !TARGET_MIPS16"
>    "@
>      madd\t%1,%2
> +    madd\t%1,%2
>      #"
>    [(set_attr "type"	"imadd")
>     (set_attr "accum_in"	"3")
>     (set_attr "mode"	"SI")
> -   (set_attr "insn_count" "1,2")])
> +   (set_attr "insn_count" "1,1,2")
> +   (set (attr "enabled")
> +        (cond [(and (eq_attr "alternative" "0")
> +                    (match_test "TARGET_RELOAD"))
> +                  (const_string "yes")
> +               (and (eq_attr "alternative" "1")
> +                    (match_test "!TARGET_RELOAD"))
> +                  (const_string "yes")
> +               (eq_attr "alternative" "2")
> +                  (const_string "yes")]
> +              (const_string "no")))])

Looks good.  Glad to see the operand 0 hack going away. :-)

> @@ -2986,31 +3023,14 @@
>     (set_attr "mode" "<MODE>")])
>  
>  (define_insn "*and<mode>3_mips16"
> -  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d")
> -	(and:GPR (match_operand:GPR 1 "nonimmediate_operand" "%W,W,W,d,0")
> -		 (match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d")))]
> +  [(set (match_operand:GPR 0 "register_operand" "=d,d")
> +	(and:GPR (match_operand:GPR 1 "register_operand" "%d,0")
> +		 (match_operand:GPR 2 "and_operand" "Yw,d")))]
>    "TARGET_MIPS16 && and_operands_ok (<MODE>mode, operands[1], operands[2])"
> -{
> -  switch (which_alternative)
> -    {
> -    case 0:
> -      operands[1] = gen_lowpart (QImode, operands[1]);
> -      return "lbu\t%0,%1";
> -    case 1:
> -      operands[1] = gen_lowpart (HImode, operands[1]);
> -      return "lhu\t%0,%1";
> -    case 2:
> -      operands[1] = gen_lowpart (SImode, operands[1]);
> -      return "lwu\t%0,%1";
> -    case 3:
> -      return "#";
> -    case 4:
> -      return "and\t%0,%2";
> -    default:
> -      gcc_unreachable ();
> -    }
> -}
> -  [(set_attr "move_type" "load,load,load,shift_shift,logical")
> +  "@
> +   #
> +   and\t%0,%2"
> +  [(set_attr "move_type" "shift_shift,logical")
>     (set_attr "mode" "<MODE>")])

I think we want to keep the LWU case at the very least, since I assume
otherwise we'll load the full 64-bit spill slot and use a pair of shifts
to zero-extend it.  Reloading the stack address into a base register
should always be better than that.

I agree it's less clear for the byte and halfword cases.  All other
things -- including size -- being equal, reloading a stack address into
a base register and using an extending load should be better than
reloading the full spill slot and extending it, since the reloaded stack
address is more likely to be reused in other instructions.

The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
I think using LBU and LHU there was probably a win.  I can imagine
it making sense to disable LBU and LHU for MIPS16e though.

It might be better to do any changes to this pattern as a follow-on
patch, since I think LRA should cope with it as-is.

> @@ -4139,7 +4159,7 @@
>    [(set (match_operand:DI 0 "register_operand" "=d")
>  	(match_operand:DI 1 "absolute_symbolic_operand" ""))
>     (clobber (match_scratch:DI 2 "=&d"))]
> -  "TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
> +  "!TARGET_MIPS16 && TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0) (high:DI (match_dup 3)))

Minor nit, but please keep within 80 chars:

  "!TARGET_MIPS16
   && TARGET_EXPLICIT_RELOCS
   && ABI_HAS_64BIT_SYMBOLS
   && cse_not_expected"

> diff --git gcc/lra-constraints.c gcc/lra-constraints.c
> index ba4d489..ab85495 100644
> --- gcc/lra-constraints.c
> +++ gcc/lra-constraints.c
> @@ -2615,6 +2615,39 @@ valid_address_p (struct address_info *ad)
>    return ok_p;
>  }
>  
> +/* Make reload base reg from address AD.  */
> +static rtx
> +base_to_reg (struct address_info *ad)
> +{
> +  enum reg_class cl;
> +  int code = -1;
> +  rtx new_inner = NULL_RTX;
> +  rtx new_reg = NULL_RTX;
> +  rtx insn;
> +  rtx last_insn = get_last_insn();
> +
> +  lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
> +  cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
> +                       get_index_code (ad));
> +  new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
> +                                cl, "base");
> +  new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
> +                                   ad->disp_term == NULL
> +                                   ? gen_int_mode (0, ad->mode) 
> +                                   : *ad->disp_term);
> +  if (!valid_address_p (ad->mode, new_inner, ad->as))
> +    return NULL_RTX;
> +  insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
> +  code = recog_memoized (insn);
> +  if (code < 0)
> +    {
> +      delete_insns_since (last_insn);
> +      return NULL_RTX;
> +    }
> +  
> +  return new_inner;
> +}
> +
>  /* Make reload base reg + disp from address AD.  Return the new pseudo.  */
>  static rtx
>  base_plus_disp_to_reg (struct address_info *ad)
> @@ -2818,6 +2851,8 @@ process_address (int nop, rtx *before, rtx *after)
>  
>       3) the address is a frame address with an invalid offset.
>  
> +     4) the address is a frame address with an invalid base.
> +
>       All these cases involve a non-autoinc address, so there is no
>       point revalidating other types.  */
>    if (ad.autoinc_p || valid_address_p (&ad))
> @@ -2899,14 +2934,19 @@ process_address (int nop, rtx *before, rtx *after)
>        int regno;
>        enum reg_class cl;
>        rtx set, insns, last_insn;
> +      /* Try to reload base into register only if the base is invalid 
> +         for the address but with valid offset, case (4) above.  */
> +      start_sequence ();
> +      new_reg = base_to_reg (&ad);
> +
>        /* base + disp => new base, cases (1) and (3) above.  */
>        /* Another option would be to reload the displacement into an
>  	 index register.  However, postreload has code to optimize
>  	 address reloads that have the same base and different
>  	 displacements, so reloading into an index register would
>  	 not necessarily be a win.  */
> -      start_sequence ();
> -      new_reg = base_plus_disp_to_reg (&ad);
> +      if (new_reg == NULL_RTX)
> +        new_reg = base_plus_disp_to_reg (&ad);
>        insns = get_insns ();
>        last_insn = get_last_insn ();
>        /* If we generated at least two insns, try last insn source as

Obviously this is Vlad's call, but the idea looks good to me.
Speculatively calling lra_create_new_reg for cases where (4) doesn't
end up applying is probably too wasteful though.

> diff --git gcc/rtlanal.c gcc/rtlanal.c
> index 7a9efec..f699d17 100644
> --- gcc/rtlanal.c
> +++ gcc/rtlanal.c
> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
>      inner = strip_address_mutations (&XEXP (*inner, 0));
>    if (REG_P (*inner)
>        || MEM_P (*inner)
> -      || GET_CODE (*inner) == SUBREG)
> +      || GET_CODE (*inner) == SUBREG
> +      || CONSTANT_P (*inner))
>      return inner;
>    return 0;
>  }

This doesn't look right; the general form is base+index+displacement+segment,
with the constant term being treated as the displacement.

Thanks,
Richard
Jakub Jelinek March 29, 2014, 12:21 p.m. UTC | #2
On Sat, Mar 29, 2014 at 01:07:40AM +0000, Robert Suchanek wrote:
> This patch enables LRA by default for MIPS. The classic reload is still
> available and can be enabled via -mreload switch. 

FYI, all other targets that have LRA optionally selectable or deselectable
use -mno-lra for this (even when -mlra is the default), it would be better
for consistency not to invent new switch names for that.

	Jakub
Robert Suchanek April 9, 2014, 9:59 a.m. UTC | #3
> FYI, all other targets that have LRA optionally selectable or deselectable
> use -mno-lra for this (even when -mlra is the default), it would be better
> for consistency not to invent new switch names for that.

Agreed.

>> -    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>> +    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>>  
>>    return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
>>  }
> Not sure about this one.  We would need to update the comment that
> explains why "!strict_p" is there, but AFAIK reason (1) would still apply.
>
> Was this needed for correctness or because it gave better code?

"!strict_p" has been removed because of correctness issue. When LRA validates 
memory addresses pseudos are temporarily eliminated to hard registers (if possible)
but the target hook is always called as non-strict. This only affects MIPS16 instructions with
not directly accessible $sp. The strict variant, as I understand, was used in the reload
pass to indicate if a pseudo-register has been allocated a hard register. Unless LRA
should be setting the strict/non-strict depending on whether a temporal elimination
to hard reg was successful or there is something else that I missed? 

> Easier as:
>
>  if (TARGET_MIPS16
>      && TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
>    return 1;
>  return 0;

Indeed. 

>> +  M16F_REGS,			/* mips16 + frame */
>
> Constraints are supposed to be operating on real registers, after
> elimination, so it seems odd to include a fake register.  What went
> wrong with just M16_REGS?

Only the stack pointer has been added to M16_REGS. A number of patterns need to accept
it otherwise LRA inserts a lot of reloads and the code size goes up by about 10%. 
The change does have also a positive effect on reload but marginally.
"frame" meant to indicate inclusion of both the stack and hard frame pointers in the class
but perhaps I should name it differently to avoid confusion.

>> +  SPILL_REGS,			/* All but $sp and call preserved regs are in here */
>...
>> +  { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* SPILL_REGS */      	\
>
> These two don't seem to match.  I think literally it would be 0x0300fffc,
> but maybe you had to make SPILL_REGS a superset of M16_REGs?

I initially used 0x0300fffc but did some experiments and it turned out that 0x0003fffc (with $16, $17 regs)
gives slightly better code. I haven't updated the comment though. There is yet more to do 
and need to return to another thread with MIPS16 at some point as I found some limitations
of IRA/LRA to generate better code. $8-$15 are currently inaccessible as temporary storage
because these registers are marked as fixed (when optimizing for size) but leaving them
as fixed are better for the code size. I don't expect a big gain by using hard registers
for spilling but it more likely to improve the performance.

>> +/* Add costs to hard registers based on frequency. This helps to negate
>> +   some of the reduced cost associated with argument registers which 
>> +   unfairly promotes their use and increases register pressure */
>> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
>> +  (TARGET_MIPS16 && optimize_size                                       \
>> +   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
>
> So we would be trying to use, say, $4 for the first incoming argument
> even after it had been spilled?  Hmm.
>
> Since this change is aimed specifically at one heuristic, I wonder
> whether we should parameterise that heuristic somehow rather than
> try to use a general hook to undo it.  But I don't think there's
> anything particularly special about MIPS16 here, so maybe it's too
> eager for all targets.

In a number of cases argument registers appeared to be unfairly promoted
increasing the register pressure and increasing the number of reloads.
Bumping up the cost of using those registers encourages IRA to spill
into memory but this appears to help LRA to do a better allocation. Of course,
not always it is a win but generally the gain outweighs the loss. 

I've seen an codesize improvements for other optimization levels
but I'm unsure whether we should make this change generic.
This part of the patch is not crucial though and can be send separately. 

>>  (define_insn "*and<mode>3_mips16"
>> ...
>
> I think we want to keep the LWU case at the very least, since I assume
> otherwise we'll load the full 64-bit spill slot and use a pair of shifts
> to zero-extend it.  Reloading the stack address into a base register
> should always be better than that.
>
> I agree it's less clear for the byte and halfword cases.  All other
> things -- including size -- being equal, reloading a stack address into
> a base register and using an extending load should be better than
> reloading the full spill slot and extending it, since the reloaded stack
> address is more likely to be reused in other instructions.
>
> The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
> I think using LBU and LHU there was probably a win.  I can imagine
> it making sense to disable LBU and LHU for MIPS16e though.
>
> It might be better to do any changes to this pattern as a follow-on
> patch, since I think LRA should cope with it as-is.

Keeping LWU case should be fine. Alternatives were removed because
LRA was ICEing for the same reason of $sp not accessible for the byte 
and halfword cases. I tried to find a testcase to trigger LWU case 
but couldn't. I presume it must be a rare case? The problem with other
alternatives was that spilled pseudos were changed to memory without
checking if the use of $sp is valid. On the other hand, keeping LWU 
may only delay triggering the problem because the stack pointer 
should not be used. It could be a missing case in LRA too.

> Minor nit, but please keep within 80 chars:
>
>  "!TARGET_MIPS16
>   && TARGET_EXPLICIT_RELOCS
>   && ABI_HAS_64BIT_SYMBOLS
>   && cse_not_expected"

Overlooked it.

>> diff --git gcc/rtlanal.c gcc/rtlanal.c
>> index 7a9efec..f699d17 100644
>> --- gcc/rtlanal.c
>> +++ gcc/rtlanal.c
>> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
>>      inner = strip_address_mutations (&XEXP (*inner, 0));
>>    if (REG_P (*inner)
>>        || MEM_P (*inner)
>> -      || GET_CODE (*inner) == SUBREG)
>> +      || GET_CODE (*inner) == SUBREG
>> +      || CONSTANT_P (*inner))
>>      return inner;
>>    return 0;
>>  }
> 
> This doesn't look right; the general form is base+index+displacement+segment,
> with the constant term being treated as the displacement.

I'm not particularly happy with this either. This was an attempt to fix an ICE for 
the following RTL (gcc.dg/torture/asm-subreg-1.c compiled with -mips32r2 -mips16 -O1):

(insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
            (mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
                                (const_int 0 [0])
                            ] UNSPEC_GP))
                    (symbol_ref:SI ("_const_32") [flags 0x6]  <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
 [(asm_input:HI ("X") (null):0)]
 [] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))

Any suggestions to handle this case?

Regards,
Robert
Richard Sandiford April 9, 2014, 9:24 p.m. UTC | #4
Robert Suchanek <Robert.Suchanek@imgtec.com> writes:
>> FYI, all other targets that have LRA optionally selectable or deselectable
>> use -mno-lra for this (even when -mlra is the default), it would be better
>> for consistency not to invent new switch names for that.
>
> Agreed.
>
>>> -    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>>> +    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>>>  
>>>    return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
>>>  }
>> Not sure about this one.  We would need to update the comment that
>> explains why "!strict_p" is there, but AFAIK reason (1) would still apply.
>>
>> Was this needed for correctness or because it gave better code?
>
> "!strict_p" has been removed because of correctness issue. When LRA
> validates memory addresses pseudos are temporarily eliminated to hard
> registers (if possible) but the target hook is always called as
> non-strict. This only affects MIPS16 instructions with not directly
> accessible $sp. The strict variant, as I understand, was used in the
> reload pass to indicate if a pseudo-register has been allocated a hard
> register. Unless LRA should be setting the strict/non-strict depending
> on whether a temporal elimination to hard reg was successful or there
> is something else that I missed?

Hmm, OK, in that case I agree reason (2) doesn't apply.  That part was
always more of a consistency thing anyway, so I agree it's not worth
keeping around for reload.  I also had a look to see why
instantiate_virtual_regs_in_insn didn't complain about cases like:

  struct s { unsigned char c; };
  void foo (int, int, int, int, struct s);
  void bar (struct s *ptr) { foo (1, 2, 3, 4, *ptr); }

and I think it's because of the later:

2008-02-14  Michael Matz  <matz@suse.de>

	PR target/34930
	* function.c (instantiate_virtual_regs_in_insn): Reload address
	before falling back to reloading the whole operand.

which correctly reloads the address if necessary.

So yeah, I agree this is right after all, sorry.  Let's delete the
comment starting at "There are two problems here:" at the same time.

>>> +  M16F_REGS,			/* mips16 + frame */
>>
>> Constraints are supposed to be operating on real registers, after
>> elimination, so it seems odd to include a fake register.  What went
>> wrong with just M16_REGS?
>
> Only the stack pointer has been added to M16_REGS.

Sorry, I'd read "frame" as meaning "$frame", the soft frame pointer.
I agree M16_REGS + $sp is OK.

mips_regno_to_class should then map $sp to the new class, since it's now
the smallest containing class.  (We really should set that up automatically
one day...)

> A number of patterns need to accept it otherwise LRA inserts a lot of
> reloads and the code size goes up by about 10%.  The change does have
> also a positive effect on reload but marginally.  "frame" meant to
> indicate inclusion of both the stack and hard frame pointers in the
> class but perhaps I should name it differently to avoid confusion.

How about M16_SP_REGS, to match M16_T_REGS?

Also, the BASE_REG_CLASS/ADDR_REG_CLASS distinction isn't all that
obvious from the names.  ADDR_REG_CLASS is only needed for the "d"
constraint so maybe we could just use TARGET_MIPS16 ? M16_REGS : GR_REGS
directly for now.

>>> + SPILL_REGS, /* All but $sp and call preserved regs are in here */
>>...
>>> + { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>> 0x00000000 }, /* SPILL_REGS */ \
>>
>> These two don't seem to match.  I think literally it would be 0x0300fffc,
>> but maybe you had to make SPILL_REGS a superset of M16_REGs?
>
> I initially used 0x0300fffc but did some experiments and it turned out
> that 0x0003fffc (with $16, $17 regs) gives slightly better code. I
> haven't updated the comment though.

I can imagine including all M16_REGS makes sense, but it seems odd to
drop the 2 temporaries.  Does 0x0303fffc have the same problem?

> There is yet more to do and need to return to another thread with
> MIPS16 at some point as I found some limitations of IRA/LRA to
> generate better code. $8-$15 are currently inaccessible as temporary
> storage because these registers are marked as fixed (when optimizing
> for size) but leaving them as fixed are better for the code size. I
> don't expect a big gain by using hard registers for spilling but it
> more likely to improve the performance.

Hmm, marking them fixed was supposed to be a temporary reload-only thing,
until the move to LRA.  It should never be worse to spill to these GPRs
over spilling to the stack, if the value isn't live across a call.

But that certainly doesn't need to be part of the initial patch.

>>> +/* Add costs to hard registers based on frequency. This helps to negate
>>> +   some of the reduced cost associated with argument registers which 
>>> +   unfairly promotes their use and increases register pressure */
>>> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
>>> +  (TARGET_MIPS16 && optimize_size                                       \
>>> +   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
>>
>> So we would be trying to use, say, $4 for the first incoming argument
>> even after it had been spilled?  Hmm.
>>
>> Since this change is aimed specifically at one heuristic, I wonder
>> whether we should parameterise that heuristic somehow rather than
>> try to use a general hook to undo it.  But I don't think there's
>> anything particularly special about MIPS16 here, so maybe it's too
>> eager for all targets.
>
> In a number of cases argument registers appeared to be unfairly promoted
> increasing the register pressure and increasing the number of reloads.
> Bumping up the cost of using those registers encourages IRA to spill
> into memory but this appears to help LRA to do a better allocation. Of course,
> not always it is a win but generally the gain outweighs the loss. 
>
> I've seen an codesize improvements for other optimization levels
> but I'm unsure whether we should make this change generic.
> This part of the patch is not crucial though and can be send separately. 

OK, thanks, doing it separately sounds better.

>>>  (define_insn "*and<mode>3_mips16"
>>> ...
>>
>> I think we want to keep the LWU case at the very least, since I assume
>> otherwise we'll load the full 64-bit spill slot and use a pair of shifts
>> to zero-extend it.  Reloading the stack address into a base register
>> should always be better than that.
>>
>> I agree it's less clear for the byte and halfword cases.  All other
>> things -- including size -- being equal, reloading a stack address into
>> a base register and using an extending load should be better than
>> reloading the full spill slot and extending it, since the reloaded stack
>> address is more likely to be reused in other instructions.
>>
>> The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
>> I think using LBU and LHU there was probably a win.  I can imagine
>> it making sense to disable LBU and LHU for MIPS16e though.
>>
>> It might be better to do any changes to this pattern as a follow-on
>> patch, since I think LRA should cope with it as-is.
>
> Keeping LWU case should be fine. Alternatives were removed because
> LRA was ICEing for the same reason of $sp not accessible for the byte 
> and halfword cases. I tried to find a testcase to trigger LWU case 
> but couldn't. I presume it must be a rare case? The problem with other
> alternatives was that spilled pseudos were changed to memory without
> checking if the use of $sp is valid. On the other hand, keeping LWU 
> may only delay triggering the problem because the stack pointer 
> should not be used. It could be a missing case in LRA too.

Did you see the failures even after your mips_regno_mode_ok_for_base_p
change?  LRA should know how to reload a "W" address.

>>> diff --git gcc/rtlanal.c gcc/rtlanal.c
>>> index 7a9efec..f699d17 100644
>>> --- gcc/rtlanal.c
>>> +++ gcc/rtlanal.c
>>> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
>>>      inner = strip_address_mutations (&XEXP (*inner, 0));
>>>    if (REG_P (*inner)
>>>        || MEM_P (*inner)
>>> -      || GET_CODE (*inner) == SUBREG)
>>> +      || GET_CODE (*inner) == SUBREG
>>> +      || CONSTANT_P (*inner))
>>>      return inner;
>>>    return 0;
>>>  }
>> 
>> This doesn't look right; the general form is base+index+displacement+segment,
>> with the constant term being treated as the displacement.
>
> I'm not particularly happy with this either. This was an attempt to fix an ICE for 
> the following RTL (gcc.dg/torture/asm-subreg-1.c compiled with -mips32r2 -mips16 -O1):
>
> (insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
>             (mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
>                                 (const_int 0 [0])
>                             ] UNSPEC_GP))
>                     (symbol_ref:SI ("_const_32") [flags 0x6]  <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
>  [(asm_input:HI ("X") (null):0)]
>  [] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))
>
> Any suggestions to handle this case?

Thanks for the pointer.  I think this shows a more fundamental problem
with the handling of "X" constraints.  With something like:

void
foo (int **x, int y, int z)
{
  int *ptr = *x + y * z / 11;
  __asm__ __volatile__ ("" : : "X" (*ptr));
}

the entire expression gets treated as a MEM address, which neither
reload nor LRA can handle.  And with something like that, it isn't
obvious what class all the registers in the address should have.
With a sufficiently-complicated expression you could run out of registers.

So perhaps we should limit the propagation to things that
decompose_mem_address can handle.

Thanks,
Richard
Robert Suchanek April 14, 2014, 11:12 a.m. UTC | #5
> So yeah, I agree this is right after all, sorry.  Let's delete the
> comment starting at "There are two problems here:" at the same time.

Ok.

> mips_regno_to_class should then map $sp to the new class, since it's now
> the smallest containing class.  (We really should set that up automatically
> one day...)

Indeed, it is easy to overlook it.

> How about M16_SP_REGS, to match M16_T_REGS?
> 
> Also, the BASE_REG_CLASS/ADDR_REG_CLASS distinction isn't all that
> obvious from the names.  ADDR_REG_CLASS is only needed for the "d"
> constraint so maybe we could just use TARGET_MIPS16 ? M16_REGS : GR_REGS
> directly for now.

Fair enough. I'll remove ADDR_REG_CLASS.

> I can imagine including all M16_REGS makes sense, but it seems odd to
> drop the 2 temporaries.  Does 0x0303fffc have the same problem?

It's a midway between 0x0003fffc and 0x0300fffc. I think 0x0303fffc will be
a good trade-off. The difference between 0x0303fffc and others is about 
~600 bytes in CSiBE which is less than 0.01% of code size change.

> Hmm, marking them fixed was supposed to be a temporary reload-only thing,
> until the move to LRA.  It should never be worse to spill to these GPRs
> over spilling to the stack, if the value isn't live across a call.

I would say this also affects IRA/LRA integration. I found that it is more
profitable to hide registers (MIPS16 only) in IRA to encourage spilling to
memory. Otherwise $8-$15 would be treated like any other registers and LRA
would inserts reloads to move in/out values of these registers. My assumption is
that if we could hide some of the registers in IRA but enable them in LRA
then all registers in SPILL_REGS would be available keeping reasonable code
size. Another way would be to increase the cost of moving values between 
M16_REGS and GR_REGS but it was already mentioned, and is true that there 
should be no difference of costs and it feels like a hack to make things work.

> Did you see the failures even after your mips_regno_mode_ok_for_base_p
> change?  LRA should know how to reload a "W" address.

Yes but I realize there is more. It fails because $sp is now included
in BASE_REG_CLASS and "W" is based on it. However, I suppose that 
it would be too eager to say it is wrong and likely there is something missing 
in LRA if we want to keep all alternatives. Currently there is no check
if a reloaded operand has a valid address, use of $sp in lbu/lhu cases.
Even if we added extra checks we are less likely to benefit as we need
to reload the base into register.

> Even that might be too loose, since invalid scales will need to be reloaded
> as a multiplication or shift, and there's no guarantee that the target
> can do that without clobbering the flags.  So maybe we should do something
> like the patch below.
>
> Alternatively we could stick to the decompose_mem_address-based check
> above and teach LRA to keep invalid addresses for 'X'.  The problem then
> is that we might ICE while printing the operand.

Tightening validity for 'X' appears to be reasonable. Will you commit this patch?

Regards,
Robert
Richard Sandiford April 15, 2014, 9:04 p.m. UTC | #6
Robert Suchanek <Robert.Suchanek@imgtec.com> writes:
>> Hmm, marking them fixed was supposed to be a temporary reload-only thing,
>> until the move to LRA.  It should never be worse to spill to these GPRs
>> over spilling to the stack, if the value isn't live across a call.
>
> I would say this also affects IRA/LRA integration. I found that it is more
> profitable to hide registers (MIPS16 only) in IRA to encourage spilling to
> memory. Otherwise $8-$15 would be treated like any other registers and LRA
> would inserts reloads to move in/out values of these registers. My assumption is
> that if we could hide some of the registers in IRA but enable them in LRA
> then all registers in SPILL_REGS would be available keeping reasonable code
> size. Another way would be to increase the cost of moving values between 
> M16_REGS and GR_REGS but it was already mentioned, and is true that there 
> should be no difference of costs and it feels like a hack to make things work.

OK.

This definitely sounds like it ought to be made to work, with some mixture
of target and generic changes.  But if it doesn't work out of the box
then let's leave that for future work.

>> Did you see the failures even after your mips_regno_mode_ok_for_base_p
>> change?  LRA should know how to reload a "W" address.
>
> Yes but I realize there is more. It fails because $sp is now included
> in BASE_REG_CLASS and "W" is based on it. However, I suppose that 
> it would be too eager to say it is wrong and likely there is something missing 
> in LRA if we want to keep all alternatives. Currently there is no check
> if a reloaded operand has a valid address, use of $sp in lbu/lhu cases.
> Even if we added extra checks we are less likely to benefit as we need
> to reload the base into register.

Not sure what you mean, sorry.  "W" exists specifically to exclude
$sp-based and $pc-based addresses.  LRA AFAIK should already be able
to reload addresses that are valid in the TARGET_LEGITIMATE_ADDRESS_P
sense but which do not match the constraints for a particular insn.

Can you remember one of the tests that fails?

>> Even that might be too loose, since invalid scales will need to be reloaded
>> as a multiplication or shift, and there's no guarantee that the target
>> can do that without clobbering the flags.  So maybe we should do something
>> like the patch below.
>>
>> Alternatively we could stick to the decompose_mem_address-based check
>> above and teach LRA to keep invalid addresses for 'X'.  The problem then
>> is that we might ICE while printing the operand.
>
> Tightening validity for 'X' appears to be reasonable. Will you commit
> this patch?

OK, just submitted separately.

Thanks,
Richard
Robert Suchanek April 16, 2014, 9:06 p.m. UTC | #7
> >> Did you see the failures even after your mips_regno_mode_ok_for_base_p
> >> change?  LRA should know how to reload a "W" address.
> >
> > Yes but I realize there is more. It fails because $sp is now included
> > in BASE_REG_CLASS and "W" is based on it. However, I suppose that 
> > it would be too eager to say it is wrong and likely there is something missing 
> > in LRA if we want to keep all alternatives. Currently there is no check
> > if a reloaded operand has a valid address, use of $sp in lbu/lhu cases.
> > Even if we added extra checks we are less likely to benefit as we need
> > to reload the base into register.
>
> Not sure what you mean, sorry.  "W" exists specifically to exclude
> $sp-based and $pc-based addresses.  LRA AFAIK should already be able
> to reload addresses that are valid in the TARGET_LEGITIMATE_ADDRESS_P
> sense but which do not match the constraints for a particular insn.
>
> Can you remember one of the tests that fails?

I couldn't trigger the problem with the original testcase but found another
one that reveals it. The following needs to compiled with -mips32r2 -mips16 -Os:

struct { int addr; } c;
struct command { int args[1]; };
unsigned short a;

fn1 (struct command *p1)
{
    unsigned short d;
    d = fn2 ();
    a = p1->args[0];
    fn3 (a);
    if (c.addr)
    {
        fn4 (p1->args[0]);
        return;
    }
    (&c)->addr = fn5 ();
    fn6 (d);
}

Not sure how the constraint would/should exclude $sp-based address in LRA.
In this particular case, a spilled pseudo is changed to memory giving the following RTL form:

(insn 30 29 31 4 (set (reg:SI 4 $4)
        (and:SI (mem/c:SI (plus:SI (reg/f:SI 78 $frame)
                    (const_int 16 [0x10])) [7 %sfp+16 S4 A32])
            (const_int 65535 [0xffff]))) shell.i:17 161 {*andsi3_mips16}
     (expr_list:REG_DEAD (reg:SI 194 [ D.1469 ])
        (nil)))

The operand 1 during alternative selection is not marked as a bad operand as it is a memory
operand. $frame appears to be fine as it could be eliminated later to hard register. No reloads
are inserted for the instructions concerned. Unless, $frame should be temporarily eliminated
and then a reload would be inserted?

Regards,
Robert
diff mbox

Patch

diff --git gcc/config/mips/constraints.md gcc/config/mips/constraints.md
index 49e4895..3810ac3 100644
--- gcc/config/mips/constraints.md
+++ gcc/config/mips/constraints.md
@@ -19,7 +19,7 @@ 
 
 ;; Register constraints
 
-(define_register_constraint "d" "BASE_REG_CLASS"
+(define_register_constraint "d" "ADDR_REG_CLASS"
   "An address register.  This is equivalent to @code{r} unless
    generating MIPS16 code.")
 
diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
index 143169b..f27a801 100644
--- gcc/config/mips/mips.c
+++ gcc/config/mips/mips.c
@@ -2255,7 +2255,7 @@  mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode,
      All in all, it seems more consistent to only enforce this restriction
      during and after reload.  */
   if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
-    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
+    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
 
   return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
 }
@@ -12114,6 +12114,32 @@  mips_register_move_cost (enum machine_mode mode,
   return 0;
 }
 
+/* Return a register priority for hard reg REGNO.  */
+
+static int
+mips_register_priority (int hard_regno)
+{
+  if (TARGET_MIPS16)
+   {
+     /* Treat MIPS16 registers with higher priority than other regs.  */
+     switch (hard_regno)
+       {
+       case 2:
+       case 3:
+       case 4:
+       case 5:
+       case 6:
+       case 7:
+       case 16:
+       case 17:
+         return 1;
+       default:
+         return 0;
+       }
+   }
+  return 0;
+}
+
 /* Implement TARGET_MEMORY_MOVE_COST.  */
 
 static int
@@ -18897,6 +18923,22 @@  mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
   *update = build2 (COMPOUND_EXPR, void_type_node, *update,
 		    atomic_feraiseexcept_call);
 }
+
+static reg_class_t
+mips_spill_class (reg_class_t rclass ATTRIBUTE_UNUSED, 
+		  enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  if (TARGET_MIPS16)
+    return SPILL_REGS;
+  return NO_REGS;
+}
+
+static bool
+mips_lra_p (void)
+{
+  return !TARGET_RELOAD;
+}
+
 

 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -18960,6 +19002,8 @@  mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #define TARGET_VALID_POINTER_MODE mips_valid_pointer_mode
 #undef TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST mips_register_move_cost
+#undef TARGET_REGISTER_PRIORITY
+#define TARGET_REGISTER_PRIORITY mips_register_priority
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST mips_memory_move_cost
 #undef TARGET_RTX_COSTS
@@ -19134,6 +19178,11 @@  mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV mips_atomic_assign_expand_fenv
 
+#undef TARGET_SPILL_CLASS
+#define TARGET_SPILL_CLASS mips_spill_class
+#undef TARGET_LRA_P
+#define TARGET_LRA_P mips_lra_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 

 #include "gt-mips.h"
diff --git gcc/config/mips/mips.h gcc/config/mips/mips.h
index a786d4c..712008f 100644
--- gcc/config/mips/mips.h
+++ gcc/config/mips/mips.h
@@ -1871,10 +1871,12 @@  enum reg_class
 {
   NO_REGS,			/* no registers in set */
   M16_REGS,			/* mips16 directly accessible registers */
+  M16F_REGS,			/* mips16 + frame */
   T_REG,			/* mips16 T register ($24) */
   M16_T_REGS,			/* mips16 registers plus T register */
   PIC_FN_ADDR_REG,		/* SVR4 PIC function address register */
   V1_REG,			/* Register $v1 ($3) used for TLS access.  */
+  SPILL_REGS,			/* All but $sp and call preserved regs are in here */
   LEA_REGS,			/* Every GPR except $25 */
   GR_REGS,			/* integer registers */
   FP_REGS,			/* floating point registers */
@@ -1908,10 +1910,12 @@  enum reg_class
 {									\
   "NO_REGS",								\
   "M16_REGS",								\
+  "M16F_REGS",								\
   "T_REG",								\
   "M16_T_REGS",								\
   "PIC_FN_ADDR_REG",							\
   "V1_REG",								\
+  "SPILL_REGS",								\
   "LEA_REGS",								\
   "GR_REGS",								\
   "FP_REGS",								\
@@ -1948,10 +1952,12 @@  enum reg_class
 {									                                \
   { 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* NO_REGS */		\
   { 0x000300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* M16_REGS */		\
+  { 0x200300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* M16F_REGS */		\
   { 0x01000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* T_REG */		\
   { 0x010300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* M16_T_REGS */	\
   { 0x02000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* PIC_FN_ADDR_REG */	\
   { 0x00000008, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* V1_REG */		\
+  { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* SPILL_REGS */      	\
   { 0xfdffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* LEA_REGS */		\
   { 0xffffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* GR_REGS */		\
   { 0x00000000, 0xffffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* FP_REGS */		\
@@ -1984,7 +1990,9 @@  enum reg_class
    valid base register must belong.  A base register is one used in
    an address which is the register value plus a displacement.  */
 
-#define BASE_REG_CLASS  (TARGET_MIPS16 ? M16_REGS : GR_REGS)
+#define BASE_REG_CLASS  (TARGET_MIPS16 ? M16F_REGS : GR_REGS)
+
+#define ADDR_REG_CLASS  (TARGET_MIPS16 ? M16_REGS : GR_REGS)
 
 /* A macro whose definition is the name of the class to which a
    valid index register must belong.  An index register is one used
@@ -1994,6 +2002,13 @@  enum reg_class
 
 #define INDEX_REG_CLASS NO_REGS
 
+/* Add costs to hard registers based on frequency. This helps to negate
+   some of the reduced cost associated with argument registers which 
+   unfairly promotes their use and increases register pressure */
+#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
+  (TARGET_MIPS16 && optimize_size                                       \
+   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
+
 /* We generally want to put call-clobbered registers ahead of
    call-saved ones.  (IRA expects this.)  */
 
diff --git gcc/config/mips/mips.md gcc/config/mips/mips.md
index 1e3e9e6..ababd5e 100644
--- gcc/config/mips/mips.md
+++ gcc/config/mips/mips.md
@@ -1622,40 +1622,66 @@ 
 ;; copy instructions.  Reload therefore thinks that the second alternative
 ;; is two reloads more costly than the first.  We add "*?*?" to the first
 ;; alternative as a counterweight.
+;;
+;; LRA simulates reload but the cost of reloading scratches is lower
+;; than of the classic reload. For the time being, removing the counterweight
+;; for LRA is more profitable.
 (define_insn "*mul_acc_si"
-  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
-	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
-			  (match_operand:SI 2 "register_operand" "d,d"))
-		 (match_operand:SI 3 "register_operand" "0,d")))
-   (clobber (match_scratch:SI 4 "=X,l"))
-   (clobber (match_scratch:SI 5 "=X,&d"))]
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
+	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
+			  (match_operand:SI 2 "register_operand" "d,d,d"))
+		 (match_operand:SI 3 "register_operand" "0,0,d")))
+   (clobber (match_scratch:SI 4 "=X,X,l"))
+   (clobber (match_scratch:SI 5 "=X,X,&d"))]
   "GENERATE_MADD_MSUB && !TARGET_MIPS16"
   "@
     madd\t%1,%2
+    madd\t%1,%2
     #"
   [(set_attr "type"	"imadd")
    (set_attr "accum_in"	"3")
    (set_attr "mode"	"SI")
-   (set_attr "insn_count" "1,2")])
+   (set_attr "insn_count" "1,1,2")
+   (set (attr "enabled")
+        (cond [(and (eq_attr "alternative" "0")
+                    (match_test "TARGET_RELOAD"))
+                  (const_string "yes")
+               (and (eq_attr "alternative" "1")
+                    (match_test "!TARGET_RELOAD"))
+                  (const_string "yes")
+               (eq_attr "alternative" "2")
+                  (const_string "yes")]
+              (const_string "no")))])
 
 ;; The same idea applies here.  The middle alternative needs one less
 ;; clobber than the final alternative, so we add "*?" as a counterweight.
 (define_insn "*mul_acc_si_r3900"
-  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d*?,d?")
-	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
-			  (match_operand:SI 2 "register_operand" "d,d,d"))
-		 (match_operand:SI 3 "register_operand" "0,l,d")))
-   (clobber (match_scratch:SI 4 "=X,3,l"))
-   (clobber (match_scratch:SI 5 "=X,X,&d"))]
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d*?,d?")
+	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d,d")
+			  (match_operand:SI 2 "register_operand" "d,d,d,d"))
+		 (match_operand:SI 3 "register_operand" "0,0,l,d")))
+   (clobber (match_scratch:SI 4 "=X,X,3,l"))
+   (clobber (match_scratch:SI 5 "=X,X,X,&d"))]
   "TARGET_MIPS3900 && !TARGET_MIPS16"
   "@
     madd\t%1,%2
+    madd\t%1,%2
     madd\t%0,%1,%2
     #"
   [(set_attr "type"	"imadd")
    (set_attr "accum_in"	"3")
    (set_attr "mode"	"SI")
-   (set_attr "insn_count" "1,1,2")])
+   (set_attr "insn_count" "1,1,1,2")
+   (set (attr "enabled")
+        (cond [(and (eq_attr "alternative" "0")
+                    (match_test "TARGET_RELOAD"))
+                  (const_string "yes")
+               (and (eq_attr "alternative" "1")
+                    (match_test "!TARGET_RELOAD"))
+                  (const_string "yes")
+               (eq_attr "alternative" "2,3")
+                  (const_string "yes")]
+              (const_string "no")))])
 
 ;; Split *mul_acc_si if both the source and destination accumulator
 ;; values are GPRs.
@@ -1859,20 +1885,31 @@ 
 
 ;; See the comment above *mul_add_si for details.
 (define_insn "*mul_sub_si"
-  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
-        (minus:SI (match_operand:SI 1 "register_operand" "0,d")
-                  (mult:SI (match_operand:SI 2 "register_operand" "d,d")
-                           (match_operand:SI 3 "register_operand" "d,d"))))
-   (clobber (match_scratch:SI 4 "=X,l"))
-   (clobber (match_scratch:SI 5 "=X,&d"))]
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
+        (minus:SI (match_operand:SI 1 "register_operand" "0,0,d")
+                  (mult:SI (match_operand:SI 2 "register_operand" "d,d,d")
+                           (match_operand:SI 3 "register_operand" "d,d,d"))))
+   (clobber (match_scratch:SI 4 "=X,X,l"))
+   (clobber (match_scratch:SI 5 "=X,X,&d"))]
   "GENERATE_MADD_MSUB"
   "@
    msub\t%2,%3
+   msub\t%2,%3
    #"
   [(set_attr "type"     "imadd")
    (set_attr "accum_in"	"1")
    (set_attr "mode"     "SI")
-   (set_attr "insn_count" "1,2")])
+   (set_attr "insn_count" "1,1,2")
+   (set (attr "enabled")
+        (cond [(and (eq_attr "alternative" "0")
+                    (match_test "TARGET_RELOAD"))
+                  (const_string "yes")
+               (and (eq_attr "alternative" "1")
+                    (match_test "!TARGET_RELOAD"))
+                  (const_string "yes")
+               (eq_attr "alternative" "2")
+                  (const_string "yes")]
+              (const_string "no")))])
 
 ;; Split *mul_sub_si if both the source and destination accumulator
 ;; values are GPRs.
@@ -2986,31 +3023,14 @@ 
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*and<mode>3_mips16"
-  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d")
-	(and:GPR (match_operand:GPR 1 "nonimmediate_operand" "%W,W,W,d,0")
-		 (match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d")))]
+  [(set (match_operand:GPR 0 "register_operand" "=d,d")
+	(and:GPR (match_operand:GPR 1 "register_operand" "%d,0")
+		 (match_operand:GPR 2 "and_operand" "Yw,d")))]
   "TARGET_MIPS16 && and_operands_ok (<MODE>mode, operands[1], operands[2])"
-{
-  switch (which_alternative)
-    {
-    case 0:
-      operands[1] = gen_lowpart (QImode, operands[1]);
-      return "lbu\t%0,%1";
-    case 1:
-      operands[1] = gen_lowpart (HImode, operands[1]);
-      return "lhu\t%0,%1";
-    case 2:
-      operands[1] = gen_lowpart (SImode, operands[1]);
-      return "lwu\t%0,%1";
-    case 3:
-      return "#";
-    case 4:
-      return "and\t%0,%2";
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "move_type" "load,load,load,shift_shift,logical")
+  "@
+   #
+   and\t%0,%2"
+  [(set_attr "move_type" "shift_shift,logical")
    (set_attr "mode" "<MODE>")])
 
 (define_expand "ior<mode>3"
@@ -4139,7 +4159,7 @@ 
   [(set (match_operand:DI 0 "register_operand" "=d")
 	(match_operand:DI 1 "absolute_symbolic_operand" ""))
    (clobber (match_scratch:DI 2 "=&d"))]
-  "TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
+  "!TARGET_MIPS16 && TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (high:DI (match_dup 3)))
diff --git gcc/config/mips/mips.opt gcc/config/mips/mips.opt
index 6ee5398..a044031 100644
--- gcc/config/mips/mips.opt
+++ gcc/config/mips/mips.opt
@@ -380,6 +380,10 @@  msynci
 Target Report Mask(SYNCI)
 Use synci instruction to invalidate i-cache
 
+mreload
+Target Report Var(TARGET_RELOAD)
+Use reload instead of lra
+
 mtune=
 Target RejectNegative Joined Var(mips_tune_option) ToLower Enum(mips_arch_opt_value)
 -mtune=PROCESSOR	Optimize the output for PROCESSOR
diff --git gcc/lra-constraints.c gcc/lra-constraints.c
index ba4d489..ab85495 100644
--- gcc/lra-constraints.c
+++ gcc/lra-constraints.c
@@ -2615,6 +2615,39 @@  valid_address_p (struct address_info *ad)
   return ok_p;
 }
 
+/* Make reload base reg from address AD.  */
+static rtx
+base_to_reg (struct address_info *ad)
+{
+  enum reg_class cl;
+  int code = -1;
+  rtx new_inner = NULL_RTX;
+  rtx new_reg = NULL_RTX;
+  rtx insn;
+  rtx last_insn = get_last_insn();
+
+  lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
+  cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
+                       get_index_code (ad));
+  new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
+                                cl, "base");
+  new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
+                                   ad->disp_term == NULL
+                                   ? gen_int_mode (0, ad->mode) 
+                                   : *ad->disp_term);
+  if (!valid_address_p (ad->mode, new_inner, ad->as))
+    return NULL_RTX;
+  insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
+  code = recog_memoized (insn);
+  if (code < 0)
+    {
+      delete_insns_since (last_insn);
+      return NULL_RTX;
+    }
+  
+  return new_inner;
+}
+
 /* Make reload base reg + disp from address AD.  Return the new pseudo.  */
 static rtx
 base_plus_disp_to_reg (struct address_info *ad)
@@ -2818,6 +2851,8 @@  process_address (int nop, rtx *before, rtx *after)
 
      3) the address is a frame address with an invalid offset.
 
+     4) the address is a frame address with an invalid base.
+
      All these cases involve a non-autoinc address, so there is no
      point revalidating other types.  */
   if (ad.autoinc_p || valid_address_p (&ad))
@@ -2899,14 +2934,19 @@  process_address (int nop, rtx *before, rtx *after)
       int regno;
       enum reg_class cl;
       rtx set, insns, last_insn;
+      /* Try to reload base into register only if the base is invalid 
+         for the address but with valid offset, case (4) above.  */
+      start_sequence ();
+      new_reg = base_to_reg (&ad);
+
       /* base + disp => new base, cases (1) and (3) above.  */
       /* Another option would be to reload the displacement into an
 	 index register.  However, postreload has code to optimize
 	 address reloads that have the same base and different
 	 displacements, so reloading into an index register would
 	 not necessarily be a win.  */
-      start_sequence ();
-      new_reg = base_plus_disp_to_reg (&ad);
+      if (new_reg == NULL_RTX)
+        new_reg = base_plus_disp_to_reg (&ad);
       insns = get_insns ();
       last_insn = get_last_insn ();
       /* If we generated at least two insns, try last insn source as
diff --git gcc/rtlanal.c gcc/rtlanal.c
index 7a9efec..f699d17 100644
--- gcc/rtlanal.c
+++ gcc/rtlanal.c
@@ -5577,7 +5577,8 @@  get_base_term (rtx *inner)
     inner = strip_address_mutations (&XEXP (*inner, 0));
   if (REG_P (*inner)
       || MEM_P (*inner)
-      || GET_CODE (*inner) == SUBREG)
+      || GET_CODE (*inner) == SUBREG
+      || CONSTANT_P (*inner))
     return inner;
   return 0;
 }