Patchwork [RFC,4.9] Reload alternative weighting vs. wrong class regs

login
register
mail settings
Submitter Alan Modra
Date Feb. 7, 2013, 3:09 a.m.
Message ID <20130207030946.GV5023@bubble.grove.modra.org>
Download mbox | patch
Permalink /patch/218813/
State New
Headers show

Comments

Alan Modra - Feb. 7, 2013, 3:09 a.m.
After fixing PR54009 (again), I thought I'd take a look at why reload
is generating the following correct but poor code

 stw 10,8(1)
 stw 11 12(1)
 ...
 lfd 0,8(1)
 stfd 0,x+32764@l(9)

rather than

 addi 9,x+32764@l(9)
 ...
 stw 10,0(9)
 stw 11 4(9)

This code sequence is from

(set (mem/c:DF (lo_sum:SI (reg/f:SI 9)
                          (const:SI (plus:SI (symbol_ref:SI ("x"))
                                             (const_int 32764)))))
     (reg:DF 10) ...gcc.target/powerpc/pr54009.c:42 363 {*movdf_hardfloat32})

In tracing through reload, I see a score of 8 for the m<-d
alternative, and 9 for Y<-r.  In both cases we have one "loser"
operand for a score of 6 ("d" in the first case, "Y" in the second),
plus a score of 2 from

		  /* We prefer to reload pseudos over reloading other things,
		     since such reloads may be able to be eliminated later.
		     If we are reloading a SCRATCH, we won't be generating any
		     insns, just using a register, so it is also preferred.
		     So bump REJECT in other cases.  Don't do this in the
		     case where we are forcing a constant into memory and
		     it will then win since we don't want to have a different
		     alternative match then.  */
		  if (! (REG_P (operand)
			 && REGNO (operand) >= FIRST_PSEUDO_REGISTER)
		      && GET_CODE (operand) != SCRATCH
		      && ! (const_to_mem && constmemok))
		    reject += 2;

The Y<-r alternative gets one extra from

		  /* Input reloads can be inherited more often than output
		     reloads can be removed, so penalize output reloads.  */
		  if (operand_type[i] != RELOAD_FOR_INPUT
		      && GET_CODE (operand) != SCRATCH)
		    reject++;

The problem of course is that the input reload is quite expensive,
involving a copy to memory.  So, how about teaching reload about this
as follows?  I picked 6 for the reject value to make it equivalent to
a '?' in the constraint, but that may be too large.  Any value of 2 or
greater works for the testcase.

Bootstrapped and regression tested powerpc64-linux, but not yet spec
tested.

2013-02-07  Alan Modra  <amodra@gmail.com>

	* reload.c (find_reloads): Disparage reg alternatives needing
	secondary memory to reload.
Steven Bosscher - Feb. 7, 2013, 10:01 p.m.
On Thu, Feb 7, 2013 at 4:09 AM, Alan Modra wrote:
> After fixing PR54009 (again), I thought I'd take a look at why reload
> is generating the following correct but poor code
>
>  stw 10,8(1)
>  stw 11 12(1)
>  ...
>  lfd 0,8(1)
>  stfd 0,x+32764@l(9)
>
> rather than
>
>  addi 9,x+32764@l(9)
>  ...
>  stw 10,0(9)
>  stw 11 4(9)

FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c):

r:                              r:
        stwu 1,-160(1)                  stwu 1,-160(1)
        lis 9,x+32764@ha                lis 9,x+32764@ha
        la 9,x+32764@l(9)     |         la 8,x+32764@l(9)
        lwz 10,0(9)           <
        lwz 11,4(9)           <
        lis 9,y@ha                      lis 9,y@ha
                              >         lwz 10,0(8)
                              >         lwz 11,4(8)
        stfd ...
        ...
        lfd ...
        ...
        stw 10,y@l(9)                   stw 10,y@l(9)
        stw 11,y+4@l(9)                 stw 11,y+4@l(9)
        addi 1,1,160                    addi 1,1,160
        blr                             blr

w:                              w:
        stwu 1,-160(1)                  stwu 1,-160(1)
        lis 9,y@ha                      lis 9,y@ha
        la 10,y@l(9)                    la 10,y@l(9)
                              >         lis 9,x+32764@ha
        lwz 11,4(10)                    lwz 11,4(10)
        lwz 10,0(10)                    lwz 10,0(10)
        lis 9,x+32764@ha      |         la 8,x+32764@l(9)
        stfd ...
        ...
        lfd ...
        ...
        lfd 31,152(1)                   lfd 31,152(1)
        lfd 0,8(1)            |         stw 10,0(8)
        stfd 0,x+32764@l(9)   |         stw 11,4(8)
        addi 1,1,160                    addi 1,1,160
        blr                             blr

I don't speak POWER but perhaps you can make sense of it :-)

Ciao!
Steven
Alan Modra - Feb. 7, 2013, 10:47 p.m.
On Thu, Feb 07, 2013 at 11:01:13PM +0100, Steven Bosscher wrote:
> FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c):
[snip]
> 
> I don't speak POWER but perhaps you can make sense of it :-)

No real difference in "r" function, and interesting that LRA does a
better job of "w", about the same as trunk with my reload patch.

Patch

Index: gcc/reload.c
===================================================================
--- gcc/reload.c	(revision 195707)
+++ gcc/reload.c	(working copy)
@@ -3633,11 +3633,21 @@ 
 			  == NO_REGS)
 			reject = 600;
 
-		      if (operand_type[i] == RELOAD_FOR_OUTPUT
-			  && (targetm.preferred_output_reload_class (operand,
-							    this_alternative[i])
-			      == NO_REGS))
+		      else if (operand_type[i] == RELOAD_FOR_OUTPUT
+			       && (targetm.preferred_output_reload_class
+				   (operand, this_alternative[i])
+				   == NO_REGS))
 			reject = 600;
+
+#ifdef SECONDARY_MEMORY_NEEDED
+		      else if (REG_P (operand)
+			       && REGNO (operand) < FIRST_PSEUDO_REGISTER
+			       && (SECONDARY_MEMORY_NEEDED
+				   ((enum reg_class) this_alternative[i],
+				    REGNO_REG_CLASS (REGNO (operand)),
+				    operand_mode[i])))
+			reject += 6;
+#endif
 		    }
 
 		  /* We prefer to reload pseudos over reloading other things,