| Submitter | Alan Modra |
|---|---|
| Date | Feb. 7, 2013, 3:09 a.m. |
| Message ID | <20130207030946.GV5023@bubble.grove.modra.org> |
| Download | mbox | patch |
| Permalink | /patch/218813/ |
| State | New |
| Headers | show |
Comments
On Thu, Feb 7, 2013 at 4:09 AM, Alan Modra wrote: > After fixing PR54009 (again), I thought I'd take a look at why reload > is generating the following correct but poor code > > stw 10,8(1) > stw 11 12(1) > ... > lfd 0,8(1) > stfd 0,x+32764@l(9) > > rather than > > addi 9,x+32764@l(9) > ... > stw 10,0(9) > stw 11 4(9) FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c): r: r: stwu 1,-160(1) stwu 1,-160(1) lis 9,x+32764@ha lis 9,x+32764@ha la 9,x+32764@l(9) | la 8,x+32764@l(9) lwz 10,0(9) < lwz 11,4(9) < lis 9,y@ha lis 9,y@ha > lwz 10,0(8) > lwz 11,4(8) stfd ... ... lfd ... ... stw 10,y@l(9) stw 10,y@l(9) stw 11,y+4@l(9) stw 11,y+4@l(9) addi 1,1,160 addi 1,1,160 blr blr w: w: stwu 1,-160(1) stwu 1,-160(1) lis 9,y@ha lis 9,y@ha la 10,y@l(9) la 10,y@l(9) > lis 9,x+32764@ha lwz 11,4(10) lwz 11,4(10) lwz 10,0(10) lwz 10,0(10) lis 9,x+32764@ha | la 8,x+32764@l(9) stfd ... ... lfd ... ... lfd 31,152(1) lfd 31,152(1) lfd 0,8(1) | stw 10,0(8) stfd 0,x+32764@l(9) | stw 11,4(8) addi 1,1,160 addi 1,1,160 blr blr I don't speak POWER but perhaps you can make sense of it :-) Ciao! Steven
On Thu, Feb 07, 2013 at 11:01:13PM +0100, Steven Bosscher wrote: > FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c): [snip] > > I don't speak POWER but perhaps you can make sense of it :-) No real difference in "r" function, and interesting that LRA does a better job of "w", about the same as trunk with my reload patch.
Patch
Index: gcc/reload.c =================================================================== --- gcc/reload.c (revision 195707) +++ gcc/reload.c (working copy) @@ -3633,11 +3633,21 @@ == NO_REGS) reject = 600; - if (operand_type[i] == RELOAD_FOR_OUTPUT - && (targetm.preferred_output_reload_class (operand, - this_alternative[i]) - == NO_REGS)) + else if (operand_type[i] == RELOAD_FOR_OUTPUT + && (targetm.preferred_output_reload_class + (operand, this_alternative[i]) + == NO_REGS)) reject = 600; + +#ifdef SECONDARY_MEMORY_NEEDED + else if (REG_P (operand) + && REGNO (operand) < FIRST_PSEUDO_REGISTER + && (SECONDARY_MEMORY_NEEDED + ((enum reg_class) this_alternative[i], + REGNO_REG_CLASS (REGNO (operand)), + operand_mode[i]))) + reject += 6; +#endif } /* We prefer to reload pseudos over reloading other things,
After fixing PR54009 (again), I thought I'd take a look at why reload is generating the following correct but poor code stw 10,8(1) stw 11 12(1) ... lfd 0,8(1) stfd 0,x+32764@l(9) rather than addi 9,x+32764@l(9) ... stw 10,0(9) stw 11 4(9) This code sequence is from (set (mem/c:DF (lo_sum:SI (reg/f:SI 9) (const:SI (plus:SI (symbol_ref:SI ("x")) (const_int 32764))))) (reg:DF 10) ...gcc.target/powerpc/pr54009.c:42 363 {*movdf_hardfloat32}) In tracing through reload, I see a score of 8 for the m<-d alternative, and 9 for Y<-r. In both cases we have one "loser" operand for a score of 6 ("d" in the first case, "Y" in the second), plus a score of 2 from /* We prefer to reload pseudos over reloading other things, since such reloads may be able to be eliminated later. If we are reloading a SCRATCH, we won't be generating any insns, just using a register, so it is also preferred. So bump REJECT in other cases. Don't do this in the case where we are forcing a constant into memory and it will then win since we don't want to have a different alternative match then. */ if (! (REG_P (operand) && REGNO (operand) >= FIRST_PSEUDO_REGISTER) && GET_CODE (operand) != SCRATCH && ! (const_to_mem && constmemok)) reject += 2; The Y<-r alternative gets one extra from /* Input reloads can be inherited more often than output reloads can be removed, so penalize output reloads. */ if (operand_type[i] != RELOAD_FOR_INPUT && GET_CODE (operand) != SCRATCH) reject++; The problem of course is that the input reload is quite expensive, involving a copy to memory. So, how about teaching reload about this as follows? I picked 6 for the reject value to make it equivalent to a '?' in the constraint, but that may be too large. Any value of 2 or greater works for the testcase. Bootstrapped and regression tested powerpc64-linux, but not yet spec tested. 2013-02-07 Alan Modra <amodra@gmail.com> * reload.c (find_reloads): Disparage reg alternatives needing secondary memory to reload.