Patchwork A few simple DImode improvements

login
register
mail settings
Submitter Bernd Schmidt
Date June 26, 2010, 12:29 a.m.
Message ID <4C2549D4.10608@codesourcery.com>
Download mbox | patch
Permalink /patch/57045/
State New
Headers show

Comments

Bernd Schmidt - June 26, 2010, 12:29 a.m.
There are ways to improve register allocation for DImode without going
to extremes like the patch I posted on Monday.  The attached patch is
quite simple and fixes two issues I noticed while working on the DCE
patch I posted earlier.

First, when we set DImode registers in a sequence of multiple SImode
accesses, we emit a clobber at the start of the sequence to tell life
analysis that the register is dead above this point.  Scheduling can
move these clobbers, extending the lifetime as far as IRA is able to
tell.  The changes in ira-lives.c make us move these clobbers downwards
as far as possible.  This may sometimes help even for single-word regs
that are initialized with a sequence of partial sets involving zero_extract.

The changes in the lower-subreg pass are motivated by nonsensical
transformations like the following:

-(insn 205 204 26 4  (set (reg:DI 68)
-        (reg:DI 178)) 49 (REG_DEAD (reg:DI 178)
+(insn 261 204 262 4  (clobber (reg:DI 193))
+
+(insn 262 261 263 4  (set (subreg:SI (reg:DI 193) 0) (reg:SI 191))
+
+(insn 263 262 264 4  (set (subreg:SI (reg:DI 193) 4) (reg:SI 192))
+
+(insn 264 263 26 4  (set (reg:DI 68) (reg:DI 193))

In resolve_simple_move, we generate a new DImode temporary pseudo rather
than storing into the DImode pseudo which is the actual destination.
This, I think, is due to confusion about the use of can_decompose_p; it
makes sense to use it in this context only for hard regs.  For pseudos,
it tests the non_decomposable_context bitmap, which is irrelevant here.

Bootstrapped and regression tested on i686-linux.  Ok?


Bernd
* ira-lives.c (process_bb_node_lives): Move clobber insns downwards
	as much as possible before analyzing a basic block.
	* lower-subreg.c (resolve_simple_move): Don't call can_decompose_p on
	pseudos.
Eric Botcazou - June 27, 2010, 9:36 p.m.
> The changes in the lower-subreg pass are motivated by nonsensical
> transformations like the following:
>
> -(insn 205 204 26 4  (set (reg:DI 68)
> -        (reg:DI 178)) 49 (REG_DEAD (reg:DI 178)
> +(insn 261 204 262 4  (clobber (reg:DI 193))
> +
> +(insn 262 261 263 4  (set (subreg:SI (reg:DI 193) 0) (reg:SI 191))
> +
> +(insn 263 262 264 4  (set (subreg:SI (reg:DI 193) 4) (reg:SI 192))
> +
> +(insn 264 263 26 4  (set (reg:DI 68) (reg:DI 193))
>
> In resolve_simple_move, we generate a new DImode temporary pseudo rather
> than storing into the DImode pseudo which is the actual destination.
> This, I think, is due to confusion about the use of can_decompose_p; it
> makes sense to use it in this context only for hard regs.  For pseudos,
> it tests the non_decomposable_context bitmap, which is irrelevant here.

can_decompose_p is only used in this context though.  But I agree that this 
looks superfluous.

Ian, do you have any insight here?
Ian Taylor - June 28, 2010, 6:05 p.m.
Eric Botcazou <ebotcazou@adacore.com> writes:

>> The changes in the lower-subreg pass are motivated by nonsensical
>> transformations like the following:
>>
>> -(insn 205 204 26 4  (set (reg:DI 68)
>> -        (reg:DI 178)) 49 (REG_DEAD (reg:DI 178)
>> +(insn 261 204 262 4  (clobber (reg:DI 193))
>> +
>> +(insn 262 261 263 4  (set (subreg:SI (reg:DI 193) 0) (reg:SI 191))
>> +
>> +(insn 263 262 264 4  (set (subreg:SI (reg:DI 193) 4) (reg:SI 192))
>> +
>> +(insn 264 263 26 4  (set (reg:DI 68) (reg:DI 193))
>>
>> In resolve_simple_move, we generate a new DImode temporary pseudo rather
>> than storing into the DImode pseudo which is the actual destination.
>> This, I think, is due to confusion about the use of can_decompose_p; it
>> makes sense to use it in this context only for hard regs.  For pseudos,
>> it tests the non_decomposable_context bitmap, which is irrelevant here.
>
> can_decompose_p is only used in this context though.  But I agree that this 
> looks superfluous.
>
> Ian, do you have any insight here?

It's superfluous for a simple example like this one, but it's not
clearly superfluous in all cases.  If register 68 here should wind up
being allocated to a floating point register which can't be SUBREG'ed,
then my concern is that taking a SUBREG is going to make it harder to
allocate the register correctly.  In other words, the bitmap is not
irrelevant; it's a proxy for non-SImode register classes.  I don't know
whether this a concern with IRA; can IRA allocate a DImode pseudo to a
floating point register if there are SImode SUBREGs of it?  The old
register allocator wouldn't, and you could wind up with a bunch of
reloads.

(I actually had patches for the old register allocator which tracked
subregs independently to improve allocation for cases like this.  I
delayed them to wait for the dataflow work, and then they became
irrelevant with IRA.)

Ian
Bernd Schmidt - June 28, 2010, 6:20 p.m.
On 06/28/2010 08:05 PM, Ian Lance Taylor wrote:

> It's superfluous for a simple example like this one, but it's not
> clearly superfluous in all cases.  If register 68 here should wind up
> being allocated to a floating point register which can't be SUBREG'ed,
> then my concern is that taking a SUBREG is going to make it harder to
> allocate the register correctly.  In other words, the bitmap is not
> irrelevant; it's a proxy for non-SImode register classes.

That however applies to only one case in which bits can be set in that
bitmap, the SUBREG case in find_decomposable_subregs.  The normal case,
I think, is that these bits come from occurrences of the reg that are in
NOT_SIMPLE_MOVE, and I think this is irrelevant here.  Do we need to
keep track of two different bitmaps?

>  I don't know
> whether this a concern with IRA; can IRA allocate a DImode pseudo to a
> floating point register if there are SImode SUBREGs of it?  The old
> register allocator wouldn't, and you could wind up with a bunch of
> reloads.

I don't know whether IRA would do that or whether it can determine the
costs correctly; I can't find anything right now which would deal with
this case.


Bernd
Jeff Law - June 28, 2010, 6:33 p.m.
On 06/28/10 12:05, Ian Lance Taylor wrote:
> I don't know
> whether this a concern with IRA; can IRA allocate a DImode pseudo to a
> floating point register if there are SImode SUBREGs of it?  The old
> register allocator wouldn't, and you could wind up with a bunch of
> reloads.
>    
I would expect that unless the target explicitly forbids such an 
allocation via CANNOT_CHANGE_MODE_CLASS, then it ought to work.  In 
fact, this is a normal code generation situation for the PA family to 
support xmpyu -- if it had been broken I suspect we would have heard 
about it by now as anything doing an integer multiply would suddenly be 
using libcalls instead of xmpyu.


Jeff
Jeff Law - June 28, 2010, 6:34 p.m.
On 06/28/10 12:20, Bernd Schmidt wrote:
> I don't know
>> whether this a concern with IRA; can IRA allocate a DImode pseudo to a
>> floating point register if there are SImode SUBREGs of it?  The old
>> register allocator wouldn't, and you could wind up with a bunch of
>> reloads.
>>      
> I don't know whether IRA would do that or whether it can determine the
> costs correctly; I can't find anything right now which would deal with
> this case.
>    
It should be merely a question of consulting CANNOT_CHANGE_MODE_CLASS 
(for correctness) and costing models.  It shouldn't really need special 
handling.

jeff
Ian Taylor - June 28, 2010, 6:57 p.m.
Bernd Schmidt <bernds@codesourcery.com> writes:

> On 06/28/2010 08:05 PM, Ian Lance Taylor wrote:
>
>> It's superfluous for a simple example like this one, but it's not
>> clearly superfluous in all cases.  If register 68 here should wind up
>> being allocated to a floating point register which can't be SUBREG'ed,
>> then my concern is that taking a SUBREG is going to make it harder to
>> allocate the register correctly.  In other words, the bitmap is not
>> irrelevant; it's a proxy for non-SImode register classes.
>
> That however applies to only one case in which bits can be set in that
> bitmap, the SUBREG case in find_decomposable_subregs.  The normal case,
> I think, is that these bits come from occurrences of the reg that are in
> NOT_SIMPLE_MOVE, and I think this is irrelevant here.  Do we need to
> keep track of two different bitmaps?

As far as I can see, it's not irrelevant there, because the pseudo might
be used somewhere else in a case which is NOT_SIMPLE_MOVE.  E.g., a
floating point addition.  What am I missing?

Ian
Bernd Schmidt - June 28, 2010, 9:14 p.m.
On 06/28/2010 08:57 PM, Ian Lance Taylor wrote:
> As far as I can see, it's not irrelevant there, because the pseudo might
> be used somewhere else in a case which is NOT_SIMPLE_MOVE.  E.g., a
> floating point addition.  What am I missing?

Do you want to also verify that the mode of the final destination pseudo
is DImode (i.e. orig_mode == dest_mode) before omitting the temporary?
In that case it would be quite unlikely for it to be used in a floating
point addition, and I think it would still avoid the extra temporary in
the vast majority of the cases?


Bernd
Richard Henderson - June 28, 2010, 11 p.m.
On 06/28/2010 11:33 AM, Jeff Law wrote:
> On 06/28/10 12:05, Ian Lance Taylor wrote:
>> I don't know
>> whether this a concern with IRA; can IRA allocate a DImode pseudo to a
>> floating point register if there are SImode SUBREGs of it?  The old
>> register allocator wouldn't, and you could wind up with a bunch of
>> reloads.
>>    
> I would expect that unless the target explicitly forbids such an
> allocation via CANNOT_CHANGE_MODE_CLASS, then it ought to work.  In
> fact, this is a normal code generation situation for the PA family to
> support xmpyu -- if it had been broken I suspect we would have heard
> about it by now as anything doing an integer multiply would suddenly be
> using libcalls instead of xmpyu.

CANNOT_CHANGE_MODE_CLASS is the normal state of affairs for several
ports doing int<->fp conversion.  Where DImode is the only valid
integer mode, and the single-precision load instruction does some
bit swizzling (to place the SFmode value into register format) so
that 32-bit integer values cannot be loaded or manipulated.

This is true of some ppc variants and alpha, at least.  As such,
this is sort-of a non-answer, Jeff.  


r~
Ian Taylor - June 28, 2010, 11:03 p.m.
Bernd Schmidt <bernds@codesourcery.com> writes:

> On 06/28/2010 08:57 PM, Ian Lance Taylor wrote:
>> As far as I can see, it's not irrelevant there, because the pseudo might
>> be used somewhere else in a case which is NOT_SIMPLE_MOVE.  E.g., a
>> floating point addition.  What am I missing?
>
> Do you want to also verify that the mode of the final destination pseudo
> is DImode (i.e. orig_mode == dest_mode) before omitting the temporary?
> In that case it would be quite unlikely for it to be used in a floating
> point addition, and I think it would still avoid the extra temporary in
> the vast majority of the cases?

I see what you're getting at.  It's unlikely, but it does happen in code
which examines the bits of a floating point number via a union or using
-fno-strict-aliasing and a type cast.  So if that code works correctly
if perhaps less efficiently, then I agree that checking that the mode
class is MODE_INT should work OK here.

Ian
Bernd Schmidt - June 28, 2010, 11:34 p.m.
On 06/29/2010 01:03 AM, Ian Lance Taylor wrote:
> I see what you're getting at.  It's unlikely, but it does happen in code
> which examines the bits of a floating point number via a union or using
> -fno-strict-aliasing and a type cast.  So if that code works correctly
> if perhaps less efficiently, then I agree that checking that the mode
> class is MODE_INT should work OK here.

Hmm.  Looking at this again now, aren't we already doing that?
(resolve_reg_p checks whether DEST is a CONCAT)

  if (!can_decompose_p (dest)
      || (side_effects_p (dest) && !pushing)
      || (!SCALAR_INT_MODE_P (dest_mode)
	  && !resolve_reg_p (dest)
	  && !resolve_subreg_p (dest)))


Bernd

Patch

Index: ira-lives.c
===================================================================
--- ira-lives.c	(revision 161371)
+++ ira-lives.c	(working copy)
@@ -877,7 +877,7 @@  process_bb_node_lives (ira_loop_tree_nod
   int i, freq;
   unsigned int j;
   basic_block bb;
-  rtx insn;
+  rtx insn, prev;
   bitmap_iterator bi;
   bitmap reg_live_out;
   unsigned int px;
@@ -925,6 +925,42 @@  process_bb_node_lives (ira_loop_tree_nod
       /* Invalidate all allocno_saved_at_call entries.  */
       last_call_num++;
 
+      /* Previous passes such as the first scheduling pass may have lengthened
+	 the lifetime of pseudos by moving CLOBBER insns upwards.  Undo this
+	 here.  */
+      FOR_BB_INSNS_REVERSE_SAFE (bb, insn, prev)
+	{
+	  rtx pat, next, reg;
+	  if (!NONJUMP_INSN_P (insn) || insn == BB_END (bb))
+	    continue;
+	  pat = PATTERN (insn);
+	  if (GET_CODE (pat) != CLOBBER)
+	    continue;
+	  reg = XEXP (pat, 0);
+	  if (!REG_P (reg) || REGNO (reg) < FIRST_PSEUDO_REGISTER)
+	    continue;
+	  next = insn;
+	  while (next != BB_END (bb))
+	    {
+	      next = NEXT_INSN (next);
+	      if (!NONDEBUG_INSN_P (next))
+		continue;
+	      if (reg_mentioned_p (XEXP (pat, 0), PATTERN (next)))
+		break;
+	    }
+	  if (next == NEXT_INSN (insn))
+	    continue;
+
+	  NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
+	  PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
+
+	  PREV_INSN (insn) = PREV_INSN (next);
+	  NEXT_INSN (insn) = next;
+
+	  NEXT_INSN (PREV_INSN (next)) = insn;
+	  PREV_INSN (next) = insn;
+	}
+
       /* Scan the code of this basic block, noting which allocnos and
 	 hard regs are born or die.
 
Index: lower-subreg.c
===================================================================
--- lower-subreg.c	(revision 161371)
+++ lower-subreg.c	(working copy)
@@ -717,7 +717,7 @@  resolve_simple_move (rtx set, rtx insn)
   /* If SRC is a register which we can't decompose, or has side
      effects, we need to move via a temporary register.  */
 
-  if (!can_decompose_p (src)
+  if ((REG_P (src) && HARD_REGISTER_P (src) && !can_decompose_p (src))
       || side_effects_p (src)
       || GET_CODE (src) == ASM_OPERANDS)
     {
@@ -737,7 +737,7 @@  resolve_simple_move (rtx set, rtx insn)
 
   dest_mode = orig_mode;
   pushing = push_operand (dest, dest_mode);
-  if (!can_decompose_p (dest)
+  if ((REG_P (dest) && HARD_REGISTER_P (dest) && !can_decompose_p (dest))
       || (side_effects_p (dest) && !pushing)
       || (!SCALAR_INT_MODE_P (dest_mode)
 	  && !resolve_reg_p (dest)