diff mbox

RFC/RFA: Fix bug with REE optimization corrupting extended registers

Message ID 87d1v7v1g9.fsf@redhat.com
State New
Headers show

Commit Message

Nick Clifton Nov. 18, 2015, 2:15 p.m. UTC
Hi Guys,

  I recently discovered a bug in the current Redundant Extension
  Elimination optimization.  If the candidate extension instruction
  increases the number of hard registers used, the pass does not check
  to see if these extra registers are live between the definition and
  the extension.

  For example:

    (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20))) 
    (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18))) 
    [...]
    (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
    [...]                                                  
    (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))

  (This is on the RL78 target where HImode values occupy two hard
  registers and QImode values only one.  The bug however is generic, not
  RL78 specific).
  
  The REE pass transforms this into:

    (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20))) 
    (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) 
    [...]
    (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
    [...]
    (insn  88 deleted)

  Note how the new set at insn 45 clobbers the value loaded by insn 44
  into r11.

  The patch below is my attempt to fix this.  It is not correct
  however.  I could not work out how to determine if a given hard
  register is live at any point between two insns.  So instead I used
  the liveness information associated with the basic block containing
  the definition.  If one of the extended registers is live or becomes
  live in this block, and this block is not the same block as the one
  containing the extension insn, then I stop the optimization.  This
  works for the RL78 for all of the cases that I could find.

  So ... comments please ?

  Will gcc ever generate a single basic block that contains both the
  definition and the extension instructions, but also where the extra
  hard registers are used for another purpose as well ?

  Tested with no regressions (or fixes) on an x86-pc-linux-gnu target.
  Also tested with no regression and 7 fixes on an rl78-elf target.

Cheers
  Nick

gcc/ChangeLog
2015-11-18  Nick Clifton  <nickc@redhat.com>

	* ree.c (regs_live_between): New function.
        (add_removable_extension): If the extension uses more hard
        registers than the definition then check that the extra hard
        registers are not live between the definition and the
        extension.

Comments

Bernd Schmidt Nov. 19, 2015, 12:52 a.m. UTC | #1
>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.

I had a look around. There's code testing HARD_REGNO_NREGS in 
ree.c:combine_set_extension. It's inside #if 0, and labelled 
"temporarily disabled". See if enabling that helps you? (Jeff, that #if 
0 was added by you).


Bernd
Jeff Law Nov. 19, 2015, 1:56 a.m. UTC | #2
On 11/18/2015 05:52 PM, Bernd Schmidt wrote:
>>    (This is on the RL78 target where HImode values occupy two hard
>>    registers and QImode values only one.  The bug however is generic, not
>>    RL78 specific).
>>
>>    The REE pass transforms this into:
>>
>>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>>      [...]
>>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>>      [...]
>>      (insn  88 deleted)
>>
>>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>>    into r11.
>
> I had a look around. There's code testing HARD_REGNO_NREGS in
> ree.c:combine_set_extension. It's inside #if 0, and labelled
> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
> 0 was added by you).
I was going to dig into this discussion tomorrow since I was the last 
one in that code.  Just had to get that threading regression resolved 
first :-)

jeff
Nick Clifton Nov. 19, 2015, 3:34 p.m. UTC | #3
Hi Bernd,

> I had a look around. There's code testing HARD_REGNO_NREGS in
> ree.c:combine_set_extension. It's inside #if 0, and labelled
> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
> 0 was added by you).

I suspect that the code was disabled because it prevented too many 
useful optimization opportunities.

The code there would solve this problem, but the approach is is overly 
cautious, since it disables the optimization for all extensions that 
increase the number of hard registers used.  Some of these will be 
viable candidates, provided that the extra hard registers are no used. 
(This is certainly true for the RL78, where the (patched) optimization 
does improve code, even though the widening does use extra registers).

Cheers
   Nick
Bernd Schmidt Nov. 19, 2015, 4:53 p.m. UTC | #4
On 11/19/2015 04:34 PM, Nick Clifton wrote:
> Hi Bernd,
>
>> I had a look around. There's code testing HARD_REGNO_NREGS in
>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>> 0 was added by you).
>
> I suspect that the code was disabled because it prevented too many
> useful optimization opportunities.

Well, correctness goes first. Since reenabling it fixes your problem, 
that change is approved (conditional on testing as usual obviously).


Bernd
Jeff Law Nov. 19, 2015, 5:06 p.m. UTC | #5
On 11/19/2015 09:53 AM, Bernd Schmidt wrote:
> On 11/19/2015 04:34 PM, Nick Clifton wrote:
>> Hi Bernd,
>>
>>> I had a look around. There's code testing HARD_REGNO_NREGS in
>>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>>> 0 was added by you).
>>
>> I suspect that the code was disabled because it prevented too many
>> useful optimization opportunities.
>
> Well, correctness goes first. Since reenabling it fixes your problem,
> that change is approved (conditional on testing as usual obviously).
I couldn't convince myself it was a correct check.  Having Nick's 
testcase should help with the analysis.

Frankly, the overall structure of ree is a mess -- I've found it 
incredibly difficult to follow every time I've had to look at it.

Let me take a peek today and see if I can get familiar enough with this 
code again to help.


Jeff
Jeff Law Nov. 19, 2015, 6:31 p.m. UTC | #6
On 11/19/2015 08:34 AM, Nick Clifton wrote:
> Hi Bernd,
>
>> I had a look around. There's code testing HARD_REGNO_NREGS in
>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>> 0 was added by you).
>
> I suspect that the code was disabled because it prevented too many
> useful optimization opportunities.
>
> The code there would solve this problem, but the approach is is overly
> cautious, since it disables the optimization for all extensions that
> increase the number of hard registers used.  Some of these will be
> viable candidates, provided that the extra hard registers are no used.
> (This is certainly true for the RL78, where the (patched) optimization
> does improve code, even though the widening does use extra registers).
Nick -- can you pass along your testcode?

Jeff
Jeff Law Nov. 19, 2015, 7:25 p.m. UTC | #7
On 11/18/2015 07:15 AM, Nick Clifton wrote:
> Hi Guys,
>
>    I recently discovered a bug in the current Redundant Extension
>    Elimination optimization.  If the candidate extension instruction
>    increases the number of hard registers used, the pass does not check
>    to see if these extra registers are live between the definition and
>    the extension.
>
>    For example:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))
>
>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
Right.  I can recall walking though multi-reg cases on my whiteboard. 
There was one particular case Jakub was concerned about in this space, 
but I couldn't contrive a testcase to trigger and I was getting to the 
point where I was struggling to be able to reason about the behaviour of 
the code.




>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.
Right.

>
>    The patch below is my attempt to fix this.  It is not correct
>    however.  I could not work out how to determine if a given hard
>    register is live at any point between two insns.  So instead I used
>    the liveness information associated with the basic block containing
>    the definition.  If one of the extended registers is live or becomes
>    live in this block, and this block is not the same block as the one
>    containing the extension insn, then I stop the optimization.  This
>    works for the RL78 for all of the cases that I could find.
The way this pass has dealt with this class of problems is the various 
routines in rtlanal.c  reg_used_between_p, reg_set_between_p and the like.


>
>    So ... comments please ?
>
>    Will gcc ever generate a single basic block that contains both the
>    definition and the extension instructions, but also where the extra
>    hard registers are used for another purpose as well ?
I can't see any reason why it would not.


jeff
Nick Clifton Nov. 20, 2015, 9:19 a.m. UTC | #8
Hi Jeff,

>> The code there would solve this problem, but the approach is is overly
>> cautious, since it disables the optimization for all extensions that
>> increase the number of hard registers used.  Some of these will be
>> viable candidates, provided that the extra hard registers are no used.
>> (This is certainly true for the RL78, where the (patched) optimization
>> does improve code, even though the widening does use extra registers).
> Nick -- can you pass along your testcode?

Sure - this is for the RL78 toolchain.  In theory the problem is 
generic, but I have not tested other toolchains.

Compile the gcc.c-torture/execute/pr42833.c or 
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free 
also specified on the command line.  Without -free these tests pass. 
With -free they fail.

Cheers
   Nick
Jeff Law Nov. 20, 2015, 5:32 p.m. UTC | #9
On 11/20/2015 02:19 AM, Nick Clifton wrote:
> Hi Jeff,
>
>>> The code there would solve this problem, but the approach is is overly
>>> cautious, since it disables the optimization for all extensions that
>>> increase the number of hard registers used.  Some of these will be
>>> viable candidates, provided that the extra hard registers are no used.
>>> (This is certainly true for the RL78, where the (patched) optimization
>>> does improve code, even though the widening does use extra registers).
>> Nick -- can you pass along your testcode?
>
> Sure - this is for the RL78 toolchain.  In theory the problem is
> generic, but I have not tested other toolchains.
Right.  It just really helps me to have something I can poke at -- it's 
just the type of learner I am.  I tried to trip the test in that #if 
several times last year and never came up with anything.  So naturally 
if we can trip it with the rl78 I want to dig into it.


>
> Compile the gcc.c-torture/execute/pr42833.c or
> gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
> also specified on the command line.  Without -free these tests pass.
> With -free they fail.
Perfect.  Building rl78-elf cross bits as I type...

jeff
Jeff Law Nov. 20, 2015, 10:01 p.m. UTC | #10
On 11/20/2015 02:19 AM, Nick Clifton wrote:
> Hi Jeff,
>
>>> The code there would solve this problem, but the approach is is overly
>>> cautious, since it disables the optimization for all extensions that
>>> increase the number of hard registers used.  Some of these will be
>>> viable candidates, provided that the extra hard registers are no used.
>>> (This is certainly true for the RL78, where the (patched) optimization
>>> does improve code, even though the widening does use extra registers).
>> Nick -- can you pass along your testcode?
>
> Sure - this is for the RL78 toolchain.  In theory the problem is
> generic, but I have not tested other toolchains.
>
> Compile the gcc.c-torture/execute/pr42833.c or
> gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
> also specified on the command line.  Without -free these tests pass.
> With -free they fail.
So this looks like a pretty fundamental failing in ree.c and AFAICT has 
been there since day 1.  Thankfully, I don't think it's likely to 
trigger all that often.

It's useful to ponder two cases.  One where we're going to add a copy 
and one where we don't.  I added the copy case during the 4.9 cycle as 
an extension of existing code.  It only fires in easy to analyze 
circumstances -- when the original set and the extension are in the same 
basic block, but have different destination registers.

In the copy case.  The code checks that the destination of the 
*extension* is neither used nor set between the original set and the 
extension.  So in the case of multi-word hard reg, we'll verify that the 
entire multi-word hard reg survives from the original insn to the 
extension.  That avoids this problem in cases where a copy is needed.

However, in the case where no copy is needed, no such checks are made. 
In fact the checks could be fairly painful as the original set and the 
extension can be in different blocks with arbitrary flow between them.

I would hazard a guess that the authors simply didn't consider the 
multi-hard reg case.  Essentially if the original set reached an 
extension, then obviously the original set got there unharmed and the 
extended destination should reach as well -- except that doesn't apply 
to multi-word hard regs.

Given the expense in checking all the potential paths between the 
original set and the extension, I think just disallowing this case ought 
to be is the best way to go.  Ideally we'll filter out that case before 
we get to combine_set_extension.  But if we can't, it's easy enough to 
test there.

Jeff
Eric Botcazou Nov. 20, 2015, 10:10 p.m. UTC | #11
> I would hazard a guess that the authors simply didn't consider the
> multi-hard reg case.  Essentially if the original set reached an
> extension, then obviously the original set got there unharmed and the
> extended destination should reach as well -- except that doesn't apply
> to multi-word hard regs.

This pass started as a Zero-Extension Elimination pass for x86-64 and was only 
considering implicit SI->DI extensions initially.  The algorithm was tailored 
to this specific pattern and I agree that the pass should be reimplemented.
Jeff Law Nov. 20, 2015, 10:24 p.m. UTC | #12
On 11/18/2015 07:15 AM, Nick Clifton wrote:
> Hi Guys,
>
>    I recently discovered a bug in the current Redundant Extension
>    Elimination optimization.  If the candidate extension instruction
>    increases the number of hard registers used, the pass does not check
>    to see if these extra registers are live between the definition and
>    the extension.
>
>    For example:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))
>
>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.
>
>    The patch below is my attempt to fix this.  It is not correct
>    however.  I could not work out how to determine if a given hard
>    register is live at any point between two insns.  So instead I used
>    the liveness information associated with the basic block containing
>    the definition.  If one of the extended registers is live or becomes
>    live in this block, and this block is not the same block as the one
>    containing the extension insn, then I stop the optimization.  This
>    works for the RL78 for all of the cases that I could find.
>
>    So ... comments please ?
>
>    Will gcc ever generate a single basic block that contains both the
>    definition and the extension instructions, but also where the extra
>    hard registers are used for another purpose as well ?
>
>    Tested with no regressions (or fixes) on an x86-pc-linux-gnu target.
>    Also tested with no regression and 7 fixes on an rl78-elf target.
>
> Cheers
>    Nick
>
> gcc/ChangeLog
> 2015-11-18  Nick Clifton  <nickc@redhat.com>
>
> 	* ree.c (regs_live_between): New function.
>          (add_removable_extension): If the extension uses more hard
>          registers than the definition then check that the extra hard
>          registers are not live between the definition and the
>          extension.
>
> @@ -1080,6 +1123,30 @@
>   	      }
>   	  }
>
> +      /* Fourth, if the extended version occupies more registers than
> +	 the original version then make sure that these extra registers
> +	 are not live between the definition and the extension.  */
> +      if (HARD_REGNO_NREGS (REGNO (dest), mode)
> +	  > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
So this seems like a reasonable place to catch this.  I certainly prefer 
to avoid work later by filtering earlier like you've done.

The problem is with your implementation of regs_live_between.  You're 
assuming we've got straight-line code between the original set and the 
extension.  That's not guaranteed.  Essentially we've got use-def chains 
via the DF framework.  So, in theory, they can be in different blocks 
and arbitrary flow control between them.  In fact, the extension could 
be before the original set in the insn chain (though obviously not for 
execution order).

I think it's reasonable to just punt when we see that the source/dest 
register of the extension are the same and the number of hard regs is 
different.  That's what I'm testing now.

jeff
diff mbox

Patch

Index: gcc/ree.c
===================================================================
--- gcc/ree.c	(revision 230517)
+++ gcc/ree.c	(working copy)
@@ -952,6 +952,49 @@ 
   return false;
 }
 
+/* Returns TRUE if any of the registers [start, stop) are live between DEF and up
+   to, but not including, INSN.  */
+
+static bool
+regs_live_between (rtx_insn * insn,
+		   struct df_link * def,
+		   unsigned int start,
+		   unsigned int stop)
+{
+  basic_block bb;
+
+  /* FIXME: This is an incomplete test.  It only checks the DEF and USE states of the
+     registers in basic block of DEF.  If any of them match the start-stop range and
+     the INSN is not in the same block, then it is assumed that the registers must be
+     live.  */
+  /* These first few checks are just paranoia... */
+  if (def != NULL
+      && def->ref != NULL
+      && def->ref->base.insn_info != NULL
+      && def->ref->base.insn_info->insn != NULL
+      && (bb = BLOCK_FOR_INSN (def->ref->base.insn_info->insn)) != NULL
+      && bb != BLOCK_FOR_INSN (insn))
+    {
+      struct df_lr_bb_info * bb_info = df_lr_get_bb_info (bb->index);
+      bitmap_iterator bi;
+      unsigned int reg;
+
+      EXECUTE_IF_SET_IN_BITMAP (& bb_info->def, 0, reg, bi)
+	{
+	  if (reg >= start && reg < stop)
+	    return true;
+	}
+
+      EXECUTE_IF_SET_IN_BITMAP (& bb_info->out, 0, reg, bi)
+	{
+	  if (reg >= start && reg < stop)
+	    return true;
+	}
+    }
+
+  return false;
+}
+
 /* Add an extension pattern that could be eliminated.  */
 
 static void
@@ -1080,6 +1123,30 @@ 
 	      }
 	  }
 
+      /* Fourth, if the extended version occupies more registers than
+	 the original version then make sure that these extra registers
+	 are not live between the definition and the extension.  */
+      if (HARD_REGNO_NREGS (REGNO (dest), mode)
+	  > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
+	{
+	  unsigned int start_reg = REGNO (reg) + HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg));
+	  unsigned int stop_reg  = REGNO (reg) + HARD_REGNO_NREGS (REGNO (dest), mode);
+
+	  for (def = defs; def; def = def->next)
+	    {
+	      if (regs_live_between (insn, def, start_reg, stop_reg))
+		{
+		  if (dump_file)
+		    {
+		      fprintf (dump_file, "Cannot eliminate extension:\n");
+		      print_rtl_single (dump_file, insn);
+		      fprintf (dump_file, " because extended register(s) are already used\n");
+		    }
+		  return;
+		}
+	    }
+	}
+
       /* Then add the candidate to the list and insert the reaching definitions
          into the definition map.  */
       ext_cand e = {expr, code, mode, insn};