Patchwork Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

login
register
mail settings
Submitter Steven Bosscher
Date Nov. 19, 2012, 9:19 p.m.
Message ID <CABu31nO2O_zH=wyk_MYqqn81AT4HComhkxXWomXpbsU-Qsgr1w@mail.gmail.com>
Download mbox | patch
Permalink /patch/200156/
State New
Headers show

Comments

Steven Bosscher - Nov. 19, 2012, 9:19 p.m.
On Mon, Nov 19, 2012 at 9:34 PM, Steven Bosscher wrote:
> On Mon, Nov 19, 2012 at 12:24 PM, Eric Botcazou wrote:
>>> Yes, I'll be looking into this soon.
>>
>> We have a related regression on SPARC:
>>
>> FAIL: gfortran.dg/minmaxloc_5.f90  -O3 -fomit-frame-pointer -funroll-loops
>> execution test
>> FAIL: gfortran.dg/minmaxloc_5.f90  -O3 -fomit-frame-pointer -funroll-all-loops
>> -finline-functions  execution test
>> FAIL: gfortran.dg/minmaxloc_6.f90  -O3 -fomit-frame-pointer -funroll-loops
>> execution test
>> FAIL: gfortran.dg/minmaxloc_6.f90  -O3 -fomit-frame-pointer -funroll-all-loops
>> -finline-functions  execution test
>>
>> whose failure mode is exactly the same as for Honza's testcase.  I even have a
>> more reduced testcase (6 lines, attached) in case you'd prefer working on it.
>>
>> Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and
>> grepping for mem:SF (const_int 0 [0]) in the RTL dumps.
>
> Thanks for this reduced test case, that's saving me a lot of work!
>
> Can you please try and see if the following C test case also fails?
>
> It looks like the memcpy is expanded to a loop that is unrolled, and
> then mis-optimized. I haven't found out yet why...

It's another case of an invalid REG_EQUAL note. Shown below is an
excerpt from the C test case .loop2_done dump (after calling
df_analyze to update liveness). The note on insn 172 uses (reg:SI 132)
which is not live on entry to basic block 25.

;; basic block 25, loop depth 0, count 0, freq 2485, maybe hot
;;  prev block 20, next block 26, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:       11 [50.5%]
;; bb 25 artificial_defs: { }
;; bb 25 artificial_uses: { u-1(14){ }u-1(30){ }u-1(101){ }}
;; lr  in        14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 138 158 165
;; lr  use       14 [%sp] 30 [%fp] 101 [%sfp] 138 158
;; lr  def       131 133
;; live  in      14 [%sp] 101 [%sfp] 138 158 165
;; live  gen     131 133
;; live  kill

(code_label 179 149 173 25 22 "" [1 uses])
(note 173 179 171 25 [bb 25] NOTE_INSN_BASIC_BLOCK)
(insn 171 173 172 25 (set (reg/v:SI 133 [ idsD.1386 ])
        (reg/v:SI 138 [ idsD.1386 ])) 40 {*movsi_insn}
     (expr_list:REG_UNUSED (reg/v:SI 133 [ idsD.1386 ])
        (nil)))
(insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ])
        (reg:SF 158 [ limit3D.1388 ])) t.c:62 -1
     (expr_list:REG_EQUAL (mem:SF (reg:SI 132 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))
;; lr  out       14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 131 133 138 165
;; live  out     14 [%sp] 101 [%sfp] 131 133 138 165


The use of the dead reg is renamed in the webizer:

(insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ])
        (reg:SF 172 [ limit3D.1388 ])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))


Next, cse2 thinks it's a good idea to actually use the REG_EQUAL note,
turning the EQ-use of the uninitialized reg 174 into a real use:

(insn 172 171 176 17 (set (reg/v:SF 131 [ limit3D.1388 ])
        (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23,
offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))


Then init-regs concludes that reg 174 is used ininitialized:

adding initialization in ga4076 of reg 174 at in block 16 for insn 172.
(insn 206 171 172 16 (set (reg:SI 174 [ ivtmp.7D.1419 ])
        (const_int 0 [0])) t.c:62 -1
     (nil))
(insn 172 206 176 16 (set (reg/v:SF 131 [ limit3D.1388 ])
        (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23,
offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_DEAD (reg:SI 174 [ ivtmp.7D.1419 ])
        (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
            (nil))))


And finally, combine merges the above to yield the final result:
Trying 206 -> 172:
Successfully matched this instruction:
(set (reg/v:SF 131 [ limit3D.1388 ])
    (mem:SF (const_int 0 [0]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]))


The root cause is the bad REG_EQUAL note. I think the most robust
solution is to make the webizer re-compute notes before renaming.
Patch for that is attached.

Ciao!
Steven


        PR rtl-optimization/55006
        * web.c (web_main): Add the DF_NOTE problem, and explain whatfor.
Eric Botcazou - Nov. 19, 2012, 9:50 p.m.
> The root cause is the bad REG_EQUAL note. I think the most robust
> solution is to make the webizer re-compute notes before renaming.
> Patch for that is attached.

Thanks for the analysis.  However...

> Ciao!
> Steven
> 
> 
>         PR rtl-optimization/55006
>         * web.c (web_main): Add the DF_NOTE problem, and explain whatfor.
> 
> Index: web.c
> ===================================================================
> --- web.c       (revision 193454)
> +++ web.c       (working copy)
> @@ -335,9 +335,16 @@ web_main (void)
>    unsigned int uses_num = 0;
>    rtx insn;
> 
> +  /* Add the flags and problems we need.  The DF_EQ_NOTES flag is set so
> +     that uses of registers in REG_EQUAL and REG_EQUIV notes are included
> +     in the web that their DEF belongs to, so that these uses are also
> +     properly renamed.  The DF_NOTE problem is added to make sure that
> +     all notes are up-to-date and valid: Re-computing the notes problem
> +     also cleans up all dead REG_EQUAL notes.  */
>    df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES);
>    df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
>    df_chain_add_problem (DF_UD_CHAIN);
> +  df_note_add_problem ();
>    df_analyze ();
>    df_set_flags (DF_DEFER_INSN_RESCAN);

... that's not very satisfactory, as web doesn't use the DF_NOTE problem, so 
adding it just to clean things up in the other kind of notes is weird.

Can't we arrange to clean up the REG_EQUAL/REG_EQUIV notes when we use them, 
i.e. when DF_EQ_NOTES is set?
Steven Bosscher - Nov. 19, 2012, 10:03 p.m.
On Mon, Nov 19, 2012 at 10:50 PM, Eric Botcazou wrote:
>> The root cause is the bad REG_EQUAL note. I think the most robust
>> solution is to make the webizer re-compute notes before renaming.
>> Patch for that is attached.
>
> Thanks for the analysis.  However...
>
>> Ciao!
>> Steven
>>
>>
>>         PR rtl-optimization/55006
>>         * web.c (web_main): Add the DF_NOTE problem, and explain whatfor.
>>
>> Index: web.c
>> ===================================================================
>> --- web.c       (revision 193454)
>> +++ web.c       (working copy)
>> @@ -335,9 +335,16 @@ web_main (void)
>>    unsigned int uses_num = 0;
>>    rtx insn;
>>
>> +  /* Add the flags and problems we need.  The DF_EQ_NOTES flag is set so
>> +     that uses of registers in REG_EQUAL and REG_EQUIV notes are included
>> +     in the web that their DEF belongs to, so that these uses are also
>> +     properly renamed.  The DF_NOTE problem is added to make sure that
>> +     all notes are up-to-date and valid: Re-computing the notes problem
>> +     also cleans up all dead REG_EQUAL notes.  */
>>    df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES);
>>    df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
>>    df_chain_add_problem (DF_UD_CHAIN);
>> +  df_note_add_problem ();
>>    df_analyze ();
>>    df_set_flags (DF_DEFER_INSN_RESCAN);
>
> ... that's not very satisfactory, as web doesn't use the DF_NOTE problem, so
> adding it just to clean things up in the other kind of notes is weird.

True.

> Can't we arrange to clean up the REG_EQUAL/REG_EQUIV notes when we use them,
> i.e. when DF_EQ_NOTES is set?

That could be done, yes. Cleaning up the REG_EQ* notes requires
liveness at the insn level, so it'd require a bigger re-organization
of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES)
over all insn in df_lr_finalize, tracking liveness and calling
df_remove_dead_eq_notes on each insn.

But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag.

Perhaps df_note_add_problem should imply setting DF_EQ_NOTES?

Ciao!
Steven
Eric Botcazou - Nov. 19, 2012, 10:42 p.m.
> That could be done, yes. Cleaning up the REG_EQ* notes requires
> liveness at the insn level, so it'd require a bigger re-organization
> of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES)
> over all insn in df_lr_finalize, tracking liveness and calling
> df_remove_dead_eq_notes on each insn.
> 
> But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag.

Right.  After the DF merge, the de-facto consensus was that individual passes 
were still responsible for cleaning up the REG_EQ* notes when they change the 
code, whereas the REG_DEAD/REG_UNUSED notes are entirely handled by DF.  As 
such, the bug here is in the unroller, not in the webizer.

So I think that we should try to fix the unroller first.  If that's not really 
doable, then a fallback could be to shield the passes using DF_EQ_NOTES from 
dangling REG_EQ* notes by means of DF.  But using df_note_add_problem for this 
task seems to be a little far-fetched.

> Perhaps df_note_add_problem should imply setting DF_EQ_NOTES?

Do you mean the opposite, i.e setting DF_EQ_NOTES should imply adding DF_NOTE?
Steven Bosscher - Nov. 19, 2012, 10:47 p.m.
On Mon, Nov 19, 2012 at 11:42 PM, Eric Botcazou wrote:
>> That could be done, yes. Cleaning up the REG_EQ* notes requires
>> liveness at the insn level, so it'd require a bigger re-organization
>> of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES)
>> over all insn in df_lr_finalize, tracking liveness and calling
>> df_remove_dead_eq_notes on each insn.
>>
>> But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag.
>
> Right.  After the DF merge, the de-facto consensus was that individual passes
> were still responsible for cleaning up the REG_EQ* notes when they change the
> code, whereas the REG_DEAD/REG_UNUSED notes are entirely handled by DF.  As
> such, the bug here is in the unroller, not in the webizer.

Then it's in -fsplit-ivs-in-unroller. Looking into it.

Ciao!
Steven
Steven Bosscher - Nov. 23, 2012, 10:45 p.m.
On Mon, Nov 19, 2012 at 11:47 PM, Steven Bosscher wrote:
> On Mon, Nov 19, 2012 at 11:42 PM, Eric Botcazou wrote:
>>> That could be done, yes. Cleaning up the REG_EQ* notes requires
>>> liveness at the insn level, so it'd require a bigger re-organization
>>> of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES)
>>> over all insn in df_lr_finalize, tracking liveness and calling
>>> df_remove_dead_eq_notes on each insn.
>>>
>>> But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag.
>>
>> Right.  After the DF merge, the de-facto consensus was that individual passes
>> were still responsible for cleaning up the REG_EQ* notes when they change the
>> code, whereas the REG_DEAD/REG_UNUSED notes are entirely handled by DF.  As
>> such, the bug here is in the unroller, not in the webizer.
>
> Then it's in -fsplit-ivs-in-unroller. Looking into it.

This bug is giving me head-aches, but I think I finally understand
what is going wrong here.

Compiling the C test case with --param max-unroll-times=2, you get the
attached dumps. The loop that is unrolled to wrong-code is
bb10->bb7->bb8->bb9->bb10 in the loop2_unswitch dump (the last dump
before unrolling). The IV that is being split is in r132. In basic
block 9 of the loop2_unswitch dump there is the IV increment in insn
78:

   78 r132:SI=r132:SI+0x4

In the loop2_unroll dump the loop is
bb10->bb7->bb8->bb9->bb23->bb24->bb25->bb26->bb10. The loop is
unrolled by copying the body once, and the IV increment is expanded
into r165 at the end of basic block 9 in the loop2_unroll dump. This
makes r132 die after insn 181. The incremented IV is copied back into
r132 in insn 78, but there are no no-EQ_USES of this DEF so r132 is
still dead after insn 78. The IV increment of the unrolled loop
happens in insn 175 in basic block 25, which makes r132 live again:

  181 r165:SI=r132:SI+0x4
   78 r132:SI=r165:SI
...
  175 r132:SI=r165:SI+0x4

The insn with the REG_EQUAL note using r132 is insn 172 in the copied body:

  172 r131:SF=r158:SF
      REG_EQUAL: [r132:SI]
      REG_DEAD: r158:SF

With the (assumed agreed) semantics of REG_EQ* notes being invalid if
they use a register that isn't live in the insn with the note, this
REG_EQUAL note is invalid because r132 is dead after insn 78.

The note would be valid if loop2_unroll would copy-prop out r132 for
r165, or if it would remove the note in the copied body. I'd like to
do the former but DEF-USE chains aren't available at this point. We
could run CPROP before web (instead of web before CPROP like we do
now) which would make the REG_EQUAL note valid again, but that'd be
papering over the problem that loop2_unroll leaves this bad note.
Removing the note is easier but it may hurt optimization later on:
CSE2 puts the note back in but this introduces a pass ordering
dependency. Perhaps that's not a big problem, but I'm not sure...

Anyway, just so you know I haven't forgotten about this one yet :-)

Ciao!
Steven
Steven Bosscher - Nov. 24, 2012, 1:09 a.m.
On Fri, Nov 23, 2012 at 11:45 PM, Steven Bosscher wrote:
> Removing the note is easier but it may hurt optimization later on:
> CSE2 puts the note back in but this introduces a pass ordering
> dependency. Perhaps that's not a big problem, but I'm not sure...

Hmm, actually CSE2 fails to put the note back, I guess because it's a
local CSE pass only...

Ciao!
Steven

Patch

Index: web.c
===================================================================
--- web.c       (revision 193454)
+++ web.c       (working copy)
@@ -335,9 +335,16 @@  web_main (void)
   unsigned int uses_num = 0;
   rtx insn;

+  /* Add the flags and problems we need.  The DF_EQ_NOTES flag is set so
+     that uses of registers in REG_EQUAL and REG_EQUIV notes are included
+     in the web that their DEF belongs to, so that these uses are also
+     properly renamed.  The DF_NOTE problem is added to make sure that
+     all notes are up-to-date and valid: Re-computing the notes problem
+     also cleans up all dead REG_EQUAL notes.  */
   df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES);
   df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
   df_chain_add_problem (DF_UD_CHAIN);
+  df_note_add_problem ();
   df_analyze ();
   df_set_flags (DF_DEFER_INSN_RESCAN);