diff mbox

Fix bootstrap with -mno-accumulate-outgoing-args

Message ID 20140101142311.GC26209@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Jan. 1, 2014, 2:23 p.m. UTC
Hi,
this patch fixes ICE seen with -mno-accumulate-outgoing-args bootstrap building go runtime.
The ICE is in dwarf2cfi.c while checking that on all paths into given basic block stack
frames are same.  It goes away either with disabling crossjumping or sched2 but the problem
IMO really originates in combine-stack-adj.
Crossjumping combines two call seqences and breaks basic blocks in half of argument setups.

OK, sched2 reorders:

(insn 762 761 765 50 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 112 [0x70])) [17 MEM[(struct  *)&bs + 8B]+0 S4 A64])
        (const_int 8 [0x8])) ../../../../libgo/go/encoding/binary/binary.go:273 90 {*movsi_internal}
     (nil))

(insn 765 762 766 50 (set (mem:HI (reg/f:SI 7 sp) [49  S2 A16])
        (reg/v:HI 1 dx [orig:232 v ] [232])) ../../../../libgo/go/encoding/binary/binary.go:273 91 {*movhi_internal}
     (expr_list:REG_DEAD (reg/v:HI 1 dx [orig:232 v ] [232])
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
(insn 766 765 2326 50 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -12 [0xfffffffffffffff4])))
            (clobber (reg:CC 17 flags))
        ]) ../../../../libgo/go/encoding/binary/binary.go:273 265 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_ARGS_SIZE (const_int 28 [0x1c])
            (nil))))

to
(insn 766 761 762 54 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -12 [0xfffffffffffffff4])))
            (clobber (reg:CC 17 flags))
        ]) ../../../../libgo/go/encoding/binary/binary.go:273 265 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags) 
        (expr_list:REG_ARGS_SIZE (const_int 28 [0x1c])
            (nil))))
(insn:TI 762 766 765 54 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 124 [0x7c])) [17 MEM[(struct  *)&bs + 8B]+0 S4 A64])
        (const_int 8 [0x8])) ../../../../libgo/go/encoding/binary/binary.go:273 90 {*movsi_internal}
     (nil))     
(insn 765 762 770 54 (set (mem:HI (plus:SI (reg/f:SI 7 sp)
                (const_int 12 [0xc])) [49  S2 A16])
        (reg/v:HI 1 dx [orig:232 v ] [232])) ../../../../libgo/go/encoding/binary/binary.go:273 91 {*movhi_internal}
     (expr_list:REG_DEAD (reg/v:HI 1 dx [orig:232 v ] [232])
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))

Here insn 765 gets adjusted for insn766 wihtout updating REG_ARGS_SIZE.  
This is IMO not bug in scheduler but bug in fact that insn 775 should not have the note on it
when it is not adjusting stack pointer. This is because originally it was push instruction,
but combine-stack-adjustments turned it into move.
I think it should remove the ARG_SIZE note when doing so, since argument block size should
be corrently annotated by preceeding stack adjustment insn.

Bootstrapped/regtested x86_64-linux. Did not double check it is solves the ARM issue
reported in PR, too.
OK?

	* combine-stack-adj.c (combine_stack_adjustments_for_block): Remove
	ARG_SIZE note when adjustment was eliminated.

Comments

Ryan Mansfield Jan. 2, 2014, 2:46 p.m. UTC | #1
On 14-01-01 09:23 AM, Jan Hubicka wrote:
> Here insn 765 gets adjusted for insn766 wihtout updating REG_ARGS_SIZE.
> This is IMO not bug in scheduler but bug in fact that insn 775 should not have the note on it
> when it is not adjusting stack pointer. This is because originally it was push instruction,
> but combine-stack-adjustments turned it into move.
> I think it should remove the ARG_SIZE note when doing so, since argument block size should
> be corrently annotated by preceeding stack adjustment insn.
>
> Bootstrapped/regtested x86_64-linux. Did not double check it is solves the ARM issue
> reported in PR, too.

FWIW, I tested your patch and it doesn't address the ICE on ARM.

Regards,

Ryan Mansfield
Jan Hubicka Jan. 17, 2014, 9:32 p.m. UTC | #2
> 	* combine-stack-adj.c (combine_stack_adjustments_for_block): Remove
> 	ARG_SIZE note when adjustment was eliminated.

Ping...  This patch prevents me from switching the accumulate-args default
for generic and I am waiting for that witht he inliner tunning, so there is
quite a dependency chain.

Thanks,
HOnza
Jeff Law Jan. 21, 2014, 10:01 p.m. UTC | #3
On 01/17/14 14:32, Jan Hubicka wrote:
>> 	* combine-stack-adj.c (combine_stack_adjustments_for_block): Remove
>> 	ARG_SIZE note when adjustment was eliminated.
>
> Ping...  This patch prevents me from switching the accumulate-args default
> for generic and I am waiting for that witht he inliner tunning, so there is
> quite a dependency chain.
This is fine.

Can you add a testcase for this?

jeff
Jan Hubicka Jan. 21, 2014, 11:55 p.m. UTC | #4
> On 01/17/14 14:32, Jan Hubicka wrote:
> >>	* combine-stack-adj.c (combine_stack_adjustments_for_block): Remove
> >>	ARG_SIZE note when adjustment was eliminated.
> >
> >Ping...  This patch prevents me from switching the accumulate-args default
> >for generic and I am waiting for that witht he inliner tunning, so there is
> >quite a dependency chain.
> This is fine.
> 
> Can you add a testcase for this?

It occurs during bootstrap compiling go, so I think we should have this covered.
I will try to construct some artificial testcase in C, but it seems a bit
tricky.

Thank you!
Jan
> 
> jeff
H.J. Lu Jan. 23, 2014, 4:23 a.m. UTC | #5
On Tue, Jan 21, 2014 at 3:55 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 01/17/14 14:32, Jan Hubicka wrote:
>> >>    * combine-stack-adj.c (combine_stack_adjustments_for_block): Remove
>> >>    ARG_SIZE note when adjustment was eliminated.
>> >
>> >Ping...  This patch prevents me from switching the accumulate-args default
>> >for generic and I am waiting for that witht he inliner tunning, so there is
>> >quite a dependency chain.
>> This is fine.
>>
>> Can you add a testcase for this?
>
> It occurs during bootstrap compiling go, so I think we should have this covered.
> I will try to construct some artificial testcase in C, but it seems a bit
> tricky.
>

Enable X86_TUNE_ACCUMULATE_OUTGOING_ARGS for generic
caused:

FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 z == 8
FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  line 21 i == v + 1
FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
-funroll-loops  line 21 i == v + 1
FAIL: gcc.target/i386/pr35767-5.c scan-assembler-not movups

Should we fix

FAIL: gcc.target/i386/pr35767-5.c scan-assembler-not movups
Jan Hubicka Jan. 23, 2014, 3:40 p.m. UTC | #6
> FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
> -funroll-loops  line 21 i == v + 1
> FAIL: gcc.target/i386/pr35767-5.c scan-assembler-not movups
> 
> Should we fix
> 
> FAIL: gcc.target/i386/pr35767-5.c scan-assembler-not movups

I will look into this one.  It did not failed for me previously.
I wonder if the guality case is broken unwind info - will try to check.

Honza
> 
> 
> -- 
> H.J.
Richard Henderson Feb. 4, 2014, 4:23 p.m. UTC | #7
On 01/01/2014 06:23 AM, Jan Hubicka wrote:
>  	      last_sp_adjust = 0;
> +	      /* We no longer adjust stack size.  Whoever adjusted it earlier
> +		 hopefully got the note right.  */
> +	      note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
> +	      if (note)
> +		remove_note (insn, note);
>  	      continue;

This is a totally unfounded hope, by the way.  I'm certain that I tried this
the first time around, with similarly poor results.

See the g++.dg/opt/pr52727.C test case that fails with this patch.  I can only
assume you didn't examine i686 test results when you committed this?

The problem is that the last adjustment _did_ apply the correct note
adjustment, for the stack at that time.  But neither moving the note nor
dropping the note is correct.

Consider

					// args_size 4
	add	$4, %esp		// args_size 0
	push	%eax			// no note; fpu, not call related

transformed by csa to

	mov	%eax, (%esp)

If we move the note from the add to the mov, as we did prior to your patch, we
retain the essence of the pop, that we're no longer queueing arguments.  But if
we drop the note, as we do with your patch, then we still believe that we have
4 bytes of arguments on the stack.

You say a bootstrap with Go was the testcase that sent you in this direction?
I'll have a look...


r~
Jan Hubicka Feb. 4, 2014, 4:48 p.m. UTC | #8
> On 01/01/2014 06:23 AM, Jan Hubicka wrote:
> >  	      last_sp_adjust = 0;
> > +	      /* We no longer adjust stack size.  Whoever adjusted it earlier
> > +		 hopefully got the note right.  */
> > +	      note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
> > +	      if (note)
> > +		remove_note (insn, note);
> >  	      continue;
> 
> This is a totally unfounded hope, by the way.  I'm certain that I tried this

Sorry to hear that - that assumption seemed right in the testcase I loked into.

> the first time around, with similarly poor results.
> 
> See the g++.dg/opt/pr52727.C test case that fails with this patch.  I can only
> assume you didn't examine i686 test results when you committed this?
> 
> The problem is that the last adjustment _did_ apply the correct note
> adjustment, for the stack at that time.  But neither moving the note nor
> dropping the note is correct.
> 
> Consider
> 
> 					// args_size 4
> 	add	$4, %esp		// args_size 0
> 	push	%eax			// no note; fpu, not call related
> 
> transformed by csa to
> 
> 	mov	%eax, (%esp)
> 
> If we move the note from the add to the mov, as we did prior to your patch, we
> retain the essence of the pop, that we're no longer queueing arguments.  But if
> we drop the note, as we do with your patch, then we still believe that we have
> 4 bytes of arguments on the stack.
> 
> You say a bootstrap with Go was the testcase that sent you in this direction?
> I'll have a look...

Yes, if you remove the patch you should get ICE during go x86_64 bootstrap with
generic tuning.

I tried to describe the situation in original mail and the PR - the sequence was
same as you describe here, but comming from two consecutive calls, so args_size
was 8, then reduced to 0 by add and then increased to 8 by push.

I see in the testcase the psh comes from splitter - that is something I did
not consider indded. I am sorry for that.
How things are supposed to work in this case? So perhaps we need scheduler to
understand and move around the ARGS_SIZE note?

Honza
Richard Henderson Feb. 4, 2014, 5:06 p.m. UTC | #9
On 02/04/2014 08:48 AM, Jan Hubicka wrote:
> How things are supposed to work in this case? So perhaps we need scheduler to
> understand and move around the ARGS_SIZE note?

I believe we do need to have the scheduler move the notes around.


r~
Jan Hubicka Feb. 4, 2014, 5:35 p.m. UTC | #10
> On 02/04/2014 08:48 AM, Jan Hubicka wrote:
> > How things are supposed to work in this case? So perhaps we need scheduler to
> > understand and move around the ARGS_SIZE note?
> 
> I believe we do need to have the scheduler move the notes around.

If we need notes on non-stack adjusting insns as you seem to show in your testcase,
I guess it is the only way around.  Still combine stack adjust may be smart enough
to not produce redundant note in the case of go's ICE.

I am not terribly familiar with the code, will you look into it or shall I give it a try?

Thanks!
Honza
> 
> 
> r~
Richard Henderson Feb. 4, 2014, 5:40 p.m. UTC | #11
On 02/04/2014 09:35 AM, Jan Hubicka wrote:
> I am not terribly familiar with the code, will you look into it or shall I give it a try?

I'm looking at it.  Was there a PR for the Go bootstrap failure?


r~
Richard Henderson Feb. 5, 2014, 3:40 p.m. UTC | #12
On 02/04/2014 09:35 AM, Jan Hubicka wrote:
>> On 02/04/2014 08:48 AM, Jan Hubicka wrote:
>>> How things are supposed to work in this case? So perhaps we need scheduler to
>>> understand and move around the ARGS_SIZE note?
>>
>> I believe we do need to have the scheduler move the notes around.
> 
> If we need notes on non-stack adjusting insns as you seem to show in your testcase,
> I guess it is the only way around.  Still combine stack adjust may be smart enough
> to not produce redundant note in the case of go's ICE.
> 
> I am not terribly familiar with the code, will you look into it or shall I give it a try?

I had a brief look at find_modifiable_mems, which seems to be the only kind of
schedule-time dependency breaking that we do.  I started writing some new code
that would handle REG_ARGS_SIZE, but then considered that wasn't terribly
appropriate for stage4.

So I've left off with just the dependency links between the insns.  We'd need
these for the schedule-time adjustments anyway, and with them in place the
scheduler does not re-order the insns in a way that causes problems.

Testing for x86_64 has finished; i686 is nearly done.  I suppose the only
question I have is if anyone disagrees that OUTPUT is incorrect as the
dependency type.


r~
diff mbox

Patch

Index: combine-stack-adj.c
===================================================================
--- combine-stack-adj.c	(revision 206233)
+++ combine-stack-adj.c	(working copy)
@@ -567,6 +567,7 @@  combine_stack_adjustments_for_block (bas
 	      && try_apply_stack_adjustment (insn, reflist, 0,
 					     -last_sp_adjust))
 	    {
+	      rtx note;
 	      if (last2_sp_set)
 		maybe_move_args_size_note (last2_sp_set, last_sp_set, false);
 	      else
@@ -576,6 +577,11 @@  combine_stack_adjustments_for_block (bas
 	      reflist = NULL;
 	      last_sp_set = NULL_RTX;
 	      last_sp_adjust = 0;
+	      /* We no longer adjust stack size.  Whoever adjusted it earlier
+		 hopefully got the note right.  */
+	      note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
+	      if (note)
+		remove_note (insn, note);
 	      continue;
 	    }
 	}