diff mbox

[RFC] Cleanup DW_CFA_GNU_args_size handling

Message ID 4E36D44A.6020607@redhat.com
State New
Headers show

Commit Message

Richard Henderson Aug. 1, 2011, 4:28 p.m. UTC
This is related primarily to PR49864 but also to PR49879.

The fundamental problem in the first test case is that the
AVR target cannot perform

   (set (stack-pointer-rtx)
	(plus (stack-pointer-rtx) (const_int large)))

where "large" is in fact really quite small.  This gets
dutifully reloaded to

   (set (reg temp) (stack-pointer-rtx))

   (set (reg temp) (plus (reg temp) (const_int large)))

   (set (stack-pointer-rtx) (reg temp))

which, we have to admit, is perfectly reasonable.

The code that we had in stack_adjust_offset would not
properly handle anything except immediate addends to
the stack pointer.  Indeed, if it saw any other sort
of stack pointer adjustment, it gave up and assumed 
that we'd reset the args_size level to 0.

This exact problem hasn't shown up on other targets
either (1) because they ACCUMULATE_OUTGOING_ARGS, or
(2) while they may not be able to handle very large
addends to the stack pointer, we usually don't let
the pushed arguments grow to very large numbers; we
regularly reduce stack_pointer_delta.

AVR is special in that it can't *really* perform any
direct addition to the stack pointer at all; "large"
is non-zero only because of a peephole pattern to use
pop insns into a scratch register for small adjustments.

I briefly considered adding an insn pattern that would
handle the entire mov/add/mov thing with a scratch, so
that we wouldn't have to do anything special.  It seemed
relatively unlikely that any other port would be this
severely limited.

However, then came a look at the second PR, where we
performed cross-jumping between two blocks with different
args_size values.

For the specific case of the PR, this "merely" resulted
in wrong-debug, because we would in fact generate invalid
unwind info for one of the two paths.

That said, I see no reason why this same cross-jump could
not occur with a real throw as opposed to an abort.  And
in that case we would have wrong-code.

I consulted the IRC oracle, and Ian didn't have any 
objections in principal to a new reg-note.

Comments?


r~
* reg-notes.def (REG_ARGS_SIZE): New.
	* calls.c (emit_call_1): Emit REG_ARGS_SIZE for call_pop.
	(expand_call): Add REG_ARGS_SIZE to emit_stack_restore.
	* cfgcleanup.c (old_insns_match_p): Don't allow cross-jumping to
	different stack levels.
	* combine-stack-adj.c (adjust_frame_related_expr): Remove.
	(maybe_move_args_size_note): New.
	(combine_stack_adjustments_for_block): Use it.
	* combine.c (distribute_notes): Place REG_ARGS_SIZE.
	* dwarf2cfi.c (dw_cfi_row_struct): Remove args_size member.
	(dw_trace_info): Add beg_true_args_size, end_true_args_size,
	beg_delay_args_size, end_delay_args_size, eh_head, args_size_undefined.
	(cur_cfa): New.
	(queued_args_size): Remove.
	(add_cfi_args_size): Assert size is non-negative.
	(stack_adjust_offset, dwarf2out_args_size): Remove.
	(dwarf2out_stack_adjust, dwarf2out_notice_stack_adjust): Remove.
	(notice_args_size, notice_eh_throw): New.
	(dwarf2out_frame_debug_def_cfa): Use cur_cfa.
	(dwarf2out_frame_debug_adjust_cfa): Likewise.
	(dwarf2out_frame_debug_cfa_offset): Likewise.
	(dwarf2out_frame_debug_expr): Likewise.  Don't stack_adjust_offset.
	(dwarf2out_frame_debug): Don't handle non-frame-related-p insns.
	(change_cfi_row): Don't emit args_size.
	(maybe_record_trace_start_abnormal): Split out from ...
	(maybe_record_trace_start): Here.  Set args_size_undefined.
	(create_trace_edges): Update to match.
	(scan_trace): Handle REG_ARGS_SIZE.
	(connect_traces): Connect args_size between EH insns.
	* emit-rtl.c (try_split): Handle REG_ARGS_SIZE.
	* explow.c (suppress_reg_args_size): New.
	(adjust_stack_1): Split out from ...
	(adjust_stack): ... here.
	(anti_adjust_stack): Use it.
	(allocate_dynamic_stack_space): Suppress REG_ARGS_SIZE.
	* expr.c (mem_autoinc_base): New.
	(fixup_args_size_notes): New.
	(emit_single_push_insn_1): Rename from emit_single_push_insn.
	(emit_single_push_insn): New.  Generate REG_ARGS_SIZE.
	* recog.c (peep2_attempt): Handle REG_ARGS_SIZE.
	* reload1.c (reload_as_needed): Likewise.
	* rtl.h (fixup_args_size_notes): Declare.

Comments

Georg-Johann Lay Aug. 1, 2011, 6:42 p.m. UTC | #1
Richard Henderson schrieb:
> This is related primarily to PR49864 but also to PR49879.
> 
> The fundamental problem in the first test case is that the
> AVR target cannot perform
> 
>    (set (stack-pointer-rtx)
> 	(plus (stack-pointer-rtx) (const_int large)))
> 
> where "large" is in fact really quite small.  This gets
> dutifully reloaded to
> 
>    (set (reg temp) (stack-pointer-rtx))
> 
>    (set (reg temp) (plus (reg temp) (const_int large)))
> 
>    (set (stack-pointer-rtx) (reg temp))
> 
> which, we have to admit, is perfectly reasonable.
> 
> The code that we had in stack_adjust_offset would not
> properly handle anything except immediate addends to
> the stack pointer.  Indeed, if it saw any other sort
> of stack pointer adjustment, it gave up and assumed 
> that we'd reset the args_size level to 0.
> 
> This exact problem hasn't shown up on other targets
> either (1) because they ACCUMULATE_OUTGOING_ARGS, or
> (2) while they may not be able to handle very large
> addends to the stack pointer, we usually don't let
> the pushed arguments grow to very large numbers; we
> regularly reduce stack_pointer_delta.
> 
> AVR is special in that it can't *really* perform any
> direct addition to the stack pointer at all; "large"
> is non-zero only because of a peephole pattern to use
> pop insns into a scratch register for small adjustments.

In CCing Denis.

http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00075.html

Is there a specific reason not to define
ACCUMULATE_OUTGOING_ARGS on AVR?

Addind/subtracting to SP is very expensive on AVR, even
IRQs have to be disabled.

Johann

> I briefly considered adding an insn pattern that would
> handle the entire mov/add/mov thing with a scratch, so
> that we wouldn't have to do anything special.  It seemed
> relatively unlikely that any other port would be this
> severely limited.
> 
> However, then came a look at the second PR, where we
> performed cross-jumping between two blocks with different
> args_size values.
> 
> For the specific case of the PR, this "merely" resulted
> in wrong-debug, because we would in fact generate invalid
> unwind info for one of the two paths.
> 
> That said, I see no reason why this same cross-jump could
> not occur with a real throw as opposed to an abort.  And
> in that case we would have wrong-code.
> 
> I consulted the IRC oracle, and Ian didn't have any 
> objections in principal to a new reg-note.
> 
> Comments?
> 
> r~
Richard Henderson Aug. 1, 2011, 6:46 p.m. UTC | #2
On 08/01/2011 11:42 AM, Georg-Johann Lay wrote:
> Is there a specific reason not to define
> ACCUMULATE_OUTGOING_ARGS on AVR?

Yes.  So that you can use PUSH.  But as I said in PR49881,
you probably want to provide -maccumulate-outgoing-args.

I have a follow-up patch to the last one in that PR...


r~
Denis Chertykov Aug. 1, 2011, 7:52 p.m. UTC | #3
2011/8/1 Richard Henderson <rth@redhat.com>:
> On 08/01/2011 11:42 AM, Georg-Johann Lay wrote:
>> Is there a specific reason not to define
>> ACCUMULATE_OUTGOING_ARGS on AVR?

I havn't define ACCUMULATE_OUTGOING_ARGS because AVR have a very small
displacement for memory addressing (63 bytes) and I think that better
to have a minimal possible frame size (because of difficult addressing
of a local variables outside of fp+63 boundary).

Generally, ACCUMULATE_OUTGOING_ARGS enlarge the frame size.

Denis.

PS: I didn't try ACCUMULATE_OUTGOING_ARGS for AVR. May be results will
be better than I think.
Richard Henderson Aug. 2, 2011, 10:32 p.m. UTC | #4
I got Jeff Law to review the reload change on IRC
and committed the composite patch.

Tested on x86_64, i586, avr, and h8300.  Most other
tier1 targets ought not be affected, as this patch
only applies to ACCUMULATE_OUTGOING_ARGS == 0 targets.


r~
Georg-Johann Lay Aug. 3, 2011, 12:14 p.m. UTC | #5
Richard Henderson wrote:
> On 08/01/2011 11:42 AM, Georg-Johann Lay wrote:
>> Is there a specific reason not to define
>> ACCUMULATE_OUTGOING_ARGS on AVR?
> 
> Yes.  So that you can use PUSH.  But as I said in PR49881,
> you probably want to provide -maccumulate-outgoing-args.
> 
> I have a follow-up patch to the last one in that PR...
> 
> 
> r~

PUSH is fine but what about POP?

It's very expensive to pop several bytes, i.e. disabling IRQs, loading and storing SP and the like.
Usung store+displacement has not this drawback and as I wrote, come code degradations you explained
in PR49881 are artifacts of PR46278, i.e. fake X addressing.

Johann
Georg-Johann Lay Aug. 3, 2011, 2:07 p.m. UTC | #6
Georg-Johann Lay wrote:
> Richard Henderson wrote:
>> On 08/01/2011 11:42 AM, Georg-Johann Lay wrote:
>>> Is there a specific reason not to define
>>> ACCUMULATE_OUTGOING_ARGS on AVR?
>> Yes.  So that you can use PUSH.  But as I said in PR49881,
>> you probably want to provide -maccumulate-outgoing-args.
>>
>> I have a follow-up patch to the last one in that PR...
>>
>>
>> r~
> 
> PUSH is fine but what about POP?
> 
> It's very expensive to pop several bytes, i.e. disabling IRQs, loading and storing SP and the like.
> Usung store+displacement has not this drawback and as I wrote, come code degradations you explained
> in PR49881 are artifacts of PR46278, i.e. fake X addressing.
> 
> Johann
> 

Tried this test case:

#include <stdio.h>

void foo ()
{
    printf ("%d %d %d", 1, 2, 3);
    printf ("%d %d %d", 3, 4, 5);
    printf ("%d %d %d", 1, 4, 5);
}

Attached the output: The compiler happily pushes onto the stack
but pops only at the end of the function. So in a function with
many such calls that would eat up great deal of RAM. It that
what we want?

RETURN_POPS_ARGS cannot help here.

Johann
.file	"printf.c"
__SREG__ = 0x3f
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__tmp_reg__ = 0
__zero_reg__ = 1
 ;  GNU C (GCC) version 4.7.0 20110803 (experimental) (avr)
 ; 	compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP version 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
 ;  GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
 ;  options passed:  printf.c -Os -fverbose-asm
 ;  options enabled:  -fauto-inc-dec -fbranch-count-reg -fcaller-saves
 ;  -fcombine-stack-adjustments -fcommon -fcompare-elim -fcprop-registers
 ;  -fcrossjumping -fcse-follow-jumps -fdebug-types-section -fdefer-pop
 ;  -fdevirtualize -fearly-inlining -feliminate-unused-debug-types
 ;  -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse
 ;  -fgcse-lm -fguess-branch-probability -fident -fif-conversion
 ;  -fif-conversion2 -findirect-inlining -finline -finline-functions
 ;  -finline-functions-called-once -finline-small-functions -fipa-cp
 ;  -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
 ;  -fira-share-save-slots -fira-share-spill-slots -fivopts
 ;  -fkeep-static-consts -fleading-underscore -fmath-errno
 ;  -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 ;  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 ;  -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays
 ;  -freg-struct-return -fregmove -freorder-blocks -freorder-functions
 ;  -frerun-cse-after-loop -fsched-critical-path-heuristic
 ;  -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
 ;  -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
 ;  -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column
 ;  -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types
 ;  -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields
 ;  -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 ;  -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
 ;  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse
 ;  -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im
 ;  -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 ;  -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc -ftree-scev-cprop
 ;  -ftree-sink -ftree-slp-vectorize -ftree-sra -ftree-switch-conversion
 ;  -ftree-ter -ftree-vect-loop-version -ftree-vrp -funit-at-a-time
 ;  -fverbose-asm -fzero-initialized-in-bss

	.section	.rodata.str1.1,"aMS",@progbits,1
.LC0:
	.string	"%d %d %d"
	.text
.global	foo
	.type	foo, @function
foo:
	push r14	 ; 	 ;  66	pushqi1/1	[length = 1]
	push r15	 ; 	 ;  67	pushqi1/1	[length = 1]
	push r16	 ; 	 ;  68	pushqi1/1	[length = 1]
	push r17	 ; 	 ;  69	pushqi1/1	[length = 1]
	push r28	 ; 	 ;  70	pushqi1/1	[length = 1]
	push r29	 ; 	 ;  71	pushqi1/1	[length = 1]
/* prologue: function */
/* frame size = 0 */
/* stack size = 6 */
.L__stack_usage = 6
	push __zero_reg__	 ;  5	pushqi1/2	[length = 1]
	ldi r24,lo8(3)	 ; ,	 ;  82	*reload_inqi	[length = 2]
	mov r14,r24	 ;  tmp42,
	push r14	 ;  tmp42	 ;  7	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  8	pushqi1/2	[length = 1]
	ldi r24,lo8(2)	 ;  tmp43,	 ;  9	*movqi/2	[length = 1]
	push r24	 ;  tmp43	 ;  10	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  11	pushqi1/2	[length = 1]
	ldi r17,lo8(1)	 ;  tmp44,	 ;  12	*movqi/2	[length = 1]
	push r17	 ;  tmp44	 ;  13	pushqi1/1	[length = 1]
	ldi r28,lo8(.LC0)	 ;  tmp45,	 ;  14	*movhi/4	[length = 2]
	ldi r29,hi8(.LC0)	 ;  tmp45,
	push r29	 ;  tmp24	 ;  16	pushqi1/1	[length = 1]
	push r28	 ;  tmp25	 ;  19	pushqi1/1	[length = 1]
	rcall printf	 ; 	 ;  20	*call_value_insn/2	[length = 1]
	push __zero_reg__	 ;  21	pushqi1/2	[length = 1]
	ldi r25,lo8(5)	 ; ,	 ;  83	*reload_inqi	[length = 2]
	mov r15,r25	 ;  tmp49,
	push r15	 ;  tmp49	 ;  23	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  24	pushqi1/2	[length = 1]
	ldi r16,lo8(4)	 ;  tmp50,	 ;  25	*movqi/2	[length = 1]
	push r16	 ;  tmp50	 ;  26	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  27	pushqi1/2	[length = 1]
	push r14	 ;  tmp42	 ;  29	pushqi1/1	[length = 1]
	push r29	 ;  tmp24	 ;  32	pushqi1/1	[length = 1]
	push r28	 ;  tmp25	 ;  35	pushqi1/1	[length = 1]
	rcall printf	 ; 	 ;  36	*call_value_insn/2	[length = 1]
	push __zero_reg__	 ;  37	pushqi1/2	[length = 1]
	push r15	 ;  tmp49	 ;  39	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  40	pushqi1/2	[length = 1]
	push r16	 ;  tmp50	 ;  42	pushqi1/1	[length = 1]
	push __zero_reg__	 ;  43	pushqi1/2	[length = 1]
	push r17	 ;  tmp44	 ;  45	pushqi1/1	[length = 1]
	push r29	 ;  tmp24	 ;  48	pushqi1/1	[length = 1]
	push r28	 ;  tmp25	 ;  51	pushqi1/1	[length = 1]
	rcall printf	 ; 	 ;  52	*call_value_insn/2	[length = 1]
	in r24,__SP_L__	 ; 	 ;  64	*movhi_sp/2	[length = 2]
	in r25,__SP_H__	 ; 
	adiw r24,24	 ; ,	 ;  53	*addhi3/2	[length = 1]
	in __tmp_reg__,__SREG__	 ;  65	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r25	 ; 
	out __SREG__,__tmp_reg__
	out __SP_L__,r24	 ; 
/* epilogue start */
	pop r29	 ; 	 ;  74	popqi	[length = 1]
	pop r28	 ; 	 ;  75	popqi	[length = 1]
	pop r17	 ; 	 ;  76	popqi	[length = 1]
	pop r16	 ; 	 ;  77	popqi	[length = 1]
	pop r15	 ; 	 ;  78	popqi	[length = 1]
	pop r14	 ; 	 ;  79	popqi	[length = 1]
	ret	 ;  80	return_from_epilogue	[length = 1]
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.7.0 20110803 (experimental)"
.global __do_copy_data
H.J. Lu Aug. 3, 2011, 2:31 p.m. UTC | #7
On Tue, Aug 2, 2011 at 3:32 PM, Richard Henderson <rth@redhat.com> wrote:
> I got Jeff Law to review the reload change on IRC
> and committed the composite patch.
>
> Tested on x86_64, i586, avr, and h8300.  Most other
> tier1 targets ought not be affected, as this patch
> only applies to ACCUMULATE_OUTGOING_ARGS == 0 targets.
>

It may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49964
Richard Henderson Aug. 3, 2011, 3:13 p.m. UTC | #8
On 08/03/2011 07:07 AM, Georg-Johann Lay wrote:
> #include <stdio.h>
> 
> void foo ()
> {
>     printf ("%d %d %d", 1, 2, 3);
>     printf ("%d %d %d", 3, 4, 5);
>     printf ("%d %d %d", 1, 4, 5);
> }
> 
> Attached the output: The compiler happily pushes onto the stack
> but pops only at the end of the function. So in a function with
> many such calls that would eat up great deal of RAM. It that
> what we want?

Add more printfs and find out.  We'll consume 32 bytes and then
pop it all off in the middle of the function, then start again.

See the use of pending_stack_adjust in expand_call.


r~
Georg-Johann Lay Aug. 3, 2011, 3:47 p.m. UTC | #9
Richard Henderson wrote:
> On 08/03/2011 07:07 AM, Georg-Johann Lay wrote:
>> #include <stdio.h>
>>
>> void foo ()
>> {
>>     printf ("%d %d %d", 1, 2, 3);
>>     printf ("%d %d %d", 3, 4, 5);
>>     printf ("%d %d %d", 1, 4, 5);
>> }
>>
>> Attached the output: The compiler happily pushes onto the stack
>> but pops only at the end of the function. So in a function with
>> many such calls that would eat up great deal of RAM. It that
>> what we want?
> 
> Add more printfs and find out.  We'll consume 32 bytes and then
> pop it all off in the middle of the function, then start again.
> 
> See the use of pending_stack_adjust in expand_call.
> 
> 
> r~
> 

Yes, the following blocks are sprinkled all over the function:

	in r24,__SP_L__	 ;  222	*movhi_sp/2	[length = 2]
	in r25,__SP_H__
	adiw r24,32	 ;  134	*addhi3/2	[length = 1]
	in __tmp_reg__,__SREG__	 ;  223	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r25
	out __SREG__,__tmp_reg__
	out __SP_L__,r24

With ACCUMULATE_OUTGOING_ARGS it looks much better: there is just
one such block in the prologue/epilogue.

I think ACCUMULATE_OUTGOING_ARGS would be a win definitely.

Pushing more than 63 bytes for one function is no real world
code for AVR and I think we can focus on functions receiving
<= 63 bytes on stack.

Johann
Richard Henderson Aug. 3, 2011, 3:53 p.m. UTC | #10
On 08/03/2011 08:47 AM, Georg-Johann Lay wrote:
> With ACCUMULATE_OUTGOING_ARGS it looks much better: there is just
> one such block in the prologue/epilogue.
> 
> I think ACCUMULATE_OUTGOING_ARGS would be a win definitely.

That's what I thought too, but with the test case in PR49881
I couldn't make A_O_A come out smaller than PUSHes.  The
reason being the function didn't otherwise need a frame
pointer and we got to use REG_Y for something more useful.

Perhaps that test case isn't the rule for real-world code.
That's why I suggested implementing the command-line switch,
to give the user an option of trying both and selecting the
smaller option for their particular code.


r~
Jeff Law Aug. 3, 2011, 4:24 p.m. UTC | #11
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/03/11 08:07, Georg-Johann Lay wrote:
> Georg-Johann Lay wrote:
>> Richard Henderson wrote:
>>> On 08/01/2011 11:42 AM, Georg-Johann Lay wrote:
>>>> Is there a specific reason not to define 
>>>> ACCUMULATE_OUTGOING_ARGS on AVR?
>>> Yes.  So that you can use PUSH.  But as I said in PR49881, you
>>> probably want to provide -maccumulate-outgoing-args.
>>> 
>>> I have a follow-up patch to the last one in that PR...
>>> 
>>> 
>>> r~
>> 
>> PUSH is fine but what about POP?
>> 
>> It's very expensive to pop several bytes, i.e. disabling IRQs,
>> loading and storing SP and the like. Usung store+displacement has
>> not this drawback and as I wrote, come code degradations you
>> explained in PR49881 are artifacts of PR46278, i.e. fake X
>> addressing.
>> 
>> Johann
>> 
> 
> Tried this test case:
> 
> #include <stdio.h>
> 
> void foo () { printf ("%d %d %d", 1, 2, 3); printf ("%d %d %d", 3, 4,
> 5); printf ("%d %d %d", 1, 4, 5); }
> 
> Attached the output: The compiler happily pushes onto the stack but
> pops only at the end of the function. So in a function with many such
> calls that would eat up great deal of RAM. It that what we want?
> 
> RETURN_POPS_ARGS cannot help here.
Popping arguments is deferred until the accumulated size of the deferred
argument pops reaches a particular threshold or certain other conditions
are met (see NO_DEFER_POP).

I don't recall the parameters used to determine when the accumulated
size is large enough to force pops, but I'm sure you can find it with a
little searching in the GCC sources.

Ahhh, memories of the m68k...

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOOXY4AAoJEBRtltQi2kC79VkH/Ar4NFKpU9mMXS+Pswz3vuOT
6Mv0pWGmuACBz7r69NZ9WNb2PQBXTJUp/vQJUbvKxIBFWmtEc6wRO1kYz3NuZ9FX
MRTXO+Mo3TWkFtzr5krq6d3r0CYVwa3ta/U/XNwvoo+XAUVkuYQgJi0C85yBQBwM
M9Q8nzWaJxAohSh+r6tOIBBdFG66YnfmZAYxLnsOaS5akm4tMuS6D5KFnJVJZbc7
btjNYzLvjCBXuekuHCyaq3HxpTTmhKWCFXuy55fUiOZGJJabPfTfbXlRaM5pGkPk
eo6gR9ydvMIt0RbYCSpqNy1RE+bOfeawqnoE/Mx86ZIzJZuLWOz+3K8UI5cINJ4=
=2Kgl
-----END PGP SIGNATURE-----
H.J. Lu Aug. 3, 2011, 5:39 p.m. UTC | #12
On Wed, Aug 3, 2011 at 7:31 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 2, 2011 at 3:32 PM, Richard Henderson <rth@redhat.com> wrote:
>> I got Jeff Law to review the reload change on IRC
>> and committed the composite patch.
>>
>> Tested on x86_64, i586, avr, and h8300.  Most other
>> tier1 targets ought not be affected, as this patch
>> only applies to ACCUMULATE_OUTGOING_ARGS == 0 targets.
>>
>
> It may have caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49964
>
>

It breaks -march=corei7-avx.
Jie Zhang Dec. 21, 2011, 4:31 a.m. UTC | #13
Hi,

On Tue, Aug 2, 2011 at 6:32 PM, Richard Henderson <rth@redhat.com> wrote:
> I got Jeff Law to review the reload change on IRC
> and committed the composite patch.
>
> Tested on x86_64, i586, avr, and h8300.  Most other
> tier1 targets ought not be affected, as this patch
> only applies to ACCUMULATE_OUTGOING_ARGS == 0 targets.
>
This commit may have caused

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51552


Regards,
Jie
diff mbox

Patch

diff --git a/gcc/calls.c b/gcc/calls.c
index dfa9ceb..ab42fd6 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -434,6 +434,8 @@  emit_call_1 (rtx funexp, tree fntree ATTRIBUTE_UNUSED, tree fndecl ATTRIBUTE_UNU
       rounded_stack_size_rtx = GEN_INT (rounded_stack_size);
       stack_pointer_delta -= n_popped;
 
+      add_reg_note (call_insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+
       /* If popup is needed, stack realign must use DRAP  */
       if (SUPPORTS_STACK_ALIGNMENT)
         crtl->need_drap = true;
@@ -3126,8 +3128,19 @@  expand_call (tree exp, rtx target, int ignore)
 
       if (old_stack_level)
 	{
+	  rtx last, set;
+
 	  emit_stack_restore (SAVE_BLOCK, old_stack_level);
 	  stack_pointer_delta = old_stack_pointer_delta;
+
+	  /* ??? Is this assert warrented, given emit_stack_restore?
+	     or should we just mark the last insn no matter what?  */
+	  last = get_last_insn ();
+	  set = single_set (last);
+	  gcc_assert (set != NULL);
+	  gcc_assert (SET_DEST (set) == stack_pointer_rtx);
+	  add_reg_note (last, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+
 	  pending_stack_adjust = old_pending_adj;
 	  old_stack_allocated = stack_pointer_delta - pending_stack_adjust;
 	  stack_arg_under_construction = old_stack_arg_under_construction;
diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 7f72e68..7173013 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -1078,6 +1078,16 @@  old_insns_match_p (int mode ATTRIBUTE_UNUSED, rtx i1, rtx i2)
   if (NOTE_INSN_BASIC_BLOCK_P (i1) && NOTE_INSN_BASIC_BLOCK_P (i2))
     return dir_both;
 
+  /* ??? Do not allow cross-jumping between different stack levels.  */
+  p1 = find_reg_note (i1, REG_ARGS_SIZE, NULL);
+  p2 = find_reg_note (i2, REG_ARGS_SIZE, NULL);
+  if (p1)
+    p1 = XEXP (p1, 0);
+  if (p2)
+    p2 = XEXP (p2, 0);
+  if (!rtx_equal_p (p1, p2))
+    return dir_none;
+
   p1 = PATTERN (i1);
   p2 = PATTERN (i2);
 
diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index d267b70..bca0784 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -296,68 +296,22 @@  record_stack_refs (rtx *xp, void *data)
   return 0;
 }
 
-/* Adjust or create REG_FRAME_RELATED_EXPR note when merging a stack
-   adjustment into a frame related insn.  */
+/* If INSN has a REG_ARGS_SIZE note, move it to LAST.  */
 
 static void
-adjust_frame_related_expr (rtx last_sp_set, rtx insn,
-			   HOST_WIDE_INT this_adjust)
+maybe_move_args_size_note (rtx last, rtx insn)
 {
-  rtx note = find_reg_note (last_sp_set, REG_FRAME_RELATED_EXPR, NULL_RTX);
-  rtx new_expr = NULL_RTX;
+  rtx note, last_note;
 
-  if (note == NULL_RTX && RTX_FRAME_RELATED_P (insn))
+  note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
+  if (note == NULL)
     return;
 
-  if (note
-      && GET_CODE (XEXP (note, 0)) == SEQUENCE
-      && XVECLEN (XEXP (note, 0), 0) >= 2)
-    {
-      rtx expr = XEXP (note, 0);
-      rtx last = XVECEXP (expr, 0, XVECLEN (expr, 0) - 1);
-      int i;
-
-      if (GET_CODE (last) == SET
-	  && RTX_FRAME_RELATED_P (last) == RTX_FRAME_RELATED_P (insn)
-	  && SET_DEST (last) == stack_pointer_rtx
-	  && GET_CODE (SET_SRC (last)) == PLUS
-	  && XEXP (SET_SRC (last), 0) == stack_pointer_rtx
-	  && CONST_INT_P (XEXP (SET_SRC (last), 1)))
-	{
-	  XEXP (SET_SRC (last), 1)
-	    = GEN_INT (INTVAL (XEXP (SET_SRC (last), 1)) + this_adjust);
-	  return;
-	}
-
-      new_expr = gen_rtx_SEQUENCE (VOIDmode,
-				   rtvec_alloc (XVECLEN (expr, 0) + 1));
-      for (i = 0; i < XVECLEN (expr, 0); i++)
-	XVECEXP (new_expr, 0, i) = XVECEXP (expr, 0, i);
-    }
-  else
-    {
-      new_expr = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
-      if (note)
-	XVECEXP (new_expr, 0, 0) = XEXP (note, 0);
-      else
-	{
-	  rtx expr = copy_rtx (single_set_for_csa (last_sp_set));
-
-	  XEXP (SET_SRC (expr), 1)
-	    = GEN_INT (INTVAL (XEXP (SET_SRC (expr), 1)) - this_adjust);
-	  RTX_FRAME_RELATED_P (expr) = 1;
-	  XVECEXP (new_expr, 0, 0) = expr;
-	}
-    }
-
-  XVECEXP (new_expr, 0, XVECLEN (new_expr, 0) - 1)
-    = copy_rtx (single_set_for_csa (insn));
-  RTX_FRAME_RELATED_P (XVECEXP (new_expr, 0, XVECLEN (new_expr, 0) - 1))
-    = RTX_FRAME_RELATED_P (insn);
-  if (note)
-    XEXP (note, 0) = new_expr;
+  last_note = find_reg_note (last, REG_ARGS_SIZE, NULL_RTX);
+  if (last_note)
+    XEXP (last_note, 0) = XEXP (note, 0);
   else
-    add_reg_note (last_sp_set, REG_FRAME_RELATED_EXPR, new_expr);
+    add_reg_note (last, REG_ARGS_SIZE, XEXP (note, 0));
 }
 
 /* Subroutine of combine_stack_adjustments, called for each basic block.  */
@@ -431,9 +385,8 @@  combine_stack_adjustments_for_block (basic_block bb)
 						  last_sp_adjust + this_adjust,
 						  this_adjust))
 		    {
-		      if (RTX_FRAME_RELATED_P (last_sp_set))
-			adjust_frame_related_expr (last_sp_set, insn,
-						   this_adjust);
+		      maybe_move_args_size_note (last_sp_set, insn);
+
 		      /* It worked!  */
 		      delete_insn (insn);
 		      last_sp_adjust += this_adjust;
diff --git a/gcc/combine.c b/gcc/combine.c
index b5cf245..c24f081 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -13273,6 +13273,16 @@  distribute_notes (rtx notes, rtx from_insn, rtx i3, rtx i2, rtx elim_i2,
 	    }
 	  break;
 
+	case REG_ARGS_SIZE:
+	  {
+	    /* ??? How to distribute between i3-i1.  Assume i3 contains the
+	       entire adjustment.  Assert i3 contains at least some adjust.  */
+	    int old_size, args_size = INTVAL (XEXP (note, 0));
+	    old_size = fixup_args_size_notes (PREV_INSN (i3), i3, args_size);
+	    gcc_assert (old_size != args_size);
+	  }
+	  break;
+
 	case REG_NORETURN:
 	case REG_SETJMP:
 	  /* These notes must remain with the call.  It should not be
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 99b37ab..5757d1d 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -70,9 +70,6 @@  typedef struct GTY(()) dw_cfi_row_struct
 
   /* The expressions for any register column that is saved.  */
   cfi_vec reg_save;
-
-  /* The value of any DW_CFA_GNU_args_size.  */
-  HOST_WIDE_INT args_size;
 } dw_cfi_row;
 
 /* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
@@ -109,6 +106,16 @@  typedef struct
   /* The row state at the beginning and end of the trace.  */
   dw_cfi_row *beg_row, *end_row;
 
+  /* Tracking for DW_CFA_GNU_args_size.  The "true" sizes are those we find
+     while scanning insns.  However, the args_size value is irrelevant at
+     any point except can_throw_internal_p insns.  Therefore the "delay"
+     sizes the values that must actually be emitted for this trace.  */
+  HOST_WIDE_INT beg_true_args_size, end_true_args_size;
+  HOST_WIDE_INT beg_delay_args_size, end_delay_args_size;
+
+  /* The first EH insn in the trace, where beg_delay_args_size must be set.  */
+  rtx eh_head;
+
   /* The following variables contain data used in interpreting frame related
      expressions.  These are not part of the "real" row state as defined by
      Dwarf, but it seems like they need to be propagated into a trace in case
@@ -141,6 +148,9 @@  typedef struct
 
   /* True if this trace immediately follows NOTE_INSN_SWITCH_TEXT_SECTIONS.  */
   bool switch_sections;
+
+  /* True if we've seen different values incoming to beg_true_args_size.  */
+  bool args_size_undefined;
 } dw_trace_info;
 
 DEF_VEC_O (dw_trace_info);
@@ -179,6 +189,10 @@  static dw_trace_info *cur_trace;
 /* The current, i.e. most recently generated, row of the CFI table.  */
 static dw_cfi_row *cur_row;
 
+/* A copy of the current CFA, for use during the processing of a
+   single insn.  */
+static dw_cfa_location *cur_cfa;
+
 /* We delay emitting a register save until either (a) we reach the end
    of the prologue or (b) the register is clobbered.  This clusters
    register saves so that there are fewer pc advances.  */
@@ -194,10 +208,6 @@  DEF_VEC_ALLOC_O (queued_reg_save, heap);
 
 static VEC(queued_reg_save, heap) *queued_reg_saves;
 
-/* The (really) current value for DW_CFA_GNU_args_size.  We delay actually
-   emitting this data, i.e. updating CUR_ROW, without async unwind.  */
-static HOST_WIDE_INT queued_args_size;
-
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
@@ -413,6 +423,10 @@  add_cfi_args_size (HOST_WIDE_INT size)
 {
   dw_cfi_ref cfi = new_cfi ();
 
+  /* While we can occasionally have args_size < 0 internally, this state
+     should not persist at a point we actually need an opcode.  */
+  gcc_assert (size >= 0);
+
   cfi->dw_cfi_opc = DW_CFA_GNU_args_size;
   cfi->dw_cfi_oprnd1.dw_cfi_offset = size;
 
@@ -663,16 +677,6 @@  cfi_row_equal_p (dw_cfi_row *a, dw_cfi_row *b)
   else if (!cfa_equal_p (&a->cfa, &b->cfa))
     return false;
 
-  /* Logic suggests that we compare args_size here.  However, if
-     EXIT_IGNORE_STACK we don't bother tracking the args_size after
-     the last time it really matters within the function.  This does
-     in fact lead to paths with differing arg_size, but in cases for
-     which it doesn't matter.  */
-  /* ??? If we really want to sanity check the output of the optimizers,
-     find a way to backtrack from epilogues to the last EH site.  This
-     would allow us to distinguish regions with garbage args_size and
-     regions where paths ought to agree.  */
-
   n_a = VEC_length (dw_cfi_ref, a->reg_save);
   n_b = VEC_length (dw_cfi_ref, b->reg_save);
   n_max = MAX (n_a, n_b);
@@ -836,214 +840,66 @@  reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
   update_row_reg_save (cur_row, reg, cfi);
 }
 
-/* Given a SET, calculate the amount of stack adjustment it
-   contains.  */
-
-static HOST_WIDE_INT
-stack_adjust_offset (const_rtx pattern, HOST_WIDE_INT cur_args_size,
-		     HOST_WIDE_INT cur_offset)
-{
-  const_rtx src = SET_SRC (pattern);
-  const_rtx dest = SET_DEST (pattern);
-  HOST_WIDE_INT offset = 0;
-  enum rtx_code code;
-
-  if (dest == stack_pointer_rtx)
-    {
-      code = GET_CODE (src);
-
-      /* Assume (set (reg sp) (reg whatever)) sets args_size
-	 level to 0.  */
-      if (code == REG && src != stack_pointer_rtx)
-	{
-	  offset = -cur_args_size;
-#ifndef STACK_GROWS_DOWNWARD
-	  offset = -offset;
-#endif
-	  return offset - cur_offset;
-	}
-
-      if (! (code == PLUS || code == MINUS)
-	  || XEXP (src, 0) != stack_pointer_rtx
-	  || !CONST_INT_P (XEXP (src, 1)))
-	return 0;
-
-      /* (set (reg sp) (plus (reg sp) (const_int))) */
-      offset = INTVAL (XEXP (src, 1));
-      if (code == PLUS)
-	offset = -offset;
-      return offset;
-    }
-
-  if (MEM_P (src) && !MEM_P (dest))
-    dest = src;
-  if (MEM_P (dest))
-    {
-      /* (set (mem (pre_dec (reg sp))) (foo)) */
-      src = XEXP (dest, 0);
-      code = GET_CODE (src);
-
-      switch (code)
-	{
-	case PRE_MODIFY:
-	case POST_MODIFY:
-	  if (XEXP (src, 0) == stack_pointer_rtx)
-	    {
-	      rtx val = XEXP (XEXP (src, 1), 1);
-	      /* We handle only adjustments by constant amount.  */
-	      gcc_assert (GET_CODE (XEXP (src, 1)) == PLUS
-			  && CONST_INT_P (val));
-	      offset = -INTVAL (val);
-	      break;
-	    }
-	  return 0;
-
-	case PRE_DEC:
-	case POST_DEC:
-	  if (XEXP (src, 0) == stack_pointer_rtx)
-	    {
-	      offset = GET_MODE_SIZE (GET_MODE (dest));
-	      break;
-	    }
-	  return 0;
-
-	case PRE_INC:
-	case POST_INC:
-	  if (XEXP (src, 0) == stack_pointer_rtx)
-	    {
-	      offset = -GET_MODE_SIZE (GET_MODE (dest));
-	      break;
-	    }
-	  return 0;
-
-	default:
-	  return 0;
-	}
-    }
-  else
-    return 0;
-
-  return offset;
-}
-
-/* Add a CFI to update the running total of the size of arguments
-   pushed onto the stack.  */
+/* A subroutine of scan_trace.  Check INSN for a REG_ARGS_SIZE note
+   and adjust data structures to match.  */
 
 static void
-dwarf2out_args_size (HOST_WIDE_INT size)
+notice_args_size (rtx insn)
 {
-  if (size == cur_row->args_size)
-    return;
-
-  cur_row->args_size = size;
-  add_cfi_args_size (size);
-}
+  HOST_WIDE_INT args_size, delta;
+  rtx note;
 
-/* Record a stack adjustment of OFFSET bytes.  */
-
-static void
-dwarf2out_stack_adjust (HOST_WIDE_INT offset)
-{
-  dw_cfa_location loc = cur_row->cfa;
+  note = find_reg_note (insn, REG_ARGS_SIZE, NULL);
+  if (note == NULL)
+    return;
 
-  if (loc.reg == dw_stack_pointer_regnum)
-    loc.offset += offset;
+  args_size = INTVAL (XEXP (note, 0));
+  delta = args_size - cur_trace->end_true_args_size;
+  if (delta == 0)
+    return;
 
-  if (cur_trace->cfa_store.reg == dw_stack_pointer_regnum)
-    cur_trace->cfa_store.offset += offset;
+  cur_trace->end_true_args_size = args_size;
 
-  /* ??? The assumption seems to be that if A_O_A, the only CFA adjustments
-     involving the stack pointer are inside the prologue and marked as
-     RTX_FRAME_RELATED_P.  That said, should we not verify this assumption
-     by *asserting* A_O_A at this point?  Why else would we have a change
-     to the stack pointer?  */
-  if (ACCUMULATE_OUTGOING_ARGS)
-    return;
+  /* If the CFA is computed off the stack pointer, then we must adjust
+     the computation of the CFA as well.  */
+  if (cur_cfa->reg == dw_stack_pointer_regnum)
+    {
+      gcc_assert (!cur_cfa->indirect);
 
+      /* Convert a change in args_size (always a positive in the
+	 direction of stack growth) to a change in stack pointer.  */
 #ifndef STACK_GROWS_DOWNWARD
-  offset = -offset;
+      delta = -delta;
 #endif
-
-  queued_args_size += offset;
-  if (queued_args_size < 0)
-    queued_args_size = 0;
-
-  def_cfa_1 (&loc);
-  if (flag_asynchronous_unwind_tables)
-    dwarf2out_args_size (queued_args_size);
+      cur_cfa->offset += delta;
+    }
 }
 
-/* Check INSN to see if it looks like a push or a stack adjustment, and
-   make a note of it if it does.  EH uses this information to find out
-   how much extra space it needs to pop off the stack.  */
+/* A subroutine of scan_trace.  INSN is can_throw_internal.  Update the
+   data within the trace related to EH insns and args_size.  */
 
 static void
-dwarf2out_notice_stack_adjust (rtx insn, bool after_p)
+notice_eh_throw (rtx insn)
 {
-  HOST_WIDE_INT offset;
-  int i;
-
-  /* Don't handle epilogues at all.  Certainly it would be wrong to do so
-     with this function.  Proper support would require all frame-related
-     insns to be marked, and to be able to handle saving state around
-     epilogues textually in the middle of the function.  */
-  if (prologue_epilogue_contains (insn))
-    return;
-
-  /* If INSN is an instruction from target of an annulled branch, the
-     effects are for the target only and so current argument size
-     shouldn't change at all.  */
-  if (final_sequence
-      && INSN_ANNULLED_BRANCH_P (XVECEXP (final_sequence, 0, 0))
-      && INSN_FROM_TARGET_P (insn))
-    return;
+  HOST_WIDE_INT args_size;
 
-  /* If only calls can throw, and we have a frame pointer,
-     save up adjustments until we see the CALL_INSN.  */
-  if (!flag_asynchronous_unwind_tables
-      && cur_row->cfa.reg != dw_stack_pointer_regnum)
+  args_size = cur_trace->end_true_args_size;
+  if (cur_trace->eh_head == NULL)
     {
-      if (CALL_P (insn) && !after_p)
-	{
-	  /* Extract the size of the args from the CALL rtx itself.  */
-	  insn = PATTERN (insn);
-	  if (GET_CODE (insn) == PARALLEL)
-	    insn = XVECEXP (insn, 0, 0);
-	  if (GET_CODE (insn) == SET)
-	    insn = SET_SRC (insn);
-	  gcc_assert (GET_CODE (insn) == CALL);
-	  dwarf2out_args_size (INTVAL (XEXP (insn, 1)));
-	}
-      return;
+      cur_trace->eh_head = insn;
+      cur_trace->beg_delay_args_size = args_size;
+      cur_trace->end_delay_args_size = args_size;
     }
-
-  if (CALL_P (insn) && !after_p)
+  else if (cur_trace->end_delay_args_size != args_size)
     {
-      if (!flag_asynchronous_unwind_tables)
-	dwarf2out_args_size (queued_args_size);
-      return;
-    }
-  else if (BARRIER_P (insn))
-    return;
-  else if (GET_CODE (PATTERN (insn)) == SET)
-    offset = stack_adjust_offset (PATTERN (insn), queued_args_size, 0);
-  else if (GET_CODE (PATTERN (insn)) == PARALLEL
-	   || GET_CODE (PATTERN (insn)) == SEQUENCE)
-    {
-      /* There may be stack adjustments inside compound insns.  Search
-	 for them.  */
-      for (offset = 0, i = XVECLEN (PATTERN (insn), 0) - 1; i >= 0; i--)
-	if (GET_CODE (XVECEXP (PATTERN (insn), 0, i)) == SET)
-	  offset += stack_adjust_offset (XVECEXP (PATTERN (insn), 0, i),
-					 queued_args_size, offset);
-    }
-  else
-    return;
-
-  if (offset == 0)
-    return;
+      cur_trace->end_delay_args_size = args_size;
 
-  dwarf2out_stack_adjust (offset);
+      /* ??? If the CFA is the stack pointer, search backward for the last
+	 CFI note and insert there.  Given that the stack changed for the
+	 args_size change, there *must* be such a note in between here and
+	 the last eh insn.  */
+      add_cfi_args_size (args_size);
+    }
 }
 
 /* Short-hand inline for the very common D_F_R (REGNO (x)) operation.  */
@@ -1201,38 +1057,34 @@  reg_saved_in (rtx reg)
 static void
 dwarf2out_frame_debug_def_cfa (rtx pat)
 {
-  dw_cfa_location loc;
-
-  memset (&loc, 0, sizeof (loc));
+  memset (cur_cfa, 0, sizeof (*cur_cfa));
 
   switch (GET_CODE (pat))
     {
     case PLUS:
-      loc.reg = dwf_regno (XEXP (pat, 0));
-      loc.offset = INTVAL (XEXP (pat, 1));
+      cur_cfa->reg = dwf_regno (XEXP (pat, 0));
+      cur_cfa->offset = INTVAL (XEXP (pat, 1));
       break;
 
     case REG:
-      loc.reg = dwf_regno (pat);
+      cur_cfa->reg = dwf_regno (pat);
       break;
 
     case MEM:
-      loc.indirect = 1;
+      cur_cfa->indirect = 1;
       pat = XEXP (pat, 0);
       if (GET_CODE (pat) == PLUS)
 	{
-	  loc.base_offset = INTVAL (XEXP (pat, 1));
+	  cur_cfa->base_offset = INTVAL (XEXP (pat, 1));
 	  pat = XEXP (pat, 0);
 	}
-      loc.reg = dwf_regno (pat);
+      cur_cfa->reg = dwf_regno (pat);
       break;
 
     default:
       /* Recurse and define an expression.  */
       gcc_unreachable ();
     }
-
-  def_cfa_1 (&loc);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_ADJUST_CFA note.  */
@@ -1240,7 +1092,6 @@  dwarf2out_frame_debug_def_cfa (rtx pat)
 static void
 dwarf2out_frame_debug_adjust_cfa (rtx pat)
 {
-  dw_cfa_location loc = cur_row->cfa;
   rtx src, dest;
 
   gcc_assert (GET_CODE (pat) == SET);
@@ -1250,21 +1101,19 @@  dwarf2out_frame_debug_adjust_cfa (rtx pat)
   switch (GET_CODE (src))
     {
     case PLUS:
-      gcc_assert (dwf_regno (XEXP (src, 0)) == loc.reg);
-      loc.offset -= INTVAL (XEXP (src, 1));
+      gcc_assert (dwf_regno (XEXP (src, 0)) == cur_cfa->reg);
+      cur_cfa->offset -= INTVAL (XEXP (src, 1));
       break;
 
     case REG:
-	break;
+      break;
 
     default:
-	gcc_unreachable ();
+      gcc_unreachable ();
     }
 
-  loc.reg = dwf_regno (dest);
-  gcc_assert (loc.indirect == 0);
-
-  def_cfa_1 (&loc);
+  cur_cfa->reg = dwf_regno (dest);
+  gcc_assert (cur_cfa->indirect == 0);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_OFFSET note.  */
@@ -1285,12 +1134,12 @@  dwarf2out_frame_debug_cfa_offset (rtx set)
   switch (GET_CODE (addr))
     {
     case REG:
-      gcc_assert (dwf_regno (addr) == cur_row->cfa.reg);
-      offset = -cur_row->cfa.offset;
+      gcc_assert (dwf_regno (addr) == cur_cfa->reg);
+      offset = -cur_cfa->offset;
       break;
     case PLUS:
-      gcc_assert (dwf_regno (XEXP (addr, 0)) == cur_row->cfa.reg);
-      offset = INTVAL (XEXP (addr, 1)) - cur_row->cfa.offset;
+      gcc_assert (dwf_regno (XEXP (addr, 0)) == cur_cfa->reg);
+      offset = INTVAL (XEXP (addr, 1)) - cur_cfa->offset;
       break;
     default:
       gcc_unreachable ();
@@ -1458,7 +1307,7 @@  dwarf2out_frame_debug_cfa_window_save (void)
 
   cfa	       current rule for calculating the CFA.  It usually
 	       consists of a register and an offset.  This is
-	       actually stored in cur_row->cfa, but abbreviated
+	       actually stored in *cur_cfa, but abbreviated
 	       for the purposes of this documentation.
   cfa_store    register used by prologue code to save things to the stack
 	       cfa_store.offset is the offset from the value of
@@ -1613,7 +1462,6 @@  dwarf2out_frame_debug_cfa_window_save (void)
 static void
 dwarf2out_frame_debug_expr (rtx expr)
 {
-  dw_cfa_location cfa = cur_row->cfa;
   rtx src, dest, span;
   HOST_WIDE_INT offset;
   dw_fde_ref fde;
@@ -1651,18 +1499,6 @@  dwarf2out_frame_debug_expr (rtx expr)
 	      && (!MEM_P (SET_DEST (elem)) || GET_CODE (expr) == SEQUENCE)
 	      && (RTX_FRAME_RELATED_P (elem) || par_index == 0))
 	    dwarf2out_frame_debug_expr (elem);
-	  else if (GET_CODE (elem) == SET
-		   && par_index != 0
-		   && !RTX_FRAME_RELATED_P (elem))
-	    {
-	      /* Stack adjustment combining might combine some post-prologue
-		 stack adjustment into a prologue stack adjustment.  */
-	      HOST_WIDE_INT offset
-		= stack_adjust_offset (elem, queued_args_size, 0);
-
-	      if (offset != 0)
-		dwarf2out_stack_adjust (offset);
-	    }
 	}
       return;
     }
@@ -1688,7 +1524,7 @@  dwarf2out_frame_debug_expr (rtx expr)
 	{
 	  /* Setting FP from SP.  */
 	case REG:
-	  if (cfa.reg == dwf_regno (src))
+	  if (cur_cfa->reg == dwf_regno (src))
 	    {
 	      /* Rule 1 */
 	      /* Update the CFA rule wrt SP or FP.  Make sure src is
@@ -1698,9 +1534,9 @@  dwarf2out_frame_debug_expr (rtx expr)
 		 ARM copies SP to a temporary register, and from there to
 		 FP.  So we just rely on the backends to only set
 		 RTX_FRAME_RELATED_P on appropriate insns.  */
-	      cfa.reg = dwf_regno (dest);
-	      cur_trace->cfa_temp.reg = cfa.reg;
-	      cur_trace->cfa_temp.offset = cfa.offset;
+	      cur_cfa->reg = dwf_regno (dest);
+	      cur_trace->cfa_temp.reg = cur_cfa->reg;
+	      cur_trace->cfa_temp.offset = cur_cfa->offset;
 	    }
 	  else
 	    {
@@ -1718,7 +1554,7 @@  dwarf2out_frame_debug_expr (rtx expr)
 		  && REGNO (src) == STACK_POINTER_REGNUM)
 		gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
 			    && fde->drap_reg != INVALID_REGNUM
-			    && cfa.reg != dwf_regno (src));
+			    && cur_cfa->reg != dwf_regno (src));
 	      else
 		queue_reg_save (src, dest, 0);
 	    }
@@ -1748,8 +1584,8 @@  dwarf2out_frame_debug_expr (rtx expr)
 	      if (XEXP (src, 0) == hard_frame_pointer_rtx)
 		{
 		  /* Restoring SP from FP in the epilogue.  */
-		  gcc_assert (cfa.reg == dw_frame_pointer_regnum);
-		  cfa.reg = dw_stack_pointer_regnum;
+		  gcc_assert (cur_cfa->reg == dw_frame_pointer_regnum);
+		  cur_cfa->reg = dw_stack_pointer_regnum;
 		}
 	      else if (GET_CODE (src) == LO_SUM)
 		/* Assume we've set the source reg of the LO_SUM from sp.  */
@@ -1759,8 +1595,8 @@  dwarf2out_frame_debug_expr (rtx expr)
 
 	      if (GET_CODE (src) != MINUS)
 		offset = -offset;
-	      if (cfa.reg == dw_stack_pointer_regnum)
-		cfa.offset += offset;
+	      if (cur_cfa->reg == dw_stack_pointer_regnum)
+		cur_cfa->offset += offset;
 	      if (cur_trace->cfa_store.reg == dw_stack_pointer_regnum)
 		cur_trace->cfa_store.offset += offset;
 	    }
@@ -1772,13 +1608,13 @@  dwarf2out_frame_debug_expr (rtx expr)
 	      gcc_assert (frame_pointer_needed);
 
 	      gcc_assert (REG_P (XEXP (src, 0))
-			  && dwf_regno (XEXP (src, 0)) == cfa.reg
+			  && dwf_regno (XEXP (src, 0)) == cur_cfa->reg
 			  && CONST_INT_P (XEXP (src, 1)));
 	      offset = INTVAL (XEXP (src, 1));
 	      if (GET_CODE (src) != MINUS)
 		offset = -offset;
-	      cfa.offset += offset;
-	      cfa.reg = dw_frame_pointer_regnum;
+	      cur_cfa->offset += offset;
+	      cur_cfa->reg = dw_frame_pointer_regnum;
 	    }
 	  else
 	    {
@@ -1786,17 +1622,17 @@  dwarf2out_frame_debug_expr (rtx expr)
 
 	      /* Rule 4 */
 	      if (REG_P (XEXP (src, 0))
-		  && dwf_regno (XEXP (src, 0)) == cfa.reg
+		  && dwf_regno (XEXP (src, 0)) == cur_cfa->reg
 		  && CONST_INT_P (XEXP (src, 1)))
 		{
 		  /* Setting a temporary CFA register that will be copied
 		     into the FP later on.  */
 		  offset = - INTVAL (XEXP (src, 1));
-		  cfa.offset += offset;
-		  cfa.reg = dwf_regno (dest);
+		  cur_cfa->offset += offset;
+		  cur_cfa->reg = dwf_regno (dest);
 		  /* Or used to save regs to the stack.  */
-		  cur_trace->cfa_temp.reg = cfa.reg;
-		  cur_trace->cfa_temp.offset = cfa.offset;
+		  cur_trace->cfa_temp.reg = cur_cfa->reg;
+		  cur_trace->cfa_temp.offset = cur_cfa->offset;
 		}
 
 	      /* Rule 5 */
@@ -1806,10 +1642,10 @@  dwarf2out_frame_debug_expr (rtx expr)
 		{
 		  /* Setting a scratch register that we will use instead
 		     of SP for saving registers to the stack.  */
-		  gcc_assert (cfa.reg == dw_stack_pointer_regnum);
+		  gcc_assert (cur_cfa->reg == dw_stack_pointer_regnum);
 		  cur_trace->cfa_store.reg = dwf_regno (dest);
 		  cur_trace->cfa_store.offset
-		    = cfa.offset - cur_trace->cfa_temp.offset;
+		    = cur_cfa->offset - cur_trace->cfa_temp.offset;
 		}
 
 	      /* Rule 9 */
@@ -1870,17 +1706,15 @@  dwarf2out_frame_debug_expr (rtx expr)
               fde->stack_realignment = INTVAL (XEXP (src, 1));
               cur_trace->cfa_store.offset = 0;
 
-	      if (cfa.reg != dw_stack_pointer_regnum
-		  && cfa.reg != dw_frame_pointer_regnum)
-		fde->drap_reg = cfa.reg;
+	      if (cur_cfa->reg != dw_stack_pointer_regnum
+		  && cur_cfa->reg != dw_frame_pointer_regnum)
+		fde->drap_reg = cur_cfa->reg;
             }
           return;
 
 	default:
 	  gcc_unreachable ();
 	}
-
-      def_cfa_1 (&cfa);
       break;
 
     case MEM:
@@ -1902,8 +1736,8 @@  dwarf2out_frame_debug_expr (rtx expr)
 		      && cur_trace->cfa_store.reg == dw_stack_pointer_regnum);
 
 	  cur_trace->cfa_store.offset += offset;
-	  if (cfa.reg == dw_stack_pointer_regnum)
-	    cfa.offset = cur_trace->cfa_store.offset;
+	  if (cur_cfa->reg == dw_stack_pointer_regnum)
+	    cur_cfa->offset = cur_trace->cfa_store.offset;
 
 	  if (GET_CODE (XEXP (dest, 0)) == POST_MODIFY)
 	    offset -= cur_trace->cfa_store.offset;
@@ -1932,12 +1766,12 @@  dwarf2out_frame_debug_expr (rtx expr)
               && fde->stack_realign
               && src == hard_frame_pointer_rtx)
 	    {
-	      gcc_assert (cfa.reg != dw_frame_pointer_regnum);
+	      gcc_assert (cur_cfa->reg != dw_frame_pointer_regnum);
 	      cur_trace->cfa_store.offset = 0;
 	    }
 
-	  if (cfa.reg == dw_stack_pointer_regnum)
-	    cfa.offset = cur_trace->cfa_store.offset;
+	  if (cur_cfa->reg == dw_stack_pointer_regnum)
+	    cur_cfa->offset = cur_trace->cfa_store.offset;
 
 	  if (GET_CODE (XEXP (dest, 0)) == POST_DEC)
 	    offset += -cur_trace->cfa_store.offset;
@@ -1961,8 +1795,8 @@  dwarf2out_frame_debug_expr (rtx expr)
 
 	    regno = dwf_regno (XEXP (XEXP (dest, 0), 0));
 
-	    if (cfa.reg == regno)
-	      offset -= cfa.offset;
+	    if (cur_cfa->reg == regno)
+	      offset -= cur_cfa->offset;
 	    else if (cur_trace->cfa_store.reg == regno)
 	      offset -= cur_trace->cfa_store.offset;
 	    else
@@ -1979,8 +1813,8 @@  dwarf2out_frame_debug_expr (rtx expr)
 	  {
 	    unsigned int regno = dwf_regno (XEXP (dest, 0));
 
-	    if (cfa.reg == regno)
-	      offset = -cfa.offset;
+	    if (cur_cfa->reg == regno)
+	      offset = -cur_cfa->offset;
 	    else if (cur_trace->cfa_store.reg == regno)
 	      offset = -cur_trace->cfa_store.offset;
 	    else
@@ -2012,11 +1846,11 @@  dwarf2out_frame_debug_expr (rtx expr)
       if (REG_P (src)
 	  && REGNO (src) != STACK_POINTER_REGNUM
 	  && REGNO (src) != HARD_FRAME_POINTER_REGNUM
-	  && dwf_regno (src) == cfa.reg)
+	  && dwf_regno (src) == cur_cfa->reg)
 	{
 	  /* We're storing the current CFA reg into the stack.  */
 
-	  if (cfa.offset == 0)
+	  if (cur_cfa->offset == 0)
 	    {
               /* Rule 19 */
               /* If stack is aligned, putting CFA reg into stack means
@@ -2026,28 +1860,23 @@  dwarf2out_frame_debug_expr (rtx expr)
 		 value.  */
               if (fde
                   && fde->stack_realign
-                  && cfa.indirect == 0
-                  && cfa.reg != dw_frame_pointer_regnum)
+                  && cur_cfa->indirect == 0
+                  && cur_cfa->reg != dw_frame_pointer_regnum)
                 {
-		  dw_cfa_location cfa_exp;
-
-		  gcc_assert (fde->drap_reg == cfa.reg);
+		  gcc_assert (fde->drap_reg == cur_cfa->reg);
 
-		  cfa_exp.indirect = 1;
-		  cfa_exp.reg = dw_frame_pointer_regnum;
-		  cfa_exp.base_offset = offset;
-		  cfa_exp.offset = 0;
+		  cur_cfa->indirect = 1;
+		  cur_cfa->reg = dw_frame_pointer_regnum;
+		  cur_cfa->base_offset = offset;
+		  cur_cfa->offset = 0;
 
 		  fde->drap_reg_saved = 1;
-
-		  def_cfa_1 (&cfa_exp);
 		  break;
                 }
 
 	      /* If the source register is exactly the CFA, assume
 		 we're saving SP like any other register; this happens
 		 on the ARM.  */
-	      def_cfa_1 (&cfa);
 	      queue_reg_save (stack_pointer_rtx, NULL_RTX, offset);
 	      break;
 	    }
@@ -2061,16 +1890,13 @@  dwarf2out_frame_debug_expr (rtx expr)
 		x = XEXP (x, 0);
 	      gcc_assert (REG_P (x));
 
-	      cfa.reg = dwf_regno (x);
-	      cfa.base_offset = offset;
-	      cfa.indirect = 1;
-	      def_cfa_1 (&cfa);
+	      cur_cfa->reg = dwf_regno (x);
+	      cur_cfa->base_offset = offset;
+	      cur_cfa->indirect = 1;
 	      break;
 	    }
 	}
 
-      def_cfa_1 (&cfa);
-
       span = NULL;
       if (REG_P (src))
 	span = targetm.dwarf_register_span (src);
@@ -2101,33 +1927,17 @@  dwarf2out_frame_debug_expr (rtx expr)
     }
 }
 
-/* Record call frame debugging information for INSN, which either
-   sets SP or FP (adjusting how we calculate the frame address) or saves a
-   register to the stack.  If INSN is NULL_RTX, initialize our state.
-
-   If AFTER_P is false, we're being called before the insn is emitted,
-   otherwise after.  Call instructions get invoked twice.  */
+/* Record call frame debugging information for INSN, which either sets
+   SP or FP (adjusting how we calculate the frame address) or saves a
+   register to the stack.  */
 
 static void
-dwarf2out_frame_debug (rtx insn, bool after_p)
+dwarf2out_frame_debug (rtx insn)
 {
   rtx note, n;
   bool handled_one = false;
   bool need_flush = false;
 
-  if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
-    dwarf2out_flush_queued_reg_saves ();
-
-  if (!RTX_FRAME_RELATED_P (insn))
-    {
-      /* ??? This should be done unconditionally since stack adjustments
-	 matter if the stack pointer is not the CFA register anymore but
-	 is still used to save registers.  */
-      if (!ACCUMULATE_OUTGOING_ARGS)
-	dwarf2out_notice_stack_adjust (insn, after_p);
-      return;
-    }
-
   any_cfis_emitted = false;
 
   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
@@ -2265,9 +2075,6 @@  change_cfi_row (dw_cfi_row *old_row, dw_cfi_row *new_row)
 	add_cfi (cfi);
     }
 
-  if (old_row->args_size != new_row->args_size)
-    add_cfi_args_size (new_row->args_size);
-
   n_old = VEC_length (dw_cfi_ref, old_row->reg_save);
   n_new = VEC_length (dw_cfi_ref, new_row->reg_save);
   n_max = MAX (n_old, n_new);
@@ -2391,14 +2198,10 @@  add_cfis_to_fde (void)
    trace from CUR_TRACE and CUR_ROW.  */
 
 static void
-maybe_record_trace_start (rtx start, rtx origin, bool abnormal)
+maybe_record_trace_start (rtx start, rtx origin)
 {
   dw_trace_info *ti;
-
-  /* Sync queued data before propagating to a destination,
-     lest we propagate out-of-date data.  */
-  dwarf2out_flush_queued_reg_saves ();
-  dwarf2out_args_size (queued_args_size);
+  HOST_WIDE_INT args_size;
 
   ti = get_trace_info (start);
   gcc_assert (ti != NULL);
@@ -2411,15 +2214,13 @@  maybe_record_trace_start (rtx start, rtx origin, bool abnormal)
 	       (origin ? INSN_UID (origin) : 0));
     }
 
+  args_size = cur_trace->end_true_args_size;
   if (ti->beg_row == NULL)
     {
       /* This is the first time we've encountered this trace.  Propagate
 	 state across the edge and push the trace onto the work list.  */
       ti->beg_row = copy_cfi_row (cur_row);
-      /* On all abnormal edges, especially EH and non-local-goto, we take
-	 care to free the pushed arguments.  */
-      if (abnormal)
-	ti->beg_row->args_size = 0;
+      ti->beg_true_args_size = args_size;
 
       ti->cfa_store = cur_trace->cfa_store;
       ti->cfa_temp = cur_trace->cfa_temp;
@@ -2433,11 +2234,52 @@  maybe_record_trace_start (rtx start, rtx origin, bool abnormal)
     }
   else
     {
+
       /* We ought to have the same state incoming to a given trace no
 	 matter how we arrive at the trace.  Anything else means we've
 	 got some kind of optimization error.  */
       gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row));
+
+      /* The args_size is allowed to conflict if it isn't actually used.  */
+      if (ti->beg_true_args_size != args_size)
+	ti->args_size_undefined = true;
+    }
+}
+
+/* Similarly, but handle the args_size and CFA reset across EH
+   and non-local goto edges.  */
+
+static void
+maybe_record_trace_start_abnormal (rtx start, rtx origin)
+{
+  HOST_WIDE_INT save_args_size, delta;
+  dw_cfa_location save_cfa;
+
+  save_args_size = cur_trace->end_true_args_size;
+  if (save_args_size == 0)
+    {
+      maybe_record_trace_start (start, origin);
+      return;
+    }
+
+  delta = -save_args_size;
+  cur_trace->end_true_args_size = 0;
+
+  save_cfa = cur_row->cfa;
+  if (cur_row->cfa.reg == dw_stack_pointer_regnum)
+    {
+      /* Convert a change in args_size (always a positive in the
+	 direction of stack growth) to a change in stack pointer.  */
+#ifndef STACK_GROWS_DOWNWARD
+      delta = -delta;
+#endif
+      cur_row->cfa.offset += delta;
     }
+  
+  maybe_record_trace_start (start, origin);
+
+  cur_trace->end_true_args_size = save_args_size;
+  cur_row->cfa = save_cfa;
 }
 
 /* Propagate CUR_TRACE state to the destinations implied by INSN.  */
@@ -2452,8 +2294,9 @@  create_trace_edges (rtx insn)
   if (JUMP_P (insn))
     {
       if (find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX))
-	;
-      else if (tablejump_p (insn, NULL, &tmp))
+	return;
+
+      if (tablejump_p (insn, NULL, &tmp))
 	{
 	  rtvec vec;
 
@@ -2464,13 +2307,13 @@  create_trace_edges (rtx insn)
 	  for (i = 0; i < n; ++i)
 	    {
 	      lab = XEXP (RTVEC_ELT (vec, i), 0);
-	      maybe_record_trace_start (lab, insn, false);
+	      maybe_record_trace_start (lab, insn);
 	    }
 	}
       else if (computed_jump_p (insn))
 	{
 	  for (lab = forced_labels; lab; lab = XEXP (lab, 1))
-	    maybe_record_trace_start (XEXP (lab, 0), insn, true);
+	    maybe_record_trace_start (XEXP (lab, 0), insn);
 	}
       else if (returnjump_p (insn))
 	;
@@ -2480,14 +2323,14 @@  create_trace_edges (rtx insn)
 	  for (i = 0; i < n; ++i)
 	    {
 	      lab = XEXP (ASM_OPERANDS_LABEL (tmp, i), 0);
-	      maybe_record_trace_start (lab, insn, true);
+	      maybe_record_trace_start (lab, insn);
 	    }
 	}
       else
 	{
 	  lab = JUMP_LABEL (insn);
 	  gcc_assert (lab != NULL);
-	  maybe_record_trace_start (lab, insn, false);
+	  maybe_record_trace_start (lab, insn);
 	}
     }
   else if (CALL_P (insn))
@@ -2499,7 +2342,7 @@  create_trace_edges (rtx insn)
       /* Process non-local goto edges.  */
       if (can_nonlocal_goto (insn))
 	for (lab = nonlocal_goto_handler_labels; lab; lab = XEXP (lab, 1))
-	  maybe_record_trace_start (XEXP (lab, 0), insn, true);
+	  maybe_record_trace_start_abnormal (XEXP (lab, 0), insn);
     }
   else if (GET_CODE (PATTERN (insn)) == SEQUENCE)
     {
@@ -2515,7 +2358,7 @@  create_trace_edges (rtx insn)
     {
       eh_landing_pad lp = get_eh_landing_pad_from_rtx (insn);
       if (lp)
-	maybe_record_trace_start (lp->landing_pad, insn, true);
+	maybe_record_trace_start_abnormal (lp->landing_pad, insn);
     }
 }
 
@@ -2526,6 +2369,7 @@  static void
 scan_trace (dw_trace_info *trace)
 {
   rtx insn = trace->head;
+  dw_cfa_location this_cfa;
 
   if (dump_file)
     fprintf (dump_file, "Processing trace %u : start at %s %d\n",
@@ -2533,61 +2377,99 @@  scan_trace (dw_trace_info *trace)
 	     INSN_UID (insn));
 
   trace->end_row = copy_cfi_row (trace->beg_row);
+  trace->end_true_args_size = trace->beg_true_args_size;
 
   cur_trace = trace;
   cur_row = trace->end_row;
-  queued_args_size = cur_row->args_size;
+
+  this_cfa = cur_row->cfa;
+  cur_cfa = &this_cfa;
 
   for (insn = NEXT_INSN (insn); insn ; insn = NEXT_INSN (insn))
     {
-      rtx pat;
-
+      /* Do everything that happens "before" the insn.  */
       add_cfi_insn = PREV_INSN (insn);
 
       /* Notice the end of a trace.  */
-      if (BARRIER_P (insn) || save_point_p (insn))
+      if (BARRIER_P (insn))
+	{
+	  /* Don't bother saving the unneeded queued registers at all.  */
+	  VEC_truncate (queued_reg_save, queued_reg_saves, 0);
+	  break;
+	}
+      if (save_point_p (insn))
 	{
-	  dwarf2out_flush_queued_reg_saves ();
-	  dwarf2out_args_size (queued_args_size);
-
 	  /* Propagate across fallthru edges.  */
-	  if (!BARRIER_P (insn))
-	    maybe_record_trace_start (insn, NULL, false);
+	  dwarf2out_flush_queued_reg_saves ();
+	  maybe_record_trace_start (insn, NULL);
 	  break;
 	}
 
       if (DEBUG_INSN_P (insn) || !inside_basic_block_p (insn))
 	continue;
 
-      pat = PATTERN (insn);
-      if (asm_noperands (pat) >= 0)
+      /* Flush data before calls and jumps, and of course if necessary.  */
+      if (can_throw_internal (insn))
 	{
-	  dwarf2out_frame_debug (insn, false);
-	  add_cfi_insn = insn;
+	  dwarf2out_flush_queued_reg_saves ();
+	  notice_eh_throw (insn);
 	}
-      else
+      else if (!NONJUMP_INSN_P (insn)
+	       || clobbers_queued_reg_save (insn)
+	       || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
+	dwarf2out_flush_queued_reg_saves ();
+
+      /* Do everything that happens "after" the insn.  */
+      add_cfi_insn = insn;
+
+      /* Handle changes to the row state.  */
+      if (RTX_FRAME_RELATED_P (insn))
+	dwarf2out_frame_debug (insn);
+
+      /* Look for REG_ARGS_SIZE, and handle it.  */
+      if (GET_CODE (PATTERN (insn)) == SEQUENCE)
 	{
-	  if (GET_CODE (pat) == SEQUENCE)
+	  rtx elt, pat = PATTERN (insn);
+	  int i, n = XVECLEN (pat, 0);
+
+	  if (INSN_ANNULLED_BRANCH_P (XVECEXP (pat, 0, 0)))
 	    {
-	      int i, n = XVECLEN (pat, 0);
-	      for (i = 1; i < n; ++i)
-		dwarf2out_frame_debug (XVECEXP (pat, 0, i), false);
-	    }
+	      /* ??? Hopefully multiple delay slots are not annulled.  */
+	      gcc_assert (n == 2);
+	      elt = XVECEXP (pat, 0, 1);
+
+	      /* If ELT is an instruction from target of an annulled branch,
+		 the effects are for the target only and so the args_size
+		 and CFA along the current path shouldn't change.  */
+	      if (INSN_FROM_TARGET_P (elt))
+		{
+		  HOST_WIDE_INT restore_args_size;
 
-          if (CALL_P (insn))
-	    dwarf2out_frame_debug (insn, false);
-          else if (find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL)
-		   || (cfun->can_throw_non_call_exceptions
-		       && can_throw_internal (insn)))
-	    dwarf2out_flush_queued_reg_saves ();
+		  restore_args_size = cur_trace->end_true_args_size;
+		  cur_cfa = &cur_row->cfa;
 
-	  /* Do not separate tablejump insns from their ADDR_DIFF_VEC.
-	     Putting the note after the VEC should be ok.  */
-	  if (!tablejump_p (insn, NULL, &add_cfi_insn))
-	    add_cfi_insn = insn;
+		  notice_args_size (elt);
+		  create_trace_edges (insn);
 
-	  dwarf2out_frame_debug (insn, true);
+		  cur_trace->end_true_args_size = restore_args_size;
+		  cur_row->cfa = this_cfa;
+		  cur_cfa = &this_cfa;
+		  continue;
+		}
+	    }
+
+	  for (i = 1; i < n; ++i)
+	    {
+	      elt = XVECEXP (pat, 0, i);
+	      notice_args_size (elt);
+	    }
 	}
+      else
+	notice_args_size (insn);
+
+      /* Between frame-related-p and args_size we might have otherwise
+	 emitted two cfa adjustments.  Do it now.  */
+      def_cfa_1 (&this_cfa);
 
       /* Note that a test for control_flow_insn_p does exactly the
 	 same tests as are done to actually create the edges.  So
@@ -2599,6 +2481,7 @@  scan_trace (dw_trace_info *trace)
   add_cfi_insn = NULL;
   cur_row = NULL;
   cur_trace = NULL;
+  cur_cfa = NULL;
 }
 
 /* Scan the function and create the initial set of CFI notes.  */
@@ -2740,6 +2623,32 @@  connect_traces (void)
 	  while (note != add_cfi_insn);
 	}
     }
+
+  /* Connect args_size between traces that have can_throw_internal insns.  */
+  if (cfun->eh->lp_array != NULL)
+    {
+      HOST_WIDE_INT prev_args_size = 0;
+
+      for (i = 0; i < n; ++i)
+	{
+	  ti = VEC_index (dw_trace_info, trace_info, i);
+
+	  if (ti->switch_sections)
+	    prev_args_size = 0;
+	  if (ti->eh_head == NULL)
+	    continue;
+	  gcc_assert (!ti->args_size_undefined);
+
+	  if (ti->beg_delay_args_size != prev_args_size)
+	    {
+	      /* ??? Search back to previous CFI note.  */
+	      add_cfi_insn = PREV_INSN (ti->eh_head);
+	      add_cfi_args_size (ti->beg_delay_args_size);
+	    }
+
+	  prev_args_size = ti->end_delay_args_size;
+	}
+    }
 }
 
 /* Set up the pseudo-cfg of instruction traces, as described at the
@@ -3387,9 +3296,6 @@  dump_cfi_row (FILE *f, dw_cfi_row *row)
   FOR_EACH_VEC_ELT (dw_cfi_ref, row->reg_save, i, cfi)
     if (cfi)
       output_cfi_directive (f, cfi);
-
-  fprintf (f, "\t.cfi_GNU_args_size "HOST_WIDE_INT_PRINT_DEC "\n",
-	   row->args_size);
 }
 
 void debug_cfi_row (dw_cfi_row *row);
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 99b02ba..aa743d7 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3614,6 +3614,10 @@  try_split (rtx pat, rtx trial, int last)
 	  break;
 #endif
 
+	case REG_ARGS_SIZE:
+	  fixup_args_size_notes (NULL_RTX, insn_last, INTVAL (XEXP (note, 0)));
+	  break;
+
 	default:
 	  break;
 	}
diff --git a/gcc/explow.c b/gcc/explow.c
index 3c692f4..f8262db 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -873,14 +873,45 @@  promote_decl_mode (const_tree decl, int *punsignedp)
 }
 
 
+/* Controls the behaviour of {anti_,}adjust_stack.  */
+static bool suppress_reg_args_size;
+
+/* A helper for adjust_stack and anti_adjust_stack.  */
+
+static void
+adjust_stack_1 (rtx adjust, bool anti_p)
+{
+  rtx temp, insn;
+
+#ifndef STACK_GROWS_DOWNWARD
+  /* Hereafter anti_p means subtract_p.  */
+  anti_p = !anti_p;
+#endif
+
+  temp = expand_binop (Pmode,
+		       anti_p ? sub_optab : add_optab,
+		       stack_pointer_rtx, adjust, stack_pointer_rtx, 0,
+		       OPTAB_LIB_WIDEN);
+
+  if (temp != stack_pointer_rtx)
+    insn = emit_move_insn (stack_pointer_rtx, temp);
+  else
+    {
+      insn = get_last_insn ();
+      temp = single_set (insn);
+      gcc_assert (temp != NULL && SET_DEST (temp) == stack_pointer_rtx);
+    }
+
+  if (!suppress_reg_args_size)
+    add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+}
+
 /* Adjust the stack pointer by ADJUST (an rtx for a number of bytes).
    This pops when ADJUST is positive.  ADJUST need not be constant.  */
 
 void
 adjust_stack (rtx adjust)
 {
-  rtx temp;
-
   if (adjust == const0_rtx)
     return;
 
@@ -889,17 +920,7 @@  adjust_stack (rtx adjust)
   if (CONST_INT_P (adjust))
     stack_pointer_delta -= INTVAL (adjust);
 
-  temp = expand_binop (Pmode,
-#ifdef STACK_GROWS_DOWNWARD
-		       add_optab,
-#else
-		       sub_optab,
-#endif
-		       stack_pointer_rtx, adjust, stack_pointer_rtx, 0,
-		       OPTAB_LIB_WIDEN);
-
-  if (temp != stack_pointer_rtx)
-    emit_move_insn (stack_pointer_rtx, temp);
+  adjust_stack_1 (adjust, false);
 }
 
 /* Adjust the stack pointer by minus ADJUST (an rtx for a number of bytes).
@@ -908,8 +929,6 @@  adjust_stack (rtx adjust)
 void
 anti_adjust_stack (rtx adjust)
 {
-  rtx temp;
-
   if (adjust == const0_rtx)
     return;
 
@@ -918,17 +937,7 @@  anti_adjust_stack (rtx adjust)
   if (CONST_INT_P (adjust))
     stack_pointer_delta += INTVAL (adjust);
 
-  temp = expand_binop (Pmode,
-#ifdef STACK_GROWS_DOWNWARD
-		       sub_optab,
-#else
-		       add_optab,
-#endif
-		       stack_pointer_rtx, adjust, stack_pointer_rtx, 0,
-		       OPTAB_LIB_WIDEN);
-
-  if (temp != stack_pointer_rtx)
-    emit_move_insn (stack_pointer_rtx, temp);
+  adjust_stack_1 (adjust, true);
 }
 
 /* Round the size of a block to be pushed up to the boundary required
@@ -1416,14 +1425,18 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
 	}
 
       saved_stack_pointer_delta = stack_pointer_delta;
+      suppress_reg_args_size = true;
+
       if (flag_stack_check && STACK_CHECK_MOVING_SP)
 	anti_adjust_stack_and_probe (size, false);
       else
 	anti_adjust_stack (size);
+
       /* Even if size is constant, don't modify stack_pointer_delta.
 	 The constant size alloca should preserve
 	 crtl->preferred_stack_boundary alignment.  */
       stack_pointer_delta = saved_stack_pointer_delta;
+      suppress_reg_args_size = false;
 
 #ifdef STACK_GROWS_DOWNWARD
       emit_move_insn (target, virtual_stack_dynamic_rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index 0d88a21..81b1ad8 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -3514,12 +3514,166 @@  push_block (rtx size, int extra, int below)
   return memory_address (GET_CLASS_NARROWEST_MODE (MODE_INT), temp);
 }
 
-#ifdef PUSH_ROUNDING
+/* A utility routine that returns the base of an auto-inc memory, or NULL.  */
+
+static rtx
+mem_autoinc_base (rtx mem)
+{
+  if (MEM_P (mem))
+    {
+      rtx addr = XEXP (mem, 0);
+      if (GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC)
+	return XEXP (addr, 0);
+    }
+  return NULL;
+}
+
+/* A utility routine used here, in reload, and in try_split.  The insns
+   after PREV up to and including LAST are known to adjust the stack,
+   with a final value of END_ARGS_SIZE.  Iterate backward from LAST
+   placing notes as appropriate.  PREV may be NULL, indicating the
+   entire insn sequence prior to LAST should be scanned.
+
+   The set of allowed stack pointer modifications is small:
+     (1) One or more auto-inc style memory references (aka pushes),
+     (2) One or more addition/subtraction with the SP as destination,
+     (3) A single move insn with the SP as destination,
+     (4) A call_pop insn.
+
+   Insns in the sequence that do not modify the SP are ignored.
+
+   The return value is the amount of adjustment that can be trivially
+   verified, via immediate operand or auto-inc.  If the adjustment
+   cannot be trivially extracted, the return value is INT_MIN.  */
+
+int
+fixup_args_size_notes (rtx prev, rtx last, int end_args_size)
+{
+  int args_size = end_args_size;
+  bool saw_unknown = false;
+  rtx insn;
+
+  for (insn = last; insn != prev; insn = PREV_INSN (insn))
+    {
+      rtx dest, set, pat;
+      HOST_WIDE_INT this_delta = 0;
+      int i;
+
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      pat = PATTERN (insn);
+      set = NULL;
+
+      /* Look for a call_pop pattern.  */
+      if (CALL_P (insn))
+	{
+	  /* We're not supposed to see non-pop call patterns here.  */
+	  gcc_assert (GET_CODE (pat) == PARALLEL);
 
+	  /* All call_pop have a stack pointer adjust in the parallel.
+	     The call itself is always first, and the stack adjust is
+	     usually last, so search from the end.  */
+	  for (i = XVECLEN (pat, 0) - 1; i > 0; --i)
+	    {
+	      set = XVECEXP (pat, 0, i);
+	      if (GET_CODE (set) != SET)
+		continue;
+	      dest = SET_DEST (set);
+	      if (dest == stack_pointer_rtx)
+		break;
+	    }
+	  /* We'd better have found the stack pointer adjust.  */
+	  gcc_assert (i > 0);
+	  /* Fall through to process the extracted SET and DEST
+	     as if it was a standalone insn.  */
+	}
+      else if (GET_CODE (pat) == SET)
+	set = pat;
+      else if ((set = single_set (insn)) != NULL)
+	;
+      else
+	{
+	  /* ??? Some older ports use a parallel with a stack adjust
+	     and a store for a PUSH_ROUNDING pattern, rather than a
+	     PRE/POST_MODIFY rtx.  Don't force them to update yet...  */
+	  /* ??? See h8300 and m68k, pushqi1.  */
+	  for (i = XVECLEN (pat, 0) - 1; i >= 0; --i)
+	    {
+	      set = XVECEXP (pat, 0, i);
+	      if (GET_CODE (set) != SET)
+		continue;
+	      dest = SET_DEST (set);
+	      if (dest == stack_pointer_rtx)
+		break;
+
+	      /* We do not expect an auto-inc of the sp in the parallel.  */
+	      gcc_checking_assert (mem_autoinc_base (dest)
+				   != stack_pointer_rtx);
+	      gcc_checking_assert (mem_autoinc_base (SET_SRC (set))
+				   != stack_pointer_rtx);
+	    }
+	  if (i < 0)
+	    continue;
+	}
+      dest = SET_DEST (set);
+
+      /* Look for direct modifications of the stack pointer.  */
+      if (dest == stack_pointer_rtx)
+	{
+	  gcc_assert (!saw_unknown);
+	  /* Look for a trivial adjustment, otherwise assume nothing.  */
+	  if (GET_CODE (SET_SRC (set)) == PLUS
+	      && XEXP (SET_SRC (set), 0) == stack_pointer_rtx
+	      && CONST_INT_P (XEXP (SET_SRC (set), 1)))
+	    this_delta = INTVAL (XEXP (SET_SRC (set), 1));
+	  else
+	    saw_unknown = true;
+	}
+      /* Otherwise only think about autoinc patterns.  */
+      else if (mem_autoinc_base (dest) == stack_pointer_rtx)
+	{
+	  rtx addr = XEXP (dest, 0);
+	  gcc_assert (!saw_unknown);
+	  switch (GET_CODE (addr))
+	    {
+	    case PRE_INC:
+	    case POST_INC:
+	      this_delta = GET_MODE_SIZE (GET_MODE (dest));
+	      break;
+	    case PRE_DEC:
+	    case POST_DEC:
+	      this_delta = -GET_MODE_SIZE (GET_MODE (dest));
+	      break;
+	    case PRE_MODIFY:
+	    case POST_MODIFY:
+	      addr = XEXP (addr, 1);
+	      gcc_assert (GET_CODE (addr) == PLUS);
+	      gcc_assert (XEXP (addr, 0) == stack_pointer_rtx);
+	      gcc_assert (CONST_INT_P (XEXP (addr, 1)));
+	      this_delta = INTVAL (XEXP (addr, 1));
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	continue;
+
+      add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (args_size));
+#ifdef STACK_GROWS_DOWNWARD
+      this_delta = -this_delta;
+#endif
+      args_size -= this_delta;
+    }
+
+  return saw_unknown ? INT_MIN : args_size;
+}
+
+#ifdef PUSH_ROUNDING
 /* Emit single push insn.  */
 
 static void
-emit_single_push_insn (enum machine_mode mode, rtx x, tree type)
+emit_single_push_insn_1 (enum machine_mode mode, rtx x, tree type)
 {
   rtx dest_addr;
   unsigned rounded_size = PUSH_ROUNDING (GET_MODE_SIZE (mode));
@@ -3603,6 +3757,30 @@  emit_single_push_insn (enum machine_mode mode, rtx x, tree type)
     }
   emit_move_insn (dest, x);
 }
+
+/* Emit and annotate a single push insn.  */
+
+static void
+emit_single_push_insn (enum machine_mode mode, rtx x, tree type)
+{
+  int delta, old_delta = stack_pointer_delta;
+  rtx prev = get_last_insn ();
+  rtx last;
+
+  emit_single_push_insn_1 (mode, x, type);
+
+  last = get_last_insn ();
+
+  /* Notice the common case where we emitted exactly one insn.  */
+  if (PREV_INSN (last) == prev)
+    {
+      add_reg_note (last, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+      return;
+    }
+
+  delta = fixup_args_size_notes (prev, last, stack_pointer_delta);
+  gcc_assert (delta == INT_MIN || delta == old_delta);
+}
 #endif
 
 /* Generate code to push X onto the stack, assuming it has mode MODE and
diff --git a/gcc/recog.c b/gcc/recog.c
index 9331681..22a5402 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -3146,16 +3146,17 @@  static rtx
 peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
 {
   int i;
-  rtx last, note, before_try, x;
+  rtx last, eh_note, as_note, before_try, x;
   rtx old_insn, new_insn;
   bool was_call = false;
 
-  /* If we are splittind an RTX_FRAME_RELATED_P insn, do not allow it to
+  /* If we are splitting an RTX_FRAME_RELATED_P insn, do not allow it to
      match more than one insn, or to be split into more than one insn.  */
   old_insn = peep2_insn_data[peep2_current].insn;
   if (RTX_FRAME_RELATED_P (old_insn))
     {
       bool any_note = false;
+      rtx note;
 
       if (match_len != 0)
 	return NULL;
@@ -3236,6 +3237,7 @@  peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
   for (i = 0; i <= match_len; ++i)
     {
       int j;
+      rtx note;
 
       j = peep2_buf_position (peep2_current + i);
       old_insn = peep2_insn_data[j].insn;
@@ -3281,9 +3283,21 @@  peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
       break;
     }
 
-  i = peep2_buf_position (peep2_current + match_len);
+  /* If we matched any instruction that had a REG_ARGS_SIZE, then
+     move those notes over to the new sequence.  */
+  as_note = NULL;
+  for (i = match_len; i >= 0; --i)
+    {
+      int j = peep2_buf_position (peep2_current + i);
+      old_insn = peep2_insn_data[j].insn;
+
+      as_note = find_reg_note (old_insn, REG_ARGS_SIZE, NULL);
+      if (as_note)
+	break;
+    }
 
-  note = find_reg_note (peep2_insn_data[i].insn, REG_EH_REGION, NULL_RTX);
+  i = peep2_buf_position (peep2_current + match_len);
+  eh_note = find_reg_note (peep2_insn_data[i].insn, REG_EH_REGION, NULL_RTX);
 
   /* Replace the old sequence with the new.  */
   last = emit_insn_after_setloc (attempt,
@@ -3293,7 +3307,7 @@  peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
   delete_insn_chain (insn, peep2_insn_data[i].insn, false);
 
   /* Re-insert the EH_REGION notes.  */
-  if (note || (was_call && nonlocal_goto_handler_labels))
+  if (eh_note || (was_call && nonlocal_goto_handler_labels))
     {
       edge eh_edge;
       edge_iterator ei;
@@ -3302,8 +3316,8 @@  peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
 	if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
 	  break;
 
-      if (note)
-	copy_reg_eh_region_note_backward (note, last, before_try);
+      if (eh_note)
+	copy_reg_eh_region_note_backward (eh_note, last, before_try);
 
       if (eh_edge)
 	for (x = last; x != before_try; x = PREV_INSN (x))
@@ -3336,6 +3350,10 @@  peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
       peep2_do_cleanup_cfg |= purge_dead_edges (bb);
     }
 
+  /* Re-insert the ARGS_SIZE notes.  */
+  if (as_note)
+    fixup_args_size_notes (before_try, last, INTVAL (XEXP (as_note, 0)));
+
   /* If we generated a jump instruction, it won't have
      JUMP_LABEL set.  Recompute after we're done.  */
   for (x = last; x != before_try; x = PREV_INSN (x))
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index eccac9e..8c8a99a 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -201,3 +201,8 @@  REG_NOTE (CROSSING_JUMP)
 /* This kind of note is generated at each to `setjmp', and similar
    functions that can return twice.  */
 REG_NOTE (SETJMP)
+
+/* Indicates the cumulative offset of the stack pointer accounting
+   for pushed arguments.  This will only be generated when
+   ACCUMULATE_OUTGOING_ARGS is false.  */
+REG_NOTE (ARGS_SIZE)
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 499412c..3233580 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -4653,6 +4653,15 @@  reload_as_needed (int live_known)
 	      if (cfun->can_throw_non_call_exceptions && !CALL_P (insn))
 		fixup_eh_region_note (insn, prev, next);
 
+	      /* Adjust the location of REG_ARGS_SIZE.  */
+	      p = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
+	      if (p)
+		{
+		  remove_note (insn, p);
+		  fixup_args_size_notes (prev, PREV_INSN (next),
+					 INTVAL (XEXP (p, 0)));
+		}
+
 	      /* If this was an ASM, make sure that all the reload insns
 		 we have generated are valid.  If not, give an error
 		 and delete them.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 3156006..d5d7b02 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2471,6 +2471,7 @@  extern void emit_jump (rtx);
 /* In expr.c */
 extern rtx move_by_pieces (rtx, rtx, unsigned HOST_WIDE_INT,
 			   unsigned int, int);
+extern int fixup_args_size_notes (rtx, rtx, int);
 
 /* In cfgrtl.c */
 extern void print_rtl_with_bb (FILE *, const_rtx);