diff mbox

[AVR] : Support tail calls

Message ID 4D8091B3.6070509@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay March 16, 2011, 10:32 a.m. UTC
Richard Henderson schrieb:
> On 03/11/2011 05:43 AM, Georg-Johann Lay wrote:
>> I did not find a way to make this work together with -mcall-prologues.
>> Please let me know if you have suggestion on how call prologues can be
>> combine with tail calls.
> 
> You need a new symbol in libgcc for this.  It should be easy enough to have
> the sibcall epilogue load up Z+EIND before jumping to the new symbol
> (perhaps called __sibcall_restores__).  This new symbol would be just like
> the existing __epilogue_restores__ except that it would finish with an
> eijmp/ijmp instruction (depending on multilib) instead of a ret instruction.

The point is that the intent of -mprologue-saves is to save program
memory space (flash). If we load the callee's address in the epilogue,
the epilogue's size will increase. Moreover, adding an other lengthy
function besides __epilogue_restores__ to libgcc, that will increase
code size because in many cases we will see both epilogue flavors
being dragged into application.

Note that, in comparison with non-tailcall, a tailcall without
prologue-saves saves one instruction, i.e. the ret. This is no more
true in presence of prologue-saves, i.e. even with tail-calling the
code would be neither faster nor smaller. (It might still save some
stack space, though)

>> The implementation uses struct machine_function to pass information
>> around, i.e. from avr_function_arg_advance to avr_function_ok_for_sibcall.
> 
> Look at how the s390 port handles this exact problem.
> 
>   /* Register 6 on s390 is available as an argument register but unfortunately
>      "caller saved". This makes functions needing this register for arguments
>      not suitable for sibcalls.  */
>   return !s390_call_saved_register_used (exp);
> 
> I'll admit that it would be helpful if the cumulative_args pointer was passed
> into the ok_for_sibcall hook, but it's not *that* hard to recreate that value
> by hand.  This is what the s390_call_saved_register_used function does.

See also discussion in

http://gcc.gnu.org/ml/gcc/2011-03/msg00048.html

I must admit that I did not try to copy-paste all code in calls.c that
deals with preparation of arguments and mapping trees to rtx and see
how it deflates when applying all the #if and #ifdef and avr backend
implications...

Maybe we see Joern's work being committed soon so that on top of this
a patch adding cumulative args to function_of_for_sibcall will be
approved.

Presumably, even s390 backend guys would prefer reusing information
instead of recomputing them.

> 
>> +      || (avr_OS_task_function_p (decl_callee) ^ avr_OS_task_function_p (current_function_decl))
> 
> Please just use != instead of ^ here.  Also, needs line wrapping.

Done in the patch attached.

In the cases of OS_task resp.OS_main, I think I am a bit
over-conservative when rejecting tail-calls. This is basically because
there is *no* documentation of these attributes. Presumably they
implement "naked+ret" so that tail-calling from OS_task resp. OS_main
should be ok?

> I do like very much how you've cleaned up the call patterns.  IMO this should
> be committed as a separate patch; I'll let the AVR maintainers approve it though.

If anyone desires an explicit, intermediate step I will provide
according patch, of course.

New patch attached. Changes with respect to original patch:
- use != instead of ^ to ensure caller/callee epilogue compatibility
  for functions attributed OS_task resp OS_main
- add some more comment to md

Johann

2011-03-16  Georg-Johann Lay  <avr@gjlay.de>

	* config/avr/avr-protos.h (expand_epilogue): Change prototype
	* config/avr/avr.h (struct machine_function): Add field
	sibcall_fails.
	* config/avr/avr.c (init_cumulative_args,
	avr_function_arg_advance): Use it.
	* config/avr/avr.c (expand_epilogue): Add bool parameter. Handle
	sibcall	epilogues.
	(TARGET_FUNCTION_OK_FOR_SIBCALL): Define to...
	(avr_function_ok_for_sibcall): ...this new function.
	(avr_lookup_function_attribute1): New static Function.
	(avr_naked_function_p, interrupt_function_p,
	signal_function_p, avr_OS_task_function_p,
	avr_OS_main_function_p): Use it.
	* config/avr/avr.md ("sibcall", "sibcall_value",
	"sibcall_epilogue"): New expander.
	("*call_insn", "*call_value_insn"): New insn.
	("call_insn", "call_value_insn"): Remove
	("call", "call_value", "epilogue"): Change expander to handle
	sibling calls.

Comments

Richard Henderson March 16, 2011, 3:51 p.m. UTC | #1
On 03/16/2011 03:32 AM, Georg-Johann Lay wrote:
> Richard Henderson schrieb:
>> On 03/11/2011 05:43 AM, Georg-Johann Lay wrote:
>>> I did not find a way to make this work together with -mcall-prologues.
>>> Please let me know if you have suggestion on how call prologues can be
>>> combine with tail calls.
>>
>> You need a new symbol in libgcc for this.  It should be easy enough to have
>> the sibcall epilogue load up Z+EIND before jumping to the new symbol
>> (perhaps called __sibcall_restores__).  This new symbol would be just like
>> the existing __epilogue_restores__ except that it would finish with an
>> eijmp/ijmp instruction (depending on multilib) instead of a ret instruction.
> 
> The point is that the intent of -mprologue-saves is to save program
> memory space (flash). If we load the callee's address in the epilogue,
> the epilogue's size will increase. Moreover, adding an other lengthy
> function besides __epilogue_restores__ to libgcc, that will increase
> code size because in many cases we will see both epilogue flavors
> being dragged into application.

All true, but supposing there are enough tail calls then on balance
you'll still save flash space.

An intermediate alternative is to simply override TARGET_CALL_PROLOGUES
in sibcall epilogues.  I.e. let __prologue_saves__ be generated as usual,
and __epilogue_restores__ be generated as usual for normal epilogues, 
but test for sibcall_p in computing "minimize".


r~
Georg-Johann Lay March 16, 2011, 6:59 p.m. UTC | #2
Richard Henderson schrieb:
> On 03/16/2011 03:32 AM, Georg-Johann Lay wrote:
> 
>>Richard Henderson schrieb:
>>
>>>On 03/11/2011 05:43 AM, Georg-Johann Lay wrote:
>>>
>>>>I did not find a way to make this work together with -mcall-prologues.
>>>>Please let me know if you have suggestion on how call prologues can be
>>>>combine with tail calls.
>>>
>>>You need a new symbol in libgcc for this.  It should be easy enough to have
>>>the sibcall epilogue load up Z+EIND before jumping to the new symbol
>>>(perhaps called __sibcall_restores__).  This new symbol would be just like
>>>the existing __epilogue_restores__ except that it would finish with an
>>>eijmp/ijmp instruction (depending on multilib) instead of a ret instruction.
>>
>>The point is that the intent of -mprologue-saves is to save program
>>memory space (flash). If we load the callee's address in the epilogue,
>>the epilogue's size will increase. Moreover, adding an other lengthy
>>function besides __epilogue_restores__ to libgcc, that will increase
>>code size because in many cases we will see both epilogue flavors
>>being dragged into application.
> 
> All true, but supposing there are enough tail calls then on balance
> you'll still save flash space.

Only if epilogue size decreses. As tail-call saves at most 2 bytes and 
storing the address (either return-to in epilogue or jump-to caller) 
will take more than 2 bytes. So nothing is won.

> An intermediate alternative is to simply override TARGET_CALL_PROLOGUES
> in sibcall epilogues.  I.e. let __prologue_saves__ be generated as usual,
> and __epilogue_restores__ be generated as usual for normal epilogues, 
> but test for sibcall_p in computing "minimize".

This means that long epilogues that with call-prologues would deflate 
then would no more deflate and had to be coded inline in epilogue.

Maybe we can take the patch as-is, let approve it and fiddle out more 
optimization opportunities later?

Johann
Denis Chertykov March 17, 2011, 11 a.m. UTC | #3
2011/3/16 Georg-Johann Lay <avr@gjlay.de>
>
> Richard Henderson schrieb:
>>
>> On 03/16/2011 03:32 AM, Georg-Johann Lay wrote:
>>
>>> Richard Henderson schrieb:
>>>
>>>> On 03/11/2011 05:43 AM, Georg-Johann Lay wrote:
>>>>
>>>>> I did not find a way to make this work together with -mcall-prologues.
>>>>> Please let me know if you have suggestion on how call prologues can be
>>>>> combine with tail calls.
>>>>
>>>> You need a new symbol in libgcc for this.  It should be easy enough to have
>>>> the sibcall epilogue load up Z+EIND before jumping to the new symbol
>>>> (perhaps called __sibcall_restores__).  This new symbol would be just like
>>>> the existing __epilogue_restores__ except that it would finish with an
>>>> eijmp/ijmp instruction (depending on multilib) instead of a ret instruction.
>>>
>>> The point is that the intent of -mprologue-saves is to save program
>>> memory space (flash). If we load the callee's address in the epilogue,
>>> the epilogue's size will increase. Moreover, adding an other lengthy
>>> function besides __epilogue_restores__ to libgcc, that will increase
>>> code size because in many cases we will see both epilogue flavors
>>> being dragged into application.
>>
>> All true, but supposing there are enough tail calls then on balance
>> you'll still save flash space.
>
> Only if epilogue size decreses. As tail-call saves at most 2 bytes and storing the address (either return-to in epilogue or jump-to caller) will take more than 2 bytes. So nothing is won.
>
>> An intermediate alternative is to simply override TARGET_CALL_PROLOGUES
>> in sibcall epilogues.  I.e. let __prologue_saves__ be generated as usual,
>> and __epilogue_restores__ be generated as usual for normal epilogues, but test for sibcall_p in computing "minimize".
>
> This means that long epilogues that with call-prologues would deflate then would no more deflate and had to be coded inline in epilogue.
>
> Maybe we can take the patch as-is, let approve it and fiddle out more optimization opportunities later?

Is it tested for regressions ?

Denis.
Georg-Johann Lay March 18, 2011, 10:10 a.m. UTC | #4
Denis Chertykov schrieb:
> 2011/3/16 Georg-Johann Lay <avr@gjlay.de>
>> Richard Henderson schrieb:
>>> On 03/16/2011 03:32 AM, Georg-Johann Lay wrote:
>>> 
>>>> Richard Henderson schrieb:
>>>> 
>>>>> On 03/11/2011 05:43 AM, Georg-Johann Lay wrote:
>>>>> 
>>>>>> I did not find a way to make this work together with
>>>>>> -mcall-prologues. Please let me know if you have
>>>>>> suggestion on how call prologues can be combine with tail
>>>>>> calls.
>>>>> You need a new symbol in libgcc for this.  It should be
>>>>> easy enough to have the sibcall epilogue load up Z+EIND
>>>>> before jumping to the new symbol (perhaps called
>>>>> __sibcall_restores__).  This new symbol would be just like 
>>>>> the existing __epilogue_restores__ except that it would
>>>>> finish with an eijmp/ijmp instruction (depending on
>>>>> multilib) instead of a ret instruction.
>>>> The point is that the intent of -mprologue-saves is to save
>>>> program memory space (flash). If we load the callee's address
>>>> in the epilogue, the epilogue's size will increase. Moreover,
>>>> adding an other lengthy function besides
>>>> __epilogue_restores__ to libgcc, that will increase code size
>>>> because in many cases we will see both epilogue flavors being
>>>> dragged into application.
>>> All true, but supposing there are enough tail calls then on
>>> balance you'll still save flash space.
>> Only if epilogue size decreses. As tail-call saves at most 2
>> bytes and storing the address (either return-to in epilogue or
>> jump-to caller) will take more than 2 bytes. So nothing is won.
>> 
>>> An intermediate alternative is to simply override
>>> TARGET_CALL_PROLOGUES in sibcall epilogues.  I.e. let
>>> __prologue_saves__ be generated as usual, and
>>> __epilogue_restores__ be generated as usual for normal
>>> epilogues, but test for sibcall_p in computing "minimize".
>> This means that long epilogues that with call-prologues would
>> deflate then would no more deflate and had to be coded inline in
>> epilogue.
>> 
>> Maybe we can take the patch as-is, let approve it and fiddle out
>> more optimization opportunities later?
> 
> Is it tested for regressions ?
> 
> Denis.

I ran tests against svn 170942 (latest 4.7.0 snapshot). Besides
timestamps, the diff looks like this:

1435a1436,1437
> XPASS: gcc.dg/sibcall-3.c execution test
> XPASS: gcc.dg/sibcall-4.c execution test
1630,1631c1632,1633
< # of unexpected successes     4
< # of expected failures                132
---
> # of unexpected successes     6
> # of expected failures                130
1670c1672

This is due to xfail for runtime-test of sibcall abilities for avr-*-*.

Johann
Denis Chertykov March 18, 2011, 6:16 p.m. UTC | #5
2011/3/18 Georg-Johann Lay <avr@gjlay.de>:

>> Is it tested for regressions ?
>>
>> Denis.
>
> I ran tests against svn 170942 (latest 4.7.0 snapshot). Besides
> timestamps, the diff looks like this:
>
> 1435a1436,1437
>> XPASS: gcc.dg/sibcall-3.c execution test
>> XPASS: gcc.dg/sibcall-4.c execution test
> 1630,1631c1632,1633
> < # of unexpected successes     4
> < # of expected failures                132
> ---
>> # of unexpected successes     6
>> # of expected failures                130
> 1670c1672
>
> This is due to xfail for runtime-test of sibcall abilities for avr-*-*.
>


So, the patch can be committed.
Please, prepare the final version of the patch.

Denis.
diff mbox

Patch

Index: config/avr/avr-protos.h
===================================================================
--- config/avr/avr-protos.h	(Revision 171039)
+++ config/avr/avr-protos.h	(Arbeitskopie)
@@ -76,7 +76,7 @@  extern const char *lshrsi3_out (rtx insn
 extern bool avr_rotate_bytes (rtx operands[]);
 
 extern void expand_prologue (void);
-extern void expand_epilogue (void);
+extern void expand_epilogue (bool);
 extern int avr_epilogue_uses (int regno);
 
 extern void avr_output_bld (rtx operands[], int bit_nr);
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 171039)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -2647,94 +2647,91 @@ 
 ;; call
 
 (define_expand "call"
-  [(call (match_operand:HI 0 "call_insn_operand" "")
-         (match_operand:HI 1 "general_operand" ""))]
+  [(parallel[(call (match_operand:HI 0 "call_insn_operand" "")
+                   (match_operand:HI 1 "general_operand" ""))
+             (use (const_int 0))])]
   ;; Operand 1 not used on the AVR.
+  ;; Operand 2 is 1 for tail-call, 0 otherwise.
+  ""
+  "")
+
+(define_expand "sibcall"
+  [(parallel[(call (match_operand:HI 0 "call_insn_operand" "")
+                   (match_operand:HI 1 "general_operand" ""))
+             (use (const_int 1))])]
+  ;; Operand 1 not used on the AVR.
+  ;; Operand 2 is 1 for tail-call, 0 otherwise.
   ""
   "")
 
 ;; call value
 
 (define_expand "call_value"
-  [(set (match_operand 0 "register_operand" "")
-        (call (match_operand:HI 1 "call_insn_operand" "")
-              (match_operand:HI 2 "general_operand" "")))]
+  [(parallel[(set (match_operand 0 "register_operand" "")
+                  (call (match_operand:HI 1 "call_insn_operand" "")
+                        (match_operand:HI 2 "general_operand" "")))
+             (use (const_int 0))])]
   ;; Operand 2 not used on the AVR.
+  ;; Operand 3 is 1 for tail-call, 0 otherwise.
   ""
   "")
 
-(define_insn "call_insn"
-  [(call (mem:HI (match_operand:HI 0 "nonmemory_operand" "!z,*r,s,n"))
-         (match_operand:HI 1 "general_operand" "X,X,X,X"))]
-;; We don't need in saving Z register because r30,r31 is a call used registers
+(define_expand "sibcall_value"
+  [(parallel[(set (match_operand 0 "register_operand" "")
+                  (call (match_operand:HI 1 "call_insn_operand" "")
+                        (match_operand:HI 2 "general_operand" "")))
+             (use (const_int 1))])]
+  ;; Operand 2 not used on the AVR.
+  ;; Operand 3 is 1 for tail-call, 0 otherwise.
+  ""
+  "")
+
+(define_insn "*call_insn"
+  [(parallel[(call (mem:HI (match_operand:HI 0 "nonmemory_operand" "z,s,z,s"))
+                   (match_operand:HI 1 "general_operand"           "X,X,X,X"))
+             (use (match_operand:HI 2 "const_int_operand"          "L,L,P,P"))])]
   ;; Operand 1 not used on the AVR.
-  "(register_operand (operands[0], HImode) || CONSTANT_P (operands[0]))"
-  "*{
-  if (which_alternative==0)
-     return \"%!icall\";
-  else if (which_alternative==1)
-    {
-      if (AVR_HAVE_MOVW)
-	return (AS2 (movw, r30, %0) CR_TAB
-               \"%!icall\");
-      else
-	return (AS2 (mov, r30, %A0) CR_TAB
-		AS2 (mov, r31, %B0) CR_TAB
-		\"%!icall\");
-    }
-  else if (which_alternative==2)
-    return AS1(%~call,%x0);
-  return (AS2 (ldi,r30,lo8(%0)) CR_TAB
-          AS2 (ldi,r31,hi8(%0)) CR_TAB
-          \"%!icall\");
-}"
-  [(set_attr "cc" "clobber,clobber,clobber,clobber")
+  ;; Operand 2 is 1 for tail-call, 0 otherwise.
+  ""
+  "@
+    %!icall
+    %~call %x0
+    %!ijmp
+    %~jmp %x0"
+  [(set_attr "cc" "clobber")
    (set_attr_alternative "length"
-			 [(const_int 1)
-			  (if_then_else (eq_attr "mcu_have_movw" "yes")
-					(const_int 2)
-					(const_int 3))
-			  (if_then_else (eq_attr "mcu_mega" "yes")
-					(const_int 2)
-					(const_int 1))
-			  (const_int 3)])])
-
-(define_insn "call_value_insn"
-  [(set (match_operand 0 "register_operand" "=r,r,r,r")
-        (call (mem:HI (match_operand:HI 1 "nonmemory_operand" "!z,*r,s,n"))
-;; We don't need in saving Z register because r30,r31 is a call used registers
-              (match_operand:HI 2 "general_operand" "X,X,X,X")))]
+                         [(const_int 1)
+                          (if_then_else (eq_attr "mcu_mega" "yes")
+                                        (const_int 2)
+                                        (const_int 1))
+                          (const_int 1)
+                          (if_then_else (eq_attr "mcu_mega" "yes")
+                                        (const_int 2)
+                                        (const_int 1))])])
+
+(define_insn "*call_value_insn"
+  [(parallel[(set (match_operand 0 "register_operand"                   "=r,r,r,r")
+                  (call (mem:HI (match_operand:HI 1 "nonmemory_operand"  "z,s,z,s"))
+                        (match_operand:HI 2 "general_operand"            "X,X,X,X")))
+             (use (match_operand:HI 3 "const_int_operand"                "L,L,P,P"))])]
   ;; Operand 2 not used on the AVR.
-  "(register_operand (operands[0], VOIDmode) || CONSTANT_P (operands[0]))"
-  "*{
-  if (which_alternative==0)
-     return \"%!icall\";
-  else if (which_alternative==1)
-    {
-      if (AVR_HAVE_MOVW)
-	return (AS2 (movw, r30, %1) CR_TAB
-		\"%!icall\");
-      else
-	return (AS2 (mov, r30, %A1) CR_TAB
-		AS2 (mov, r31, %B1) CR_TAB
-		\"%!icall\");
-    }
-  else if (which_alternative==2)
-    return AS1(%~call,%x1);
-  return (AS2 (ldi, r30, lo8(%1)) CR_TAB
-          AS2 (ldi, r31, hi8(%1)) CR_TAB
-          \"%!icall\");
-}"
-  [(set_attr "cc" "clobber,clobber,clobber,clobber")
+  ;; Operand 3 is 1 for tail-call, 0 otherwise.
+  ""
+  "@
+    %!icall
+    %~call %x1
+    %!ijmp
+    %~jmp %x1"
+  [(set_attr "cc" "clobber")
    (set_attr_alternative "length"
-			 [(const_int 1)
-			  (if_then_else (eq_attr "mcu_have_movw" "yes")
-					(const_int 2)
-					(const_int 3))
-			  (if_then_else (eq_attr "mcu_mega" "yes")
-					(const_int 2)
-					(const_int 1))
-			  (const_int 3)])])
+                         [(const_int 1)
+                          (if_then_else (eq_attr "mcu_mega" "yes")
+                                        (const_int 2)
+                                        (const_int 1))
+                          (const_int 1)
+                          (if_then_else (eq_attr "mcu_mega" "yes")
+                                        (const_int 2)
+                                        (const_int 1))])])
 
 (define_insn "nop"
   [(const_int 0)]
@@ -3246,8 +3243,15 @@ 
 (define_expand "epilogue"
   [(const_int 0)]
   ""
-  "
   {
-    expand_epilogue (); 
+    expand_epilogue (false /* sibcall_p */);
     DONE;
-  }")
+  })
+
+(define_expand "sibcall_epilogue"
+  [(const_int 0)]
+  ""
+  {
+    expand_epilogue (true /* sibcall_p */);
+    DONE;
+  })
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(Revision 171039)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -100,6 +100,7 @@  static rtx avr_function_arg (CUMULATIVE_
 static void avr_function_arg_advance (CUMULATIVE_ARGS *, enum machine_mode,
 				      const_tree, bool);
 static void avr_help (void);
+static bool avr_function_ok_for_sibcall (tree, tree);
 
 /* Allocate registers from r25 to r8 for parameters for function calls.  */
 #define FIRST_CUM_REG 26
@@ -231,6 +232,9 @@  static const struct default_options avr_
 #undef TARGET_HELP
 #define TARGET_HELP avr_help
 
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL avr_function_ok_for_sibcall
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 static void
@@ -330,17 +334,34 @@  avr_regno_reg_class (int r)
   return ALL_REGS;
 }
 
+/* A helper for the subsequent function attribute used to dig for
+   attribute 'name' in a FUNCTION_DECL or FUNCTION_TYPE */
+
+static inline int
+avr_lookup_function_attribute1 (const_tree func, const char *name)
+{
+  if (FUNCTION_DECL == TREE_CODE (func))
+    {
+      if (NULL_TREE != lookup_attribute (name, DECL_ATTRIBUTES (func)))
+        {
+          return true;
+        }
+      
+      func = TREE_TYPE (func);
+    }
+
+  gcc_assert (TREE_CODE (func) == FUNCTION_TYPE
+              || TREE_CODE (func) == METHOD_TYPE);
+  
+  return NULL_TREE != lookup_attribute (name, TYPE_ATTRIBUTES (func));
+}
+
 /* Return nonzero if FUNC is a naked function.  */
 
 static int
 avr_naked_function_p (tree func)
 {
-  tree a;
-
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-  
-  a = lookup_attribute ("naked", TYPE_ATTRIBUTES (TREE_TYPE (func)));
-  return a != NULL_TREE;
+  return avr_lookup_function_attribute1 (func, "naked");
 }
 
 /* Return nonzero if FUNC is an interrupt function as specified
@@ -349,13 +370,7 @@  avr_naked_function_p (tree func)
 static int
 interrupt_function_p (tree func)
 {
-  tree a;
-
-  if (TREE_CODE (func) != FUNCTION_DECL)
-    return 0;
-
-  a = lookup_attribute ("interrupt", DECL_ATTRIBUTES (func));
-  return a != NULL_TREE;
+  return avr_lookup_function_attribute1 (func, "interrupt");
 }
 
 /* Return nonzero if FUNC is a signal function as specified
@@ -364,13 +379,7 @@  interrupt_function_p (tree func)
 static int
 signal_function_p (tree func)
 {
-  tree a;
-
-  if (TREE_CODE (func) != FUNCTION_DECL)
-    return 0;
-
-  a = lookup_attribute ("signal", DECL_ATTRIBUTES (func));
-  return a != NULL_TREE;
+  return avr_lookup_function_attribute1 (func, "signal");
 }
 
 /* Return nonzero if FUNC is a OS_task function.  */
@@ -378,12 +387,7 @@  signal_function_p (tree func)
 static int
 avr_OS_task_function_p (tree func)
 {
-  tree a;
-
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-  
-  a = lookup_attribute ("OS_task", TYPE_ATTRIBUTES (TREE_TYPE (func)));
-  return a != NULL_TREE;
+  return avr_lookup_function_attribute1 (func, "OS_task");
 }
 
 /* Return nonzero if FUNC is a OS_main function.  */
@@ -391,12 +395,7 @@  avr_OS_task_function_p (tree func)
 static int
 avr_OS_main_function_p (tree func)
 {
-  tree a;
-
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-  
-  a = lookup_attribute ("OS_main", TYPE_ATTRIBUTES (TREE_TYPE (func)));
-  return a != NULL_TREE;
+  return avr_lookup_function_attribute1 (func, "OS_main");
 }
 
 /* Return the number of hard registers to push/pop in the prologue/epilogue
@@ -864,7 +863,7 @@  avr_epilogue_uses (int regno ATTRIBUTE_U
 /*  Output RTL epilogue.  */
 
 void
-expand_epilogue (void)
+expand_epilogue (bool sibcall_p)
 {
   int reg;
   int live_seq;
@@ -875,6 +874,8 @@  expand_epilogue (void)
   /* epilogue: naked  */
   if (cfun->machine->is_naked)
     {
+      gcc_assert (!sibcall_p);
+      
       emit_jump_insn (gen_return ());
       return;
     }
@@ -1016,7 +1017,8 @@  expand_epilogue (void)
           emit_insn (gen_popqi (zero_reg_rtx));
         }
 
-      emit_jump_insn (gen_return ());
+      if (!sibcall_p)
+        emit_jump_insn (gen_return ());
     }
 }
 
@@ -1629,6 +1631,10 @@  init_cumulative_args (CUMULATIVE_ARGS *c
   cum->regno = FIRST_CUM_REG;
   if (!libname && stdarg_p (fntype))
     cum->nregs = 0;
+
+  /* Assume the calle may be tail called */
+  
+  cfun->machine->sibcall_fails = 0;
 }
 
 /* Returns the number of registers to allocate for a function argument.  */
@@ -1676,6 +1682,23 @@  avr_function_arg_advance (CUMULATIVE_ARG
   cum->nregs -= bytes;
   cum->regno -= bytes;
 
+  /* A parameter is being passed in a call-saved register. As the original
+     contents of these regs has to be restored before leaving the function,
+     a function must not pass arguments in call-saved regs in order to get
+     tail-called. */
+  
+  if (cum->regno >= 0
+      && !call_used_regs[cum->regno])
+    {
+      /* FIXME: We ship info on failing tail-call in struct machine_function.
+         This uses internals of calls.c:expand_call() and the way args_so_far
+         is used. targetm.function_ok_for_sibcall() needs to be extended to
+         pass &args_so_far, too. At present, CUMULATIVE_ARGS is target
+         dependent so that such an extension is not wanted. */
+      
+      cfun->machine->sibcall_fails = 1;
+    }
+
   if (cum->nregs <= 0)
     {
       cum->nregs = 0;
@@ -1683,6 +1706,62 @@  avr_function_arg_advance (CUMULATIVE_ARG
     }
 }
 
+/* Implement `TARGET_FUNCTION_OK_FOR_SIBCALL' */
+/* Decide whether we can make a sibling call to a function.  DECL is the
+   declaration of the function being targeted by the call and EXP is the
+   CALL_EXPR representing the call. */
+
+static bool
+avr_function_ok_for_sibcall (tree decl_callee, tree exp_callee)
+{
+  tree fntype_callee;
+
+  /* Tail-calling must fail if callee-saved regs are used to pass
+     function args.  We must not tail-call when `epilogue_restores'
+     is used.  Unfortunately, we cannot tell at this point if that
+     actually will happen or not, and we cannot step back from
+     tail-calling. Thus, we inhibit tail-calling with -mcall-prologues. */
+  
+  if (cfun->machine->sibcall_fails
+      || TARGET_CALL_PROLOGUES)
+    {
+      return false;
+    }
+  
+  fntype_callee = TREE_TYPE (CALL_EXPR_FN (exp_callee));
+
+  if (decl_callee)
+    {
+      decl_callee = TREE_TYPE (decl_callee);
+    }
+  else
+    {
+      decl_callee = fntype_callee;
+      
+      while (FUNCTION_TYPE != TREE_CODE (decl_callee)
+             && METHOD_TYPE != TREE_CODE (decl_callee))
+        {
+          decl_callee = TREE_TYPE (decl_callee);
+        }
+    }
+
+  /* Ensure that caller and callee have compatible epilogues */
+  
+  if (interrupt_function_p (current_function_decl)
+      || signal_function_p (current_function_decl)
+      || avr_naked_function_p (decl_callee)
+      || avr_naked_function_p (current_function_decl)
+      || (avr_OS_task_function_p (decl_callee)
+          != avr_OS_task_function_p (current_function_decl))
+      || (avr_OS_main_function_p (decl_callee)
+          != avr_OS_main_function_p (current_function_decl)))
+    {
+      return false;
+    }
+ 
+  return true;
+}
+
 /***********************************************************************
   Functions for outputting various mov's for a various modes
 ************************************************************************/
Index: config/avr/avr.opt
===================================================================
--- config/avr/avr.opt	(Revision 171039)
+++ config/avr/avr.opt	(Arbeitskopie)
@@ -29,6 +29,9 @@  Target RejectNegative Joined Var(avr_mcu
 mdeb
 Target Report Undocumented Mask(ALL_DEBUG)
 
+mtest=
+Target RejectNegative Report Joined Undocumented UInteger Var(avr_test) Init(0)
+
 mint8
 Target Report Mask(INT8)
 Use an 8-bit 'int' type
Index: config/avr/avr.h
===================================================================
--- config/avr/avr.h	(Revision 171039)
+++ config/avr/avr.h	(Arbeitskopie)
@@ -828,4 +828,7 @@  struct GTY(()) machine_function
   
   /* Current function stack size.  */
   int stack_usage;
+
+  /* 'true' if a callee might be tail called */
+  int sibcall_fails;
 };