From patchwork Fri Jan 25 13:05:02 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 215660 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C05D12C0091 for ; Sat, 26 Jan 2013 00:05:51 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1359723952; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC: Subject:Content-Type:Mailing-List:Precedence:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=IdIQbcJ+s0tdygHOgLqaYsKn6sM=; b=M5MMlMHweVDx4MR TOC+UZaZBMQM5vJ2jFb/fB9fmdDD4sjbbML33OgR9Wi1PClXbdqTsuVOFn1JfFYz n2a6MqD7g21lGdD8YxlL5T8KNbBQ2pZwbH8/VEyfuawjIYqgcSF7fzzscpmzXqui FkquAwWihu8gA9Proyjjp2QHMfbU= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=PHIh8XTmFfzUgCsp2FKpPq6fgHGUIwRV7ZRUCijdpw6ty9bCgFVKzm1SRu5+M/ NDjHUClX4a/mz+VKvVjlcetd1wt481f97Mr5VP5thTKhNSCEolx5fuKM0bJAGGpJ O3as7PQcu3OUH59L9cAsAyj8gsvsqFo0Ce8uf20wp+34w=; Received: (qmail 29495 invoked by alias); 25 Jan 2013 13:05:43 -0000 Received: (qmail 29476 invoked by uid 22791); 25 Jan 2013 13:05:41 -0000 X-SWARE-Spam-Status: No, hits=0.2 required=5.0 tests=AWL, BAYES_20, KAM_STOCKTIP, KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, TW_FN, TW_FV, TW_FW, TW_TJ, TW_VP X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jan 2013 13:05:11 +0000 Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93]) by relay1.mentorg.com with esmtp id 1Tyixy-0001ZI-Sk from Tom_deVries@mentor.com ; Fri, 25 Jan 2013 05:05:10 -0800 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Fri, 25 Jan 2013 05:05:10 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Fri, 25 Jan 2013 13:05:07 +0000 Message-ID: <510282FE.1060809@mentor.com> Date: Fri, 25 Jan 2013 14:05:02 +0100 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Vladimir Makarov , Steven Bosscher CC: "gcc-patches@gcc.gnu.org" , Radovan Obradovic Subject: [PATCH][IRA] Analysis of register usage of functions for usage by IRA. Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Vladimir, this patch adds analysis of register usage of functions for usage by IRA. The patch: - adds analysis in pass_final to track which hard registers are set or clobbered by the function body, and stores that information in a struct cgraph_node. - adds a target hook fn_other_hard_reg_usage to list hard registers that are set or clobbered by a call to a function, but are not listed as such in the function body, such as f.i. registers clobbered by veneers inserted by the linker. - adds a reg-note REG_CALL_DECL, to be able to easily link call_insns to their corresponding declaration, even after the calls may have been split into an insn (set register to function address) and a call_insn (call register), which can happen for f.i. sh, and mips with -mabi-calls. - uses the register analysis in IRA. - adds an option -fuse-caller-save to control the optimization, on by default at -Os and -O2 and higher. The patch (original version by Radovan Obradovic) is similar to your patch ( http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01625.html ) from 2007. But this patch doesn't implement save area stack slot sharing. ( Btw, I've borrowed the struct cgraph_node field name and comment from the 2007 patch ). [ Steven, you mentioned in this discussion ( http://gcc.gnu.org/ml/gcc/2012-10/msg00213.html ) that you are working on porting the 2007 patch to trunk. What is the status of that effort? ] As an example of the functionality, consider foo and bar from test-case aru-1.c: ... static int __attribute__((noinline)) bar (int x) { return x + 3; } int __attribute__((noinline)) foo (int y) { return y + bar (y); } ... Compiled at -O2, bar only sets register $2 (the first return register): ... bar: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .set nomacro j $31 addiu $2,$4,3 ... foo then can use register $3 (the second return register) instead of register $16 to save the value in register $4 (the first argument register) over the call, as demonstrated here in a -fno-use-caller-save vs. -fuse-caller-save diff: ... foo: foo: # vars= 0, regs= 2/0, args= 16, gp= 8 | # vars= 0, regs= 1/0, args= 16, gp= 8 .frame $sp,32,$31 .frame $sp,32,$31 .mask 0x80010000,-4 | .mask 0x80000000,-4 .fmask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .set noreorder .set nomacro .set nomacro addiu $sp,$sp,-32 addiu $sp,$sp,-32 sw $31,28($sp) sw $31,28($sp) sw $16,24($sp) < .option pic0 .option pic0 jal bar jal bar .option pic2 .option pic2 move $16,$4 | move $3,$4 lw $31,28($sp) lw $31,28($sp) addu $2,$2,$16 | addu $2,$2,$3 lw $16,24($sp) < j $31 j $31 addiu $sp,$sp,32 addiu $sp,$sp,32 ... That way we skip the save and restore of register $16, which is not necessary for $3. Btw, a further improvement could be to reuse $4 after the call, and eliminate the move. A version of this patch on top of 4.6 ran into trouble with the epilogue on arm, where a register was clobbered by a stack pop instruction, while that was not visible in the rtl representation. This instruction was introduced in arm_output_epilogue by code marked with the comment 'pop call clobbered registers if it avoids a separate stack adjustment'. I cannot reproduce that issue on trunk. Looking at the generated rtl, it seems that the epilogue instructions now list all registers set by it, so collect_fn_hard_reg_usage is able to analyze all clobbered registers. Bootstrapped and reg-tested on x86_64, Ada inclusive. Build and reg-tested on mips, arm, ppc and sh. No issues found. OK for stage1 trunk? Thanks, - Tom 2013-01-24 Radovan Obradovic Tom de Vries * hooks.c (hook_void_hard_reg_set_containerp): New function. * hooks.h (hook_void_hard_reg_set_containerp): Declare. * target.def (fn_other_hard_reg_usage): New DEFHOOK. * config/arm/arm.c (TARGET_FN_OTHER_HARD_REG_USAGE): Redefine as arm_fn_other_hard_reg_usage. (arm_fn_other_hard_reg_usage): New function. * doc/tm.texi.in (@node Stack and Calling): Add Miscellaneous Register Hooks to @menu. (@node Miscellaneous Register Hooks): New node. (@hook TARGET_FN_OTHER_HARD_REG_USAGE): New hook. * doc/tm.texi: Regenerate. * reg-notes.def (REG_NOTE (CALL_DECL)): New reg-note REG_CALL_DECL. * calls.c (expand_call, emit_library_call_value_1): Add REG_CALL_DECL reg-note. * combine.c (distribute_notes): Handle REG_CALL_DECL reg-note. * emit-rtl.c (try_split): Same. * rtlanal.c (find_all_hard_reg_sets): Add bool implicit parameter and handle. * rtl.h (find_all_hard_reg_sets): Add bool parameter. * haifa-sched.c (recompute_todo_spec, check_clobbered_conditions): Add new argument to find_all_hard_reg_sets call. cgraph.h (struct cgraph_node): Add function_used_regs, function_used_regs_initialized and function_used_regs_valid fields. * common.opt (fuse-caller-save): New option. * opts.c (default_options_table): Add OPT_LEVELS_2_PLUS entry with OPT_fuse_caller_save. * final.c: Move include of hard-reg-set.h to before rtl.h to declare find_all_hard_reg_sets. (collect_fn_hard_reg_usage, get_call_fndecl, get_call_cgraph_node) (get_call_reg_set_usage): New function. (rest_of_handle_final): Use collect_fn_hard_reg_usage. * regs.h (get_call_reg_set_usage): Declare. * df-scan.c (df_get_call_refs): Use get_call_reg_set_usage. * caller-save.c (setup_save_areas, save_call_clobbered_regs): Use get_call_reg_set_usage. * resource.c (mark_set_resources, mark_target_live_regs): Use get_call_reg_set_usage. * ira-int.h (struct ira_allocno): Add crossed_calls_clobbered_regs field. (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS): Define. * ira-lives.c (process_bb_node_lives): Use get_call_reg_set_usage. Calculate ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS. * ira-build.c (ira_create_allocno): Init ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS. (create_cap_allocno, propagate_allocno_info) (propagate_some_info_from_allocno) (copy_info_to_removed_store_destinations): Handle ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS. * ira-costs.c (ira_tune_allocno_costs): Use ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS to adjust costs. * doc/invoke.texi (@item Optimization Options): Add -fuse-caller-save to gccoptlist. (@item -fuse-caller-save): New item. * lib/target-supports.exp (check_effective_target_mips16) (check_effective_target_micromips): New proc. * gcc.target/mips/mips.exp: Add use-caller-save to -ffoo/-fno-foo options. Add -save-temps to mips_option_groups. * gcc.target/mips/aru-1.c: New test. Index: gcc/hooks.c =================================================================== --- gcc/hooks.c (revision 195240) +++ gcc/hooks.c (working copy) @@ -446,3 +446,11 @@ void hook_void_gcc_optionsp (struct gcc_options *opts ATTRIBUTE_UNUSED) { } + +/* Generic hook that takes a struct hard_reg_set_container * and returns + void. */ + +void +hook_void_hard_reg_set_containerp (struct hard_reg_set_container *regs ATTRIBUTE_UNUSED) +{ +} Index: gcc/hooks.h =================================================================== --- gcc/hooks.h (revision 195240) +++ gcc/hooks.h (working copy) @@ -69,6 +69,7 @@ extern void hook_void_tree (tree); extern void hook_void_tree_treeptr (tree, tree *); extern void hook_void_int_int (int, int); extern void hook_void_gcc_optionsp (struct gcc_options *); +extern void hook_void_hard_reg_set_containerp (struct hard_reg_set_container *); extern int hook_int_uint_mode_1 (unsigned int, enum machine_mode); extern int hook_int_const_tree_0 (const_tree); Index: gcc/target.def =================================================================== --- gcc/target.def (revision 195240) +++ gcc/target.def (working copy) @@ -2859,6 +2859,17 @@ DEFHOOK void, (bitmap regs), hook_void_bitmap) +/* For targets that need to mark extra registers as clobbered on entry to + the function, they should define this target hook and set their + bits in the struct hard_reg_set_container passed in. */ +DEFHOOK +(fn_other_hard_reg_usage, + "Add any hard registers to @var{regs} that are set or clobbered by a call to\ + the function. This hook only needs to be defined to provide registers that\ + cannot be found by examination of the final RTL representation of a function.", + void, (struct hard_reg_set_container *regs), + hook_void_hard_reg_set_containerp) + /* Fill in additional registers set up by prologue into a regset. */ DEFHOOK (set_up_by_prologue, Index: gcc/cgraph.h =================================================================== --- gcc/cgraph.h (revision 195240) +++ gcc/cgraph.h (working copy) @@ -251,6 +251,15 @@ struct GTY(()) cgraph_node { /* Unique id of the node. */ int uid; + /* Call unsaved hard registers really used by the corresponding + function (including ones used by functions called by the + function). */ + HARD_REG_SET function_used_regs; + /* Set if function_used_regs is initialized. */ + unsigned function_used_regs_initialized: 1; + /* Set if function_used_regs is valid. */ + unsigned function_used_regs_valid: 1; + /* Set when decl is an abstract function pointed to by the ABSTRACT_DECL_ORIGIN of a reachable function. */ unsigned abstract_and_needed : 1; Index: gcc/rtlanal.c =================================================================== --- gcc/rtlanal.c (revision 195240) +++ gcc/rtlanal.c (working copy) @@ -1028,13 +1028,13 @@ record_hard_reg_sets (rtx x, const_rtx p /* Examine INSN, and compute the set of hard registers written by it. Store it in *PSET. Should only be called after reload. */ void -find_all_hard_reg_sets (const_rtx insn, HARD_REG_SET *pset) +find_all_hard_reg_sets (const_rtx insn, HARD_REG_SET *pset, bool implicit) { rtx link; CLEAR_HARD_REG_SET (*pset); note_stores (PATTERN (insn), record_hard_reg_sets, pset); - if (CALL_P (insn)) + if (implicit && CALL_P (insn)) IOR_HARD_REG_SET (*pset, call_used_reg_set); for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) if (REG_NOTE_KIND (link) == REG_INC) Index: gcc/final.c =================================================================== --- gcc/final.c (revision 195240) +++ gcc/final.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. #include "tm.h" #include "tree.h" +#include "hard-reg-set.h" #include "rtl.h" #include "tm_p.h" #include "regs.h" @@ -56,7 +57,6 @@ along with GCC; see the file COPYING3. #include "recog.h" #include "conditions.h" #include "flags.h" -#include "hard-reg-set.h" #include "output.h" #include "except.h" #include "function.h" @@ -219,6 +219,7 @@ static int alter_cond (rtx); static int final_addr_vec_align (rtx); #endif static int align_fuzz (rtx, rtx, int, unsigned); +static void collect_fn_hard_reg_usage (void); /* Initialize data in final at the beginning of a compilation. */ @@ -4277,6 +4278,8 @@ rest_of_handle_final (void) rtx x; const char *fnname; + collect_fn_hard_reg_usage (); + /* Get the function's name, as described by its RTL. This may be different from the DECL_NAME name used in the source file. */ @@ -4533,3 +4536,121 @@ struct rtl_opt_pass pass_clean_state = 0 /* todo_flags_finish */ } }; + +/* Collect hard register usage for the current function. */ + +static void +collect_fn_hard_reg_usage (void) +{ + rtx insn; + int i; + struct cgraph_node *node; + struct hard_reg_set_container other_usage; + + if (!flag_use_caller_save) + return; + + node = cgraph_get_node (current_function_decl); + gcc_assert (node != NULL); + + gcc_assert (!node->function_used_regs_initialized); + node->function_used_regs_initialized = 1; + + for (insn = get_insns (); insn != NULL_RTX; insn = next_insn (insn)) + { + HARD_REG_SET insn_used_regs; + + if (!NONDEBUG_INSN_P (insn)) + continue; + + find_all_hard_reg_sets (insn, &insn_used_regs, false); + + if (CALL_P (insn) + && !get_call_reg_set_usage (insn, &insn_used_regs, call_used_reg_set)) + { + CLEAR_HARD_REG_SET (node->function_used_regs); + return; + } + + IOR_HARD_REG_SET (node->function_used_regs, insn_used_regs); + } + + /* Be conservative - mark fixed and global registers as used. */ + IOR_HARD_REG_SET (node->function_used_regs, fixed_reg_set); + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) + if (global_regs[i]) + SET_HARD_REG_BIT (node->function_used_regs, i); + +#ifdef STACK_REGS + /* Handle STACK_REGS conservatively, since the df-framework does not + provide accurate information for them. */ + + for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++) + SET_HARD_REG_BIT (node->function_used_regs, i); +#endif + + CLEAR_HARD_REG_SET (other_usage.set); + targetm.fn_other_hard_reg_usage (&other_usage); + IOR_HARD_REG_SET (node->function_used_regs, other_usage.set); + + node->function_used_regs_valid = 1; +} + +/* Get the declaration of the function called by INSN. */ + +static tree +get_call_fndecl (rtx insn) +{ + rtx note, datum; + + if (!flag_use_caller_save) + return NULL_TREE; + + note = find_reg_note (insn, REG_CALL_DECL, NULL_RTX); + if (note == NULL_RTX) + return NULL_TREE; + + datum = XEXP (note, 0); + if (datum != NULL_RTX) + return SYMBOL_REF_DECL (datum); + + return NULL_TREE; +} + +static struct cgraph_node * +get_call_cgraph_node (rtx insn) +{ + tree fndecl; + + if (insn == NULL_RTX) + return NULL; + + fndecl = get_call_fndecl (insn); + if (fndecl == NULL_TREE + || !targetm.binds_local_p (fndecl)) + return NULL; + + return cgraph_get_node (fndecl); +} + +/* Find hard registers used by function call instruction INSN, and return them + in REG_SET. Return DEFAULT_SET in REG_SET if not found. */ + +bool +get_call_reg_set_usage (rtx insn, HARD_REG_SET *reg_set, + HARD_REG_SET default_set) +{ + struct cgraph_node *node = get_call_cgraph_node (insn); + if (node != NULL + && node->function_used_regs_valid) + { + COPY_HARD_REG_SET (*reg_set, node->function_used_regs); + AND_HARD_REG_SET (*reg_set, default_set); + return true; + } + else + { + COPY_HARD_REG_SET (*reg_set, default_set); + return false; + } +} Index: gcc/regs.h =================================================================== --- gcc/regs.h (revision 195240) +++ gcc/regs.h (working copy) @@ -419,4 +419,8 @@ range_in_hard_reg_set_p (const HARD_REG_ return true; } +/* Get registers used by given function call instruction. */ +extern bool get_call_reg_set_usage (rtx insn, HARD_REG_SET *reg_set, + HARD_REG_SET default_set); + #endif /* GCC_REGS_H */ Index: gcc/df-scan.c =================================================================== --- gcc/df-scan.c (revision 195240) +++ gcc/df-scan.c (working copy) @@ -3363,10 +3363,13 @@ df_get_call_refs (struct df_collection_r bool is_sibling_call; unsigned int i; HARD_REG_SET defs_generated; + HARD_REG_SET fn_reg_set_usage; CLEAR_HARD_REG_SET (defs_generated); df_find_hard_reg_defs (PATTERN (insn_info->insn), &defs_generated); is_sibling_call = SIBLING_CALL_P (insn_info->insn); + get_call_reg_set_usage (insn_info->insn, &fn_reg_set_usage, + regs_invalidated_by_call); for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) { @@ -3391,6 +3394,7 @@ df_get_call_refs (struct df_collection_r } } else if (TEST_HARD_REG_BIT (regs_invalidated_by_call, i) + && TEST_HARD_REG_BIT (fn_reg_set_usage, i) /* no clobbers for regs that are the result of the call */ && !TEST_HARD_REG_BIT (defs_generated, i) && (!is_sibling_call Index: gcc/haifa-sched.c =================================================================== --- gcc/haifa-sched.c (revision 195240) +++ gcc/haifa-sched.c (working copy) @@ -1271,7 +1271,7 @@ recompute_todo_spec (rtx next, bool for_ { HARD_REG_SET t; - find_all_hard_reg_sets (prev, &t); + find_all_hard_reg_sets (prev, &t, true); if (TEST_HARD_REG_BIT (t, regno)) return HARD_DEP; if (prev == pro) @@ -3041,7 +3041,7 @@ check_clobbered_conditions (rtx insn) if ((current_sched_info->flags & DO_PREDICATION) == 0) return; - find_all_hard_reg_sets (insn, &t); + find_all_hard_reg_sets (insn, &t, true); restart: for (i = 0; i < ready.n_ready; i++) Index: gcc/caller-save.c =================================================================== --- gcc/caller-save.c (revision 195240) +++ gcc/caller-save.c (working copy) @@ -441,7 +441,7 @@ setup_save_areas (void) freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)); REG_SET_TO_HARD_REG_SET (hard_regs_to_save, &chain->live_throughout); - COPY_HARD_REG_SET (used_regs, call_used_reg_set); + get_call_reg_set_usage (insn, &used_regs, call_used_reg_set); /* Record all registers set in this call insn. These don't need to be saved. N.B. the call insn might set a subreg @@ -525,7 +525,7 @@ setup_save_areas (void) REG_SET_TO_HARD_REG_SET (hard_regs_to_save, &chain->live_throughout); - COPY_HARD_REG_SET (used_regs, call_used_reg_set); + get_call_reg_set_usage (insn, &used_regs, call_used_reg_set); /* Record all registers set in this call insn. These don't need to be saved. N.B. the call insn might set a subreg @@ -804,6 +804,7 @@ save_call_clobbered_regs (void) { unsigned regno; HARD_REG_SET hard_regs_to_save; + HARD_REG_SET call_def_reg_set; reg_set_iterator rsi; rtx cheap; @@ -854,7 +855,9 @@ save_call_clobbered_regs (void) AND_COMPL_HARD_REG_SET (hard_regs_to_save, call_fixed_reg_set); AND_COMPL_HARD_REG_SET (hard_regs_to_save, this_insn_sets); AND_COMPL_HARD_REG_SET (hard_regs_to_save, hard_regs_saved); - AND_HARD_REG_SET (hard_regs_to_save, call_used_reg_set); + get_call_reg_set_usage (insn, &call_def_reg_set, + call_used_reg_set); + AND_HARD_REG_SET (hard_regs_to_save, call_def_reg_set); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (TEST_HARD_REG_BIT (hard_regs_to_save, regno)) Index: gcc/ira-int.h =================================================================== --- gcc/ira-int.h (revision 195240) +++ gcc/ira-int.h (working copy) @@ -374,6 +374,8 @@ struct ira_allocno /* The number of calls across which it is live, but which should not affect register preferences. */ int cheap_calls_crossed_num; + /* Registers clobbered by intersected calls. */ + HARD_REG_SET crossed_calls_clobbered_regs; /* Array of usage costs (accumulated and the one updated during coloring) for each hard register of the allocno class. The member value can be NULL if all costs are the same and equal to @@ -417,6 +419,8 @@ struct ira_allocno #define ALLOCNO_CALL_FREQ(A) ((A)->call_freq) #define ALLOCNO_CALLS_CROSSED_NUM(A) ((A)->calls_crossed_num) #define ALLOCNO_CHEAP_CALLS_CROSSED_NUM(A) ((A)->cheap_calls_crossed_num) +#define ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS(A) \ + ((A)->crossed_calls_clobbered_regs) #define ALLOCNO_MEM_OPTIMIZED_DEST(A) ((A)->mem_optimized_dest) #define ALLOCNO_MEM_OPTIMIZED_DEST_P(A) ((A)->mem_optimized_dest_p) #define ALLOCNO_SOMEWHERE_RENAMED_P(A) ((A)->somewhere_renamed_p) Index: gcc/opts.c =================================================================== --- gcc/opts.c (revision 195240) +++ gcc/opts.c (working copy) @@ -484,6 +484,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_fuse_caller_save, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c (revision 195240) +++ gcc/ira-lives.c (working copy) @@ -1273,6 +1273,10 @@ process_bb_node_lives (ira_loop_tree_nod ira_object_t obj = ira_object_id_map[i]; ira_allocno_t a = OBJECT_ALLOCNO (obj); int num = ALLOCNO_NUM (a); + HARD_REG_SET this_call_used_reg_set; + + get_call_reg_set_usage (insn, &this_call_used_reg_set, + call_used_reg_set); /* Don't allocate allocnos that cross setjmps or any call, if this function receives a nonlocal @@ -1287,9 +1291,9 @@ process_bb_node_lives (ira_loop_tree_nod if (can_throw_internal (insn)) { IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj), - call_used_reg_set); + this_call_used_reg_set); IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), - call_used_reg_set); + this_call_used_reg_set); } if (sparseset_bit_p (allocnos_processed, num)) @@ -1306,6 +1310,8 @@ process_bb_node_lives (ira_loop_tree_nod /* Mark it as saved at the next call. */ allocno_saved_at_call[num] = last_call_num + 1; ALLOCNO_CALLS_CROSSED_NUM (a)++; + IOR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a), + this_call_used_reg_set); if (cheap_reg != NULL_RTX && ALLOCNO_REGNO (a) == (int) REGNO (cheap_reg)) ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a)++; Index: gcc/ira-build.c =================================================================== --- gcc/ira-build.c (revision 195240) +++ gcc/ira-build.c (working copy) @@ -506,6 +506,7 @@ ira_create_allocno (int regno, bool cap_ ALLOCNO_CALL_FREQ (a) = 0; ALLOCNO_CALLS_CROSSED_NUM (a) = 0; ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a) = 0; + CLEAR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); #ifdef STACK_REGS ALLOCNO_NO_STACK_REG_P (a) = false; ALLOCNO_TOTAL_NO_STACK_REG_P (a) = false; @@ -903,6 +904,8 @@ create_cap_allocno (ira_allocno_t a) ALLOCNO_CALLS_CROSSED_NUM (cap) = ALLOCNO_CALLS_CROSSED_NUM (a); ALLOCNO_CHEAP_CALLS_CROSSED_NUM (cap) = ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a); + IOR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (cap), + ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL) { fprintf (ira_dump_file, " Creating cap "); @@ -1822,6 +1825,8 @@ propagate_allocno_info (void) += ALLOCNO_CALLS_CROSSED_NUM (a); ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a) += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a); + IOR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a), + ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (parent_a) += ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a); aclass = ALLOCNO_CLASS (a); @@ -2202,6 +2207,9 @@ propagate_some_info_from_allocno (ira_al ALLOCNO_CALLS_CROSSED_NUM (a) += ALLOCNO_CALLS_CROSSED_NUM (from_a); ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a) += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (from_a); + IOR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a), + ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (from_a)); + ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) += ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (from_a); if (! ALLOCNO_BAD_SPILL_P (from_a)) @@ -2827,6 +2835,8 @@ copy_info_to_removed_store_destinations += ALLOCNO_CALLS_CROSSED_NUM (a); ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a) += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a); + IOR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a), + ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (parent_a) += ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a); merged_p = true; Index: gcc/calls.c =================================================================== --- gcc/calls.c (revision 195240) +++ gcc/calls.c (working copy) @@ -3158,6 +3158,19 @@ expand_call (tree exp, rtx target, int i next_arg_reg, valreg, old_inhibit_defer_pop, call_fusage, flags, args_so_far); + if (flag_use_caller_save) + { + rtx last, datum = NULL_RTX; + if (fndecl != NULL_TREE) + { + datum = XEXP (DECL_RTL (fndecl), 0); + gcc_assert (datum != NULL_RTX + && GET_CODE (datum) == SYMBOL_REF); + } + last = last_call_insn (); + add_reg_note (last, REG_CALL_DECL, datum); + } + /* If the call setup or the call itself overlaps with anything of the argument setup we probably clobbered our call address. In that case we can't do sibcalls. */ @@ -4183,6 +4196,14 @@ emit_library_call_value_1 (int retval, r valreg, old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); + if (flag_use_caller_save) + { + rtx last, datum = orgfun; + gcc_assert (GET_CODE (datum) == SYMBOL_REF); + last = last_call_insn (); + add_reg_note (last, REG_CALL_DECL, datum); + } + /* Right-shift returned value if necessary. */ if (!pcc_struct_value && TYPE_MODE (tfom) != BLKmode Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 195240) +++ gcc/emit-rtl.c (working copy) @@ -3517,6 +3517,7 @@ try_split (rtx pat, rtx trial, int last) int probability; rtx insn_last, insn; int njumps = 0; + rtx call_insn = NULL_RTX; /* We're not good at redistributing frame information. */ if (RTX_FRAME_RELATED_P (trial)) @@ -3589,6 +3590,9 @@ try_split (rtx pat, rtx trial, int last) { rtx next, *p; + gcc_assert (call_insn == NULL_RTX); + call_insn = insn; + /* Add the old CALL_INSN_FUNCTION_USAGE to whatever the target may have explicitly specified. */ p = &CALL_INSN_FUNCTION_USAGE (insn); @@ -3660,6 +3664,11 @@ try_split (rtx pat, rtx trial, int last) fixup_args_size_notes (NULL_RTX, insn_last, INTVAL (XEXP (note, 0))); break; + case REG_CALL_DECL: + gcc_assert (call_insn != NULL_RTX); + add_reg_note (call_insn, REG_NOTE_KIND (note), XEXP (note, 0)); + break; + default: break; } Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 195240) +++ gcc/common.opt (working copy) @@ -2540,4 +2540,8 @@ Create a position independent executable z Driver Joined Separate +fuse-caller-save +Common Report Var(flag_use_caller_save) Optimization +Use caller save register across calls if possible + ; This comment is to ensure we retain the blank line above. Index: gcc/ira-costs.c =================================================================== --- gcc/ira-costs.c (revision 195240) +++ gcc/ira-costs.c (working copy) @@ -2082,6 +2082,7 @@ ira_tune_allocno_costs (void) ira_allocno_object_iterator oi; ira_object_t obj; bool skip_p; + HARD_REG_SET *crossed_calls_clobber_regs; FOR_EACH_ALLOCNO (a, ai) { @@ -2116,17 +2117,24 @@ ira_tune_allocno_costs (void) continue; rclass = REGNO_REG_CLASS (regno); cost = 0; - if (ira_hard_reg_set_intersection_p (regno, mode, call_used_reg_set) - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) - cost += (ALLOCNO_CALL_FREQ (a) - * (ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1])); + crossed_calls_clobber_regs + = &(ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); + if (ira_hard_reg_set_intersection_p (regno, mode, + *crossed_calls_clobber_regs)) + { + if (ira_hard_reg_set_intersection_p (regno, mode, + call_used_reg_set) + || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) + cost += (ALLOCNO_CALL_FREQ (a) + * (ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1])); #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER - cost += ((ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1]) - * ALLOCNO_FREQ (a) - * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); + cost += ((ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1]) + * ALLOCNO_FREQ (a) + * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); #endif + } if (INT_MAX - cost < reg_costs[j]) reg_costs[j] = INT_MAX; else Index: gcc/rtl.h =================================================================== --- gcc/rtl.h (revision 195240) +++ gcc/rtl.h (working copy) @@ -2039,7 +2039,7 @@ extern const_rtx set_of (const_rtx, cons extern void record_hard_reg_sets (rtx, const_rtx, void *); extern void record_hard_reg_uses (rtx *, void *); #ifdef HARD_CONST -extern void find_all_hard_reg_sets (const_rtx, HARD_REG_SET *); +extern void find_all_hard_reg_sets (const_rtx, HARD_REG_SET *, bool); #endif extern void note_stores (const_rtx, void (*) (rtx, const_rtx, void *), void *); extern void note_uses (rtx *, void (*) (rtx *, void *), void *); Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 195240) +++ gcc/combine.c (working copy) @@ -13188,6 +13188,7 @@ distribute_notes (rtx notes, rtx from_in case REG_NORETURN: case REG_SETJMP: case REG_TM: + case REG_CALL_DECL: /* These notes must remain with the call. It should not be possible for both I2 and I3 to be a call. */ if (CALL_P (i3)) Index: gcc/resource.c =================================================================== --- gcc/resource.c (revision 195240) +++ gcc/resource.c (working copy) @@ -649,10 +649,12 @@ mark_set_resources (rtx x, struct resour if (mark_type == MARK_SRC_DEST_CALL) { rtx link; + HARD_REG_SET regs; res->cc = res->memory = 1; - IOR_HARD_REG_SET (res->regs, regs_invalidated_by_call); + get_call_reg_set_usage (x, ®s, regs_invalidated_by_call); + IOR_HARD_REG_SET (res->regs, regs); for (link = CALL_INSN_FUNCTION_USAGE (x); link; link = XEXP (link, 1)) @@ -998,11 +1000,15 @@ mark_target_live_regs (rtx insns, rtx ta if (CALL_P (real_insn)) { + HARD_REG_SET regs_invalidated_by_this_call; /* CALL clobbers all call-used regs that aren't fixed except sp, ap, and fp. Do this before setting the result of the call live. */ - AND_COMPL_HARD_REG_SET (current_live_regs, + get_call_reg_set_usage (real_insn, + ®s_invalidated_by_this_call, regs_invalidated_by_call); + AND_COMPL_HARD_REG_SET (current_live_regs, + regs_invalidated_by_this_call); /* A CALL_INSN sets any global register live, since it may have been modified by the call. */ Index: gcc/reg-notes.def =================================================================== --- gcc/reg-notes.def (revision 195240) +++ gcc/reg-notes.def (working copy) @@ -216,3 +216,8 @@ REG_NOTE (ARGS_SIZE) that the return value of a call can be used to reinitialize a pseudo reg. */ REG_NOTE (RETURNED) + +/* Used to mark a call with the function decl called by the call. + The decl might not be available in the call due to splitting of the call + insn. This note is a SYMBOL_REF. */ +REG_NOTE (CALL_DECL) Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 195418) +++ gcc/doc/tm.texi (working copy) @@ -3074,6 +3074,7 @@ This describes the stack layout and call * Profiling:: * Tail Calls:: * Stack Smashing Protection:: +* Miscellaneous Register Hooks:: @end menu @node Frame Layout @@ -4999,6 +5000,14 @@ normally defined in @file{libgcc2.c}. Whether this target supports splitting the stack when the options described in @var{opts} have been passed. This is called after options have been parsed, so the target may reject splitting the stack in some configurations. The default version of this hook returns false. If @var{report} is true, this function may issue a warning or error; if @var{report} is false, it must simply return a value @end deftypefn +@node Miscellaneous Register Hooks +@subsection Miscellaneous register hooks +@cindex miscellaneous register hooks + +@deftypefn {Target Hook} void TARGET_FN_OTHER_HARD_REG_USAGE (struct hard_reg_set_container *@var{regs}) +Add any hard registers to @var{regs} that are set or clobbered by a call to the function. This hook only needs to be defined to provide registers that cannot be found by examination of the final RTL representation of a function. +@end deftypefn + @node Varargs @section Implementing the Varargs Macros @cindex varargs implementation Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 195418) +++ gcc/doc/tm.texi.in (working copy) @@ -3042,6 +3042,7 @@ This describes the stack layout and call * Profiling:: * Tail Calls:: * Stack Smashing Protection:: +* Miscellaneous Register Hooks:: @end menu @node Frame Layout @@ -4922,6 +4923,12 @@ normally defined in @file{libgcc2.c}. @hook TARGET_SUPPORTS_SPLIT_STACK +@node Miscellaneous Register Hooks +@subsection Miscellaneous register hooks +@cindex miscellaneous register hooks + +@hook TARGET_FN_OTHER_HARD_REG_USAGE + @node Varargs @section Implementing the Varargs Macros @cindex varargs implementation Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 195418) +++ gcc/doc/invoke.texi (working copy) @@ -419,8 +419,8 @@ Objective-C and Objective-C++ Dialects}. -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol --fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol --fwhole-program -fwpa -fuse-ld=@var{linker} -fuse-linker-plugin @gol +-fuse-caller-save -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol +-fweb -fwhole-program -fwpa -fuse-ld=@var{linker} -fuse-linker-plugin @gol --param @var{name}=@var{value} -O -O0 -O1 -O2 -O3 -Os -Ofast -Og} @@ -7355,6 +7355,14 @@ and then tries to find ways to combine t Enabled by default at @option{-O1} and higher. +@item -fuse-caller-save +Use caller save registers for allocation if those registers are not used by +any called function. In that case it is not necessary to save and restore +them around calls. This is only possible if called functions are part of +same compilation unit as current function and they are compiled before it. + +Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}. + @item -fconserve-stack @opindex fconserve-stack Attempt to minimize stack usage. The compiler attempts to use less Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 195240) +++ gcc/config/arm/arm.c (working copy) @@ -270,6 +270,7 @@ static bool arm_vectorize_vec_perm_const const unsigned char *sel); static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1, bool op0_preserve_value); +static void arm_fn_other_hard_reg_usage (struct hard_reg_set_container *); /* Table of machine attributes. */ static const struct attribute_spec arm_attribute_table[] = @@ -633,6 +634,10 @@ static const struct attribute_spec arm_a #define TARGET_CANONICALIZE_COMPARISON \ arm_canonicalize_comparison +#undef TARGET_FN_OTHER_HARD_REG_USAGE +#define TARGET_FN_OTHER_HARD_REG_USAGE \ + arm_fn_other_hard_reg_usage + struct gcc_target targetm = TARGET_INITIALIZER; /* Obstack for minipool constant handling. */ @@ -3695,6 +3700,19 @@ arm_canonicalize_comparison (int *code, } } +/* Implement TARGET_FN_OTHER_HARD_REG_USAGE. */ + +static void +arm_fn_other_hard_reg_usage (struct hard_reg_set_container *regs) +{ + if (TARGET_AAPCS_BASED) + { + /* For AAPCS, IP and CC can be clobbered by veneers inserted by the + linker. */ + SET_HARD_REG_BIT (regs->set, IP_REGNUM); + SET_HARD_REG_BIT (regs->set, CC_REGNUM); + } +} /* Define how to find the value returned by a function. */ Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp (revision 195240) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -897,6 +897,26 @@ proc check_effective_target_mips16_attri } [add_options_for_mips16_attribute ""]] } +# Return 1 if the target generates mips16 code by default. + +proc check_effective_target_mips16 { } { + return [check_no_compiler_messages mips16 assembly { + #if !(defined __mips16) + #error FOO + #endif + } ""] +} + +# Return 1 if the target generates micromips code by default. + +proc check_effective_target_micromips { } { + return [check_no_compiler_messages micromips assembly { + #if !(defined __mips_micromips) + #error FOO + #endif + } ""] +} + # Return 1 if the target supports long double larger than double when # using the new ABI, 0 otherwise. Index: gcc/testsuite/gcc.target/mips/mips.exp =================================================================== --- gcc/testsuite/gcc.target/mips/mips.exp (revision 195240) +++ gcc/testsuite/gcc.target/mips/mips.exp (working copy) @@ -245,6 +245,7 @@ set mips_option_groups { small-data "-G[0-9]+" warnings "-w" dump "-fdump-.*" + save_temps "-save-temps" } # Add -mfoo/-mno-foo options to mips_option_groups. @@ -301,6 +302,7 @@ foreach option { tree-vectorize unroll-all-loops unroll-loops + use-caller-save } { lappend mips_option_groups $option "-f(no-|)$option" } Index: gcc/testsuite/gcc.target/mips/aru-1.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.target/mips/aru-1.c (revision 0) @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-fuse-caller-save -save-temps" } */ +/* { dg-skip-if "" { *-*-* } { "*" } { "-Os" } } */ +/* Testing -fuse-caller-save optimization option. */ + +static int __attribute__((noinline)) +bar (int x) +{ + return x + 3; +} + +int __attribute__((noinline)) +foo (int y) +{ + return y + bar (y); +} + +int +main (void) +{ + return !(foo (5) == 13); +} + +/* Check that there are only 2 stack-saves: r31 in main and foo. */ + +/* Variant not mips16. Check that there only 2 sw/sd. */ +/* { dg-final { scan-assembler-times "(?n)s\[wd\]\t\\\$.*,.*\\(\\\$sp\\)" 2 { target { ! mips16 } } } } */ + +/* Variant not mips16, Subvariant micromips. Additionally check there's no + swm. */ +/* { dg-final { scan-assembler-times "(?n)swm\t\\\$.*,.*\\(\\\$sp\\)" 0 {target micromips } } } */ + +/* Variant mips16. The save can save 1 or more registers, check that only 1 is + saved, twice in total. */ +/* { dg-final { scan-assembler-times "(?n)save\t\[0-9\]*,\\\$\[^,\]*\$" 2 { target mips16 } } } */ + +/* Check that the first caller-save register is unused. */ +/* { dg-final { scan-assembler-not "(\\\$16)" } } */