From patchwork Wed Dec 1 16:47:52 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yao Qi X-Patchwork-Id: 73846 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 1DD6BB70AF for ; Thu, 2 Dec 2010 03:48:20 +1100 (EST) Received: (qmail 21193 invoked by alias); 1 Dec 2010 16:48:15 -0000 Received: (qmail 21164 invoked by uid 22791); 1 Dec 2010 16:48:11 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL, BAYES_05, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 01 Dec 2010 16:48:03 +0000 Received: (qmail 19394 invoked from network); 1 Dec 2010 16:47:59 -0000 Received: from unknown (HELO ?192.168.0.102?) (yao@127.0.0.2) by mail.codesourcery.com with ESMTPA; 1 Dec 2010 16:47:59 -0000 Message-ID: <4CF67C38.4020304@codesourcery.com> Date: Thu, 02 Dec 2010 00:47:52 +0800 From: Yao Qi User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Eric Botcazou CC: gcc-patches@gcc.gnu.org Subject: Re: [Ping] [patch 0/3] New macro PREFERRED_RENAME_CLASS References: <4CD24091.1020904@codesourcery.com> <201011091016.07504.ebotcazou@adacore.com> <4CD9688C.8080101@codesourcery.com> <201011111834.02514.ebotcazou@adacore.com> In-Reply-To: <201011111834.02514.ebotcazou@adacore.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 11/12/2010 01:34 AM, Eric Botcazou wrote: > > It's a progress, but the first approach of using register classes instead of > individual registers seemed to be the right granularity here. > > For each element in the chain, we have the class. Can we compute a superset > of all the classes in the chain? If we cannot or if this is ALL_REGS, then > we iterate as before. If we can, we call the hook on the result and we first > iterate on the contents of this result (a class); if we find a new reg, we > stop, otherwise we iterate on the remaining registers. > In this patch, most of part is same as the previous one, except three changes, 1. compute superunion set of all the classes in the chain, reg_class_superunion_in_chain = reg_class_superunion[(int)reg_class_superunion_in_chain][(int)tmp->cl]; 2. compute the preferred class of reg_class_superunion_in_chain by hook, preferred_class = targetm.preferred_rename_class(reg_class_superunion_in_chain); 3. Iterate registers first in preferred_class, and stop once better register is found. Then, iterate the rest of registers as usual. Tested on mainline r167293 x86_64-linux with option {, -O2 -frename-registers} respectively. No regression. Macro PREFERRED_RENAME_CLASS is defined in backend. For example, on ARM, it can be, #define PREFERRED_RENAME_CLASS(CLASS) \ (TARGET_THUMB2 ? ((CLASS) == GENERAL_REGS \ ? LO_REGS : (NO_REGS)) \ : (NO_REGS)) Given this definition of macro, code size of bash-3.2 is reduced by 0.2% roughly, with option "-mthumb -march=armv7 -O2 -frename-registers". The result and code is posted here to prove this middle-end patch is useful. I'll submit another patch for ARM part changes, once this middle-end part is approved. OK for mainline? gcc/ * regrename.c (struct du_head): Add new element length. (sort_du_head, get_element, merge, merge_sort_comparison): New functions of merge sort implementation to du_head list. (regrename_optimize): Sort du_head linked list by length. Move some code to ... (check_new_reg_p): here. New function. (create_new_chain): Initialize length. (scan_rtx_reg): Increase length for non-debug insns. * target.def: New hook preferred_rename_class. * targhook.c (default_preferred_rename_class): New. * targhook.h: Declare it. * doc/tm.texi.in: New hook TARGET_PREFERRED_RENAME_CLASS. * doc/tm.texi: Regenerate. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 5446501..7105c72 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -2506,6 +2506,10 @@ looking for one that is valid, and will reload one or both registers only if neither labeling works. @end defmac +@deftypefn {Target Hook} reg_class_t TARGET_PREFERRED_RENAME_CLASS (reg_class_t @var{rclass}) +A target hook that places additional preference on the register class to use when it is necessary to rename a register in class @var{class} to another class, or perhaps @var{NO_REGS}, if no prefered register class is found or macro @code{PREFERRED_RENAME_CLASS} is not define. Sometimes returning a more restrictive class makes better code. For example, on ARM, thumb-2 instructions using @code{LO_REGS} may be smaller than instructions using @code{GENERIC_REGS}. By returning @code{LO_REGS} from @code{PREFERRED_RENAME_CLASS}, code size can be reduced. +@end deftypefn + @deftypefn {Target Hook} reg_class_t TARGET_PREFERRED_RELOAD_CLASS (rtx @var{x}, reg_class_t @var{rclass}) A target hook that places additional restrictions on the register class to use when it is necessary to copy value @var{x} into a register in class diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 4b21c92..e9bf4a8 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -2496,6 +2496,8 @@ looking for one that is valid, and will reload one or both registers only if neither labeling works. @end defmac +@hook TARGET_PREFERRED_RENAME_CLASS + @hook TARGET_PREFERRED_RELOAD_CLASS A target hook that places additional restrictions on the register class to use when it is necessary to copy value @var{x} into a register in class diff --git a/gcc/regrename.c b/gcc/regrename.c index 2535ab7..f085e78 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -38,6 +38,7 @@ #include "timevar.h" #include "tree-pass.h" #include "df.h" +#include "target.h" #if HOST_BITS_PER_WIDE_INT <= MAX_RECOG_OPERANDS #error "Use a different bitmap implementation for untracked_operands." @@ -51,6 +52,11 @@ struct du_head struct du_head *next_chain; /* The first and last elements of this chain. */ struct du_chain *first, *last; + /* The number of elements of this chain, excluding those corresponding + to references of the register in debug insns. The du_head linked + list can be sorted by this, and register-rename can prefer + register classes according to this order. */ + int length; /* Describes the register being tracked. */ unsigned regno, nregs; @@ -154,6 +160,199 @@ merge_overlapping_regs (HARD_REG_SET *pset, struct du_head *head) } } +/* Return the Nth element in LIST. If LIST contains less than N + elements, return the last one. */ +static struct du_head * +get_element (struct du_head *list, int n) +{ + while (n-- && list->next_chain != NULL) + list = list->next_chain; + + return list; +} + +/* Comparison function of merge sort. Return true if A is less than + B, otherwise return false. */ +static inline int +merge_sort_comparison(const struct du_head *a, + const struct du_head *b) +{ + return a->length < b->length; +} + +/* Merge the first 2 sub-lists of LENGTH nodes contained in the + linked list pointed to by START_NODE. Update START_NODE to point + to the merged nodes, and return a pointer to the last merged + node. Return NULL if START_NODE doesn't contain enough + elements, or this pass of merge is done. */ + +static struct du_head * +merge(struct du_head **start_node, int length) +{ + int i, left_count, right_count; + struct du_head *left, *right; + /* Current node of sort result. */ + struct du_head *current_sorted_node; + /* Tail node of sort, used to connect with next piece of list. */ + struct du_head *current_tail_node; + + if (*start_node == NULL) + return NULL; + + left = right = *start_node; + right_count = left_count = 0; + + /* Step RIGHT along the list by LENGTH places. */ + for (i = 0; i < length; i++) + { + right = right->next_chain; + if (right == NULL) + { + return NULL; + } + } + + /* Initialize current_sorted_node. */ + if (merge_sort_comparison (left, right)) + { + ++right_count; + current_sorted_node = right; + *start_node = right; + right = right->next_chain; + } + else + { + ++left_count; + current_sorted_node = left; + left = left->next_chain; + } + + while (1) + { + /* Choose LEFT or RIGHT to take the next element from. If + either is empty, choose from the other one. */ + if (left_count == length || left == NULL) + { + current_sorted_node->next_chain = right; + current_tail_node = get_element (current_sorted_node, + length - right_count); + + break; + } + else if (right_count == length || right == NULL) + { + /* Save the head node of next piece of linked list. */ + struct du_head *tmp = current_sorted_node->next_chain; + + current_sorted_node->next_chain = left; + current_tail_node + = get_element (current_sorted_node, + length - left_count); + /* Connect sorted list to next piece of list. */ + current_tail_node->next_chain = tmp; + break; + } + else + { + /* Normal merge operations. If both LEFT and RIGHT are + non-empty, compare the first element of each and choose + the lower one. */ + if (merge_sort_comparison (left, right)) + { + right_count++; + current_sorted_node->next_chain = right; + right = right->next_chain; + } + else + { + left_count++; + current_sorted_node->next_chain = left; + left = left->next_chain; + } + current_sorted_node = current_sorted_node->next_chain; + } + } + /* Return NULL if this pass of merge is done. */ + return (current_tail_node->next_chain ? current_tail_node : NULL); +} + +/* Sort the linked list pointed to by HEAD. The algorithm is a + non-recursive merge sort to linked list. */ + +static void +sort_du_head (struct du_head **head) +{ + int current_length = 1; + struct du_head *last_tail; + + /* In each pass, lists of size current_length is merged to + lists of size 2xcurrent_length (Initially current_length + is 1). */ + while (1) + { + last_tail = merge(head, current_length); + if (last_tail != NULL) + { + do + last_tail = merge (&last_tail->next_chain, current_length); + while (last_tail != NULL); + + current_length *= 2; + } + else + break; + } +} + +/* Check if NEW_REG can be the candidate register to rename for + REG in THIS_HEAD chain. THIS_UNAVAILABLE is a set of unavailable hard + registers. */ + +static bool +check_new_reg_p (int reg, int new_reg, struct du_head *this_head, + HARD_REG_SET this_unavailable) +{ + enum machine_mode mode = GET_MODE (*this_head->first->loc); + int nregs = hard_regno_nregs[new_reg][mode]; + int i; + struct du_chain *tmp; + + for (i = nregs - 1; i >= 0; --i) + if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i) + || fixed_regs[new_reg + i] + || global_regs[new_reg + i] + /* Can't use regs which aren't saved by the prologue. */ + || (! df_regs_ever_live_p (new_reg + i) + && ! call_used_regs[new_reg + i]) +#ifdef LEAF_REGISTERS + /* We can't use a non-leaf register if we're in a + leaf function. */ + || (current_function_is_leaf + && !LEAF_REGISTERS[new_reg + i]) +#endif +#ifdef HARD_REGNO_RENAME_OK + || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i) +#endif + ) + break; + if (i >= 0) + return false; + + /* See whether it accepts all modes that occur in + definition and uses. */ + for (tmp = this_head->first; tmp; tmp = tmp->next_use) + if ((! HARD_REGNO_MODE_OK (new_reg, GET_MODE (*tmp->loc)) + && ! DEBUG_INSN_P (tmp->insn)) + || (this_head->need_caller_save_reg + && ! (HARD_REGNO_CALL_PART_CLOBBERED + (reg, GET_MODE (*tmp->loc))) + && (HARD_REGNO_CALL_PART_CLOBBERED + (new_reg, GET_MODE (*tmp->loc))))) + break; + + return true; +} + /* Perform register renaming on the current function. */ static unsigned int @@ -195,6 +394,8 @@ regrename_optimize (void) if (dump_file) dump_def_use_chain (all_chains); + sort_du_head (&all_chains); + CLEAR_HARD_REG_SET (unavailable); /* Don't clobber traceback for noreturn functions. */ if (frame_pointer_needed) @@ -214,6 +415,8 @@ regrename_optimize (void) HARD_REG_SET this_unavailable; int reg = this_head->regno; int i; + enum reg_class reg_class_superunion_in_chain = NO_REGS; + enum reg_class preferred_class; all_chains = this_head->next_chain; @@ -243,16 +446,23 @@ regrename_optimize (void) COPY_HARD_REG_SET (this_unavailable, unavailable); - /* Count number of uses, and narrow the set of registers we can - use for renaming. */ + /* Iterate elements in chain in order to: + 1. Count number of uses, and narrow the set of registers we can + use for renaming. + 2. Compute the superunion of register classes in this chain. */ n_uses = 0; + reg_class_superunion_in_chain = NO_REGS; for (tmp = this_head->first; tmp; tmp = tmp->next_use) { if (DEBUG_INSN_P (tmp->insn)) continue; n_uses++; + IOR_COMPL_HARD_REG_SET (this_unavailable, reg_class_contents[tmp->cl]); + + reg_class_superunion_in_chain + = reg_class_superunion[(int)reg_class_superunion_in_chain][(int)tmp->cl]; } if (n_uses < 2) @@ -262,56 +472,55 @@ regrename_optimize (void) IOR_HARD_REG_SET (this_unavailable, call_used_reg_set); merge_overlapping_regs (&this_unavailable, this_head); - + /* Compute preferred rename class of super union of all the classes + on the chain. */ + preferred_class + = targetm.preferred_rename_class(reg_class_superunion_in_chain); + + /* The register iteration order here is "preferred-register-first". + Firstly(i == 0), we iterate registers belong to class + PREFERRED_CLASS, if we find a new register, we stop immeidately. + Otherwise, we iterate once more (i == 1) registers, which + doesn't belong class PREFERRED_CLASS. + If preferred class is not found by target hook, PREFERRED_CLASS + is NO_REGS, and registers are iterated in an ascending order + without any preference. */ + for (i = 0; i <2; i++) + { + bool prefer[] = {true, false}; + bool found = false; /* Now potential_regs is a reasonable approximation, let's have a closer look at each register still in there. */ for (new_reg = 0; new_reg < FIRST_PSEUDO_REGISTER; new_reg++) { - enum machine_mode mode = GET_MODE (*this_head->first->loc); - int nregs = hard_regno_nregs[new_reg][mode]; - - for (i = nregs - 1; i >= 0; --i) - if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i) - || fixed_regs[new_reg + i] - || global_regs[new_reg + i] - /* Can't use regs which aren't saved by the prologue. */ - || (! df_regs_ever_live_p (new_reg + i) - && ! call_used_regs[new_reg + i]) -#ifdef LEAF_REGISTERS - /* We can't use a non-leaf register if we're in a - leaf function. */ - || (current_function_is_leaf - && !LEAF_REGISTERS[new_reg + i]) -#endif -#ifdef HARD_REGNO_RENAME_OK - || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i) -#endif - ) - break; - if (i >= 0) + /* Iterate registers first in prefered class. */ + if (prefer[i] + != TEST_HARD_REG_BIT (reg_class_contents[preferred_class], + new_reg)) continue; - /* See whether it accepts all modes that occur in - definition and uses. */ - for (tmp = this_head->first; tmp; tmp = tmp->next_use) - if ((! HARD_REGNO_MODE_OK (new_reg, GET_MODE (*tmp->loc)) - && ! DEBUG_INSN_P (tmp->insn)) - || (this_head->need_caller_save_reg - && ! (HARD_REGNO_CALL_PART_CLOBBERED - (reg, GET_MODE (*tmp->loc))) - && (HARD_REGNO_CALL_PART_CLOBBERED - (new_reg, GET_MODE (*tmp->loc))))) - break; - if (! tmp) + if (check_new_reg_p (reg, new_reg, this_head, + this_unavailable)) { if (tick[best_new_reg] > tick[new_reg]) { + enum machine_mode mode + = GET_MODE (*this_head->first->loc); best_new_reg = new_reg; - best_nregs = nregs; + best_nregs = hard_regno_nregs[new_reg][mode]; + /* If we find a new reg in our preferred class, + stop immediately. */ + if (best_new_reg != reg && prefer[i]) + { + found = true; + break; } } } - + } + if (found) + break; + } if (dump_file) { fprintf (dump_file, "Register %s in insn %d", @@ -527,6 +736,7 @@ create_new_chain (unsigned this_regno, unsigned this_nregs, rtx *loc, head->need_caller_save_reg = 0; head->cannot_rename = 0; head->terminated = 0; + head->length = 0; VEC_safe_push (du_head_p, heap, id_to_chain, head); head->id = current_id++; @@ -572,6 +782,8 @@ create_new_chain (unsigned this_regno, unsigned this_nregs, rtx *loc, this_du->loc = loc; this_du->insn = insn; this_du->cl = cl; + + head->length = 1; } static void @@ -661,6 +873,8 @@ scan_rtx_reg (rtx insn, rtx *loc, enum reg_class cl, enum scan_actions action, head->last->next_use = this_du; head->last = this_du; + if (!DEBUG_INSN_P (insn)) + head->length++; } /* Avoid adding the same location in a DEBUG_INSN multiple times, which could happen with non-exact overlap. */ diff --git a/gcc/target.def b/gcc/target.def index 66006ae..0f9a9f9 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2204,6 +2204,21 @@ DEFHOOK bool, (reg_class_t rclass), default_class_likely_spilled_p) +DEFHOOK +(preferred_rename_class, + "A target hook that places additional preference on the register\ + class to use when it is necessary to rename a register in class\ + @var{class} to another class, or perhaps @var{NO_REGS}, if no\ + prefered register class is found or macro @code{PREFERRED_RENAME_CLASS}\ + is not define.\ + Sometimes returning a more restrictive class makes better code. For\ + example, on ARM, thumb-2 instructions using @code{LO_REGS} may be\ + smaller than instructions using @code{GENERIC_REGS}. By returning\ + @code{LO_REGS} from @code{PREFERRED_RENAME_CLASS}, code size can\ + be reduced.", + reg_class_t, (reg_class_t rclass), + default_preferred_rename_class) + /* This target hook allows the backend to perform additional processing while initializing for variable expansion. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 3647436..19fc29a 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1270,6 +1270,16 @@ default_preferred_output_reload_class (rtx x ATTRIBUTE_UNUSED, #endif } +reg_class_t +default_preferred_rename_class (reg_class_t rclass) +{ +#ifdef PREFERRED_RENAME_CLASS + return PREFERRED_RENAME_CLASS ((enum reg_class)rclass); +#else + return NO_REGS; +#endif +} + /* The default implementation of TARGET_CLASS_LIKELY_SPILLED_P. */ bool diff --git a/gcc/targhooks.h b/gcc/targhooks.h index eeefe05..6fe8350 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -158,6 +158,7 @@ extern int default_register_move_cost (enum machine_mode, reg_class_t, extern bool default_profile_before_prologue (void); extern reg_class_t default_preferred_reload_class (rtx, reg_class_t); extern reg_class_t default_preferred_output_reload_class (rtx, reg_class_t); +extern reg_class_t default_preferred_rename_class (reg_class_t rclass); extern bool default_class_likely_spilled_p (reg_class_t); extern enum unwind_info_type default_debug_unwind_info (void);