From patchwork Tue Jun 21 13:37:15 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Schmidt X-Patchwork-Id: 101289 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 6863CB6F87 for ; Tue, 21 Jun 2011 23:38:08 +1000 (EST) Received: (qmail 26315 invoked by alias); 21 Jun 2011 13:38:04 -0000 Received: (qmail 25402 invoked by uid 22791); 21 Jun 2011 13:37:54 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL, BAYES_50, TW_HW, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 21 Jun 2011 13:37:28 +0000 Received: (qmail 3229 invoked from network); 21 Jun 2011 13:37:26 -0000 Received: from unknown (HELO ?84.152.222.116?) (bernds@127.0.0.2) by mail.codesourcery.com with ESMTPA; 21 Jun 2011 13:37:26 -0000 Message-ID: <4E009E8B.6010104@codesourcery.com> Date: Tue, 21 Jun 2011 15:37:15 +0200 From: Bernd Schmidt User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Lightning/1.0b3pre Thunderbird/3.1.10 MIME-Version: 1.0 To: GCC Patches Subject: Generic hwloop support library Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org We currently support counted loops on several targets by having the loop-doloop pass search for suitable loops and convert them to using the doloop_end pattern. Unfortunately, there are various machines which have hardware loop support, but need additional tests and transformations, during the final stage of compilation, to ensure we can make use of their hardware. For example, Blackfin has an LSETUP instruction which sets up a loop start and loop end address; the hardware compares the program counter to the loop end address register and automatically resets it to the loop start address if the end is reached. The encoding of LSETUP has a limited amount of bits, which forces to compiler to ensure that an upper bound for the loop's length is not exceeded. The loop end must be after the loop start, which sometimes requires us to reorder the CFG. On C6X, we'd like to make use of the SPLOOP/SPKERNEL instructions, which define a hardware software pipelined loop. Instructions after SPLOOP are copied into a limited-size loop buffer from which they are reexecuted; SPKERNEL indicates the end of the loop. We have some target code that deals with this. Nathan Sidwell originally wrote this for the mt port (then called ms1, now removed), it was then reused for the Blackfin by Jie Zhang and myself. For C6X, I've decided that there is sufficient commonality that we can move parts of the machinery into a target-independent file, callable from the backends with a set of hooks that do the actual transformations. This is what the patch below does. There are some odd problems when regression testing on Blackfin right now, but I've got two runs with identical results. Doing before/after comparisons on a set of .i files, there is one case in my collection where this patch produces different code; this seems merely a difference between a long branch and a short branch. Not sure why it happens, but it seems innocuous. Ok? Bernd * function.c (record_hard_reg_sets): Moved to... * rtlanal.c (record_hard_reg_sets): ... here. No longer static. * rtl.h (record_hard_reg_sets): Declare. * hw-doloop.c: New file. * hw-doloop.h: New file. * Makefile.in (OBJS): Add hw-doloop.o. (hw-doloop.o): New rule. * config/bfin/bfin.c: Include "hw-doloop.h". (loop_info, DEF_VEC_P for loop_info, loop_info_d): Remove. (bfin_dump_loops, bfin_bb_in_loop, bfin_scan_loop): Remove. (hwloop_optimize): Renamed from bfin_optimize_loop. Return bool, true if the loop was successfully optimized. Remove code that was moved to hw-doloop.c, and adjust other parts. (hwloop_fail): New static function, containing parts that used to be in bfin_optimize_loop. (bfin_discover_loop, bfin_discover_loops, free_loops, bfin_reorder_loops): Remove. (hwloop_after_reorder, hwloop_pattern_reg): New static functions. (bfin_doloop_hooks): New variable. (bfin_reorg_loops): Remove most code, call reorg_loops. * config/bfin/bfin.md (doloop_end splitter): Also enable if loop counter is a memory_operand. Index: gcc/rtlanal.c =================================================================== --- gcc/rtlanal.c (revision 175202) +++ gcc/rtlanal.c (working copy) @@ -1000,6 +1000,17 @@ set_of (const_rtx pat, const_rtx insn) return data.found; } +/* This function, which is typically called through note_stores, + collects sets and clobbers of hard registers in a HARD_REG_SET, + which is pointed to by DATA. */ +void +record_hard_reg_sets (rtx x, const_rtx pat ATTRIBUTE_UNUSED, void *data) +{ + HARD_REG_SET *pset = (HARD_REG_SET *)data; + if (REG_P (x) && HARD_REGISTER_P (x)) + add_to_hard_reg_set (pset, GET_MODE (x), REGNO (x)); +} + /* Given an INSN, return a SET expression if this insn has only a single SET. It may also have CLOBBERs, USEs, or SET whose output will not be used, which we ignore. */ Index: gcc/function.c =================================================================== --- gcc/function.c (revision 175202) +++ gcc/function.c (working copy) @@ -2898,17 +2898,6 @@ assign_parm_setup_block (struct assign_p SET_DECL_RTL (parm, stack_parm); } -/* A subroutine of assign_parm_setup_reg, called through note_stores. - This collects sets and clobbers of hard registers in a HARD_REG_SET, - which is pointed to by DATA. */ -static void -record_hard_reg_sets (rtx x, const_rtx pat ATTRIBUTE_UNUSED, void *data) -{ - HARD_REG_SET *pset = (HARD_REG_SET *)data; - if (REG_P (x) && HARD_REGISTER_P (x)) - add_to_hard_reg_set (pset, GET_MODE (x), REGNO (x)); -} - /* A subroutine of assign_parms. Allocate a pseudo to hold the current parameter. Get it there. Perform all ABI specified conversions. */ Index: gcc/hw-doloop.c =================================================================== --- gcc/hw-doloop.c (revision 0) +++ gcc/hw-doloop.c (revision 0) @@ -0,0 +1,683 @@ +/* Code to analyze doloop loops in order for targets to perform late + optimizations converting doloops to other forms of hardware loops. + Copyright (C) 2011 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "flags.h" +#include "expr.h" +#include "hard-reg-set.h" +#include "basic-block.h" +#include "tm_p.h" +#include "df.h" +#include "cfglayout.h" +#include "cfgloop.h" +#include "output.h" +#include "recog.h" +#include "target.h" +#include "hw-doloop.h" + +#ifdef HAVE_doloop_end + +/* Dump information collected in LOOPS. */ + +static void +dump_hwloops (loop_info loops) +{ + loop_info loop; + + for (loop = loops; loop; loop = loop->next) + { + loop_info i; + basic_block b; + unsigned ix; + + fprintf (dump_file, ";; loop %d: ", loop->loop_no); + if (loop->bad) + fprintf (dump_file, "(bad) "); + fprintf (dump_file, "{head:%d, depth:%d, reg:%u}", + loop->head == NULL ? -1 : loop->head->index, + loop->depth, REGNO (loop->iter_reg)); + + fprintf (dump_file, " blocks: [ "); + for (ix = 0; VEC_iterate (basic_block, loop->blocks, ix, b); ix++) + fprintf (dump_file, "%d ", b->index); + fprintf (dump_file, "] "); + + fprintf (dump_file, " inner loops: [ "); + for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, i); ix++) + fprintf (dump_file, "%d ", i->loop_no); + fprintf (dump_file, "]\n"); + } + fprintf (dump_file, "\n"); +} + +/* Return true if BB is part of LOOP. */ +bool +bb_in_loop_p (loop_info loop, basic_block bb) +{ + return bitmap_bit_p (loop->block_bitmap, bb->index); +} + +/* Scan the blocks of LOOP (and its inferiors), and record things such as + hard registers set, jumps out of and within the loop. */ + +static void +scan_loop (loop_info loop) +{ + unsigned ix; + basic_block bb; + + if (loop->head == NULL) + { + gcc_assert (loop->bad); + return; + } + CLEAR_HARD_REG_SET (loop->regs_set_in_loop); + loop->iter_reg_used = false; + loop->jumps_within = false; + loop->jumps_outof = false; + loop->has_call = false; + loop->has_asm = false; + loop->iter_reg_used_outside = false; + loop->timode_insns = 0; + + if (REGNO_REG_SET_P (df_get_live_in (loop->successor), + REGNO (loop->iter_reg))) + loop->iter_reg_used_outside = true; + + for (ix = 0; VEC_iterate (basic_block, loop->blocks, ix, bb); ix++) + { + rtx insn; + edge e; + edge_iterator ei; + + if (bb != loop->tail) + FOR_EACH_EDGE (e, ei, bb->succs) + { + if (bb_in_loop_p (loop, e->dest)) + { + if (!(e->flags & EDGE_FALLTHRU)) + loop->jumps_within = true; + } + else + { + loop->jumps_outof = true; + if (!loop->bad) + gcc_assert (!REGNO_REG_SET_P (df_get_live_in (e->dest), + REGNO (loop->iter_reg))); + } + } + + for (insn = BB_HEAD (bb); + insn != NEXT_INSN (BB_END (bb)); + insn = NEXT_INSN (insn)) + { + HARD_REG_SET set_this_insn; + rtx x; + + if (!INSN_P (insn)) + continue; + + if (recog_memoized (insn) < 0 + && (GET_CODE (PATTERN (insn)) == ASM_INPUT + || asm_noperands (PATTERN (insn)) >= 0)) + loop->has_asm = true; + + if (GET_MODE (insn) == TImode) + loop->timode_insns++; + + CLEAR_HARD_REG_SET (set_this_insn); + note_stores (PATTERN (insn), record_hard_reg_sets, + &set_this_insn); + for (x = REG_NOTES (insn); x; x = XEXP (x, 1)) + if (REG_NOTE_KIND (x) == REG_INC) + record_hard_reg_sets (XEXP (x, 0), NULL, &set_this_insn); + + if (CALL_P (insn)) + { + loop->has_call = true; + for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1)) + if (GET_CODE (XEXP (x, 0)) == CLOBBER) + record_hard_reg_sets (XEXP (XEXP (x, 0), 0), NULL, + &set_this_insn); + + } + + if (insn == loop->loop_end) + CLEAR_HARD_REG_BIT (set_this_insn, REGNO (loop->iter_reg)); + else if (reg_mentioned_p (loop->iter_reg, PATTERN (insn))) + loop->iter_reg_used = true; + IOR_HARD_REG_SET (loop->regs_set_in_loop, set_this_insn); + } + } +} + +/* Called from reorg_loops when a potential loop end is found. LOOP is + a newly set up structure describing the loop, it is this function's + responsibility to fill most of it. TAIL_BB and TAIL_INSN point to the + loop_end insn and its enclosing basic block. REG is the loop counter + register. */ + +static void +discover_loop (loop_info loop, basic_block tail_bb, rtx tail_insn, rtx reg) +{ + unsigned dwork = 0; + basic_block bb; + VEC (basic_block,heap) *works; + + loop->tail = tail_bb; + loop->loop_end = tail_insn; + loop->last_insn = NULL_RTX; + loop->iter_reg = reg; + loop->depth = loop->length = 0; + loop->visited = false; + loop->bad = false; + loop->outer = NULL; + loop->loops = NULL; + loop->incoming = VEC_alloc (edge, gc, 2); + loop->end_label = NULL_RTX; + loop->start_label = JUMP_LABEL (tail_insn); + + if (EDGE_COUNT (tail_bb->succs) != 2) + { + loop->bad = true; + loop->head = loop->successor = NULL; + return; + } + loop->head = BRANCH_EDGE (tail_bb)->dest; + loop->successor = FALLTHRU_EDGE (tail_bb)->dest; + + works = VEC_alloc (basic_block,heap,20); + VEC_safe_push (basic_block, heap, works, loop->head); + + while (VEC_iterate (basic_block, works, dwork++, bb)) + { + edge e; + edge_iterator ei; + if (bb == EXIT_BLOCK_PTR) + { + /* We've reached the exit block. The loop must be bad. */ + if (dump_file) + fprintf (dump_file, + ";; Loop is bad - reached exit block while scanning\n"); + loop->bad = 1; + break; + } + + if (bitmap_bit_p (loop->block_bitmap, bb->index)) + continue; + + /* We've not seen this block before. Add it to the loop's + list and then add each successor to the work list. */ + + VEC_safe_push (basic_block, heap, loop->blocks, bb); + bitmap_set_bit (loop->block_bitmap, bb->index); + + if (bb != tail_bb) + { + FOR_EACH_EDGE (e, ei, bb->succs) + { + basic_block succ = EDGE_SUCC (bb, ei.index)->dest; + if (!REGNO_REG_SET_P (df_get_live_in (succ), + REGNO (loop->iter_reg))) + continue; + if (!VEC_space (basic_block, works, 1)) + { + if (dwork) + { + VEC_block_remove (basic_block, works, 0, dwork); + dwork = 0; + } + else + VEC_reserve (basic_block, heap, works, 1); + } + VEC_quick_push (basic_block, works, succ); + } + } + } + + /* Find the predecessor, and make sure nothing else jumps into this loop. */ + if (!loop->bad) + { + int pass, retry; + for (dwork = 0; VEC_iterate (basic_block, loop->blocks, dwork, bb); dwork++) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + + if (!bb_in_loop_p (loop, pred)) + { + if (dump_file) + fprintf (dump_file, ";; Loop %d: incoming edge %d -> %d\n", + loop->loop_no, pred->index, + e->dest->index); + VEC_safe_push (edge, gc, loop->incoming, e); + } + } + } + + for (pass = 0, retry = 1; retry && pass < 2; pass++) + { + edge e; + edge_iterator ei; + bool first = true; + retry = 0; + + FOR_EACH_EDGE (e, ei, loop->incoming) + { + if (first) + { + loop->incoming_src = e->src; + loop->incoming_dest = e->dest; + first = false; + } + else + { + if (e->dest != loop->incoming_dest) + loop->incoming_dest = NULL; + if (e->src != loop->incoming_src) + loop->incoming_src = NULL; + } + if (loop->incoming_src == NULL && loop->incoming_dest == NULL) + { + if (pass == 0) + { + if (dump_file) + fprintf (dump_file, + ";; retrying loop %d with forwarder blocks\n", + loop->loop_no); + retry = 1; + break; + } + loop->bad = 1; + if (dump_file) + fprintf (dump_file, + ";; can't find suitable entry for loop %d\n", + loop->loop_no); + goto out; + } + } + if (retry) + { + retry = 0; + FOR_EACH_EDGE (e, ei, loop->incoming) + { + if (forwarder_block_p (e->src)) + { + edge e2; + edge_iterator ei2; + + if (dump_file) + fprintf (dump_file, + ";; Adding forwarder block %d to loop %d and retrying\n", + e->src->index, loop->loop_no); + VEC_safe_push (basic_block, heap, loop->blocks, e->src); + bitmap_set_bit (loop->block_bitmap, e->src->index); + FOR_EACH_EDGE (e2, ei2, e->src->preds) + VEC_safe_push (edge, gc, loop->incoming, e2); + VEC_unordered_remove (edge, loop->incoming, ei.index); + retry = 1; + break; + } + } + if (!retry) + { + if (dump_file) + fprintf (dump_file, ";; No forwarder blocks found\n"); + loop->bad = 1; + } + } + } + } + + out: + VEC_free (basic_block, heap, works); +} + +/* Analyze the structure of the loops in the current function. Use STACK + for bitmap allocations. Returns all the valid candidates for hardware + loops found in this function. */ +static loop_info +discover_loops (bitmap_obstack *stack, struct hw_doloop_hooks *hooks) +{ + loop_info loops = NULL; + loop_info loop; + basic_block bb; + bitmap tmp_bitmap; + int nloops = 0; + + /* Find all the possible loop tails. This means searching for every + loop_end instruction. For each one found, create a loop_info + structure and add the head block to the work list. */ + FOR_EACH_BB (bb) + { + rtx tail = BB_END (bb); + rtx insn, reg; + + bb->aux = NULL; + + while (tail && GET_CODE (tail) == NOTE && tail != BB_HEAD (bb)) + tail = PREV_INSN (tail); + + if (tail == NULL_RTX) + continue; + + if (!JUMP_P (tail)) + continue; + reg = hooks->end_pattern_reg (tail); + if (reg == NULL_RTX) + continue; + + /* A possible loop end */ + + /* There's a degenerate case we can handle - an empty loop consisting + of only a back branch. Handle that by deleting the branch. */ + insn = JUMP_LABEL (tail); + while (insn && !NONDEBUG_INSN_P (insn)) + insn = NEXT_INSN (insn); + if (insn == tail) + { + if (dump_file) + { + fprintf (dump_file, ";; degenerate loop ending at\n"); + print_rtl_single (dump_file, tail); + } + delete_insn_and_edges (tail); + continue; + } + + loop = XNEW (struct loop_info); + loop->next = loops; + loops = loop; + loop->loop_no = nloops++; + loop->blocks = VEC_alloc (basic_block, heap, 20); + loop->block_bitmap = BITMAP_ALLOC (stack); + bb->aux = loop; + + if (dump_file) + { + fprintf (dump_file, ";; potential loop %d ending at\n", + loop->loop_no); + print_rtl_single (dump_file, tail); + } + + discover_loop (loop, bb, tail, reg); + } + + tmp_bitmap = BITMAP_ALLOC (stack); + /* Compute loop nestings. */ + for (loop = loops; loop; loop = loop->next) + { + loop_info other; + if (loop->bad) + continue; + + for (other = loop->next; other; other = other->next) + { + if (other->bad) + continue; + + bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap); + if (bitmap_empty_p (tmp_bitmap)) + continue; + if (bitmap_equal_p (tmp_bitmap, other->block_bitmap)) + { + other->outer = loop; + VEC_safe_push (loop_info, heap, loop->loops, other); + } + else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap)) + { + loop->outer = other; + VEC_safe_push (loop_info, heap, other->loops, loop); + } + else + { + if (dump_file) + fprintf (dump_file, + ";; can't find suitable nesting for loops %d and %d\n", + loop->loop_no, other->loop_no); + loop->bad = other->bad = 1; + } + } + } + BITMAP_FREE (tmp_bitmap); + + if (dump_file) + dump_hwloops (loops); + + return loops; +} + +/* Free up the loop structures in LOOPS. */ +static void +free_loops (loop_info loops) +{ + while (loops) + { + loop_info loop = loops; + loops = loop->next; + VEC_free (loop_info, heap, loop->loops); + VEC_free (basic_block, heap, loop->blocks); + BITMAP_FREE (loop->block_bitmap); + XDELETE (loop); + } +} + +#define BB_AUX_INDEX(BB) ((unsigned)(BB)->aux) + +/* The taken-branch edge from the loop end can actually go forward. + If the target's hardware loop support requires that the loop end be + after the loop start, try to reorder a loop's basic blocks when we + find such a case. */ + +static void +reorder_loops (loop_info loops) +{ + basic_block bb; + loop_info loop; + + FOR_EACH_BB (bb) + bb->aux = NULL; + cfg_layout_initialize (0); + + for (loop = loops; loop; loop = loop->next) + { + unsigned index; + basic_block bb; + edge e; + edge_iterator ei; + + if (loop->bad) + continue; + + /* Recreate an index for basic blocks that represents their order. */ + for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0; + bb != EXIT_BLOCK_PTR; + bb = bb->next_bb, index++) + bb->aux = (PTR) index; + + if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail)) + continue; + + FOR_EACH_EDGE (e, ei, loop->head->succs) + { + if (bitmap_bit_p (loop->block_bitmap, e->dest->index) + && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail)) + { + basic_block start_bb = e->dest; + basic_block start_prev_bb = start_bb->prev_bb; + + if (dump_file) + fprintf (dump_file, ";; Moving block %d before block %d\n", + loop->head->index, start_bb->index); + loop->head->prev_bb->next_bb = loop->head->next_bb; + loop->head->next_bb->prev_bb = loop->head->prev_bb; + + loop->head->prev_bb = start_prev_bb; + loop->head->next_bb = start_bb; + start_prev_bb->next_bb = start_bb->prev_bb = loop->head; + break; + } + } + loops = loops->next; + } + + FOR_EACH_BB (bb) + { + if (bb->next_bb != EXIT_BLOCK_PTR) + bb->aux = bb->next_bb; + else + bb->aux = NULL; + } + cfg_layout_finalize (); + df_analyze (); +} + +/* Call the OPT function for LOOP and all of its sub-loops. This is + done in a depth-first search; innermost loops are visited first. + OPTIMIZE and FAIL are the functions passed to reorg_loops by the + target's reorg pass. */ + +static void +optimize_loop (loop_info loop, struct hw_doloop_hooks *hooks) +{ + int ix; + loop_info inner; + int inner_depth = 0; + + if (loop->visited) + return; + + loop->visited = 1; + + if (loop->bad) + { + if (dump_file) + fprintf (dump_file, ";; loop %d bad when found\n", loop->loop_no); + goto bad_loop; + } + + /* Every loop contains in its list of inner loops every loop nested inside + it, even if there are intermediate loops. This works because we're doing + a depth-first search here and never visit a loop more than once. */ + for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, inner); ix++) + { + optimize_loop (inner, hooks); + + if (!inner->bad && inner_depth < inner->depth) + inner_depth = inner->depth; + /* The set of registers may be changed while optimizing the inner + loop. */ + IOR_HARD_REG_SET (loop->regs_set_in_loop, inner->regs_set_in_loop); + } + + loop->depth = inner_depth + 1; + + if (hooks->opt (loop)) + return; + + bad_loop: + if (dump_file) + fprintf (dump_file, ";; loop %d is bad\n", loop->loop_no); + + loop->bad = true; + hooks->fail (loop); +} + +/* Run from machine_dependent_reorg, this pass looks for doloop_end + insns, analyzes the loop structure, and tries to rewrite the RTL of + these loops so that the target can create whatever hardware looping + instructions are available. + + DO_REORDER should be set to true if we should try to use the + reorder_loops function to ensure the loop end occurs after the loop + start. HOOKS is used to pass in target specific hooks; see + hw-doloop.h. */ + +void +reorg_loops (bool do_reorder, struct hw_doloop_hooks *hooks) +{ + loop_info loops = NULL; + loop_info loop; + basic_block bb; + bitmap_obstack stack; + + df_live_add_problem (); + df_live_set_all_dirty (); + df_analyze (); + + bitmap_obstack_initialize (&stack); + + if (dump_file) + fprintf (dump_file, ";; Find loops, first pass\n\n"); + + loops = discover_loops (&stack, hooks); + + if (do_reorder) + { + reorder_loops (loops); + free_loops (loops); + + if (dump_file) + fprintf (dump_file, ";; Find loops, second pass\n\n"); + + loops = discover_loops (&stack, hooks); + } + + if (hooks->after_reorder) + { + if (hooks->after_reorder ()) + { + free_loops (loops); + if (dump_file) + fprintf (dump_file, ";; Find loops after target after_reorder\n\n"); + + loops = discover_loops (&stack, hooks); + } + } + + for (loop = loops; loop; loop = loop->next) + scan_loop (loop); + + /* Now apply the optimizations. */ + for (loop = loops; loop; loop = loop->next) + optimize_loop (loop, hooks); + + if (dump_file) + { + fprintf (dump_file, ";; After hardware loops optimization:\n\n"); + dump_hwloops (loops); + } + + free_loops (loops); + + if (dump_file) + print_rtl (dump_file, get_insns ()); + + FOR_EACH_BB (bb) + bb->aux = NULL; +} +#endif Index: gcc/hw-doloop.h =================================================================== --- gcc/hw-doloop.h (revision 0) +++ gcc/hw-doloop.h (revision 0) @@ -0,0 +1,149 @@ +/* Code to analyze doloop loops in order for targets to perform late + optimizations converting doloops to other forms of hardware loops. + Copyright (C) 2011 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* We need to keep a vector of loops */ +typedef struct loop_info *loop_info; +DEF_VEC_P (loop_info); +DEF_VEC_ALLOC_P (loop_info,heap); + +/* Information about a loop we have found (or are in the process of + finding). */ +struct GTY (()) loop_info +{ + /* loop number, for dumps */ + int loop_no; + + /* Next loop in the graph. */ + struct loop_info *next; + + /* Immediate outer loop of this loop. */ + struct loop_info *outer; + + /* Vector of blocks only within the loop, including those within + inner loops. */ + VEC (basic_block,heap) *blocks; + + /* Same information in a bitmap. */ + bitmap block_bitmap; + + /* Vector of inner loops within this loop */ + VEC (loop_info,heap) *loops; + + /* All edges that jump into the loop. */ + VEC(edge,gc) *incoming; + + /* The ports currently using this infrastructure can typically + handle two cases: all incoming edges have the same destination + block, or all incoming edges have the same source block. These + two members are set to the common source or destination we found, + or NULL if different blocks were found. If both are NULL the + loop can't be optimized. */ + basic_block incoming_src; + basic_block incoming_dest; + + /* First block in the loop. This is the one branched to by the loop_end + insn. */ + basic_block head; + + /* Last block in the loop (the one with the loop_end insn). */ + basic_block tail; + + /* The successor block of the loop. This is the one the loop_end insn + falls into. */ + basic_block successor; + + /* The last instruction in the tail. */ + rtx last_insn; + + /* The loop_end insn. */ + rtx loop_end; + + /* The iteration register. */ + rtx iter_reg; + + /* The new label placed at the beginning of the loop. */ + rtx start_label; + + /* The new label placed at the end of the loop. */ + rtx end_label; + + /* The length of the loop. */ + int length; + + /* The nesting depth of the loop. Innermost loops are given a depth + of 1. Only successfully optimized doloops are counted; if an inner + loop was marked as bad, it does not increase the depth of its parent + loop. + This value is valid when the target's optimize function is called. */ + int depth; + + /* True if we can't optimize this loop. */ + bool bad; + + /* True if we have visited this loop during the optimization phase. */ + bool visited; + + /* The following values are collected before calling the target's optimize + function and are not valid earlier. */ + + /* Record information about control flow: whether the loop has calls + or asm statements, whether it has edges that jump out of the loop, + or edges that jump within the loop. */ + bool has_call; + bool has_asm; + bool jumps_within; + bool jumps_outof; + + /* True if there is an instruction other than the doloop_end which uses the + iteration register. */ + bool iter_reg_used; + /* True if the iteration register lives past the doloop instruction. */ + bool iter_reg_used_outside; + + /* Hard registers set at any point in the loop, except for the loop counter + register's set in the doloop_end instruction. */ + HARD_REG_SET regs_set_in_loop; + + /* The number of instructions that are marked with TImode. Useful if the + pass is run after the second scheduling pass. */ + int timode_insns; +}; + +struct hw_doloop_hooks +{ + /* Examine INSN. If it is a suitable doloop_end pattern, return the + iteration register. Otherwise, return NULL_RTX. */ + rtx (*end_pattern_reg) (rtx insn); + /* Called after the hw-doloop pass has finished reordering the basic block + structure. The target can defer some of its initializations to this + hook if necessary. This function should return true if the CFG has been + changed and loops must be rediscovered. */ + bool (*after_reorder) (void); + /* Optimize LOOP. Return true if successful, false if the loop should be + marked bad. */ + bool (*opt) (loop_info loop); + /* Handle a loop that was marked bad for any reason. This could be used to + split the doloop_end pattern. */ + void (*fail) (loop_info loop); +}; + +extern void reorg_loops (bool, struct hw_doloop_hooks *); + +extern bool bb_in_loop_p (loop_info, basic_block); Index: gcc/rtl.h =================================================================== --- gcc/rtl.h (revision 175202) +++ gcc/rtl.h (working copy) @@ -1867,6 +1867,7 @@ extern rtx find_last_value (rtx, rtx *, extern int refers_to_regno_p (unsigned int, unsigned int, const_rtx, rtx *); extern int reg_overlap_mentioned_p (const_rtx, const_rtx); extern const_rtx set_of (const_rtx, const_rtx); +extern void record_hard_reg_sets (rtx, const_rtx, void *); extern void note_stores (const_rtx, void (*) (rtx, const_rtx, void *), void *); extern void note_uses (rtx *, void (*) (rtx *, void *), void *); extern int dead_or_set_p (const_rtx, const_rtx); Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 175202) +++ gcc/Makefile.in (working copy) @@ -1292,6 +1292,7 @@ OBJS = \ graphite-sese-to-poly.o \ gtype-desc.o \ haifa-sched.o \ + hw-doloop.o \ hwint.o \ ifcvt.o \ implicit-zee.o \ @@ -3545,7 +3546,10 @@ target-globals.o : target-globals.c $(CO $(TM_H) insn-config.h $(MACHMODE_H) $(GGC_H) toplev.h target-globals.h \ $(FLAGS_H) $(REGS_H) $(RTL_H) reload.h expmed.h $(EXPR_H) $(OPTABS_H) \ $(LIBFUNCS_H) $(CFGLOOP_H) $(IRA_INT_H) builtins.h gcse.h bb-reorder.h - +hw-doloop.o : hw-doloop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ + $(RTL_H) $(FLAGS_H) $(EXPR_H) hard-reg-set.h $(BASIC_BLOCK_H) $(TM_P_H) \ + $(DF_H) $(CFGLAYOUT_H) $(CFGLOOP_H) output.h $(RECOG_H) $(TARGET_H) \ + hw-doloop.h $(out_object_file): $(out_file) $(CONFIG_H) coretypes.h $(TM_H) $(TREE_H) \ $(RTL_H) $(REGS_H) hard-reg-set.h insn-config.h conditions.h \ output.h $(INSN_ATTR_H) $(SYSTEM_H) toplev.h $(DIAGNOSTIC_CORE_H) \ Index: gcc/config/bfin/bfin.c =================================================================== --- gcc/config/bfin/bfin.c (revision 175202) +++ gcc/config/bfin/bfin.c (working copy) @@ -56,6 +56,7 @@ #include "timevar.h" #include "df.h" #include "sel-sched.h" +#include "hw-doloop.h" #include "opts.h" /* A C structure for machine-specific, per-function data. @@ -3381,157 +3382,6 @@ bfin_hardware_loop (void) /* Maximum distance of the LSETUP instruction from the loop start. */ #define MAX_LSETUP_DISTANCE 30 -/* We need to keep a vector of loops */ -typedef struct loop_info_d *loop_info; -DEF_VEC_P (loop_info); -DEF_VEC_ALLOC_P (loop_info,heap); - -/* Information about a loop we have found (or are in the process of - finding). */ -struct GTY (()) loop_info_d -{ - /* loop number, for dumps */ - int loop_no; - - /* All edges that jump into and out of the loop. */ - VEC(edge,gc) *incoming; - - /* We can handle two cases: all incoming edges have the same destination - block, or all incoming edges have the same source block. These two - members are set to the common source or destination we found, or NULL - if different blocks were found. If both are NULL the loop can't be - optimized. */ - basic_block incoming_src; - basic_block incoming_dest; - - /* First block in the loop. This is the one branched to by the loop_end - insn. */ - basic_block head; - - /* Last block in the loop (the one with the loop_end insn). */ - basic_block tail; - - /* The successor block of the loop. This is the one the loop_end insn - falls into. */ - basic_block successor; - - /* The last instruction in the tail. */ - rtx last_insn; - - /* The loop_end insn. */ - rtx loop_end; - - /* The iteration register. */ - rtx iter_reg; - - /* The new label placed at the beginning of the loop. */ - rtx start_label; - - /* The new label placed at the end of the loop. */ - rtx end_label; - - /* The length of the loop. */ - int length; - - /* The nesting depth of the loop. */ - int depth; - - /* Nonzero if we can't optimize this loop. */ - int bad; - - /* True if we have visited this loop. */ - int visited; - - /* True if this loop body clobbers any of LC0, LT0, or LB0. */ - int clobber_loop0; - - /* True if this loop body clobbers any of LC1, LT1, or LB1. */ - int clobber_loop1; - - /* Next loop in the graph. */ - struct loop_info_d *next; - - /* Immediate outer loop of this loop. */ - struct loop_info_d *outer; - - /* Vector of blocks only within the loop, including those within - inner loops. */ - VEC (basic_block,heap) *blocks; - - /* Same information in a bitmap. */ - bitmap block_bitmap; - - /* Vector of inner loops within this loop */ - VEC (loop_info,heap) *loops; -}; - -static void -bfin_dump_loops (loop_info loops) -{ - loop_info loop; - - for (loop = loops; loop; loop = loop->next) - { - loop_info i; - basic_block b; - unsigned ix; - - fprintf (dump_file, ";; loop %d: ", loop->loop_no); - if (loop->bad) - fprintf (dump_file, "(bad) "); - fprintf (dump_file, "{head:%d, depth:%d}", loop->head->index, loop->depth); - - fprintf (dump_file, " blocks: [ "); - FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, b) - fprintf (dump_file, "%d ", b->index); - fprintf (dump_file, "] "); - - fprintf (dump_file, " inner loops: [ "); - FOR_EACH_VEC_ELT (loop_info, loop->loops, ix, i) - fprintf (dump_file, "%d ", i->loop_no); - fprintf (dump_file, "]\n"); - } - fprintf (dump_file, "\n"); -} - -/* Scan the blocks of LOOP (and its inferiors) looking for basic block - BB. Return true, if we find it. */ - -static bool -bfin_bb_in_loop (loop_info loop, basic_block bb) -{ - return bitmap_bit_p (loop->block_bitmap, bb->index); -} - -/* Scan the blocks of LOOP (and its inferiors) looking for uses of - REG. Return true, if we find any. Don't count the loop's loop_end - insn if it matches LOOP_END. */ - -static bool -bfin_scan_loop (loop_info loop, rtx reg, rtx loop_end) -{ - unsigned ix; - basic_block bb; - - FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, bb) - { - rtx insn; - - for (insn = BB_HEAD (bb); - insn != NEXT_INSN (BB_END (bb)); - insn = NEXT_INSN (insn)) - { - if (!INSN_P (insn)) - continue; - if (insn == loop_end) - continue; - if (reg_mentioned_p (reg, PATTERN (insn))) - return true; - } - } - return false; -} - /* Estimate the length of INSN conservatively. */ static int @@ -3559,67 +3409,32 @@ length_for_loop (rtx insn) /* Optimize LOOP. */ -static void -bfin_optimize_loop (loop_info loop) +static bool +hwloop_optimize (loop_info loop) { basic_block bb; loop_info inner; rtx insn, last_insn; rtx loop_init, start_label, end_label; - rtx reg_lc0, reg_lc1, reg_lt0, reg_lt1, reg_lb0, reg_lb1; rtx iter_reg, scratchreg, scratch_init, scratch_init_insn; rtx lc_reg, lt_reg, lb_reg; rtx seq, seq_end; int length; unsigned ix; - int inner_depth = 0; - - if (loop->visited) - return; - - loop->visited = 1; - - if (loop->bad) - { - if (dump_file) - fprintf (dump_file, ";; loop %d bad when found\n", loop->loop_no); - goto bad_loop; - } - - /* Every loop contains in its list of inner loops every loop nested inside - it, even if there are intermediate loops. This works because we're doing - a depth-first search here and never visit a loop more than once. */ - FOR_EACH_VEC_ELT (loop_info, loop->loops, ix, inner) - { - bfin_optimize_loop (inner); - - if (!inner->bad && inner_depth < inner->depth) - { - inner_depth = inner->depth; - - loop->clobber_loop0 |= inner->clobber_loop0; - loop->clobber_loop1 |= inner->clobber_loop1; - } - } + bool clobber0, clobber1; - loop->depth = inner_depth + 1; if (loop->depth > MAX_LOOP_DEPTH) { if (dump_file) fprintf (dump_file, ";; loop %d too deep\n", loop->loop_no); - goto bad_loop; + return false; } /* Get the loop iteration register. */ iter_reg = loop->iter_reg; - if (!REG_P (iter_reg)) - { - if (dump_file) - fprintf (dump_file, ";; loop %d iteration count not in a register\n", - loop->loop_no); - goto bad_loop; - } + gcc_assert (REG_P (iter_reg)); + scratchreg = NULL_RTX; scratch_init = iter_reg; scratch_init_insn = NULL_RTX; @@ -3680,7 +3495,7 @@ bfin_optimize_loop (loop_info loop) if (dump_file) fprintf (dump_file, ";; loop %d lsetup not before loop_start\n", loop->loop_no); - goto bad_loop; + return false; } /* Account for the pop of a scratch register where necessary. */ @@ -3692,7 +3507,7 @@ bfin_optimize_loop (loop_info loop) { if (dump_file) fprintf (dump_file, ";; loop %d lsetup too far away\n", loop->loop_no); - goto bad_loop; + return false; } } @@ -3710,7 +3525,7 @@ bfin_optimize_loop (loop_info loop) if (dump_file) fprintf (dump_file, ";; loop %d start_label not before loop_end\n", loop->loop_no); - goto bad_loop; + return false; } loop->length = length; @@ -3718,58 +3533,29 @@ bfin_optimize_loop (loop_info loop) { if (dump_file) fprintf (dump_file, ";; loop %d too long\n", loop->loop_no); - goto bad_loop; + return false; } /* Scan all the blocks to make sure they don't use iter_reg. */ - if (bfin_scan_loop (loop, iter_reg, loop->loop_end)) + if (loop->iter_reg_used) { if (dump_file) fprintf (dump_file, ";; loop %d uses iterator\n", loop->loop_no); - goto bad_loop; - } - - /* Scan all the insns to see if the loop body clobber - any hardware loop registers. */ - - reg_lc0 = gen_rtx_REG (SImode, REG_LC0); - reg_lc1 = gen_rtx_REG (SImode, REG_LC1); - reg_lt0 = gen_rtx_REG (SImode, REG_LT0); - reg_lt1 = gen_rtx_REG (SImode, REG_LT1); - reg_lb0 = gen_rtx_REG (SImode, REG_LB0); - reg_lb1 = gen_rtx_REG (SImode, REG_LB1); - - FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, bb) - { - rtx insn; - - for (insn = BB_HEAD (bb); - insn != NEXT_INSN (BB_END (bb)); - insn = NEXT_INSN (insn)) - { - if (!INSN_P (insn)) - continue; - - if (reg_set_p (reg_lc0, insn) - || reg_set_p (reg_lt0, insn) - || reg_set_p (reg_lb0, insn)) - loop->clobber_loop0 = 1; - - if (reg_set_p (reg_lc1, insn) - || reg_set_p (reg_lt1, insn) - || reg_set_p (reg_lb1, insn)) - loop->clobber_loop1 |= 1; - } + return false; } - if ((loop->clobber_loop0 && loop->clobber_loop1) - || (loop->depth == MAX_LOOP_DEPTH && loop->clobber_loop0)) + clobber0 = (TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC0) + || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LB0) + || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LT0)); + clobber1 = (TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC1) + || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LB1) + || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LT1)); + if (clobber0 && clobber1) { - loop->depth = MAX_LOOP_DEPTH + 1; if (dump_file) fprintf (dump_file, ";; loop %d no loop reg available\n", loop->loop_no); - goto bad_loop; + return false; } /* There should be an instruction before the loop_end instruction @@ -3814,7 +3600,7 @@ bfin_optimize_loop (loop_info loop) if (dump_file) fprintf (dump_file, ";; loop %d has no last instruction\n", loop->loop_no); - goto bad_loop; + return false; } if (JUMP_P (last_insn) && !any_condjump_p (last_insn)) @@ -3822,7 +3608,7 @@ bfin_optimize_loop (loop_info loop) if (dump_file) fprintf (dump_file, ";; loop %d has bad last instruction\n", loop->loop_no); - goto bad_loop; + return false; } /* In all other cases, try to replace a bad last insn with a nop. */ else if (JUMP_P (last_insn) @@ -3838,7 +3624,7 @@ bfin_optimize_loop (loop_info loop) { if (dump_file) fprintf (dump_file, ";; loop %d too long\n", loop->loop_no); - goto bad_loop; + return false; } if (dump_file) fprintf (dump_file, ";; loop %d has bad last insn; replace with nop\n", @@ -3854,19 +3640,19 @@ bfin_optimize_loop (loop_info loop) end_label = gen_label_rtx (); iter_reg = loop->iter_reg; - if (loop->depth == 1 && !loop->clobber_loop1) + if (loop->depth == 1 && !clobber1) { - lc_reg = reg_lc1; - lt_reg = reg_lt1; - lb_reg = reg_lb1; - loop->clobber_loop1 = 1; + lc_reg = gen_rtx_REG (SImode, REG_LC1); + lb_reg = gen_rtx_REG (SImode, REG_LB1); + lt_reg = gen_rtx_REG (SImode, REG_LT1); + SET_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC1); } else { - lc_reg = reg_lc0; - lt_reg = reg_lt0; - lb_reg = reg_lb0; - loop->clobber_loop0 = 1; + lc_reg = gen_rtx_REG (SImode, REG_LC0); + lb_reg = gen_rtx_REG (SImode, REG_LB0); + lt_reg = gen_rtx_REG (SImode, REG_LT0); + SET_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC0); } loop->end_label = end_label; @@ -4009,15 +3795,17 @@ bfin_optimize_loop (loop_info loop) /* Insert the loop end label before the last instruction of the loop. */ emit_label_before (loop->end_label, loop->last_insn); - return; - - bad_loop: - - if (dump_file) - fprintf (dump_file, ";; loop %d is bad\n", loop->loop_no); - - loop->bad = 1; + return true; +} +/* A callback for the hw-doloop pass. Called when a loop we have discovered + turns out not to be optimizable; we have to split the doloop_end pattern + into a subtract and a test. */ +static void +hwloop_fail (loop_info loop) +{ + rtx insn = loop->loop_end; + if (DPREG_P (loop->iter_reg)) { /* If loop->iter_reg is a DREG or PREG, we can split it here @@ -4039,370 +3827,51 @@ bfin_optimize_loop (loop_info loop) LABEL_NUSES (loop->start_label)++; delete_insn (loop->loop_end); } -} - -/* Called from bfin_reorg_loops when a potential loop end is found. LOOP is - a newly set up structure describing the loop, it is this function's - responsibility to fill most of it. TAIL_BB and TAIL_INSN point to the - loop_end insn and its enclosing basic block. */ - -static void -bfin_discover_loop (loop_info loop, basic_block tail_bb, rtx tail_insn) -{ - unsigned dwork = 0; - basic_block bb; - VEC (basic_block,heap) *works = VEC_alloc (basic_block,heap,20); - - loop->tail = tail_bb; - loop->head = BRANCH_EDGE (tail_bb)->dest; - loop->successor = FALLTHRU_EDGE (tail_bb)->dest; - loop->loop_end = tail_insn; - loop->last_insn = NULL_RTX; - loop->iter_reg = SET_DEST (XVECEXP (PATTERN (tail_insn), 0, 1)); - loop->depth = loop->length = 0; - loop->visited = 0; - loop->clobber_loop0 = loop->clobber_loop1 = 0; - loop->outer = NULL; - loop->loops = NULL; - loop->incoming = VEC_alloc (edge, gc, 2); - loop->start_label = XEXP (XEXP (SET_SRC (XVECEXP (PATTERN (tail_insn), 0, 0)), 1), 0); - loop->end_label = NULL_RTX; - loop->bad = 0; - - VEC_safe_push (basic_block, heap, works, loop->head); - - while (VEC_iterate (basic_block, works, dwork++, bb)) - { - edge e; - edge_iterator ei; - if (bb == EXIT_BLOCK_PTR) - { - /* We've reached the exit block. The loop must be bad. */ - if (dump_file) - fprintf (dump_file, - ";; Loop is bad - reached exit block while scanning\n"); - loop->bad = 1; - break; - } - - if (!bitmap_set_bit (loop->block_bitmap, bb->index)) - continue; - - /* We've not seen this block before. Add it to the loop's - list and then add each successor to the work list. */ - - VEC_safe_push (basic_block, heap, loop->blocks, bb); - - if (bb != tail_bb) - { - FOR_EACH_EDGE (e, ei, bb->succs) - { - basic_block succ = EDGE_SUCC (bb, ei.index)->dest; - if (!REGNO_REG_SET_P (df_get_live_in (succ), - REGNO (loop->iter_reg))) - continue; - if (!VEC_space (basic_block, works, 1)) - { - if (dwork) - { - VEC_block_remove (basic_block, works, 0, dwork); - dwork = 0; - } - else - VEC_reserve (basic_block, heap, works, 1); - } - VEC_quick_push (basic_block, works, succ); - } - } - } - - /* Find the predecessor, and make sure nothing else jumps into this loop. */ - if (!loop->bad) - { - int pass, retry; - FOR_EACH_VEC_ELT (basic_block, loop->blocks, dwork, bb) - { - edge e; - edge_iterator ei; - FOR_EACH_EDGE (e, ei, bb->preds) - { - basic_block pred = e->src; - - if (!bfin_bb_in_loop (loop, pred)) - { - if (dump_file) - fprintf (dump_file, ";; Loop %d: incoming edge %d -> %d\n", - loop->loop_no, pred->index, - e->dest->index); - VEC_safe_push (edge, gc, loop->incoming, e); - } - } - } - - for (pass = 0, retry = 1; retry && pass < 2; pass++) - { - edge e; - edge_iterator ei; - bool first = true; - retry = 0; - - FOR_EACH_EDGE (e, ei, loop->incoming) - { - if (first) - { - loop->incoming_src = e->src; - loop->incoming_dest = e->dest; - first = false; - } - else - { - if (e->dest != loop->incoming_dest) - loop->incoming_dest = NULL; - if (e->src != loop->incoming_src) - loop->incoming_src = NULL; - } - if (loop->incoming_src == NULL && loop->incoming_dest == NULL) - { - if (pass == 0) - { - if (dump_file) - fprintf (dump_file, - ";; retrying loop %d with forwarder blocks\n", - loop->loop_no); - retry = 1; - break; - } - loop->bad = 1; - if (dump_file) - fprintf (dump_file, - ";; can't find suitable entry for loop %d\n", - loop->loop_no); - goto out; - } - } - if (retry) - { - retry = 0; - FOR_EACH_EDGE (e, ei, loop->incoming) - { - if (forwarder_block_p (e->src)) - { - edge e2; - edge_iterator ei2; - - if (dump_file) - fprintf (dump_file, - ";; Adding forwarder block %d to loop %d and retrying\n", - e->src->index, loop->loop_no); - VEC_safe_push (basic_block, heap, loop->blocks, e->src); - bitmap_set_bit (loop->block_bitmap, e->src->index); - FOR_EACH_EDGE (e2, ei2, e->src->preds) - VEC_safe_push (edge, gc, loop->incoming, e2); - VEC_unordered_remove (edge, loop->incoming, ei.index); - retry = 1; - break; - } - } - if (!retry) - { - if (dump_file) - fprintf (dump_file, ";; No forwarder blocks found\n"); - loop->bad = 1; - } - } - } - } - - out: - VEC_free (basic_block, heap, works); -} - -/* Analyze the structure of the loops in the current function. Use STACK - for bitmap allocations. Returns all the valid candidates for hardware - loops found in this function. */ -static loop_info -bfin_discover_loops (bitmap_obstack *stack, FILE *dump_file) -{ - loop_info loops = NULL; - loop_info loop; - basic_block bb; - bitmap tmp_bitmap; - int nloops = 0; - - /* Find all the possible loop tails. This means searching for every - loop_end instruction. For each one found, create a loop_info - structure and add the head block to the work list. */ - FOR_EACH_BB (bb) - { - rtx tail = BB_END (bb); - - while (GET_CODE (tail) == NOTE) - tail = PREV_INSN (tail); - - bb->aux = NULL; - - if (INSN_P (tail) && recog_memoized (tail) == CODE_FOR_loop_end) - { - rtx insn; - /* A possible loop end */ - - /* There's a degenerate case we can handle - an empty loop consisting - of only a back branch. Handle that by deleting the branch. */ - insn = BB_HEAD (BRANCH_EDGE (bb)->dest); - if (next_real_insn (insn) == tail) - { - if (dump_file) - { - fprintf (dump_file, ";; degenerate loop ending at\n"); - print_rtl_single (dump_file, tail); - } - delete_insn_and_edges (tail); - continue; - } - - loop = XNEW (struct loop_info_d); - loop->next = loops; - loops = loop; - loop->loop_no = nloops++; - loop->blocks = VEC_alloc (basic_block, heap, 20); - loop->block_bitmap = BITMAP_ALLOC (stack); - bb->aux = loop; - - if (dump_file) - { - fprintf (dump_file, ";; potential loop %d ending at\n", - loop->loop_no); - print_rtl_single (dump_file, tail); - } - - bfin_discover_loop (loop, bb, tail); - } - } - - tmp_bitmap = BITMAP_ALLOC (stack); - /* Compute loop nestings. */ - for (loop = loops; loop; loop = loop->next) + else { - loop_info other; - if (loop->bad) - continue; - - for (other = loop->next; other; other = other->next) - { - if (other->bad) - continue; - - bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap); - if (bitmap_empty_p (tmp_bitmap)) - continue; - if (bitmap_equal_p (tmp_bitmap, other->block_bitmap)) - { - other->outer = loop; - VEC_safe_push (loop_info, heap, loop->loops, other); - } - else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap)) - { - loop->outer = other; - VEC_safe_push (loop_info, heap, other->loops, loop); - } - else - { - if (dump_file) - fprintf (dump_file, - ";; can't find suitable nesting for loops %d and %d\n", - loop->loop_no, other->loop_no); - loop->bad = other->bad = 1; - } - } + splitting_loops = 1; + try_split (PATTERN (insn), insn, 1); + splitting_loops = 0; } - BITMAP_FREE (tmp_bitmap); - - return loops; } -/* Free up the loop structures in LOOPS. */ -static void -free_loops (loop_info loops) +/* A callback for the hw-doloop pass. This function is called after the loop + structure has been reordered. It should return true if it makes any + further changes to the CFG that require the caller to rediscover the + loops. */ +static bool +hwloop_after_reorder (void) { - while (loops) - { - loop_info loop = loops; - loops = loop->next; - VEC_free (loop_info, heap, loop->loops); - VEC_free (basic_block, heap, loop->blocks); - BITMAP_FREE (loop->block_bitmap); - XDELETE (loop); - } + return false; } -#define BB_AUX_INDEX(BB) ((intptr_t)(BB)->aux) +/* A callback for the hw-doloop pass. This function examines INSN; if + it is a loop_end pattern we recognize, return the reg rtx for the + loop counter. Otherwise, return NULL_RTX. */ -/* The taken-branch edge from the loop end can actually go forward. Since the - Blackfin's LSETUP instruction requires that the loop end be after the loop - start, try to reorder a loop's basic blocks when we find such a case. */ -static void -bfin_reorder_loops (loop_info loops, FILE *dump_file) +static rtx +hwloop_pattern_reg (rtx insn) { - basic_block bb; - loop_info loop; - - FOR_EACH_BB (bb) - bb->aux = NULL; - cfg_layout_initialize (0); - - for (loop = loops; loop; loop = loop->next) - { - intptr_t index; - basic_block bb; - edge e; - edge_iterator ei; - - if (loop->bad) - continue; - - /* Recreate an index for basic blocks that represents their order. */ - for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0; - bb != EXIT_BLOCK_PTR; - bb = bb->next_bb, index++) - bb->aux = (PTR) index; - - if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail)) - continue; - - FOR_EACH_EDGE (e, ei, loop->head->succs) - { - if (bitmap_bit_p (loop->block_bitmap, e->dest->index) - && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail)) - { - basic_block start_bb = e->dest; - basic_block start_prev_bb = start_bb->prev_bb; + rtx pat, reg; - if (dump_file) - fprintf (dump_file, ";; Moving block %d before block %d\n", - loop->head->index, start_bb->index); - loop->head->prev_bb->next_bb = loop->head->next_bb; - loop->head->next_bb->prev_bb = loop->head->prev_bb; + if (!JUMP_P (insn) || recog_memoized (insn) != CODE_FOR_loop_end) + return NULL_RTX; - loop->head->prev_bb = start_prev_bb; - loop->head->next_bb = start_bb; - start_prev_bb->next_bb = start_bb->prev_bb = loop->head; - break; - } - } - loops = loops->next; - } - - FOR_EACH_BB (bb) - { - if (bb->next_bb != EXIT_BLOCK_PTR) - bb->aux = bb->next_bb; - else - bb->aux = NULL; - } - cfg_layout_finalize (); - df_analyze (); + pat = PATTERN (insn); + reg = SET_DEST (XVECEXP (PATTERN (insn), 0, 1)); + if (!REG_P (reg)) + return NULL_RTX; + return reg; } +static struct hw_doloop_hooks bfin_doloop_hooks = +{ + hwloop_pattern_reg, + hwloop_after_reorder, + hwloop_optimize, + hwloop_fail +}; + /* Run from machine_dependent_reorg, this pass looks for doloop_end insns and tries to rewrite the RTL of these loops so that proper Blackfin hardware loops are generated. */ @@ -4410,62 +3879,7 @@ bfin_reorder_loops (loop_info loops, FIL static void bfin_reorg_loops (FILE *dump_file) { - loop_info loops = NULL; - loop_info loop; - basic_block bb; - bitmap_obstack stack; - - bitmap_obstack_initialize (&stack); - - if (dump_file) - fprintf (dump_file, ";; Find loops, first pass\n\n"); - - loops = bfin_discover_loops (&stack, dump_file); - - if (dump_file) - bfin_dump_loops (loops); - - bfin_reorder_loops (loops, dump_file); - free_loops (loops); - - if (dump_file) - fprintf (dump_file, ";; Find loops, second pass\n\n"); - - loops = bfin_discover_loops (&stack, dump_file); - if (dump_file) - { - fprintf (dump_file, ";; All loops found:\n\n"); - bfin_dump_loops (loops); - } - - /* Now apply the optimizations. */ - for (loop = loops; loop; loop = loop->next) - bfin_optimize_loop (loop); - - if (dump_file) - { - fprintf (dump_file, ";; After hardware loops optimization:\n\n"); - bfin_dump_loops (loops); - } - - free_loops (loops); - - if (dump_file) - print_rtl (dump_file, get_insns ()); - - FOR_EACH_BB (bb) - bb->aux = NULL; - - splitting_loops = 1; - FOR_EACH_BB (bb) - { - rtx insn = BB_END (bb); - if (!JUMP_P (insn)) - continue; - - try_split (PATTERN (insn), insn, 1); - } - splitting_loops = 0; + reorg_loops (true, &bfin_doloop_hooks); } /* Possibly generate a SEQUENCE out of three insns found in SLOT. Index: gcc/config/bfin/bfin.md =================================================================== --- gcc/config/bfin/bfin.md (revision 175202) +++ gcc/config/bfin/bfin.md (working copy) @@ -1989,7 +1989,7 @@ (define_split (const_int -1))) (unspec [(const_int 0)] UNSPEC_LSETUP_END) (clobber (match_scratch:SI 2 "=&r"))] - "splitting_loops" + "memory_operand (operands[0], SImode) || splitting_loops" [(set (match_dup 2) (match_dup 0)) (set (match_dup 2) (plus:SI (match_dup 2) (const_int -1))) (set (match_dup 0) (match_dup 2))