Message ID | g439f1hxek.fsf@richards-thinkpad.stglab.manchester.uk.ibm.com |
---|---|
State | New |
Headers | show |
On Mon, Oct 10, 2011 at 1:57 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Ayal Zaks <ayal.zaks@gmail.com> writes: >>> I agree it's natural to schedule moves for intra-iteration dependencies >>> in the normal get_sched_window way. But suppose we have a dependency: >>> >>> A --(T,N,1)--> B >>> >>> that requires two moves M1 and M2. If we think in terms of cycles >>> (in the SCHED_TIME sense), then this effectively becomes: >>> >>> A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B >>> >>> because it is now M1 that is fed by both the loop and the incoming edge. >>> But if there is a second dependency: >>> >>> A --(T,M,0)--> C >>> >>> that also requires two moves, we end up with: >>> >>> A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B >>> -->(T,M3,-1)--> B >>> >>> and dependence distances of -1 feel a bit weird. :-) Of course, >>> what we really have are two parallel dependencies: >>> >>> A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B >>> >>> A --(T,M1,0)--> M1' -->(T,M2,0)--> M2' -->(T,N3,0)--> B >>> >>> where M1' and M2' occupy the same position as M1 and M2 in the schedule, >>> but are one stage further along. But we only schedule them once, >>> so if we take the cycle/SCHED_TIME route, we have to introduce >>> dependencies of distance -1. >>> >> >> Interesting; had to digest this distance 1 business, a result of >> thinking in cycles instead of rows (or conversely), and mixing >> dependences with scheduling; here's my understanding, based on your >> explanations: >> >> Suppose a Use is truely dependent on a Def, where both have been >> scheduled at some absolute cycles; think of them as timing the first >> iteration of the loop. >> Assume first that Use appears originally after Def in the original >> instruction sequence of the loop (dependence distance 0). In this >> case, Use requires register moves if its distance D from Def according >> to the schedule is more than ii cycles long -- by the time Use is >> executed, the value it needs is no longer available in the def'd >> register due to intervening occurrences of Def. So in this case, the >> first reg-move (among D/ii) should appear after Def, recording its >> value before the next occurrence of Def overwrites it, and feeding >> subsequent moves as needed before each is overwritten. Thus the >> scheduling window of this first reg-move is within (Def, Def+ii). >> >> Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if >> it remains scheduled before the Def, no reg-move is needed. If, otoh, >> Def is scheduled (D>0 cycles) before Use, breaking the anti-dependence >> between them, (D/ii + 1) reg-moves are needed in order to feed Use >> with the live-in value before Def. So in this case, the first reg-move >> should appear before Def (again feeding subsequent moves as needed >> before each is overwritten). Thus the scheduling window of this first >> reg-move is within (Def-ii, Def). >> >> In any case, each use requires a specific number of reg-moves, which >> begin either before or after the def; it is always safe (though >> potentially redundant) to start reg-moving before the def -- uses that >> do not need the live-in value will ignore it by feeding from a later >> reg-move. > > Right. The distance on the Def->Use dependency is effectively transferred > to the dependency between the Def and first move. > > And we can potentially have both kinds of Use at the same time. > We then end up scheduling two moves, one in each window, but require > them to occupy the same row and column. It feels more convenient to > schedule the earlier of the two moves. > >> Q: if we generate reg-moves starting from before the def, would >> redundant reg-moves be eliminated if no use needs access to live-in >> value? If so, would that simplify their generation? (If not, it may be >> interesting to understand how to improve DCE to catch it.) > > To be clear, the new version of the patch (unlike the old one) > doesn't emit reg-moves before the def if all uses are distance 0. > We only do it where some or all uses are distance 1. The first move > before the def is always needed in that case. > Understood. The question was whether it would simplify things if we always emit reg-moves before the def. And rely on DCE to hopefully eliminate redundancies. > So redudant moves are confined to the case where (a) we have more > than one move, (b) we have both distance 0 and distance 1 uses, and > (c) one of the two distance sets requires more moves than the other. > If the distance 0 dependencies require more moves than the distance > 1 dependencies, we will have a redudant move in the prologue. > If the distance 1 dependencies require more moves than the distance > 0 dependencies, we will have a redudant move in the epilogue. > But I believe that this is already handled by dce. > > (For avoidance of doubt, the new code is more accurate than > current trunk in this respect.) > >> The issue of assigning stages to reg-moves is mostly relevant for >> prolog and epilog generation, which requires and receives special >> attention -- handled very nicely by ps_num_consecutive_stages! Note >> that currently a simple boolean indicator for (the exceptional case >> of) double stages would suffice, instead of generalizing to arbitrary >> nums of consecutive stages (see any potential use for them?). > > Not in the immediate term. But I think having a boolean indicator > would be inconsistent. If the distance field is an int (even though > we only expect distance-0 and distance-1 register dependencies) > then I think the number of stages should be too. > > I did wonder originally about using a boolean, but IMO, it makes > the code less readable rather than more. Instead of a simple > range check like: > > if (first_stage_for_insn <= last_stage_in_range > && last_stage_for_insn >= first_stage_in_range) > > we end up with the equivalent of: > > if (first_stage_for_insn <= last_stage_in_range > && (double_stage_move_p (...) > ? first_stage_for_insn + 1 >= first_stage_in_range > : first_stage_for_insn >= first_stage_in_range)) > > with no corresponding simplification elsewhere. > Sure. But setting the range can be done by consulting an simple indicator, rather than generalizing to arbitrary stage numbers; e.g.: +ps_num_consecutive_stages (partial_schedule_ptr ps, int id) +{ + if (id >= ps->g->num_nodes && ps_reg_move (ps, id)->double_stages) + return 2; + else + return 1; +} or - last_u = first_u + ps_num_consecutive_stages (ps, u) - 1; + if (...double_stages) last_u = first_u + 1; + else last_u = first_u; >>> However... >>> >>>> Also note that if moves are assigned absolute cycles, it becomes clear >>>> to which stage each belongs (just like any other instruction), >>>> regulating their generation in prolog and epilog. >>> >>> ...I have to concede that it makes the stage calculation easier, >>> and that that tips the balance. (Although again, a move can belong >>> to two stages simultanouesly.) >>> >>> Anyway, here's an updated patch that uses cycle times. I've also >>> dropped the code that tried to allow windows to start and end on >>> the same row (and therefore schedule "either side" of that row). >>> I thought it might help with some VLIWy DFAs, but it was always >>> going to be of minor benefit, and we don't try that elsewhere, >>> so let's avoid the complication. >>> >> >> ok. Such windows seem rare, as they imply zero move (or def->move) >> latencies, iinm. Leaving a note behind would be good. > > OK. Since this same restriction applies to the amin scheduling > window code, I suppose the natural place would be in the algorithm > description itself: > > /* The SMS scheduling algorithm itself > ----------------------------------- > Input: 'O' an ordered list of insns of a loop. > Output: A scheduling of the loop - kernel, prolog, and epilogue. > ... > 42. compute epilogue & prologue > 43. finish - succeeded to schedule > */ > > E.g. adding something like this at the end: > > ??? The algorithm restricts the scheduling window to II cycles. > In rare cases, it may be better to allow windows of II+1 cycles. > The window would then start and end on the same row, but with > different "must precede" and "must follow" requirements. > > Let me know what you think and I'll add it as a follow-on patch. > great, thanks. >>> Bootstrapped & regression-tested on powerpc64-linux-gnu with >>> -fmodulo-sched and -fmodulo-sched-allow-regmoves enabled by default. >>> Also retested on the ARM benchmarks. OK to install? >>> >> >> Yes, OK. >> Some points to consider raised above, and some comments added below. > > Thanks, applied with the changes below. > >>> @@ -466,21 +506,167 @@ print_node_sched_params (FILE *file, int >>> for (i = 0; i < num_nodes; i++) >>> { >>> node_sched_params_ptr nsp = SCHED_PARAMS (i); >>> - int j; >>> >>> fprintf (file, "Node = %d; INSN = %d\n", i, >>> INSN_UID (ps_rtl_insn (ps, i))); >>> fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i])); >>> fprintf (file, " time = %d:\n", nsp->time); >>> - fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves); >>> - for (j = 0; j < nsp->nreg_moves; j++) >>> - { >>> - ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j); >> >> Is another way provided to print a summary of the regmoves? Or does >> the info dumped while scheduling each one suffice for practical >> tracing and debugging. > > Yeah, the new info should be more complete. It gives the pattern > (which is what we dump here), but also lists the producers and > consumers of each move (which we don't print here). > >>> + fprintf (file, " stage = %d:\n", nsp->stage); >>> + } >>> +} >>> + >>> +/* Set SCHED_COLUMN for each instruction in row ROW of PS. */ >>> +static void >>> +set_columns_for_row (partial_schedule_ptr ps, int row) >>> +{ >>> + ps_insn_ptr cur_insn; >>> + int column; >>> + >>> + column = 0; >>> + for (cur_insn = ps->rows[row]; cur_insn; cur_insn = cur_insn->next_in_row) >>> + SCHED_COLUMN (cur_insn->id) = column++; >>> +} >>> + >>> +/* Set SCHED_COLUMN for each instruction in PS. */ >>> +static void >>> +set_columns_for_ps (partial_schedule_ptr ps) >>> +{ >>> + int row; >>> + >>> + for (row = 0; row < ps->ii; row++) >>> + set_columns_for_row (ps, row); >>> +} >>> + >>> +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS. >>> + Its predecessors and successors have already been scheduled. >> >> well, except for some (preceeding or succeeding?) moves that have not >> yet been scheduled, right? > > Ah, yeah, fixed to: > > /* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS. > Its single predecessor has already been scheduled, as has its > ddg node successors. (The move may have also another move as its > successor, in which case that successor will be scheduled later.) > perfect >>> + >>> + The move is part of a chain that satisfies register dependencies >>> + between a producing ddg node and various consuming ddg nodes. >>> + If some of these dependencies cross a loop iteration (that is, >>> + have a distance of 1) then DISTANCE1_USES is nonnull and contains >>> + the set of uses with distance-1 dependencies. DISTANCE1_USES >>> + is null otherwise. >>> + >> >> Maybe clarify that they are upwards-exposed or live-in uses. > > OK, changed to: > > The move is part of a chain that satisfies register dependencies > between a producing ddg node and various consuming ddg nodes. > If some of these dependencies have a distance of 1 (meaning that > the use is upward-exposoed) then DISTANCE1_USES is nonnull and exposed (typo) > contains the set of uses with distance-1 dependencies. > DISTANCE1_USES is null otherwise. > >>> + MUST_FOLLOW is a scratch bitmap that is big enough to hold >>> + all current ps_insn ids. >>> + >>> + Return true on success. */ >>> +static bool >>> +schedule_reg_move (partial_schedule_ptr ps, int i_reg_move, >>> + sbitmap distance1_uses, sbitmap must_follow) >>> +{ >>> + unsigned int u; >>> + int this_time, this_distance, this_start, this_end, this_latency; >>> + int start, end, c, ii; >>> + sbitmap_iterator sbi; >>> + ps_reg_move_info *move; >>> + rtx this_insn; >>> + ps_insn_ptr psi; >>> + >>> + move = ps_reg_move (ps, i_reg_move); >>> + ii = ps->ii; >>> + if (dump_file) >>> + { >>> + fprintf (dump_file, "Scheduling register move INSN %d; ii = %d" >>> + ", min cycle = %d\n\n", INSN_UID (move->insn), ii, >>> + PS_MIN_CYCLE (ps)); >>> + print_rtl_single (dump_file, move->insn); >>> + fprintf (dump_file, "\n%11s %11s %5s\n", "start", "end", "time"); >>> + fprintf (dump_file, "=========== =========== =====\n"); >>> + } >>> + >>> + start = INT_MIN; >>> + end = INT_MAX; >>> + >>> + /* For dependencies of distance 1 between a producer ddg node A >>> + and consumer ddg node B, we have a chain of dependencies: >>> + >>> + A --(T,L1,1)--> M1 --(T,L2,0)--> M2 ... --(T,Ln,0)--> B >>> + >>> + where Mi is the ith move. For dependencies of distance 0 between >>> + a producer ddg node A and consumer ddg node C, we have a chain of >>> + dependencies: >>> + >>> + A --(T,L1',0)--> M1' --(T,L2',0)--> M2' ... --(T,Ln',0)--> C >>> + >>> + where Mi' occupies the same position as Mi but occurs a stage later. >>> + We can only schedule each move once, so if we have both types of >>> + chain, we model the second as: >>> + >>> + A --(T,L1',1)--> M1 --(T,L2',0)--> M2 ... --(T,Ln',-1)--> C >>> + >>> + First handle the dependencies between the previously-scheduled >>> + predecessor and the move. */ >>> + this_insn = ps_rtl_insn (ps, move->def); >>> + this_latency = insn_latency (this_insn, move->insn); >>> + this_distance = distance1_uses && move->def < ps->g->num_nodes ? 1 : 0; >>> + this_time = SCHED_TIME (move->def) - this_distance * ii; >>> + this_start = this_time + this_latency; >>> + this_end = this_time + ii; >>> + if (dump_file) >>> + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", >>> + this_start, this_end, SCHED_TIME (move->def), >>> + INSN_UID (this_insn), this_latency, this_distance, >>> + INSN_UID (move->insn)); >>> + >>> + if (start < this_start) >> >> redundant; start=INT_MIN is surely < this_start. >> >>> + start = this_start; >>> + if (end > this_end) >> >> redundant; end=INT_MAX is surely > this_end. > > I did this deliberately because the order in which we apply the limits > isn't important. Making them assymetrical by applying this kind of > micro-optimisation is IMO a bad idea. The compiler will do it for us. > > E.g. I originally had an extra limit to make sure that we didn't add > new stages. If we applied the optimisation by hand, and then someone > added a new limit like that later, they'd have to be careful about > where they put it. > ok, np. >>> + end = this_end; >>> + >>> + /* Handle the dependencies between the move and previously-scheduled >>> + successors. */ >> >> (maybe assert they have indeed all been scheduled) > > Hmm, I don't think that'd be useful. We've already read the schedule > time (and row and column) by this point, and we don't assert the same > thing elsewhere (in the code that assigns moves to uses, or in > get_sched_window itself). > (ok; only a thought, hence the parentheses) >>> + EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, u, sbi) >>> + { >>> + this_insn = ps_rtl_insn (ps, u); >>> + this_latency = insn_latency (move->insn, this_insn); >>> + if (distance1_uses && !TEST_BIT (distance1_uses, u)) >> >> shouldn't this be if (distance1_uses && TEST_BIT (distance1_uses, u))? >> >>> + this_distance = -1; >>> + else >>> + this_distance = 0; > > No, this condition corresponds to: > > /* For dependencies of distance 1 between a producer ddg node A > and consumer ddg node B, we have a chain of dependencies: > > A --(T,L1,1)--> M1 --(T,L2,0)--> M2 ... --(T,Ln,0)--> B > > where Mi is the ith move. For dependencies of distance 0 between > a producer ddg node A and consumer ddg node C, we have a chain of > dependencies: > > A --(T,L1',0)--> M1' --(T,L2',0)--> M2' ... --(T,Ln',0)--> C > > where Mi' occupies the same position as Mi but occurs a stage later. > We can only schedule each move once, so if we have both types of > chain, we model the second as: > > A --(T,L1',1)--> M1 --(T,L2',0)--> M2 ... --(T,Ln',-1)--> C > > First handle the dependencies between the previously-scheduled > predecessor and the move. */ > > i.e. we want a distance of -1 when (a) the original definition has uses > with both distance 0 and distance 1 and (b) the particular use we're > looking at has distance 0. > ah, right >>> + this_time = SCHED_TIME (u) + this_distance * ii; >>> + this_start = this_time - ps->ii; >> >> use ii instead of ps->ii > > Oops, fixed. > >>> + this_end = this_time - this_latency; >>> + if (dump_file) >>> + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", >>> + this_start, this_end, SCHED_TIME (u), INSN_UID (move->insn), >>> + this_latency, this_distance, INSN_UID (this_insn)); >>> + >>> + if (start < this_start) >>> + start = this_start; >>> + if (end > this_end) >>> + end = this_end; >>> + } >>> + >>> + if (dump_file) >>> + { >>> + fprintf (dump_file, "----------- ----------- -----\n"); >>> + fprintf (dump_file, "%11d %11d %5s %s\n", start, end, "", "(max, min)"); >>> + } >>> >>> - fprintf (file, " reg_move = "); >>> - print_rtl_single (file, move->insn); >>> + sbitmap_zero (must_follow); >>> + SET_BIT (must_follow, move->def); >>> + >>> + start = MAX (start, end - (ii - 1)); >> >> ok, so this excludes considering both ends of a possible ii-cycled >> window. Such a window would imply zero move (or def->move) latencies >> iinm, and more care in setting must_follow/must_precede. > > Right. > >>> + for (c = end; c >= start; c--) >>> + { >>> + psi = ps_add_node_check_conflicts (ps, i_reg_move, c, >>> + move->uses, must_follow); >> >> ok - def must_follow the move, if both are scheduled on same row (as >> we potentially start from this (end) row moving backwards), but uses >> should precede the move, assuming that if move and use are on same row >> the sched distance between them is ii (i.e., non-zero move->use >> latency) > > Right. > >>> @@ -1513,6 +1651,19 @@ sms_schedule (void) >>> continue; >>> } >>> >>> + /* Moves that handle incoming values might have been added >>> + to a new first stage. It's too late to try any kind of >>> + rotation, so just bump the stage count. */ >> >> hmm, one could still apply the rotation optimization at this time if >> desired, no? > > Hmm, maybe :-) I changed the comment to: > > /* Moves that handle incoming values might have been added > to a new first stage. Bump the stage count if so. > > ??? Perhaps we could consider rotating the schedule here > instead? */ > Great. Thanks, Ayal. > Richard > > > gcc/ > * modulo-sched.c (ps_reg_move_info): Add num_consecutive_stages. > (SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES): Delete. > (node_sched_params): Remove first_reg_move and nreg_moves. > (ps_num_consecutive_stages, extend_node_sched_params): New functions. > (update_node_sched_params): Move up file. > (print_node_sched_params): Print the stage. Don't dump info related > to first_reg_move and nreg_moves. > (set_columns_for_row): New function. > (set_columns_for_ps): Move up file and use set_columns_for_row. > (schedule_reg_move): New function. > (schedule_reg_moves): Call extend_node_sched_params and > schedule_reg_move. Extend size of uses bitmap. Initialize > num_consecutive_stages. Return false if a move could not be > scheduled. > (apply_reg_moves): Don't emit moves here. > (permute_partial_schedule): Handle register moves. > (duplicate_insns_of_cycles): Remove for_prolog. Emit moves according > to the same stage-count test as ddg nodes. > (generate_prolog_epilog): Update calls accordingly. > (sms_schedule): Allow move-scheduling to add a new first stage. > > Index: gcc/modulo-sched.c > =================================================================== > --- gcc/modulo-sched.c 2011-10-10 11:03:23.278273801 +0100 > +++ gcc/modulo-sched.c 2011-10-10 12:24:44.539932692 +0100 > @@ -153,6 +153,9 @@ struct ps_reg_move_info > rtx old_reg; > rtx new_reg; > > + /* The number of consecutive stages that the move occupies. */ > + int num_consecutive_stages; > + > /* An instruction that sets NEW_REG to the correct value. The first > move associated with DEF will have an rhs of OLD_REG; later moves > use the result of the previous move. */ > @@ -218,8 +221,6 @@ static partial_schedule_ptr sms_schedule > static void permute_partial_schedule (partial_schedule_ptr, rtx); > static void generate_prolog_epilog (partial_schedule_ptr, struct loop *, > rtx, rtx); > -static void duplicate_insns_of_cycles (partial_schedule_ptr, > - int, int, int, rtx); > static int calculate_stage_count (partial_schedule_ptr, int); > static void calculate_must_precede_follow (ddg_node_ptr, int, int, > int, int, sbitmap, sbitmap, sbitmap); > @@ -233,8 +234,6 @@ #define NODE_ASAP(node) ((node)->aux.cou > > #define SCHED_PARAMS(x) VEC_index (node_sched_params, node_sched_param_vec, x) > #define SCHED_TIME(x) (SCHED_PARAMS (x)->time) > -#define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move) > -#define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves) > #define SCHED_ROW(x) (SCHED_PARAMS (x)->row) > #define SCHED_STAGE(x) (SCHED_PARAMS (x)->stage) > #define SCHED_COLUMN(x) (SCHED_PARAMS (x)->column) > @@ -244,15 +243,6 @@ typedef struct node_sched_params > { > int time; /* The absolute scheduling cycle. */ > > - /* The following field (first_reg_move) is the ps_insn id of the first > - register-move instruction added to handle the modulo-variable-expansion > - of the register defined by this node. This register-move copies the > - original register defined by the node. */ > - int first_reg_move; > - > - /* The number of register-move instructions added. */ > - int nreg_moves; > - > int row; /* Holds time % ii. */ > int stage; /* Holds time / ii. */ > > @@ -344,6 +334,17 @@ ps_first_note (partial_schedule_ptr ps, > return ps->g->nodes[id].first_note; > } > > +/* Return the number of consecutive stages that are occupied by > + partial schedule instruction ID in PS. */ > +static int > +ps_num_consecutive_stages (partial_schedule_ptr ps, int id) > +{ > + if (id < ps->g->num_nodes) > + return 1; > + else > + return ps_reg_move (ps, id)->num_consecutive_stages; > +} > + > /* Given HEAD and TAIL which are the first and last insns in a loop; > return the register which controls the loop. Return zero if it has > more than one occurrence in the loop besides the control part or the > @@ -456,6 +457,45 @@ set_node_sched_params (ddg_ptr g) > node_sched_param_vec, g->num_nodes); > } > > +/* Make sure that node_sched_param_vec has an entry for every move in PS. */ > +static void > +extend_node_sched_params (partial_schedule_ptr ps) > +{ > + VEC_safe_grow_cleared (node_sched_params, heap, node_sched_param_vec, > + ps->g->num_nodes + VEC_length (ps_reg_move_info, > + ps->reg_moves)); > +} > + > +/* Update the sched_params (time, row and stage) for node U using the II, > + the CYCLE of U and MIN_CYCLE. > + We're not simply taking the following > + SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); > + because the stages may not be aligned on cycle 0. */ > +static void > +update_node_sched_params (int u, int ii, int cycle, int min_cycle) > +{ > + int sc_until_cycle_zero; > + int stage; > + > + SCHED_TIME (u) = cycle; > + SCHED_ROW (u) = SMODULO (cycle, ii); > + > + /* The calculation of stage count is done adding the number > + of stages before cycle zero and after cycle zero. */ > + sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii); > + > + if (SCHED_TIME (u) < 0) > + { > + stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); > + SCHED_STAGE (u) = sc_until_cycle_zero - stage; > + } > + else > + { > + stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); > + SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; > + } > +} > + > static void > print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps) > { > @@ -466,21 +506,169 @@ print_node_sched_params (FILE *file, int > for (i = 0; i < num_nodes; i++) > { > node_sched_params_ptr nsp = SCHED_PARAMS (i); > - int j; > > fprintf (file, "Node = %d; INSN = %d\n", i, > INSN_UID (ps_rtl_insn (ps, i))); > fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i])); > fprintf (file, " time = %d:\n", nsp->time); > - fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves); > - for (j = 0; j < nsp->nreg_moves; j++) > - { > - ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j); > + fprintf (file, " stage = %d:\n", nsp->stage); > + } > +} > + > +/* Set SCHED_COLUMN for each instruction in row ROW of PS. */ > +static void > +set_columns_for_row (partial_schedule_ptr ps, int row) > +{ > + ps_insn_ptr cur_insn; > + int column; > + > + column = 0; > + for (cur_insn = ps->rows[row]; cur_insn; cur_insn = cur_insn->next_in_row) > + SCHED_COLUMN (cur_insn->id) = column++; > +} > + > +/* Set SCHED_COLUMN for each instruction in PS. */ > +static void > +set_columns_for_ps (partial_schedule_ptr ps) > +{ > + int row; > + > + for (row = 0; row < ps->ii; row++) > + set_columns_for_row (ps, row); > +} > + > +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS. > + Its single predecessor has already been scheduled, as has its > + ddg node successors. (The move may have also another move as its > + successor, in which case that successor will be scheduled later.) > > - fprintf (file, " reg_move = "); > - print_rtl_single (file, move->insn); > + The move is part of a chain that satisfies register dependencies > + between a producing ddg node and various consuming ddg nodes. > + If some of these dependencies have a distance of 1 (meaning that > + the use is upward-exposoed) then DISTANCE1_USES is nonnull and > + contains the set of uses with distance-1 dependencies. > + DISTANCE1_USES is null otherwise. > + > + MUST_FOLLOW is a scratch bitmap that is big enough to hold > + all current ps_insn ids. > + > + Return true on success. */ > +static bool > +schedule_reg_move (partial_schedule_ptr ps, int i_reg_move, > + sbitmap distance1_uses, sbitmap must_follow) > +{ > + unsigned int u; > + int this_time, this_distance, this_start, this_end, this_latency; > + int start, end, c, ii; > + sbitmap_iterator sbi; > + ps_reg_move_info *move; > + rtx this_insn; > + ps_insn_ptr psi; > + > + move = ps_reg_move (ps, i_reg_move); > + ii = ps->ii; > + if (dump_file) > + { > + fprintf (dump_file, "Scheduling register move INSN %d; ii = %d" > + ", min cycle = %d\n\n", INSN_UID (move->insn), ii, > + PS_MIN_CYCLE (ps)); > + print_rtl_single (dump_file, move->insn); > + fprintf (dump_file, "\n%11s %11s %5s\n", "start", "end", "time"); > + fprintf (dump_file, "=========== =========== =====\n"); > + } > + > + start = INT_MIN; > + end = INT_MAX; > + > + /* For dependencies of distance 1 between a producer ddg node A > + and consumer ddg node B, we have a chain of dependencies: > + > + A --(T,L1,1)--> M1 --(T,L2,0)--> M2 ... --(T,Ln,0)--> B > + > + where Mi is the ith move. For dependencies of distance 0 between > + a producer ddg node A and consumer ddg node C, we have a chain of > + dependencies: > + > + A --(T,L1',0)--> M1' --(T,L2',0)--> M2' ... --(T,Ln',0)--> C > + > + where Mi' occupies the same position as Mi but occurs a stage later. > + We can only schedule each move once, so if we have both types of > + chain, we model the second as: > + > + A --(T,L1',1)--> M1 --(T,L2',0)--> M2 ... --(T,Ln',-1)--> C > + > + First handle the dependencies between the previously-scheduled > + predecessor and the move. */ > + this_insn = ps_rtl_insn (ps, move->def); > + this_latency = insn_latency (this_insn, move->insn); > + this_distance = distance1_uses && move->def < ps->g->num_nodes ? 1 : 0; > + this_time = SCHED_TIME (move->def) - this_distance * ii; > + this_start = this_time + this_latency; > + this_end = this_time + ii; > + if (dump_file) > + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", > + this_start, this_end, SCHED_TIME (move->def), > + INSN_UID (this_insn), this_latency, this_distance, > + INSN_UID (move->insn)); > + > + if (start < this_start) > + start = this_start; > + if (end > this_end) > + end = this_end; > + > + /* Handle the dependencies between the move and previously-scheduled > + successors. */ > + EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, u, sbi) > + { > + this_insn = ps_rtl_insn (ps, u); > + this_latency = insn_latency (move->insn, this_insn); > + if (distance1_uses && !TEST_BIT (distance1_uses, u)) > + this_distance = -1; > + else > + this_distance = 0; > + this_time = SCHED_TIME (u) + this_distance * ii; > + this_start = this_time - ii; > + this_end = this_time - this_latency; > + if (dump_file) > + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", > + this_start, this_end, SCHED_TIME (u), INSN_UID (move->insn), > + this_latency, this_distance, INSN_UID (this_insn)); > + > + if (start < this_start) > + start = this_start; > + if (end > this_end) > + end = this_end; > + } > + > + if (dump_file) > + { > + fprintf (dump_file, "----------- ----------- -----\n"); > + fprintf (dump_file, "%11d %11d %5s %s\n", start, end, "", "(max, min)"); > + } > + > + sbitmap_zero (must_follow); > + SET_BIT (must_follow, move->def); > + > + start = MAX (start, end - (ii - 1)); > + for (c = end; c >= start; c--) > + { > + psi = ps_add_node_check_conflicts (ps, i_reg_move, c, > + move->uses, must_follow); > + if (psi) > + { > + update_node_sched_params (i_reg_move, ii, c, PS_MIN_CYCLE (ps)); > + if (dump_file) > + fprintf (dump_file, "\nScheduled register move INSN %d at" > + " time %d, row %d\n\n", INSN_UID (move->insn), c, > + SCHED_ROW (i_reg_move)); > + return true; > } > } > + > + if (dump_file) > + fprintf (dump_file, "\nNo available slot\n\n"); > + > + return false; > } > > /* > @@ -508,6 +696,9 @@ schedule_reg_moves (partial_schedule_ptr > int nreg_moves = 0, i_reg_move; > rtx prev_reg, old_reg; > int first_move; > + int distances[2]; > + sbitmap must_follow; > + sbitmap distance1_uses; > rtx set = single_set (u->insn); > > /* Skip instructions that do not set a register. */ > @@ -516,6 +707,7 @@ schedule_reg_moves (partial_schedule_ptr > > /* Compute the number of reg_moves needed for u, by looking at life > ranges started at u (excluding self-loops). */ > + distances[0] = distances[1] = false; > for (e = u->out; e; e = e->next_out) > if (e->type == TRUE_DEP && e->dest != e->src) > { > @@ -546,6 +738,11 @@ schedule_reg_moves (partial_schedule_ptr > gcc_assert (!autoinc_var_is_used_p (u->insn, e->dest->insn)); > } > > + if (nreg_moves4e) > + { > + gcc_assert (e->distance < 2); > + distances[e->distance] = true; > + } > nreg_moves = MAX (nreg_moves, nreg_moves4e); > } > > @@ -556,11 +753,10 @@ schedule_reg_moves (partial_schedule_ptr > first_move = VEC_length (ps_reg_move_info, ps->reg_moves); > VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves, > first_move + nreg_moves); > + extend_node_sched_params (ps); > > /* Record the moves associated with this node. */ > first_move += ps->g->num_nodes; > - SCHED_FIRST_REG_MOVE (i) = first_move; > - SCHED_NREG_MOVES (i) = nreg_moves; > > /* Generate each move. */ > old_reg = prev_reg = SET_DEST (single_set (u->insn)); > @@ -569,15 +765,18 @@ schedule_reg_moves (partial_schedule_ptr > ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move); > > move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i; > - move->uses = sbitmap_alloc (g->num_nodes); > + move->uses = sbitmap_alloc (first_move + nreg_moves); > move->old_reg = old_reg; > move->new_reg = gen_reg_rtx (GET_MODE (prev_reg)); > + move->num_consecutive_stages = distances[0] && distances[1] ? 2 : 1; > move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg)); > sbitmap_zero (move->uses); > > prev_reg = move->new_reg; > } > > + distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; > + > /* Every use of the register defined by node may require a different > copy of this register, depending on the time the use is scheduled. > Record which uses require which move results. */ > @@ -601,8 +800,21 @@ schedule_reg_moves (partial_schedule_ptr > > move = ps_reg_move (ps, first_move + dest_copy - 1); > SET_BIT (move->uses, e->dest->cuid); > + if (e->distance == 1) > + SET_BIT (distance1_uses, e->dest->cuid); > } > } > + > + must_follow = sbitmap_alloc (first_move + nreg_moves); > + for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++) > + if (!schedule_reg_move (ps, first_move + i_reg_move, > + distance1_uses, must_follow)) > + break; > + sbitmap_free (must_follow); > + if (distance1_uses) > + sbitmap_free (distance1_uses); > + if (i_reg_move < nreg_moves) > + return false; > } > return true; > } > @@ -626,39 +838,6 @@ apply_reg_moves (partial_schedule_ptr ps > df_insn_rescan (ps->g->nodes[i_use].insn); > } > } > - > - FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move) > - add_insn_before (move->insn, ps_first_note (ps, move->def), NULL); > -} > - > -/* Update the sched_params (time, row and stage) for node U using the II, > - the CYCLE of U and MIN_CYCLE. > - We're not simply taking the following > - SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); > - because the stages may not be aligned on cycle 0. */ > -static void > -update_node_sched_params (int u, int ii, int cycle, int min_cycle) > -{ > - int sc_until_cycle_zero; > - int stage; > - > - SCHED_TIME (u) = cycle; > - SCHED_ROW (u) = SMODULO (cycle, ii); > - > - /* The calculation of stage count is done adding the number > - of stages before cycle zero and after cycle zero. */ > - sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii); > - > - if (SCHED_TIME (u) < 0) > - { > - stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); > - SCHED_STAGE (u) = sc_until_cycle_zero - stage; > - } > - else > - { > - stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); > - SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; > - } > } > > /* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of > @@ -699,22 +878,6 @@ reset_sched_times (partial_schedule_ptr > } > } > > -/* Set SCHED_COLUMN of each node according to its position in PS. */ > -static void > -set_columns_for_ps (partial_schedule_ptr ps) > -{ > - int row; > - > - for (row = 0; row < ps->ii; row++) > - { > - ps_insn_ptr cur_insn = ps->rows[row]; > - int column = 0; > - > - for (; cur_insn; cur_insn = cur_insn->next_in_row) > - SCHED_COLUMN (cur_insn->id) = column++; > - } > -} > - > /* Permute the insns according to their order in PS, from row 0 to > row ii-1, and position them right before LAST. This schedules > the insns of the loop kernel. */ > @@ -731,8 +894,13 @@ permute_partial_schedule (partial_schedu > rtx insn = ps_rtl_insn (ps, ps_ij->id); > > if (PREV_INSN (last) != insn) > - reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn, > - PREV_INSN (last)); > + { > + if (ps_ij->id < ps->g->num_nodes) > + reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn, > + PREV_INSN (last)); > + else > + add_insn_before (insn, last, NULL); > + } > } > } > > @@ -935,7 +1103,7 @@ optimize_sc (partial_schedule_ptr ps, dd > > static void > duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage, > - int to_stage, int for_prolog, rtx count_reg) > + int to_stage, rtx count_reg) > { > int row; > ps_insn_ptr ps_ij; > @@ -944,7 +1112,7 @@ duplicate_insns_of_cycles (partial_sched > for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row) > { > int u = ps_ij->id; > - int j, i_reg_moves, i_reg_move; > + int first_u, last_u; > rtx u_insn; > > /* Do not duplicate any insn which refers to count_reg as it > @@ -958,42 +1126,15 @@ duplicate_insns_of_cycles (partial_sched > || JUMP_P (u_insn)) > continue; > > - if (for_prolog) > + first_u = SCHED_STAGE (u); > + last_u = first_u + ps_num_consecutive_stages (ps, u) - 1; > + if (from_stage <= last_u && to_stage >= first_u) > { > - /* SCHED_STAGE (u) >= from_stage == 0. Generate increasing > - number of reg_moves starting with the second occurrence of > - u, which is generated if its SCHED_STAGE <= to_stage. */ > - i_reg_moves = to_stage - SCHED_STAGE (u) + 1; > - i_reg_moves = MAX (i_reg_moves, 0); > - i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u)); > - > - /* The reg_moves start from the *first* reg_move backwards. */ > - i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1); > - } > - else /* It's for the epilog. */ > - { > - /* SCHED_STAGE (u) <= to_stage. Generate all reg_moves, > - starting to decrease one stage after u no longer occurs; > - that is, generate all reg_moves until > - SCHED_STAGE (u) == from_stage - 1. */ > - i_reg_moves = (SCHED_NREG_MOVES (u) > - - (from_stage - SCHED_STAGE (u) - 1)); > - i_reg_moves = MAX (i_reg_moves, 0); > - i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u)); > - > - /* The reg_moves start from the *last* reg_move forwards. */ > - i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1); > + if (u < ps->g->num_nodes) > + duplicate_insn_chain (ps_first_note (ps, u), u_insn); > + else > + emit_insn (copy_rtx (PATTERN (u_insn))); > } > - > - for (j = 0; j < i_reg_moves; j++) > - { > - ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j); > - > - emit_insn (copy_rtx (PATTERN (move->insn))); > - } > - if (SCHED_STAGE (u) >= from_stage > - && SCHED_STAGE (u) <= to_stage) > - duplicate_insn_chain (ps_first_note (ps, u), u_insn); > } > } > > @@ -1027,7 +1168,7 @@ generate_prolog_epilog (partial_schedule > } > > for (i = 0; i < last_stage; i++) > - duplicate_insns_of_cycles (ps, 0, i, 1, count_reg); > + duplicate_insns_of_cycles (ps, 0, i, count_reg); > > /* Put the prolog on the entry edge. */ > e = loop_preheader_edge (loop); > @@ -1039,7 +1180,7 @@ generate_prolog_epilog (partial_schedule > start_sequence (); > > for (i = 0; i < last_stage; i++) > - duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg); > + duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg); > > /* Put the epilogue on the exit edge. */ > gcc_assert (single_exit (loop)); > @@ -1375,8 +1516,7 @@ sms_schedule (void) > { > rtx head, tail; > rtx count_reg, count_init; > - int mii, rec_mii; > - unsigned stage_count; > + int mii, rec_mii, stage_count, min_cycle; > HOST_WIDEST_INT loop_count = 0; > bool opt_sc_p; > > @@ -1486,13 +1626,12 @@ sms_schedule (void) > } > > gcc_assert (stage_count >= 1); > - PS_STAGE_COUNT (ps) = stage_count; > } > > /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of > 1 means that there is no interleaving between iterations thus > we let the scheduling passes do the job in this case. */ > - if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC) > + if (stage_count < PARAM_VALUE (PARAM_SMS_MIN_SC) > || (count_init && (loop_count <= stage_count)) > || (flag_branch_probabilities && (trip_count <= stage_count))) > { > @@ -1520,6 +1659,7 @@ sms_schedule (void) > > set_columns_for_ps (ps); > > + min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii); > if (!schedule_reg_moves (ps)) > { > mii = ps->ii + 1; > @@ -1527,6 +1667,21 @@ sms_schedule (void) > continue; > } > > + /* Moves that handle incoming values might have been added > + to a new first stage. Bump the stage count if so. > + > + ??? Perhaps we could consider rotating the schedule here > + instead? */ > + if (PS_MIN_CYCLE (ps) < min_cycle) > + { > + reset_sched_times (ps, 0); > + stage_count++; > + } > + > + /* The stage count should now be correct without rotation. */ > + gcc_checking_assert (stage_count == calculate_stage_count (ps, 0)); > + PS_STAGE_COUNT (ps) = stage_count; > + > canon_loop (loop); > > if (dump_file) >
Index: gcc/modulo-sched.c =================================================================== --- gcc/modulo-sched.c 2011-10-10 11:03:23.278273801 +0100 +++ gcc/modulo-sched.c 2011-10-10 12:24:44.539932692 +0100 @@ -153,6 +153,9 @@ struct ps_reg_move_info rtx old_reg; rtx new_reg; + /* The number of consecutive stages that the move occupies. */ + int num_consecutive_stages; + /* An instruction that sets NEW_REG to the correct value. The first move associated with DEF will have an rhs of OLD_REG; later moves use the result of the previous move. */ @@ -218,8 +221,6 @@ static partial_schedule_ptr sms_schedule static void permute_partial_schedule (partial_schedule_ptr, rtx); static void generate_prolog_epilog (partial_schedule_ptr, struct loop *, rtx, rtx); -static void duplicate_insns_of_cycles (partial_schedule_ptr, - int, int, int, rtx); static int calculate_stage_count (partial_schedule_ptr, int); static void calculate_must_precede_follow (ddg_node_ptr, int, int, int, int, sbitmap, sbitmap, sbitmap); @@ -233,8 +234,6 @@ #define NODE_ASAP(node) ((node)->aux.cou #define SCHED_PARAMS(x) VEC_index (node_sched_params, node_sched_param_vec, x) #define SCHED_TIME(x) (SCHED_PARAMS (x)->time) -#define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move) -#define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves) #define SCHED_ROW(x) (SCHED_PARAMS (x)->row) #define SCHED_STAGE(x) (SCHED_PARAMS (x)->stage) #define SCHED_COLUMN(x) (SCHED_PARAMS (x)->column) @@ -244,15 +243,6 @@ typedef struct node_sched_params { int time; /* The absolute scheduling cycle. */ - /* The following field (first_reg_move) is the ps_insn id of the first - register-move instruction added to handle the modulo-variable-expansion - of the register defined by this node. This register-move copies the - original register defined by the node. */ - int first_reg_move; - - /* The number of register-move instructions added. */ - int nreg_moves; - int row; /* Holds time % ii. */ int stage; /* Holds time / ii. */ @@ -344,6 +334,17 @@ ps_first_note (partial_schedule_ptr ps, return ps->g->nodes[id].first_note; } +/* Return the number of consecutive stages that are occupied by + partial schedule instruction ID in PS. */ +static int +ps_num_consecutive_stages (partial_schedule_ptr ps, int id) +{ + if (id < ps->g->num_nodes) + return 1; + else + return ps_reg_move (ps, id)->num_consecutive_stages; +} + /* Given HEAD and TAIL which are the first and last insns in a loop; return the register which controls the loop. Return zero if it has more than one occurrence in the loop besides the control part or the @@ -456,6 +457,45 @@ set_node_sched_params (ddg_ptr g) node_sched_param_vec, g->num_nodes); } +/* Make sure that node_sched_param_vec has an entry for every move in PS. */ +static void +extend_node_sched_params (partial_schedule_ptr ps) +{ + VEC_safe_grow_cleared (node_sched_params, heap, node_sched_param_vec, + ps->g->num_nodes + VEC_length (ps_reg_move_info, + ps->reg_moves)); +} + +/* Update the sched_params (time, row and stage) for node U using the II, + the CYCLE of U and MIN_CYCLE. + We're not simply taking the following + SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); + because the stages may not be aligned on cycle 0. */ +static void +update_node_sched_params (int u, int ii, int cycle, int min_cycle) +{ + int sc_until_cycle_zero; + int stage; + + SCHED_TIME (u) = cycle; + SCHED_ROW (u) = SMODULO (cycle, ii); + + /* The calculation of stage count is done adding the number + of stages before cycle zero and after cycle zero. */ + sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii); + + if (SCHED_TIME (u) < 0) + { + stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); + SCHED_STAGE (u) = sc_until_cycle_zero - stage; + } + else + { + stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); + SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; + } +} + static void print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps) { @@ -466,21 +506,169 @@ print_node_sched_params (FILE *file, int for (i = 0; i < num_nodes; i++) { node_sched_params_ptr nsp = SCHED_PARAMS (i); - int j; fprintf (file, "Node = %d; INSN = %d\n", i, INSN_UID (ps_rtl_insn (ps, i))); fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i])); fprintf (file, " time = %d:\n", nsp->time); - fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves); - for (j = 0; j < nsp->nreg_moves; j++) - { - ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j); + fprintf (file, " stage = %d:\n", nsp->stage); + } +} + +/* Set SCHED_COLUMN for each instruction in row ROW of PS. */ +static void +set_columns_for_row (partial_schedule_ptr ps, int row) +{ + ps_insn_ptr cur_insn; + int column; + + column = 0; + for (cur_insn = ps->rows[row]; cur_insn; cur_insn = cur_insn->next_in_row) + SCHED_COLUMN (cur_insn->id) = column++; +} + +/* Set SCHED_COLUMN for each instruction in PS. */ +static void +set_columns_for_ps (partial_schedule_ptr ps) +{ + int row; + + for (row = 0; row < ps->ii; row++) + set_columns_for_row (ps, row); +} + +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS. + Its single predecessor has already been scheduled, as has its + ddg node successors. (The move may have also another move as its + successor, in which case that successor will be scheduled later.) - fprintf (file, " reg_move = "); - print_rtl_single (file, move->insn); + The move is part of a chain that satisfies register dependencies + between a producing ddg node and various consuming ddg nodes. + If some of these dependencies have a distance of 1 (meaning that + the use is upward-exposoed) then DISTANCE1_USES is nonnull and + contains the set of uses with distance-1 dependencies. + DISTANCE1_USES is null otherwise. + + MUST_FOLLOW is a scratch bitmap that is big enough to hold + all current ps_insn ids. + + Return true on success. */ +static bool +schedule_reg_move (partial_schedule_ptr ps, int i_reg_move, + sbitmap distance1_uses, sbitmap must_follow) +{ + unsigned int u; + int this_time, this_distance, this_start, this_end, this_latency; + int start, end, c, ii; + sbitmap_iterator sbi; + ps_reg_move_info *move; + rtx this_insn; + ps_insn_ptr psi; + + move = ps_reg_move (ps, i_reg_move); + ii = ps->ii; + if (dump_file) + { + fprintf (dump_file, "Scheduling register move INSN %d; ii = %d" + ", min cycle = %d\n\n", INSN_UID (move->insn), ii, + PS_MIN_CYCLE (ps)); + print_rtl_single (dump_file, move->insn); + fprintf (dump_file, "\n%11s %11s %5s\n", "start", "end", "time"); + fprintf (dump_file, "=========== =========== =====\n"); + } + + start = INT_MIN; + end = INT_MAX; + + /* For dependencies of distance 1 between a producer ddg node A + and consumer ddg node B, we have a chain of dependencies: + + A --(T,L1,1)--> M1 --(T,L2,0)--> M2 ... --(T,Ln,0)--> B + + where Mi is the ith move. For dependencies of distance 0 between + a producer ddg node A and consumer ddg node C, we have a chain of + dependencies: + + A --(T,L1',0)--> M1' --(T,L2',0)--> M2' ... --(T,Ln',0)--> C + + where Mi' occupies the same position as Mi but occurs a stage later. + We can only schedule each move once, so if we have both types of + chain, we model the second as: + + A --(T,L1',1)--> M1 --(T,L2',0)--> M2 ... --(T,Ln',-1)--> C + + First handle the dependencies between the previously-scheduled + predecessor and the move. */ + this_insn = ps_rtl_insn (ps, move->def); + this_latency = insn_latency (this_insn, move->insn); + this_distance = distance1_uses && move->def < ps->g->num_nodes ? 1 : 0; + this_time = SCHED_TIME (move->def) - this_distance * ii; + this_start = this_time + this_latency; + this_end = this_time + ii; + if (dump_file) + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", + this_start, this_end, SCHED_TIME (move->def), + INSN_UID (this_insn), this_latency, this_distance, + INSN_UID (move->insn)); + + if (start < this_start) + start = this_start; + if (end > this_end) + end = this_end; + + /* Handle the dependencies between the move and previously-scheduled + successors. */ + EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, u, sbi) + { + this_insn = ps_rtl_insn (ps, u); + this_latency = insn_latency (move->insn, this_insn); + if (distance1_uses && !TEST_BIT (distance1_uses, u)) + this_distance = -1; + else + this_distance = 0; + this_time = SCHED_TIME (u) + this_distance * ii; + this_start = this_time - ii; + this_end = this_time - this_latency; + if (dump_file) + fprintf (dump_file, "%11d %11d %5d %d --(T,%d,%d)--> %d\n", + this_start, this_end, SCHED_TIME (u), INSN_UID (move->insn), + this_latency, this_distance, INSN_UID (this_insn)); + + if (start < this_start) + start = this_start; + if (end > this_end) + end = this_end; + } + + if (dump_file) + { + fprintf (dump_file, "----------- ----------- -----\n"); + fprintf (dump_file, "%11d %11d %5s %s\n", start, end, "", "(max, min)"); + } + + sbitmap_zero (must_follow); + SET_BIT (must_follow, move->def); + + start = MAX (start, end - (ii - 1)); + for (c = end; c >= start; c--) + { + psi = ps_add_node_check_conflicts (ps, i_reg_move, c, + move->uses, must_follow); + if (psi) + { + update_node_sched_params (i_reg_move, ii, c, PS_MIN_CYCLE (ps)); + if (dump_file) + fprintf (dump_file, "\nScheduled register move INSN %d at" + " time %d, row %d\n\n", INSN_UID (move->insn), c, + SCHED_ROW (i_reg_move)); + return true; } } + + if (dump_file) + fprintf (dump_file, "\nNo available slot\n\n"); + + return false; } /* @@ -508,6 +696,9 @@ schedule_reg_moves (partial_schedule_ptr int nreg_moves = 0, i_reg_move; rtx prev_reg, old_reg; int first_move; + int distances[2]; + sbitmap must_follow; + sbitmap distance1_uses; rtx set = single_set (u->insn); /* Skip instructions that do not set a register. */ @@ -516,6 +707,7 @@ schedule_reg_moves (partial_schedule_ptr /* Compute the number of reg_moves needed for u, by looking at life ranges started at u (excluding self-loops). */ + distances[0] = distances[1] = false; for (e = u->out; e; e = e->next_out) if (e->type == TRUE_DEP && e->dest != e->src) { @@ -546,6 +738,11 @@ schedule_reg_moves (partial_schedule_ptr gcc_assert (!autoinc_var_is_used_p (u->insn, e->dest->insn)); } + if (nreg_moves4e) + { + gcc_assert (e->distance < 2); + distances[e->distance] = true; + } nreg_moves = MAX (nreg_moves, nreg_moves4e); } @@ -556,11 +753,10 @@ schedule_reg_moves (partial_schedule_ptr first_move = VEC_length (ps_reg_move_info, ps->reg_moves); VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves, first_move + nreg_moves); + extend_node_sched_params (ps); /* Record the moves associated with this node. */ first_move += ps->g->num_nodes; - SCHED_FIRST_REG_MOVE (i) = first_move; - SCHED_NREG_MOVES (i) = nreg_moves; /* Generate each move. */ old_reg = prev_reg = SET_DEST (single_set (u->insn)); @@ -569,15 +765,18 @@ schedule_reg_moves (partial_schedule_ptr ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move); move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i; - move->uses = sbitmap_alloc (g->num_nodes); + move->uses = sbitmap_alloc (first_move + nreg_moves); move->old_reg = old_reg; move->new_reg = gen_reg_rtx (GET_MODE (prev_reg)); + move->num_consecutive_stages = distances[0] && distances[1] ? 2 : 1; move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg)); sbitmap_zero (move->uses); prev_reg = move->new_reg; } + distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; + /* Every use of the register defined by node may require a different copy of this register, depending on the time the use is scheduled. Record which uses require which move results. */ @@ -601,8 +800,21 @@ schedule_reg_moves (partial_schedule_ptr move = ps_reg_move (ps, first_move + dest_copy - 1); SET_BIT (move->uses, e->dest->cuid); + if (e->distance == 1) + SET_BIT (distance1_uses, e->dest->cuid); } } + + must_follow = sbitmap_alloc (first_move + nreg_moves); + for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++) + if (!schedule_reg_move (ps, first_move + i_reg_move, + distance1_uses, must_follow)) + break; + sbitmap_free (must_follow); + if (distance1_uses) + sbitmap_free (distance1_uses); + if (i_reg_move < nreg_moves) + return false; } return true; } @@ -626,39 +838,6 @@ apply_reg_moves (partial_schedule_ptr ps df_insn_rescan (ps->g->nodes[i_use].insn); } } - - FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move) - add_insn_before (move->insn, ps_first_note (ps, move->def), NULL); -} - -/* Update the sched_params (time, row and stage) for node U using the II, - the CYCLE of U and MIN_CYCLE. - We're not simply taking the following - SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); - because the stages may not be aligned on cycle 0. */ -static void -update_node_sched_params (int u, int ii, int cycle, int min_cycle) -{ - int sc_until_cycle_zero; - int stage; - - SCHED_TIME (u) = cycle; - SCHED_ROW (u) = SMODULO (cycle, ii); - - /* The calculation of stage count is done adding the number - of stages before cycle zero and after cycle zero. */ - sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii); - - if (SCHED_TIME (u) < 0) - { - stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); - SCHED_STAGE (u) = sc_until_cycle_zero - stage; - } - else - { - stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); - SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; - } } /* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of @@ -699,22 +878,6 @@ reset_sched_times (partial_schedule_ptr } } -/* Set SCHED_COLUMN of each node according to its position in PS. */ -static void -set_columns_for_ps (partial_schedule_ptr ps) -{ - int row; - - for (row = 0; row < ps->ii; row++) - { - ps_insn_ptr cur_insn = ps->rows[row]; - int column = 0; - - for (; cur_insn; cur_insn = cur_insn->next_in_row) - SCHED_COLUMN (cur_insn->id) = column++; - } -} - /* Permute the insns according to their order in PS, from row 0 to row ii-1, and position them right before LAST. This schedules the insns of the loop kernel. */ @@ -731,8 +894,13 @@ permute_partial_schedule (partial_schedu rtx insn = ps_rtl_insn (ps, ps_ij->id); if (PREV_INSN (last) != insn) - reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn, - PREV_INSN (last)); + { + if (ps_ij->id < ps->g->num_nodes) + reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn, + PREV_INSN (last)); + else + add_insn_before (insn, last, NULL); + } } } @@ -935,7 +1103,7 @@ optimize_sc (partial_schedule_ptr ps, dd static void duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage, - int to_stage, int for_prolog, rtx count_reg) + int to_stage, rtx count_reg) { int row; ps_insn_ptr ps_ij; @@ -944,7 +1112,7 @@ duplicate_insns_of_cycles (partial_sched for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row) { int u = ps_ij->id; - int j, i_reg_moves, i_reg_move; + int first_u, last_u; rtx u_insn; /* Do not duplicate any insn which refers to count_reg as it @@ -958,42 +1126,15 @@ duplicate_insns_of_cycles (partial_sched || JUMP_P (u_insn)) continue; - if (for_prolog) + first_u = SCHED_STAGE (u); + last_u = first_u + ps_num_consecutive_stages (ps, u) - 1; + if (from_stage <= last_u && to_stage >= first_u) { - /* SCHED_STAGE (u) >= from_stage == 0. Generate increasing - number of reg_moves starting with the second occurrence of - u, which is generated if its SCHED_STAGE <= to_stage. */ - i_reg_moves = to_stage - SCHED_STAGE (u) + 1; - i_reg_moves = MAX (i_reg_moves, 0); - i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u)); - - /* The reg_moves start from the *first* reg_move backwards. */ - i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1); - } - else /* It's for the epilog. */ - { - /* SCHED_STAGE (u) <= to_stage. Generate all reg_moves, - starting to decrease one stage after u no longer occurs; - that is, generate all reg_moves until - SCHED_STAGE (u) == from_stage - 1. */ - i_reg_moves = (SCHED_NREG_MOVES (u) - - (from_stage - SCHED_STAGE (u) - 1)); - i_reg_moves = MAX (i_reg_moves, 0); - i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u)); - - /* The reg_moves start from the *last* reg_move forwards. */ - i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1); + if (u < ps->g->num_nodes) + duplicate_insn_chain (ps_first_note (ps, u), u_insn); + else + emit_insn (copy_rtx (PATTERN (u_insn))); } - - for (j = 0; j < i_reg_moves; j++) - { - ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j); - - emit_insn (copy_rtx (PATTERN (move->insn))); - } - if (SCHED_STAGE (u) >= from_stage - && SCHED_STAGE (u) <= to_stage) - duplicate_insn_chain (ps_first_note (ps, u), u_insn); } } @@ -1027,7 +1168,7 @@ generate_prolog_epilog (partial_schedule } for (i = 0; i < last_stage; i++) - duplicate_insns_of_cycles (ps, 0, i, 1, count_reg); + duplicate_insns_of_cycles (ps, 0, i, count_reg); /* Put the prolog on the entry edge. */ e = loop_preheader_edge (loop); @@ -1039,7 +1180,7 @@ generate_prolog_epilog (partial_schedule start_sequence (); for (i = 0; i < last_stage; i++) - duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg); + duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg); /* Put the epilogue on the exit edge. */ gcc_assert (single_exit (loop)); @@ -1375,8 +1516,7 @@ sms_schedule (void) { rtx head, tail; rtx count_reg, count_init; - int mii, rec_mii; - unsigned stage_count; + int mii, rec_mii, stage_count, min_cycle; HOST_WIDEST_INT loop_count = 0; bool opt_sc_p; @@ -1486,13 +1626,12 @@ sms_schedule (void) } gcc_assert (stage_count >= 1); - PS_STAGE_COUNT (ps) = stage_count; } /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of 1 means that there is no interleaving between iterations thus we let the scheduling passes do the job in this case. */ - if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC) + if (stage_count < PARAM_VALUE (PARAM_SMS_MIN_SC) || (count_init && (loop_count <= stage_count)) || (flag_branch_probabilities && (trip_count <= stage_count))) { @@ -1520,6 +1659,7 @@ sms_schedule (void) set_columns_for_ps (ps); + min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii); if (!schedule_reg_moves (ps)) { mii = ps->ii + 1; @@ -1527,6 +1667,21 @@ sms_schedule (void) continue; } + /* Moves that handle incoming values might have been added + to a new first stage. Bump the stage count if so. + + ??? Perhaps we could consider rotating the schedule here + instead? */ + if (PS_MIN_CYCLE (ps) < min_cycle) + { + reset_sched_times (ps, 0); + stage_count++; + } + + /* The stage count should now be correct without rotation. */ + gcc_checking_assert (stage_count == calculate_stage_count (ps, 0)); + PS_STAGE_COUNT (ps) = stage_count; + canon_loop (loop); if (dump_file)