Message ID | 87egzokglh.fsf@talisman.default |
---|---|
State | New |
Headers | show |
On 05/20/14 02:19, Richard Sandiford wrote: > Following on from (and depending on) the last patch, process_constraints > also shows up high in the profile. This patch caches the recog_op_alt > information by insn code too. It also shrinks the size of the structure > from 1 pointer + 5 ints to 1 pointer + 2 ints: > > - no target should have more than 65536 register classes :-) That could probably be much lower if we needed more bits for other things. > - "reject" is based on a cost of 600 for '!', so 16 bits should be plenty OK. Makes sense. > - "matched" and "matches" are operand numbers and so fit in 8 bits OK. This could also be smaller, don't we have an upper limit of 32 (or 30?) on the number of operands appearing in an insn. It'd be a way to get a few more bits if we need them someday. > Since this code is creating cached data and should only be run once > per insn code or asm statement, I don't think the extra in-loop > lra_static_insn_data assignments really justify having two copies > of such a complicated function. I think it would be better for LRA > to use the generic routine and then fill in the lra_static_insn_data > fields on the result (basically just an OR of "is_address" and > "early_clobber" for each alternative, plus setting up "commutative"). > We could then avoid having two caches of the same data. I'll do > that as a follow-up if it sounds OK. Seems reasonable. I suspect Vlad found the same code to be hot, hence the caching. Once he had a copy adding to it wasn't a big deal. But yes, I think that after the cache is globalized that having IRA walk over things another time shouldn't be a problem. > > On the subject of commutativity, we have: > > static bool > commutative_constraint_p (const char *str) > { > int curr_alt, c; > bool ignore_p; > > for (ignore_p = false, curr_alt = 0;;) > { > c = *str; > if (c == '\0') > break; > str += CONSTRAINT_LEN (c, str); > if (c == '#' || !recog_data.alternative_enabled_p[curr_alt]) > ignore_p = true; > else if (c == ',') > { > curr_alt++; > ignore_p = false; > } > else if (! ignore_p) > { > /* Usually `%' is the first constraint character but the > documentation does not require this. */ > if (c == '%') > return true; > } > } > return false; > } > > Any objections to requiring `%' to be the first constraint character? > Seems wasteful to be searching the constraint string just to support > an odd case. If we're going to change it, then clearly the docs need to change and ideally we'd statically check the port's constraints during the build process to ensure they meet the tighter definition. I'm sure David will be oh-so-happy to see state appearing. But that's something he'll have to deal with in the JIT side. > > The patch gives a further ~3.5% improvement in compile time for > -O0 fold-const.ii, on top of the other patch. > > Tested on x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > gcc/ > * recog.h (operand_alternative): Convert reg_class, reject, > matched and matches into bitfields. > (preprocess_constraints): Take the insn as parameter. > (recog_op_alt): Change into an array of pointers. > (target_recog): Add x_op_alt. > * recog.c (asm_op_alt_1, asm_op_alt): New variables > (recog_op_alt): Change into an array of pointers. > (preprocess_constraints): Update accordingly. Take the insn as > parameter. Use asm_op_alt_1 and asm_op_alt for asms. Cache other > instructions in this_target_recog. > * ira-lives.c (process_bb_node_lives): Pass the insn to > process_constraints. > * reg-stack.c (check_asm_stack_operands): Likewise. > (subst_asm_stack_regs): Likewise. > * regcprop.c (copyprop_hardreg_forward_1): Likewise. > * regrename.c (build_def_use): Likewise. > * sched-deps.c (sched_analyze_insn): Likewise. > * sel-sched.c (get_reg_class, implicit_clobber_conflict_p): Likewise. > * config/arm/arm.c (xscale_sched_adjust_cost): Likewise. > (note_invalid_constants): Likewise. > * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise. OK. Thanks! Jeff >
Sorry Jeff, while working on the follow-on LRA patch I came across a couple of problems, so I need another round on this. First of all, I mentioned doing a follow-on patch to make LRA and recog use the same cache for operand_alternatives. I hadn't realised until I wrote that patch that there's a significant difference between the LRA and recog_op_alt versions of the information: recog_op_alt groups alternatives together by operand whereas LRA groups operands together by alternative. So in the cached flat array, recog_op_alt uses [op * n_alternatives + alt] whereas LRA uses [alt * n_operands + nop]. The two could only share a cache if they used the same order. I think the LRA ordering makes more sense. Quite a few places are only interested in alternative which_alternative, so grouping operands together by alternative gives a natural subarray. And there are places in IRA (which uses recog_op_alt rather than the LRA information) where the "for each alternative" loop is the outer one. A second difference was that preprocess_constraints skips disabled alternatives while LRA's setup_operand_alternative doesn't; LRA just checks for disabled alternatives when walking the array instead. That should make no difference in practice, but since the LRA approach would also make it easier to precompute the array at build time, I thought it would be a better way to go. It also gives a cleaner interface: we can have a preprocess_insn_constraints function that takes a (define_insn-based) insn and returns the operand_alternative information without any global state. Also, it turns out that sel-sched.c, regcprop.c and regrename.c modify the recog_op_alt structure as a convenient way of handling matching constraints. That obviously isn't a good idea if we want other passes to reuse the data later. I've fixed that and made the types "const" so that new instances shouldn't creep in. Also, I hadn't realised that a define_insn with no constraints at all has 0 alternatives rather than 1. Some passes nevertheless unconditionally access the recog_op_alt information for alternative which_alternative. I could have fixed that by making n_alternatives always be >= 1 in the static insn table. Several places do have fast paths for n_alternatives == 0 though, so I thought it would be better to handle the 0 alternative case in preprocess_constraints as well. All it needs is a MIN (..., 1). So the patch has now grown to 5 subpatches: 1. make recog_op_alt a flat array and order it in the same way as LRA 2. fix the places that modified recog_op_alt locally 3. don't handle the enabled attribute in preprocess_constraints and make sure all consumers check for disabled alternatives explicitly 4. the updated version of the original patch 5. get LRA to use the new cache too (the original follow-on patch) Jeff Law <law@redhat.com> writes: > On 05/20/14 02:19, Richard Sandiford wrote: >> Following on from (and depending on) the last patch, process_constraints >> also shows up high in the profile. This patch caches the recog_op_alt >> information by insn code too. It also shrinks the size of the structure >> from 1 pointer + 5 ints to 1 pointer + 2 ints: >> >> - no target should have more than 65536 register classes :-) > That could probably be much lower if we needed more bits for other things. > >> - "reject" is based on a cost of 600 for '!', so 16 bits should be plenty > OK. Makes sense. > > >> - "matched" and "matches" are operand numbers and so fit in 8 bits > OK. This could also be smaller, don't we have an upper limit of 32 (or > 30?) on the number of operands appearing in an insn. It'd be a way to > get a few more bits if we need them someday. Yeah, we could probably squeeze everything into a single int if we needed to and if we were prepared to have non-byte-sized integer fields. Going from 2 ints to 1 int wouldn't help LP64 hosts as things stand though, since we have a pointer at the beginning of the struct. Boostrapped & regression-tested on x86_64-linux-gnu. Also tested by building gcc.dg, g++.dg and gcc.c-torture for one target per config/ directory and making sure that there were no asm differences. Thanks, Richard
On 05/31/14 03:02, Richard Sandiford wrote: > Sorry Jeff, while working on the follow-on LRA patch I came across > a couple of problems, so I need another round on this. It happens, particularly for conceptual changes like this. I don't consider that a problem at all -- we're already in agreement on the overall direction this work is going, the rest is the implementation details. With your long history with GCC, you get a lot of trust and leeway. > First of all, I mentioned doing a follow-on patch to make LRA and > recog use the same cache for operand_alternatives. I hadn't realised > until I wrote that patch that there's a significant difference between > the LRA and recog_op_alt versions of the information: recog_op_alt > groups alternatives together by operand whereas LRA groups operands > together by alternative. So in the cached flat array, recog_op_alt > uses [op * n_alternatives + alt] whereas LRA uses [alt * n_operands + nop]. > The two could only share a cache if they used the same order. The kind of thing that becomes obvious once you try to make it work :-) > > I think the LRA ordering makes more sense. Quite a few places are > only interested in alternative which_alternative, so grouping operands > together by alternative gives a natural subarray. And there are places > in IRA (which uses recog_op_alt rather than the LRA information) where > the "for each alternative" loop is the outer one. Yea, I would think that generally we're going to be interested in major indexing by which_alternative rather than the operands. > > A second difference was that preprocess_constraints skips disabled > alternatives while LRA's setup_operand_alternative doesn't; LRA just > checks for disabled alternatives when walking the array instead. > That should make no difference in practice, but since the LRA approach > would also make it easier to precompute the array at build time, > I thought it would be a better way to go. It also gives a cleaner > interface: we can have a preprocess_insn_constraints function that > takes a (define_insn-based) insn and returns the operand_alternative > information without any global state. Sounds good. Do you think prebuilding those arrays once at build time is likely to have a measurable impact? I realize you probably haven't done measurements, just looking for your gut instinct here. > > Also, it turns out that sel-sched.c, regcprop.c and regrename.c > modify the recog_op_alt structure as a convenient way of handling > matching constraints. That obviously isn't a good idea if we want > other passes to reuse the data later. I've fixed that and made the > types "const" so that new instances shouldn't creep in. Ick. But glad to see the introduction of const to prevent this from happening again. > > Also, I hadn't realised that a define_insn with no constraints > at all has 0 alternatives rather than 1. Some passes nevertheless > unconditionally access the recog_op_alt information for alternative > which_alternative. > > I could have fixed that by making n_alternatives always be >= 1 > in the static insn table. Several places do have fast paths for > n_alternatives == 0 though, so I thought it would be better to > handle the 0 alternative case in preprocess_constraints as well. > All it needs is a MIN (..., 1). Funny thing is I suspect the n_alternatives == 0 case is rare in practice though. >>> - "matched" and "matches" are operand numbers and so fit in 8 bits >> OK. This could also be smaller, don't we have an upper limit of 32 (or >> 30?) on the number of operands appearing in an insn. It'd be a way to >> get a few more bits if we need them someday. > > Yeah, we could probably squeeze everything into a single int if we needed > to and if we were prepared to have non-byte-sized integer fields. > Going from 2 ints to 1 int wouldn't help LP64 hosts as things stand > though, since we have a pointer at the beginning of the struct. Right and non-LP64 hosts are becoming more and more rare. And the masking necessary may eat up any benefits we'd get, hard to know either way. > > Boostrapped & regression-tested on x86_64-linux-gnu. Also tested by > building gcc.dg, g++.dg and gcc.c-torture for one target per config/ > directory and making sure that there were no asm differences. Hoping to slog through them this afternoon. Time is a bit tight though. jeff
Jeff Law <law@redhat.com> writes: > On 05/31/14 03:02, Richard Sandiford wrote: >> A second difference was that preprocess_constraints skips disabled >> alternatives while LRA's setup_operand_alternative doesn't; LRA just >> checks for disabled alternatives when walking the array instead. >> That should make no difference in practice, but since the LRA approach >> would also make it easier to precompute the array at build time, >> I thought it would be a better way to go. It also gives a cleaner >> interface: we can have a preprocess_insn_constraints function that >> takes a (define_insn-based) insn and returns the operand_alternative >> information without any global state. > Sounds good. Do you think prebuilding those arrays once at build time > is likely to have a measurable impact? I realize you probably haven't > done measurements, just looking for your gut instinct here. I have a local patch that does it, but unfortunately it doesn't seem to make a measurable difference. I'll try measuring it again once other lower-hanging fruit are out of the way. >> Also, I hadn't realised that a define_insn with no constraints >> at all has 0 alternatives rather than 1. Some passes nevertheless >> unconditionally access the recog_op_alt information for alternative >> which_alternative. >> >> I could have fixed that by making n_alternatives always be >= 1 >> in the static insn table. Several places do have fast paths for >> n_alternatives == 0 though, so I thought it would be better to >> handle the 0 alternative case in preprocess_constraints as well. >> All it needs is a MIN (..., 1). > Funny thing is I suspect the n_alternatives == 0 case is rare in > practice though. Yeah, probably. :-) Thanks, Richard
Index: gcc/recog.h =================================================================== --- gcc/recog.h 2014-05-19 20:42:50.830279171 +0100 +++ gcc/recog.h 2014-05-19 21:02:22.926604180 +0100 @@ -46,18 +46,18 @@ struct operand_alternative const char *constraint; /* The register class valid for this alternative (possibly NO_REGS). */ - enum reg_class cl; + ENUM_BITFIELD (reg_class) cl : 16; /* "Badness" of this alternative, computed from number of '?' and '!' characters in the constraint string. */ - unsigned int reject; + unsigned int reject : 16; /* -1 if no matching constraint was found, or an operand number. */ - int matches; + int matches : 8; /* The same information, but reversed: -1 if this operand is not matched by any other, or the operand number of the operand that matches this one. */ - int matched; + int matched : 8; /* Nonzero if '&' was found in the constraint string. */ unsigned int earlyclobber:1; @@ -77,8 +77,9 @@ struct operand_alternative /* Nonzero if 'X' was found in the constraint string, or if the constraint string for this alternative was empty. */ unsigned int anything_ok:1; -}; + unsigned int unused : 8; +}; extern void init_recog (void); extern void init_recog_no_volatile (void); @@ -134,7 +135,7 @@ extern void insn_extract (rtx); extern void extract_insn (rtx); extern void extract_constrain_insn_cached (rtx); extern void extract_insn_cached (rtx); -extern void preprocess_constraints (void); +extern void preprocess_constraints (rtx); extern rtx peep2_next_insn (int); extern int peep2_regno_dead_p (int, int); extern int peep2_reg_dead_p (int, rtx); @@ -258,7 +259,7 @@ struct recog_data_d /* Contains a vector of operand_alternative structures for every operand. Set up by preprocess_constraints. */ -extern struct operand_alternative recog_op_alt[MAX_RECOG_OPERANDS][MAX_RECOG_ALTERNATIVES]; +extern operand_alternative **recog_op_alt; /* A table defined in insn-output.c that give information about each insn-code value. */ @@ -376,6 +377,7 @@ struct insn_data_d /* Target-dependent globals. */ struct target_recog { alternative_mask x_enabled_alternatives[LAST_INSN_CODE]; + operand_alternative **x_op_alt[LAST_INSN_CODE]; }; extern struct target_recog default_target_recog; Index: gcc/recog.c =================================================================== --- gcc/recog.c 2014-05-19 20:43:28.978647678 +0100 +++ gcc/recog.c 2014-05-19 23:26:02.037573014 +0100 @@ -80,7 +80,11 @@ struct recog_data_d recog_data; /* Contains a vector of operand_alternative structures for every operand. Set up by preprocess_constraints. */ -struct operand_alternative recog_op_alt[MAX_RECOG_OPERANDS][MAX_RECOG_ALTERNATIVES]; +operand_alternative **recog_op_alt; + +static operand_alternative asm_op_alt_1[MAX_RECOG_OPERANDS + * MAX_RECOG_ALTERNATIVES]; +static operand_alternative *asm_op_alt[MAX_RECOG_OPERANDS]; /* On return from `constrain_operands', indicate which alternative was satisfied. */ @@ -2326,23 +2330,43 @@ extract_insn (rtx insn) information from the constraint strings into a more usable form. The collected data is stored in recog_op_alt. */ void -preprocess_constraints (void) +preprocess_constraints (rtx insn) { int i; - for (i = 0; i < recog_data.n_operands; i++) - memset (recog_op_alt[i], 0, (recog_data.n_alternatives - * sizeof (struct operand_alternative))); + int code = INSN_CODE (insn); + if (code >= 0 && this_target_recog->x_op_alt[code]) + { + recog_op_alt = this_target_recog->x_op_alt[code]; + return; + } + + int n_alternatives = recog_data.n_alternatives; + int n_operands = recog_data.n_operands; + int n_entries = n_operands * n_alternatives; + + operand_alternative *op_alt; + if (code < 0) + { + memset (asm_op_alt_1, 0, n_entries * sizeof (operand_alternative)); + op_alt = asm_op_alt_1; + recog_op_alt = asm_op_alt; + } + else + { + op_alt = XCNEWVEC (operand_alternative, n_entries); + recog_op_alt = XNEWVEC (operand_alternative *, n_operands); + this_target_recog->x_op_alt[code] = recog_op_alt; + } - for (i = 0; i < recog_data.n_operands; i++) + for (i = 0; i < n_operands; i++, op_alt += n_alternatives) { int j; - struct operand_alternative *op_alt; const char *p = recog_data.constraints[i]; - op_alt = recog_op_alt[i]; + recog_op_alt[i] = op_alt; - for (j = 0; j < recog_data.n_alternatives; j++) + for (j = 0; j < n_alternatives; j++) { op_alt[j].cl = NO_REGS; op_alt[j].constraint = p; Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c 2014-05-19 20:42:50.819279065 +0100 +++ gcc/ira-lives.c 2014-05-19 21:01:12.577922691 +0100 @@ -1238,7 +1238,7 @@ process_bb_node_lives (ira_loop_tree_nod } extract_insn (insn); - preprocess_constraints (); + preprocess_constraints (insn); process_single_reg_class_operands (false, freq); /* See which defined values die here. */ Index: gcc/reg-stack.c =================================================================== --- gcc/reg-stack.c 2014-05-16 09:47:34.336936052 +0100 +++ gcc/reg-stack.c 2014-05-19 21:01:50.349288768 +0100 @@ -473,7 +473,7 @@ check_asm_stack_operands (rtx insn) constrain_operands (1); alt = which_alternative; - preprocess_constraints (); + preprocess_constraints (insn); get_asm_operands_in_out (body, &n_outputs, &n_inputs); @@ -2032,7 +2032,7 @@ subst_asm_stack_regs (rtx insn, stack_pt constrain_operands (1); alt = which_alternative; - preprocess_constraints (); + preprocess_constraints (insn); get_asm_operands_in_out (body, &n_outputs, &n_inputs); Index: gcc/regcprop.c =================================================================== --- gcc/regcprop.c 2014-05-17 07:59:06.436021428 +0100 +++ gcc/regcprop.c 2014-05-19 21:01:29.001081988 +0100 @@ -774,7 +774,7 @@ copyprop_hardreg_forward_1 (basic_block extract_insn (insn); if (! constrain_operands (1)) fatal_insn_not_found (insn); - preprocess_constraints (); + preprocess_constraints (insn); alt = which_alternative; n_ops = recog_data.n_operands; is_asm = asm_noperands (PATTERN (insn)) >= 0; @@ -880,7 +880,7 @@ copyprop_hardreg_forward_1 (basic_block extract_insn (insn); if (! constrain_operands (1)) fatal_insn_not_found (insn); - preprocess_constraints (); + preprocess_constraints (insn); } /* Otherwise, try all valid registers and see if its valid. */ @@ -908,7 +908,7 @@ copyprop_hardreg_forward_1 (basic_block extract_insn (insn); if (! constrain_operands (1)) fatal_insn_not_found (insn); - preprocess_constraints (); + preprocess_constraints (insn); } } } Index: gcc/regrename.c =================================================================== --- gcc/regrename.c 2014-05-06 18:38:47.928200023 +0100 +++ gcc/regrename.c 2014-05-19 21:01:35.724147073 +0100 @@ -1571,7 +1571,7 @@ build_def_use (basic_block bb) extract_insn (insn); if (! constrain_operands (1)) fatal_insn_not_found (insn); - preprocess_constraints (); + preprocess_constraints (insn); alt = which_alternative; n_ops = recog_data.n_operands; untracked_operands = 0; Index: gcc/sched-deps.c =================================================================== --- gcc/sched-deps.c 2014-05-16 09:47:34.347936161 +0100 +++ gcc/sched-deps.c 2014-05-19 21:01:58.526367964 +0100 @@ -2865,7 +2865,7 @@ sched_analyze_insn (struct deps_desc *de HARD_REG_SET temp; extract_insn (insn); - preprocess_constraints (); + preprocess_constraints (insn); ira_implicitly_set_insn_hard_regs (&temp); AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp); Index: gcc/sel-sched.c =================================================================== --- gcc/sel-sched.c 2014-03-04 21:19:43.120097702 +0000 +++ gcc/sel-sched.c 2014-05-19 21:02:14.471522363 +0100 @@ -1019,7 +1019,7 @@ get_reg_class (rtx insn) extract_insn (insn); if (! constrain_operands (1)) fatal_insn_not_found (insn); - preprocess_constraints (); + preprocess_constraints (insn); alt = which_alternative; n_ops = recog_data.n_operands; @@ -2141,7 +2141,7 @@ implicit_clobber_conflict_p (insn_t thro /* Calculate implicit clobbers. */ extract_insn (insn); - preprocess_constraints (); + preprocess_constraints (insn); ira_implicitly_set_insn_hard_regs (&temp); AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2014-05-19 07:46:29.013179879 +0100 +++ gcc/config/arm/arm.c 2014-05-19 21:05:50.289605508 +0100 @@ -11335,7 +11335,7 @@ xscale_sched_adjust_cost (rtx insn, rtx that overlaps with SHIFTED_OPERAND, then we have increase the cost of this dependency. */ extract_insn (dep); - preprocess_constraints (); + preprocess_constraints (dep); for (opno = 0; opno < recog_data.n_operands; opno++) { /* We can ignore strict inputs. */ @@ -16870,7 +16870,7 @@ note_invalid_constants (rtx insn, HOST_W /* Fill in recog_op_alt with information about the constraints of this insn. */ - preprocess_constraints (); + preprocess_constraints (insn); for (opno = 0; opno < recog_data.n_operands; opno++) { Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2014-05-19 07:46:26.771160339 +0100 +++ gcc/config/i386/i386.c 2014-05-19 21:05:39.461501255 +0100 @@ -5826,7 +5826,7 @@ ix86_legitimate_combined_insn (rtx insn) int i; extract_insn (insn); - preprocess_constraints (); + preprocess_constraints (insn); for (i = 0; i < recog_data.n_operands; i++) {