Message ID | f41501c6-4a9a-6dc0-7224-0f9a721a0765@ventanamicro.com |
---|---|
State | New |
Headers | show |
Series | [RFA] Avoid unnecessary load-immediate in coremark | expand |
On Tue, Sep 27, 2022 at 9:54 PM Jeff Law <jlaw@ventanamicro.com> wrote: > > > This is another minor improvement to coremark. I suspect this only > improves code size as the load-immediate was likely issuing with the ret > statement on multi-issue machines. > > > Basically we're failing to utilize conditional equivalences during the > post-reload CSE pass. So if a particular block is only reached when a > certain condition holds (say for example a4 == 0) and the block has an > assignment like a4 = 0, we would fail to eliminate the unnecessary > assignment. conditional equivalences on RTL - ick ;) I'm not familiar with RTL pattern matching so somebody else has to comment on that, but + /* If this is not the first time through, then + verify the source and destination match. */ + else if (dest == XEXP (cond, 0) && src == XEXP (cond, 1)) + ; shouldn't you restrict dest/src somehow? It might be a MEM? The way you create the fake insn suggests only REG_P dest are OK (not SUBREGs for example?)? Should you use rtx_equal_p (not using that possibly exempts MEM, but being more explicit would be nice). Should you restrict this to MODE_INT compares? Richard. > > So the way this works, as we enter each block in reload_cse_regs_1 we > look at the block's predecessors to see if all of them have the same > implicit assignment. If they do, then we create a dummy insn > representing that implicit assignment. > > > Before processing the first real insn, we enter the implicit assignment > into the cselib hash tables. This deferred action is necessary > because of CODE_LABEL handling in cselib -- when it sees a CODE_LABEL it > wipes state. So we have to add the implicit assignment after processing > the (optional) CODE_LABEL, but before processing real insns. > > > Note we have to walk all the block's predecessors to verify they all > have the same implicit assignment. That could potentially be expensive, > so we limit it to cases where there are only a few predecessors. For > reference on x86_64, 81% of the cases where implicit assignments can be > found are for single predecessor blocks. 96% have two preds, 99.1% have > 3 preds, 99.6% have 4 preds, 99.8% have 5 preds and so-on. While there > were cases where all 19 preds had the same implicit assignment capturing > those cases just doesn't seem terribly important. I put the clamp at 3 > preds. If folks think it's important, I could certainly make that a > PARAM. > > > Bootstrapped and regression tested on x86. Bootstrapped on riscv as well. > > > OK for the trunk? > > > Jeff > >
Jeff Law <jlaw@ventanamicro.com> writes: > This is another minor improvement to coremark. I suspect this only > improves code size as the load-immediate was likely issuing with the ret > statement on multi-issue machines. > > > Basically we're failing to utilize conditional equivalences during the > post-reload CSE pass. So if a particular block is only reached when a > certain condition holds (say for example a4 == 0) and the block has an > assignment like a4 = 0, we would fail to eliminate the unnecessary > assignment. I wasn't sure (and was too lazy to try, sorry), but: is the reason that we fail to catch this earlier because the two uses of r4 are entirely separate (i.e. not from the same pseudo)? > + /* Iterate over each incoming edge and see if they > + all have the same implicit set. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + /* If the predecessor does not end in a conditional > + jump, then it does not have an implicit set. */ > + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) > + && !block_ends_with_condjump_p (e->src)) > + { > + found = false; > + break; > + } > + > + /* We know the predecessor ends with a conditional > + jump. Now dig into the actal form of the jump > + to potentially extract an implicit set. */ Very minor, but it looked odd to fall through for the entry block. How about: if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun) || !block_ends_with_condjump_p (e->src)) ? > + rtx_insn *condjump = BB_END (e->src); > + if (condjump > + && any_condjump_p (condjump) > + && onlyjump_p (condjump)) > + { > + /* Extract the condition. */ > + rtx pat = PATTERN (condjump); > + rtx i_t_e = SET_SRC (pat); > + gcc_assert (GET_CODE (i_t_e) == IF_THEN_ELSE); > + rtx cond = XEXP (i_t_e, 0); > + if ((GET_CODE (cond) == EQ > + && GET_CODE (XEXP (i_t_e, 1)) == LABEL_REF > + && XEXP (XEXP (i_t_e, 1), 0) == BB_HEAD (bb)) > + || (GET_CODE (cond) == NE > + && XEXP (i_t_e, 2) == pc_rtx > + && e->src->next_bb == bb)) > + { > + /* If this is the first time through record > + the source and destination. */ > + if (!dest) > + { > + dest = XEXP (cond, 0); > + src = XEXP (cond, 1); > + } > + /* If this is not the first time through, then > + verify the source and destination match. */ > + else if (dest == XEXP (cond, 0) && src == XEXP (cond, 1)) > + ; FWIW, agree with what Richi said about using rtx_equal_p here. We don't necessarily end up with shared hard regs, especially if they originated from different pseudos. Thanks, Richard
On 9/29/22 01:44, Richard Biener wrote: > On Tue, Sep 27, 2022 at 9:54 PM Jeff Law <jlaw@ventanamicro.com> wrote: >> >> This is another minor improvement to coremark. I suspect this only >> improves code size as the load-immediate was likely issuing with the ret >> statement on multi-issue machines. >> >> >> Basically we're failing to utilize conditional equivalences during the >> post-reload CSE pass. So if a particular block is only reached when a >> certain condition holds (say for example a4 == 0) and the block has an >> assignment like a4 = 0, we would fail to eliminate the unnecessary >> assignment. > conditional equivalences on RTL - ick ;) That was my first reaction as well. > > I'm not familiar with RTL pattern matching so somebody else has to > comment on that, but > > + /* If this is not the first time through, then > + verify the source and destination match. */ > + else if (dest == XEXP (cond, 0) && src == XEXP (cond, 1)) > + ; > > shouldn't you restrict dest/src somehow? It might be a MEM? > The way you create the fake insn suggests only REG_P dest are OK > (not SUBREGs for example?)? You're absolutely right, as is Richard S WRT unexpected sharing. I'll adjust the patch appropriately. > Should you use rtx_equal_p (not using that possibly exempts MEM, > but being more explicit would be nice). Should you restrict this to > MODE_INT compares? rtx_equal_p would be better, yes. I'll adjust that too. This should work regardless of hte mode type though. The key is the post-reload cse bits have to check that the pattern matches and that the constraints are satisfied when a replacement is made. My only concern would be MODE_CC stuff. I'll think a bit more about that case. Jeff
On 9/30/22 04:47, Richard Sandiford wrote: > Jeff Law <jlaw@ventanamicro.com> writes: >> This is another minor improvement to coremark. I suspect this only >> improves code size as the load-immediate was likely issuing with the ret >> statement on multi-issue machines. >> >> >> Basically we're failing to utilize conditional equivalences during the >> post-reload CSE pass. So if a particular block is only reached when a >> certain condition holds (say for example a4 == 0) and the block has an >> assignment like a4 = 0, we would fail to eliminate the unnecessary >> assignment. > I wasn't sure (and was too lazy to try, sorry), but: is the reason > that we fail to catch this earlier because the two uses of r4 are > entirely separate (i.e. not from the same pseudo)? Right. Different pseudos used in the comparison vs the destination of the assignment. If they used the same pseudo, then I would have expected cse or gcse to pick it up. > >> + /* Iterate over each incoming edge and see if they >> + all have the same implicit set. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> + { >> + /* If the predecessor does not end in a conditional >> + jump, then it does not have an implicit set. */ >> + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) >> + && !block_ends_with_condjump_p (e->src)) >> + { >> + found = false; >> + break; >> + } >> + >> + /* We know the predecessor ends with a conditional >> + jump. Now dig into the actal form of the jump >> + to potentially extract an implicit set. */ > Very minor, but it looked odd to fall through for the entry block. > How about: > > if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun) > || !block_ends_with_condjump_p (e->src)) > > ? Looks like a mistake to me. we don't want to process anything for a successor of the entry block. I'll adjust and retest. Thanks, Jeff
diff --git a/gcc/postreload.cc b/gcc/postreload.cc index 41f61d32648..2f155a239ae 100644 --- a/gcc/postreload.cc +++ b/gcc/postreload.cc @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "emit-rtl.h" #include "recog.h" +#include "cfghooks.h" #include "cfgrtl.h" #include "cfgbuild.h" #include "cfgcleanup.h" @@ -221,13 +222,108 @@ reload_cse_regs_1 (void) init_alias_analysis (); FOR_EACH_BB_FN (bb, cfun) - FOR_BB_INSNS (bb, insn) - { - if (INSN_P (insn)) - cfg_changed |= reload_cse_simplify (insn, testreg); + { + /* If BB has a small number of predecessors, see if each of the + has the same implicit set. If so, record that implicit set so + that we can add it to the cselib tables. */ + rtx_insn *implicit_set; - cselib_process_insn (insn); - } + implicit_set = NULL; + if (EDGE_COUNT (bb->preds) <= 3) + { + edge e; + edge_iterator ei; + rtx src = NULL_RTX; + rtx dest = NULL_RTX; + bool found = true; + + /* Iterate over each incoming edge and see if they + all have the same implicit set. */ + FOR_EACH_EDGE (e, ei, bb->preds) + { + /* If the predecessor does not end in a conditional + jump, then it does not have an implicit set. */ + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) + && !block_ends_with_condjump_p (e->src)) + { + found = false; + break; + } + + /* We know the predecessor ends with a conditional + jump. Now dig into the actal form of the jump + to potentially extract an implicit set. */ + rtx_insn *condjump = BB_END (e->src); + if (condjump + && any_condjump_p (condjump) + && onlyjump_p (condjump)) + { + /* Extract the condition. */ + rtx pat = PATTERN (condjump); + rtx i_t_e = SET_SRC (pat); + gcc_assert (GET_CODE (i_t_e) == IF_THEN_ELSE); + rtx cond = XEXP (i_t_e, 0); + if ((GET_CODE (cond) == EQ + && GET_CODE (XEXP (i_t_e, 1)) == LABEL_REF + && XEXP (XEXP (i_t_e, 1), 0) == BB_HEAD (bb)) + || (GET_CODE (cond) == NE + && XEXP (i_t_e, 2) == pc_rtx + && e->src->next_bb == bb)) + { + /* If this is the first time through record + the source and destination. */ + if (!dest) + { + dest = XEXP (cond, 0); + src = XEXP (cond, 1); + } + /* If this is not the first time through, then + verify the source and destination match. */ + else if (dest == XEXP (cond, 0) && src == XEXP (cond, 1)) + ; + else + { + found = false; + break; + } + } + } + else + { + found = false; + break; + } + } + + /* If all the incoming edges had the same implicit + set, then create a dummy insn for that set. + + It will be entered into the cselib tables before + we process the first real insn in this block. */ + if (dest && found) + implicit_set = make_insn_raw (gen_rtx_SET (dest, src)); + } + + FOR_BB_INSNS (bb, insn) + { + if (INSN_P (insn)) + { + /* If we recorded an implicit set, enter it + into the tables before the first real insn. + + We have to do it this way because a CODE_LABEL + will flush the cselib tables. */ + if (implicit_set) + { + cselib_process_insn (implicit_set); + implicit_set = NULL; + } + cfg_changed |= reload_cse_simplify (insn, testreg); + } + + cselib_process_insn (insn); + } + } /* Clean up. */ end_alias_analysis (); diff --git a/gcc/testsuite/gcc.target/riscv/implicit-set.c b/gcc/testsuite/gcc.target/riscv/implicit-set.c new file mode 100644 index 00000000000..91106bb5d80 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/implicit-set.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -dp" } */ +/* This was extracted from coremark. */ + + +typedef signed short ee_s16; +typedef struct list_data_s +{ + ee_s16 data16; + ee_s16 idx; +} list_data; + +typedef struct list_head_s +{ + struct list_head_s *next; + struct list_data_s *info; +} list_head; + + +list_head * +core_list_find(list_head *list, list_data *info) +{ + if (info->idx >= 0) + { + while (list && (list->info->idx != info->idx)) + list = list->next; + return list; + } + else + { + while (list && ((list->info->data16 & 0xff) != info->data16)) + list = list->next; + return list; + } +} + +/* There was an unnecessary assignment to the return value until + recently. Scan for that in the resulting output. */ +/* { dg-final { scan-assembler-not "li\\ta0,0" } } */ +