Message ID | 15d38062-fd33-7de7-5bba-79f17adefc65@redhat.com |
---|---|
State | New |
Headers | show |
On 03/10/2017 04:24 PM, Bernd Schmidt wrote: > In this PR, we have a few insns involved in two instruction combinations: > > insn 16: set r100 > insn 27: some calculation > insn 28: some calculation > insn 32: using r100 > insn 33: using r100 > insn 35: some calculation > > Then we combine insns 27, 28 and 33, producing two output insns, As a > result, insn 28 (i2) now obtains a use of r100. But insn 32, which is > not involved in this combination, has the LOG_LINKS entry for that > register, and we don't know that we need to update it. As a result, the > second combination, involving regs 16, 32 and 35 (based on the remaining > LOG_LINK for r100), produces incorrect code, as we don't realize there's > a use of r100 before insn 32. > > The following fixes it. Bootstrapped and tested on x86_64-linux, ok (on > all branches)? > > > Bernd > > > 79910.diff > > > PR rtl-optimization/79910 > * combine.c (record_used_regs): New static function. > (try_combine): Handle situations where there is an additional > instruction between I2 and I3 which needs to have a LOG_LINK > updated. > > PR rtl-optimization/79910 > * gcc.dg/torture/pr79910.c: New test. What a nasty little problem. I don't like that we have to build these bitmaps due to the compile-time cost. Would it make sense to only build them when i0/i1 exist? We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the i2pattern where we might need to conjure up a LOG_LINK, right? We don't do 4->3 combinations, right? So we only have to care about when there's going to be an newi2pat, right (3->2 or 4->2). We don't ever create a newi1pat (for a 4->3 combination), right? So we only have to worry about testing when there's a newi2pat, right? Do you need to handle 4->3, in which case the new bits could be on newi1pat? > > Index: gcc/combine.c > =================================================================== > --- gcc/combine.c (revision 245685) > +++ gcc/combine.c (working copy) > @@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in > return true; > } > > @@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > return 0; > } > > + auto_bitmap new_regs_in_i2; > + if (newi2pat) > + { > + /* We need to discover situations where we introduce a use of a > + register into I2, where none of the existing LOG_LINKS contain > + a reference to it. This can happen if previously I3 referenced > + the reg, and there is an additional use between I2 and I3. We > + must remove the LOG_LINKS entry from that additional use and > + distribute it along with our own ones. */ > + note_uses (&newi2pat, record_used_regs, (bitmap)new_regs_in_i2); > + bitmap_and_compl_into (new_regs_in_i2, i2_regset); > + bitmap_and_compl_into (new_regs_in_i2, links_regset); > + > + /* Here, we first look for situations where a hard register use > + moved, and just give up. This should happen approximately > + never, and it's not worth it to deal with possibilities like > + multi-word registers. Later, when fixing up LOG_LINKS, we > + deal with the case where a pseudo use moved. */ > + if (prev_nonnote_insn (i3) != i2) > + for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++) > + if (bitmap_bit_p (new_regs_in_i2, r)) if (prev_nonnote_insn (i3) != i2 && bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER) ? Overall it seems reasonable. If possible, let's avoid the calls to create the bitmaps in case where we know we'll never be creating a new use in i2. I'm not wed to using bitmap_first_set_bit. It's clearer and may be faster since you're not doing a ton of calls into bitmap_bit_p. The bitmap_first_set_bit implementation under the hood finds the word within the first bitmap element that is nonzero, then uses ctzl on that to get its bit. Jeff
On 03/15/2017 12:03 AM, Jeff Law wrote: > On 03/10/2017 04:24 PM, Bernd Schmidt wrote: >> >> PR rtl-optimization/79910 >> * combine.c (record_used_regs): New static function. >> (try_combine): Handle situations where there is an additional >> instruction between I2 and I3 which needs to have a LOG_LINK >> updated. >> >> PR rtl-optimization/79910 >> * gcc.dg/torture/pr79910.c: New test. > What a nasty little problem. > > I don't like that we have to build these bitmaps due to the compile-time > cost. Would it make sense to only build them when i0/i1 exist? I suppose at the moment we don't do 2->2 combinations, so we could conditionalize this on having an i1. > We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the > i2pattern where we might need to conjure up a LOG_LINK, right? > > > We don't do 4->3 combinations, right? So we only have to care about > when there's going to be an newi2pat, right (3->2 or 4->2). > We don't ever create a newi1pat (for a 4->3 combination), right? So we > only have to worry about testing when there's a newi2pat, right? Yes to all, there isn't a newi1pat, only 4->2 and 3->2 can be an issue. >> + if (prev_nonnote_insn (i3) != i2) >> + for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++) >> + if (bitmap_bit_p (new_regs_in_i2, r)) > if (prev_nonnote_insn (i3) != i2 > && bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER) Ah. I had wondered about the loop but only thought in the direction of intersecting this bitmap with one of all hard regs (and I think there isn't such a bitmap, so I kept the loop). I'll retest with your suggestion and with the bitmap creation conditional on i1 being nonnull. Bernd
On Wed, Mar 15, 2017 at 12:09:18AM +0100, Bernd Schmidt wrote: > I suppose at the moment we don't do 2->2 combinations, so we could > conditionalize this on having an i1. You suppose wrong. If one of the resulting insns is set_noop_p then 2->2 is allowed, for example. Also in the hopefully not so far future we will allow 2->2 in general. Segher
On 03/17/2017 04:16 PM, Segher Boessenkool wrote: > On Wed, Mar 15, 2017 at 12:09:18AM +0100, Bernd Schmidt wrote: >> I suppose at the moment we don't do 2->2 combinations, so we could >> conditionalize this on having an i1. > > You suppose wrong. If one of the resulting insns is set_noop_p then > 2->2 is allowed, for example. Also in the hopefully not so far future > we will allow 2->2 in general. But in a 2->2 where one is a nop we're not going to run into problems. jeff
PR rtl-optimization/79910 * combine.c (record_used_regs): New static function. (try_combine): Handle situations where there is an additional instruction between I2 and I3 which needs to have a LOG_LINK updated. PR rtl-optimization/79910 * gcc.dg/torture/pr79910.c: New test. Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 245685) +++ gcc/combine.c (working copy) @@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in return true; } +/* Set up a set of registers used in an insn. Called through note_uses, + arguments as described for that function. */ + +static void +record_used_regs (rtx *xptr, void *data) +{ + bitmap set = (bitmap)data; + int i, j; + enum rtx_code code; + const char *fmt; + rtx x = *xptr; + + /* repeat is used to turn tail-recursion into iteration since GCC + can't do it when there's no return value. */ + repeat: + if (x == 0) + return; + + code = GET_CODE (x); + if (REG_P (x)) + { + unsigned regno = REGNO (x); + unsigned end_regno = END_REGNO (x); + while (regno < end_regno) + bitmap_set_bit (set, regno++); + return; + } + + /* Recursively scan the operands of this expression. */ + + for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--) + { + if (fmt[i] == 'e') + { + /* If we are about to do the last recursive call + needed at this level, change it into iteration. + This function is called enough to be worth it. */ + if (i == 0) + { + x = XEXP (x, 0); + goto repeat; + } + + record_used_regs (&XEXP (x, i), data); + } + else if (fmt[i] == 'E') + for (j = 0; j < XVECLEN (x, i); j++) + record_used_regs (&XVECEXP (x, i, j), data); + } +} + /* Try to combine the insns I0, I1 and I2 into I3. Here I0, I1 and I2 appear earlier than I3. I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into @@ -2742,6 +2794,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, added_links_insn = 0; + auto_bitmap i2_regset, i3_regset, links_regset; + note_uses (&PATTERN (i2), record_used_regs, (bitmap)i2_regset); + note_uses (&PATTERN (i3), record_used_regs, (bitmap)i3_regset); + insn_link *ll; + FOR_EACH_LOG_LINK (ll, i3) + bitmap_set_bit (links_regset, ll->regno); + FOR_EACH_LOG_LINK (ll, i2) + bitmap_set_bit (links_regset, ll->regno); + if (i1) + FOR_EACH_LOG_LINK (ll, i1) + bitmap_set_bit (links_regset, ll->regno); + if (i0) + FOR_EACH_LOG_LINK (ll, i0) + bitmap_set_bit (links_regset, ll->regno); + /* First check for one important special case that the code below will not handle. Namely, the case where I1 is zero, I2 is a PARALLEL and I3 is a SET whose SET_SRC is a SET_DEST in I2. In that case, @@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2, return 0; } + auto_bitmap new_regs_in_i2; + if (newi2pat) + { + /* We need to discover situations where we introduce a use of a + register into I2, where none of the existing LOG_LINKS contain + a reference to it. This can happen if previously I3 referenced + the reg, and there is an additional use between I2 and I3. We + must remove the LOG_LINKS entry from that additional use and + distribute it along with our own ones. */ + note_uses (&newi2pat, record_used_regs, (bitmap)new_regs_in_i2); + bitmap_and_compl_into (new_regs_in_i2, i2_regset); + bitmap_and_compl_into (new_regs_in_i2, links_regset); + + /* Here, we first look for situations where a hard register use + moved, and just give up. This should happen approximately + never, and it's not worth it to deal with possibilities like + multi-word registers. Later, when fixing up LOG_LINKS, we + deal with the case where a pseudo use moved. */ + if (prev_nonnote_insn (i3) != i2) + for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++) + if (bitmap_bit_p (new_regs_in_i2, r)) + { + undo_all (); + return 0; + } + } + if (MAY_HAVE_DEBUG_INSNS) { struct undo *undo; @@ -4494,6 +4594,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, NULL_RTX, NULL_RTX, NULL_RTX); } + if (newi2pat) + { + bitmap_iterator iter; + unsigned int i; + + /* See comments above where we calculate the bitmap. */ + EXECUTE_IF_SET_IN_BITMAP ((bitmap)new_regs_in_i2, + LAST_VIRTUAL_REGISTER, i, iter) + { + rtx reg = regno_reg_rtx[i]; + rtx_insn *other; + for (other = NEXT_INSN (i2); other != i3; other = NEXT_INSN (other)) + if (NONDEBUG_INSN_P (other) + && (reg_overlap_mentioned_p (reg, PATTERN (other)) + || (CALL_P (other) && find_reg_fusage (other, USE, reg)))) + { + if (dump_file) + fprintf (dump_file, + "found extra use of reg %d at insn %d\n", i, + INSN_UID (other)); + insn_link **plink; + for (plink = &LOG_LINKS (other); + *plink; + plink = &(*plink)->next) + { + insn_link *link = *plink; + if (link->regno == i) + { + *plink = link->next; + link->next = i3links; + i3links = link; + break; + } + } + break; + } + } + } + distribute_links (i3links); distribute_links (i2links); distribute_links (i1links); Index: gcc/testsuite/gcc.dg/torture/pr79910.c =================================================================== --- gcc/testsuite/gcc.dg/torture/pr79910.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr79910.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-fweb" } */ + +typedef unsigned char u8; +typedef unsigned int u32; +typedef unsigned long long u64; +int a; + +static __attribute__ ((noinline, noclone)) u64 +foo (u8 p1, u32 p2) +{ + u64 b = a <= 0; + p2 = 4; + b >>= a == 0; + p1 %= 0xfffffffff; + p2 >>= b & 31; + p1 += b; + p2 <<= 31; + return p1 + p2 + b; +} + +int +main (void) +{ + u64 x = foo (0, 1); + if (x != 0) + __builtin_abort (); + return 0; +}