Message ID | nycvar.YFH.7.76.1911181042540.5566@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
Series | Fix PR92462 | expand |
On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > The following adjusts the find_base_{term,value} heuristic when > looking through ANDs to the case where it is obviously not > aligning a base (when the LSB is set). What is specific about the LSB? I mean & 2 is obviously not an aligning AND either. Shouldn't we recurse only if the CONST_INT_P operand has some set bits followed by clear bits, i.e. after the != 0 check compute ctz_hwi and verify that INTVAL >> ctz == -1? > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > Richard. > > 2019-11-18 Richard Biener <rguenther@suse.de> > > PR rtl-optimization/92462 > * alias.c (find_base_term): Restrict the look through ANDs. > (find_base_value): Likewise. Jakub
On Mon, 18 Nov 2019, Jakub Jelinek wrote: > On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > > The following adjusts the find_base_{term,value} heuristic when > > looking through ANDs to the case where it is obviously not > > aligning a base (when the LSB is set). > > What is specific about the LSB? I mean & 2 is obviously not an aligning > AND either. It aligns 0x1 and 0x3 ;) > Shouldn't we recurse only if the CONST_INT_P operand has > some set bits followed by clear bits, i.e. after the != 0 check > compute ctz_hwi and verify that INTVAL >> ctz == -1? I thought of more advanced heuristics but I guess that any contiguous set of bits with LSB unset might be aligning if the programmer knows upper bits are zero anyways? So I fear the == -1 test would not work reliable? But since find_base_term/value are broken anyways I guess we can live with that... Richard.
On Mon, 18 Nov 2019, Richard Biener wrote: > On Mon, 18 Nov 2019, Jakub Jelinek wrote: > > > On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > > > The following adjusts the find_base_{term,value} heuristic when > > > looking through ANDs to the case where it is obviously not > > > aligning a base (when the LSB is set). > > > > What is specific about the LSB? I mean & 2 is obviously not an aligning > > AND either. > > It aligns 0x1 and 0x3 ;) Oh, and of course known-zero-bits magic could have turned a & 0x7 mask to a & 0x6 one because LSB is known to be clear already ... :/ So any of these heuristics are - well - just heuristics. > > Shouldn't we recurse only if the CONST_INT_P operand has > > some set bits followed by clear bits, i.e. after the != 0 check > > compute ctz_hwi and verify that INTVAL >> ctz == -1? > > I thought of more advanced heuristics but I guess that > any contiguous set of bits with LSB unset might be aligning > if the programmer knows upper bits are zero anyways? So I fear > the == -1 test would not work reliable? > > But since find_base_term/value are broken anyways I guess we > can live with that... > > Richard.
On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: > On Mon, 18 Nov 2019, Jakub Jelinek wrote: > > > On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > > > The following adjusts the find_base_{term,value} heuristic when > > > looking through ANDs to the case where it is obviously not > > > aligning a base (when the LSB is set). > > > > What is specific about the LSB? I mean & 2 is obviously not an aligning > > AND either. > > It aligns 0x1 and 0x3 ;) > > > Shouldn't we recurse only if the CONST_INT_P operand has > > some set bits followed by clear bits, i.e. after the != 0 check > > compute ctz_hwi and verify that INTVAL >> ctz == -1? > > I thought of more advanced heuristics but I guess that > any contiguous set of bits with LSB unset might be aligning > if the programmer knows upper bits are zero anyways? So I fear > the == -1 test would not work reliable? I'd say once a user does & 0x1ff8 or similar, he is doing something weird, and not recursing is the conservatively correct answer (or maybe it isn't? Aren't there cases where we punt if from a binary op we get base terms from both sides and just use one if we get it only from one side? If so, we might need to have some kind of annotated return, this could have a base term but please don't use it). I guess the only non-weird case would be if the target has weird pointer sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 integers, so the aligning then could be & 0xfffff8 etc. Jakub
On Mon, 18 Nov 2019, Jakub Jelinek wrote: > On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: > > On Mon, 18 Nov 2019, Jakub Jelinek wrote: > > > > > On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > > > > The following adjusts the find_base_{term,value} heuristic when > > > > looking through ANDs to the case where it is obviously not > > > > aligning a base (when the LSB is set). > > > > > > What is specific about the LSB? I mean & 2 is obviously not an aligning > > > AND either. > > > > It aligns 0x1 and 0x3 ;) > > > > > Shouldn't we recurse only if the CONST_INT_P operand has > > > some set bits followed by clear bits, i.e. after the != 0 check > > > compute ctz_hwi and verify that INTVAL >> ctz == -1? > > > > I thought of more advanced heuristics but I guess that > > any contiguous set of bits with LSB unset might be aligning > > if the programmer knows upper bits are zero anyways? So I fear > > the == -1 test would not work reliable? > > I'd say once a user does & 0x1ff8 or similar, he is doing something weird, > and not recursing is the conservatively correct answer (or maybe it isn't? > Aren't there cases where we punt if from a binary op we get base terms from > both sides and just use one if we get it only from one side? If so, > we might need to have some kind of annotated return, this could have a base > term but please don't use it). Yes, we might run into the "wrong" one via binary op handling, so there isn't really a conservative answer here :/ > I guess the only non-weird case would be if the target has weird pointer > sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 > integers, so the aligning then could be & 0xfffff8 etc. Yeah. Or the weird ones are exposed by nonzero bits "optimizations" of constants. Richard.
On 11/18/19 3:37 AM, Richard Biener wrote: > On Mon, 18 Nov 2019, Jakub Jelinek wrote: > >> On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: >>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: >>> >>>> On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: >>>>> The following adjusts the find_base_{term,value} heuristic when >>>>> looking through ANDs to the case where it is obviously not >>>>> aligning a base (when the LSB is set). >>>> >>>> What is specific about the LSB? I mean & 2 is obviously not an aligning >>>> AND either. >>> >>> It aligns 0x1 and 0x3 ;) >>> >>>> Shouldn't we recurse only if the CONST_INT_P operand has >>>> some set bits followed by clear bits, i.e. after the != 0 check >>>> compute ctz_hwi and verify that INTVAL >> ctz == -1? >>> >>> I thought of more advanced heuristics but I guess that >>> any contiguous set of bits with LSB unset might be aligning >>> if the programmer knows upper bits are zero anyways? So I fear >>> the == -1 test would not work reliable? >> >> I'd say once a user does & 0x1ff8 or similar, he is doing something weird, >> and not recursing is the conservatively correct answer (or maybe it isn't? >> Aren't there cases where we punt if from a binary op we get base terms from >> both sides and just use one if we get it only from one side? If so, >> we might need to have some kind of annotated return, this could have a base >> term but please don't use it). > > Yes, we might run into the "wrong" one via binary op handling, so > there isn't really a conservative answer here :/ > >> I guess the only non-weird case would be if the target has weird pointer >> sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 >> integers, so the aligning then could be & 0xfffff8 etc. > > Yeah. Or the weird ones are exposed by nonzero bits "optimizations" > of constants. IIRC all the low order bitmask handling for pointers was primarily to benefit the Alpha. I think over time we saw it was helpful for varargs/stdarg, but that's about it. I'd be surprised if it's all that important to dig deeply into addresses of this form. jeff
On Mon, 18 Nov 2019, Jeff Law wrote: > On 11/18/19 3:37 AM, Richard Biener wrote: > > On Mon, 18 Nov 2019, Jakub Jelinek wrote: > > > >> On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: > >>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: > >>> > >>>> On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > >>>>> The following adjusts the find_base_{term,value} heuristic when > >>>>> looking through ANDs to the case where it is obviously not > >>>>> aligning a base (when the LSB is set). > >>>> > >>>> What is specific about the LSB? I mean & 2 is obviously not an aligning > >>>> AND either. > >>> > >>> It aligns 0x1 and 0x3 ;) > >>> > >>>> Shouldn't we recurse only if the CONST_INT_P operand has > >>>> some set bits followed by clear bits, i.e. after the != 0 check > >>>> compute ctz_hwi and verify that INTVAL >> ctz == -1? > >>> > >>> I thought of more advanced heuristics but I guess that > >>> any contiguous set of bits with LSB unset might be aligning > >>> if the programmer knows upper bits are zero anyways? So I fear > >>> the == -1 test would not work reliable? > >> > >> I'd say once a user does & 0x1ff8 or similar, he is doing something weird, > >> and not recursing is the conservatively correct answer (or maybe it isn't? > >> Aren't there cases where we punt if from a binary op we get base terms from > >> both sides and just use one if we get it only from one side? If so, > >> we might need to have some kind of annotated return, this could have a base > >> term but please don't use it). > > > > Yes, we might run into the "wrong" one via binary op handling, so > > there isn't really a conservative answer here :/ > > > >> I guess the only non-weird case would be if the target has weird pointer > >> sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 > >> integers, so the aligning then could be & 0xfffff8 etc. > > > > Yeah. Or the weird ones are exposed by nonzero bits "optimizations" > > of constants. > IIRC all the low order bitmask handling for pointers was primarily to > benefit the Alpha. I think over time we saw it was helpful for > varargs/stdarg, but that's about it. I'd be surprised if it's all that > important to dig deeply into addresses of this form. The whole find_base_{value,term} thing is most important for stack accesses which we otherwise can't handle well in aliasing and code effects mostly materialize in DSE. See my analysis in PR49330. But the code is really broken and we're clawing to the extra DSE in exchange for these wrong-code issues... :/ IMHO the appropriate way to go is to for example apply the patch from comment#18 and try to fix code quality regressions in a different way. Unfortunately the way RTL represents stack accesses and implications for aliasing is a mystery to me so I can't help much there - still I have received only push-back in making find_base_{value,term} correct in the last 10 years (well, it feels like 10 years). After all, it's only about one wrong-code bug caused by this per year making it into our bugzilla. Richard.
On 11/19/19 12:42 AM, Richard Biener wrote: > On Mon, 18 Nov 2019, Jeff Law wrote: > >> On 11/18/19 3:37 AM, Richard Biener wrote: >>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: >>> >>>> On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: >>>>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: >>>>> >>>>>> On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: >>>>>>> The following adjusts the find_base_{term,value} heuristic when >>>>>>> looking through ANDs to the case where it is obviously not >>>>>>> aligning a base (when the LSB is set). >>>>>> >>>>>> What is specific about the LSB? I mean & 2 is obviously not an aligning >>>>>> AND either. >>>>> >>>>> It aligns 0x1 and 0x3 ;) >>>>> >>>>>> Shouldn't we recurse only if the CONST_INT_P operand has >>>>>> some set bits followed by clear bits, i.e. after the != 0 check >>>>>> compute ctz_hwi and verify that INTVAL >> ctz == -1? >>>>> >>>>> I thought of more advanced heuristics but I guess that >>>>> any contiguous set of bits with LSB unset might be aligning >>>>> if the programmer knows upper bits are zero anyways? So I fear >>>>> the == -1 test would not work reliable? >>>> >>>> I'd say once a user does & 0x1ff8 or similar, he is doing something weird, >>>> and not recursing is the conservatively correct answer (or maybe it isn't? >>>> Aren't there cases where we punt if from a binary op we get base terms from >>>> both sides and just use one if we get it only from one side? If so, >>>> we might need to have some kind of annotated return, this could have a base >>>> term but please don't use it). >>> >>> Yes, we might run into the "wrong" one via binary op handling, so >>> there isn't really a conservative answer here :/ >>> >>>> I guess the only non-weird case would be if the target has weird pointer >>>> sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 >>>> integers, so the aligning then could be & 0xfffff8 etc. >>> >>> Yeah. Or the weird ones are exposed by nonzero bits "optimizations" >>> of constants. >> IIRC all the low order bitmask handling for pointers was primarily to >> benefit the Alpha. I think over time we saw it was helpful for >> varargs/stdarg, but that's about it. I'd be surprised if it's all that >> important to dig deeply into addresses of this form. > > The whole find_base_{value,term} thing is most important for > stack accesses which we otherwise can't handle well in aliasing > and code effects mostly materialize in DSE. See my analysis > in PR49330. But the code is really broken and we're clawing > to the extra DSE in exchange for these wrong-code issues... :/ I'd rather fix the wrong-code issues :-) Note that REG_POINTER should already be conservative, if it isn't, then that needs to be fixed. Some backends certainly depend on anything with REG_POINTER actually being a valid pointer. For this RTX in c#18: > (plus:DI (reg:DI 83 [ d.0_2 ]) > (symbol_ref:DI ("y") [flags 0x2] <var_decl 0x7ffff7fefb40 y>)) I don't see how (reg 83) could ever be marked as REG_POINTER. It's an index, not a pointer. I would _not_ expect the result (assuming its stored in a REG) to be marked with REG_POINTER either since we don't know if the addition of (reg 83) creates a result outside the object (without additional information not in the BZ). Jeff
On Wed, 20 Nov 2019, Jeff Law wrote: > On 11/19/19 12:42 AM, Richard Biener wrote: > > On Mon, 18 Nov 2019, Jeff Law wrote: > > > >> On 11/18/19 3:37 AM, Richard Biener wrote: > >>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: > >>> > >>>> On Mon, Nov 18, 2019 at 11:07:22AM +0100, Richard Biener wrote: > >>>>> On Mon, 18 Nov 2019, Jakub Jelinek wrote: > >>>>> > >>>>>> On Mon, Nov 18, 2019 at 10:44:14AM +0100, Richard Biener wrote: > >>>>>>> The following adjusts the find_base_{term,value} heuristic when > >>>>>>> looking through ANDs to the case where it is obviously not > >>>>>>> aligning a base (when the LSB is set). > >>>>>> > >>>>>> What is specific about the LSB? I mean & 2 is obviously not an aligning > >>>>>> AND either. > >>>>> > >>>>> It aligns 0x1 and 0x3 ;) > >>>>> > >>>>>> Shouldn't we recurse only if the CONST_INT_P operand has > >>>>>> some set bits followed by clear bits, i.e. after the != 0 check > >>>>>> compute ctz_hwi and verify that INTVAL >> ctz == -1? > >>>>> > >>>>> I thought of more advanced heuristics but I guess that > >>>>> any contiguous set of bits with LSB unset might be aligning > >>>>> if the programmer knows upper bits are zero anyways? So I fear > >>>>> the == -1 test would not work reliable? > >>>> > >>>> I'd say once a user does & 0x1ff8 or similar, he is doing something weird, > >>>> and not recursing is the conservatively correct answer (or maybe it isn't? > >>>> Aren't there cases where we punt if from a binary op we get base terms from > >>>> both sides and just use one if we get it only from one side? If so, > >>>> we might need to have some kind of annotated return, this could have a base > >>>> term but please don't use it). > >>> > >>> Yes, we might run into the "wrong" one via binary op handling, so > >>> there isn't really a conservative answer here :/ > >>> > >>>> I guess the only non-weird case would be if the target has weird pointer > >>>> sizes and only has larger or smaller ints, say 24-bit pointer, and 8/16/32 > >>>> integers, so the aligning then could be & 0xfffff8 etc. > >>> > >>> Yeah. Or the weird ones are exposed by nonzero bits "optimizations" > >>> of constants. > >> IIRC all the low order bitmask handling for pointers was primarily to > >> benefit the Alpha. I think over time we saw it was helpful for > >> varargs/stdarg, but that's about it. I'd be surprised if it's all that > >> important to dig deeply into addresses of this form. > > > > The whole find_base_{value,term} thing is most important for > > stack accesses which we otherwise can't handle well in aliasing > > and code effects mostly materialize in DSE. See my analysis > > in PR49330. But the code is really broken and we're clawing > > to the extra DSE in exchange for these wrong-code issues... :/ > I'd rather fix the wrong-code issues :-) > > Note that REG_POINTER should already be conservative, if it isn't, then > that needs to be fixed. Some backends certainly depend on anything with > REG_POINTER actually being a valid pointer. > > For this RTX in c#18: > > > (plus:DI (reg:DI 83 [ d.0_2 ]) > > (symbol_ref:DI ("y") [flags 0x2] <var_decl 0x7ffff7fefb40 y>)) > > I don't see how (reg 83) could ever be marked as REG_POINTER. It's an > index, not a pointer. I would _not_ expect the result (assuming its > stored in a REG) to be marked with REG_POINTER either since we don't > know if the addition of (reg 83) creates a result outside the object > (without additional information not in the BZ). Sure, but we're not being called with this but with RTL expanded via cselib usually: (plus:SI (plus:SI (and:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) (const_int 3 [0x3])) (value:SI 8:8 @0x3089c18/0x30f5e40)) (const_int -8 [0xfffffffffffffff8])) so there are no REGs involved on the "subexpressions" we could check REG_POINTER on ... and in fact the symbol_ref _is_ a pointer! It's just not something we should consider the base of the memory reference. We're talking about sth like *(&a + ((uintptr_t)&b - (uintptr_t)&a)) where it's nearly impossible to tell (on RTL, if the address parts are not split to separate insns and thus REG_POINTER is applicable) what object we are looking at for aliasing purposes. The appropriate way here is to not to too clever things with the address but instead rely on MEM_EXPR here - unfortunately that is non-existent for spill slots which is where most of the regressions with removing the code appear. Richard.
On 11/21/19 1:00 AM, Richard Biener wrote: > On Wed, 20 Nov 2019, Jeff Law wrote: > > Sure, but we're not being called with this but with RTL > expanded via cselib usually: > > (plus:SI (plus:SI (and:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) > (const_int 3 [0x3])) > (value:SI 8:8 @0x3089c18/0x30f5e40)) > (const_int -8 [0xfffffffffffffff8])) Ugh. I'm suspect all the ANCHOR stuff came in well after the original code to try and optimize the low-order bit-logical masking for addressing. Without knowing something more, I'd claim that can't actually be a pointer. > > so there are no REGs involved on the "subexpressions" we could > check REG_POINTER on ... and in fact the symbol_ref _is_ > a pointer! It's just not something we should consider the > base of the memory reference. We're talking about sth like > > *(&a + ((uintptr_t)&b - (uintptr_t)&a)) > > where it's nearly impossible to tell (on RTL, if the address > parts are not split to separate insns and thus REG_POINTER is > applicable) what object we are looking at for aliasing purposes. Right. And in that case we have to make the most conservative assumptions. > > The appropriate way here is to not to too clever things with > the address but instead rely on MEM_EXPR here - unfortunately > that is non-existent for spill slots which is where most of > the regressions with removing the code appear. That's what doesn't make much sense to me. We ought to be able to analyze stack slots reasonably well. But the RTL DSE bits have changed a lot since Christian's implementation -- it actually used to be easy to follow. jeff
On Fri, 22 Nov 2019, Jeff Law wrote: > On 11/21/19 1:00 AM, Richard Biener wrote: > > On Wed, 20 Nov 2019, Jeff Law wrote: > > > > Sure, but we're not being called with this but with RTL > > expanded via cselib usually: > > > > (plus:SI (plus:SI (and:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) > > (const_int 3 [0x3])) > > (value:SI 8:8 @0x3089c18/0x30f5e40)) > > (const_int -8 [0xfffffffffffffff8])) > Ugh. I'm suspect all the ANCHOR stuff came in well after the original > code to try and optimize the low-order bit-logical masking for addressing. The PR also shows an example w/o the ANCHOR. > Without knowing something more, I'd claim that can't actually be a pointer. > > > > > so there are no REGs involved on the "subexpressions" we could > > check REG_POINTER on ... and in fact the symbol_ref _is_ > > a pointer! It's just not something we should consider the > > base of the memory reference. We're talking about sth like > > > > *(&a + ((uintptr_t)&b - (uintptr_t)&a)) > > > > where it's nearly impossible to tell (on RTL, if the address > > parts are not split to separate insns and thus REG_POINTER is > > applicable) what object we are looking at for aliasing purposes. > Right. And in that case we have to make the most conservative > assumptions. Right. > > > > > The appropriate way here is to not to too clever things with > > the address but instead rely on MEM_EXPR here - unfortunately > > that is non-existent for spill slots which is where most of > > the regressions with removing the code appear. > That's what doesn't make much sense to me. We ought to be able to > analyze stack slots reasonably well. But the RTL DSE bits have changed > a lot since Christian's implementation -- it actually used to be easy > to follow. So alias analysis of stack slots vs. non-stack slots relies on finding some non-stack "base" in find_base_term, that's basically saying "yeah, this MEM cannot alias [this] stack slot". But we have to get those other MEMs out of the way to DSE stores to spill slots. It looks like MEM_EXPR (mem) == get_spill_slot_decl (false) should handle this somewhat but I guess that spill slot decl is aliased (marked as address taken) and thus that MEM_EXPR doesn't help us very much, plus DSE is happily throwing away MEM_EXPRs in its meta it throws to the alias oracle. Richard.
Index: gcc/alias.c =================================================================== --- gcc/alias.c (revision 278293) +++ gcc/alias.c (working copy) @@ -1464,9 +1464,11 @@ find_base_value (rtx src) return find_base_value (XEXP (src, 1)); case AND: - /* If the second operand is constant set the base - address to the first operand. */ - if (CONST_INT_P (XEXP (src, 1)) && INTVAL (XEXP (src, 1)) != 0) + /* Look through aligning ANDs. And AND with zero or one with + the LSB set isn't one (see for example PR92462). */ + if (CONST_INT_P (XEXP (src, 1)) + && INTVAL (XEXP (src, 1)) != 0 + && (INTVAL (XEXP (src, 1)) & 1) == 0) return find_base_value (XEXP (src, 0)); return 0; @@ -2024,7 +2026,11 @@ find_base_term (rtx x, vec<std::pair<cse } case AND: - if (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0) + /* Look through aligning ANDs. And AND with zero or one with + the LSB set isn't one (see for example PR92462). */ + if (CONST_INT_P (XEXP (x, 1)) + && INTVAL (XEXP (x, 1)) != 0 + && (INTVAL (XEXP (x, 1)) & 1) == 0) return find_base_term (XEXP (x, 0), visited_vals); return 0;