Message ID | 87in5a2odu.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | Fold pointer range checks with equal spans | expand |
On Fri, 20 Jul 2018, Richard Sandiford wrote: > --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 > +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 > @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > + > +/* For pointers @0 and @2 and unsigned constant offset @1, look for > + expressions like: > + > + A: (@0 + @1 < @2) | (@2 + @1 < @0) > + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > + > + If pointers are known not to wrap, B checks whether @1 bytes starting at > + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > + A is more efficiently tested as: > + > + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > + > + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > + with @1 - 1. */ > +(for ior (truth_orif truth_or bit_ior) > + (for cmp (le lt) > + (simplify > + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) > + (cmp (pointer_plus:s @2 @1) @0)) Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some care with "cmp == LE_EXPR" below) Do you want :s on cmp as well? > + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) Don't you want TYPE_OVERFLOW_UNDEFINED? > + /* Convert the B form to the A form. */ > + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); } > + /* Always fails for negative values. */ > + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) > + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let > + tree_swap_operands_p pick a canonical order. */ > + (with { tree ptr1 = @0, ptr2 = @2; > + if (tree_swap_operands_p (ptr1, ptr2)) > + std::swap (ptr1, ptr2); } > + (gt (plus (minus (convert:sizetype { ptr1; }) > + (convert:sizetype { ptr2; })) > + { wide_int_to_tree (sizetype, off); }) > + { wide_int_to_tree (sizetype, off * 2); }))))))))
Marc Glisse <marc.glisse@inria.fr> writes: > On Fri, 20 Jul 2018, Richard Sandiford wrote: > >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (if (inverse_conditions_p (@0, @2) >> && element_precision (type) == element_precision (op_type)) >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> + >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> + expressions like: >> + >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> + >> + If pointers are known not to wrap, B checks whether @1 bytes starting at >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. >> + A is more efficiently tested as: >> + >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> + >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> + with @1 - 1. */ >> +(for ior (truth_orif truth_or bit_ior) >> + (for cmp (le lt) >> + (simplify >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) >> + (cmp (pointer_plus:s @2 @1) @0)) > > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some > care with "cmp == LE_EXPR" below) > Do you want :s on cmp as well? > >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) > > Don't you want TYPE_OVERFLOW_UNDEFINED? Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, since the generated code sets cmp to LE_EXPR when matching GE_EXPR. Tested as before. OK to install? Richard 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> gcc/ * match.pd: Optimise pointer range checks. gcc/testsuite/ * gcc.dg/pointer-range-check-1.c: New test. * gcc.dg/pointer-range-check-2.c: Likewise. Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) + +/* For pointers @0 and @2 and unsigned constant offset @1, look for + expressions like: + + A: (@0 + @1 < @2) | (@2 + @1 < @0) + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) + + If pointers are known not to wrap, B checks whether @1 bytes starting at + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. + A is more efficiently tested as: + + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) + + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced + with @1 - 1. */ +(for ior (truth_orif truth_or bit_ior) + (for cmp (le lt) + (simplify + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) + (cmp:cs (pointer_plus:s @2 @1) @0)) + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) + /* Convert the B form to the A form. */ + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); } + /* Always fails for negative values. */ + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let + tree_swap_operands_p pick a canonical order. */ + (with { tree ptr1 = @0, ptr2 = @2; + if (tree_swap_operands_p (ptr1, ptr2)) + std::swap (ptr1, ptr2); } + (gt (plus (minus (convert:sizetype { ptr1; }) + (convert:sizetype { ptr2; })) + { wide_int_to_tree (sizetype, off); }) + { wide_int_to_tree (sizetype, off * 2); })))))))) Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c =================================================================== --- /dev/null 2018-07-09 14:52:09.234750850 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23 15:58:33.480269844 +0100 @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ + +/* All four functions should be folded to: + + ((sizetype) a - (sizetype) b + 15) < 30. */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c =================================================================== --- /dev/null 2018-07-09 14:52:09.234750850 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23 15:58:33.480269844 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Marc Glisse <marc.glisse@inria.fr> writes: > > On Fri, 20 Jul 2018, Richard Sandiford wrote: > > > >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 > >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 > >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (if (inverse_conditions_p (@0, @2) > >> && element_precision (type) == element_precision (op_type)) > >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > >> + > >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for > >> + expressions like: > >> + > >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) > >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > >> + > >> + If pointers are known not to wrap, B checks whether @1 bytes starting at > >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > >> + A is more efficiently tested as: > >> + > >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > >> + > >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > >> + with @1 - 1. */ > >> +(for ior (truth_orif truth_or bit_ior) > >> + (for cmp (le lt) > >> + (simplify > >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) > >> + (cmp (pointer_plus:s @2 @1) @0)) > > > > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some > > care with "cmp == LE_EXPR" below) > > Do you want :s on cmp as well? > > > >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) > > > > Don't you want TYPE_OVERFLOW_UNDEFINED? > > Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, > since the generated code sets cmp to LE_EXPR when matching GE_EXPR. > > Tested as before. OK to install? > > Richard > > > 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> > > gcc/ > * match.pd: Optimise pointer range checks. > > gcc/testsuite/ > * gcc.dg/pointer-range-check-1.c: New test. > * gcc.dg/pointer-range-check-2.c: Likewise. > > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 > +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 > @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > + > +/* For pointers @0 and @2 and unsigned constant offset @1, look for > + expressions like: > + > + A: (@0 + @1 < @2) | (@2 + @1 < @0) > + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > + > + If pointers are known not to wrap, B checks whether @1 bytes starting at > + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > + A is more efficiently tested as: > + > + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > + > + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > + with @1 - 1. */ > +(for ior (truth_orif truth_or bit_ior) > + (for cmp (le lt) > + (simplify > + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) > + (cmp:cs (pointer_plus:s @2 @1) @0)) > + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. > + /* Convert the B form to the A form. */ > + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); } > + /* Always fails for negative values. */ > + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) > + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let > + tree_swap_operands_p pick a canonical order. */ gimple_resimplify takes care of that - well, it doesn't since minus isn't commutative... I guess you get better CSE later when doing this thus ok, but it does look a bit off here ;) I think you shouldn't use 'sizetype' without guarding this whole thing with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). Since the original IL performs an ordered compare of two eventually unrelated pointers (or pointers adjusted in a way that crosses object boundaries) (uh... ;)) I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype conversion? Since POINTER_PLUS_EXPR offsets are supposed to be interpreted "signed" that should also handle negative offsets correctly then? Also, when @2 == @0 + (@1+1) then the original condition is true but ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 -> -1 > @1 * 2 which is false. So I can't really see how you can apply this transform in general (it would be fine for generating alias checks but a bit more pessimistic). But maybe I am missing something? Richard. > + (with { tree ptr1 = @0, ptr2 = @2; > + if (tree_swap_operands_p (ptr1, ptr2)) > + std::swap (ptr1, ptr2); } > + (gt (plus (minus (convert:sizetype { ptr1; }) > + (convert:sizetype { ptr2; })) > + { wide_int_to_tree (sizetype, off); }) > + { wide_int_to_tree (sizetype, off * 2); })))))))) > Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c > =================================================================== > --- /dev/null 2018-07-09 14:52:09.234750850 +0100 > +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23 15:58:33.480269844 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ > + > +/* All four functions should be folded to: > + > + ((sizetype) a - (sizetype) b + 15) < 30. */ > + > +_Bool > +f1 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f2 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +_Bool > +f3 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f4 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ > Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c > =================================================================== > --- /dev/null 2018-07-09 14:52:09.234750850 +0100 > +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23 15:58:33.480269844 +0100 > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ > + > +_Bool > +f1 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f2 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +_Bool > +f3 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f4 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
[Sorry, somehow missed this till now] Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Marc Glisse <marc.glisse@inria.fr> writes: >> > On Fri, 20 Jul 2018, Richard Sandiford wrote: >> > >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> >> (if (inverse_conditions_p (@0, @2) >> >> && element_precision (type) == element_precision (op_type)) >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> >> + >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> >> + expressions like: >> >> + >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> >> + >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. >> >> + A is more efficiently tested as: >> >> + >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> >> + >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> >> + with @1 - 1. */ >> >> +(for ior (truth_orif truth_or bit_ior) >> >> + (for cmp (le lt) >> >> + (simplify >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) >> >> + (cmp (pointer_plus:s @2 @1) @0)) >> > >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some >> > care with "cmp == LE_EXPR" below) >> > Do you want :s on cmp as well? >> > >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) >> > >> > Don't you want TYPE_OVERFLOW_UNDEFINED? >> >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR. >> >> Tested as before. OK to install? >> >> Richard >> >> >> 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> >> >> gcc/ >> * match.pd: Optimise pointer range checks. >> >> gcc/testsuite/ >> * gcc.dg/pointer-range-check-1.c: New test. >> * gcc.dg/pointer-range-check-2.c: Likewise. >> >> Index: gcc/match.pd >> =================================================================== >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (if (inverse_conditions_p (@0, @2) >> && element_precision (type) == element_precision (op_type)) >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> + >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> + expressions like: >> + >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> + >> + If pointers are known not to wrap, B checks whether @1 bytes starting at >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. >> + A is more efficiently tested as: >> + >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> + >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> + with @1 - 1. */ >> +(for ior (truth_orif truth_or bit_ior) >> + (for cmp (le lt) >> + (simplify >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) >> + (cmp:cs (pointer_plus:s @2 @1) @0)) >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) > > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. This was there because we're converting pointer arithmetic into integer arithmetic, and -ftrapv could cause that new integer code to trap. TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined, but it seemed bad form to make it actually trap, especially when the above does some reassociation too. >> + /* Convert the B form to the A form. */ >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : >> 0); } >> + /* Always fails for negative values. */ >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION >> (sizetype)) >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let >> + tree_swap_operands_p pick a canonical order. */ > > gimple_resimplify takes care of that - well, it doesn't since minus isn't > commutative... I guess you get better CSE later when doing this thus ok, > but it does look a bit off here ;) > > I think you shouldn't use 'sizetype' without guarding this whole thing > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). OK, hadn't realised that could be false. Would building the appropriate unsigned type be OK without the condition, or does it need to be sizetype? > Since the original IL performs an ordered compare of two eventually unrelated > pointers (or pointers adjusted in a way that crosses object > boundaries) (uh... ;)) > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be > interpreted "signed" that should also handle negative offsets correctly then? The transform isn't valid when @1 is negative though, so I think we need that check either way. I could use POINTER_DIFF_EXPR and convert to unsigned though, if that seems better. That might also allow us to CSE the retained POINTER_PLUS_EXPRs. > Also, when @2 == @0 + (@1+1) then the original condition is true but > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 > -> -1 > @1 * 2 > > which is false. So I can't really see how you can apply this transform in > general (it would be fine for generating alias checks but a bit more > pessimistic). > > But maybe I am missing something? It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. Thanks, Richard > Richard. > >> + (with { tree ptr1 = @0, ptr2 = @2; >> + if (tree_swap_operands_p (ptr1, ptr2)) >> + std::swap (ptr1, ptr2); } >> + (gt (plus (minus (convert:sizetype { ptr1; }) >> + (convert:sizetype { ptr2; })) >> + { wide_int_to_tree (sizetype, off); }) >> + { wide_int_to_tree (sizetype, off * 2); })))))))) >> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c >> =================================================================== >> --- /dev/null 2018-07-09 14:52:09.234750850 +0100 >> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23 >> 15:58:33.480269844 +0100 >> @@ -0,0 +1,37 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ >> + >> +/* All four functions should be folded to: >> + >> + ((sizetype) a - (sizetype) b + 15) < 30. */ >> + >> +_Bool >> +f1 (char *a, char *b) >> +{ >> + return (a + 16 <= b) || (b + 16 <= a); >> +} >> + >> +_Bool >> +f2 (char *a, char *b) >> +{ >> + return (a + 15 < b) || (b + 15 < a); >> +} >> + >> +_Bool >> +f3 (char *a, char *b) >> +{ >> + return (a + 16 <= b) || (b + 16 <= a); >> +} >> + >> +_Bool >> +f4 (char *a, char *b) >> +{ >> + return (a + 15 < b) || (b + 15 < a); >> +} >> + >> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 >> "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 >> "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 >> "optimized" } } */ >> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ >> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c >> =================================================================== >> --- /dev/null 2018-07-09 14:52:09.234750850 +0100 >> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23 >> 15:58:33.480269844 +0100 >> @@ -0,0 +1,31 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer >> -fdump-tree-optimized" } */ >> + >> +_Bool >> +f1 (char *a, char *b) >> +{ >> + return (a + 16 <= b) || (b + 16 <= a); >> +} >> + >> +_Bool >> +f2 (char *a, char *b) >> +{ >> + return (a + 15 < b) || (b + 15 < a); >> +} >> + >> +_Bool >> +f3 (char *a, char *b) >> +{ >> + return (a + 16 <= b) || (b + 16 <= a); >> +} >> + >> +_Bool >> +f4 (char *a, char *b) >> +{ >> + return (a + 15 < b) || (b + 15 < a); >> +} >> + >> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 >> "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > [Sorry, somehow missed this till now] > > Richard Biener <richard.guenther@gmail.com> writes: > > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Marc Glisse <marc.glisse@inria.fr> writes: > >> > On Fri, 20 Jul 2018, Richard Sandiford wrote: > >> > > >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 > >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 > >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> >> (if (inverse_conditions_p (@0, @2) > >> >> && element_precision (type) == element_precision (op_type)) > >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > >> >> + > >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for > >> >> + expressions like: > >> >> + > >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) > >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > >> >> + > >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at > >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > >> >> + A is more efficiently tested as: > >> >> + > >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > >> >> + > >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > >> >> + with @1 - 1. */ > >> >> +(for ior (truth_orif truth_or bit_ior) > >> >> + (for cmp (le lt) > >> >> + (simplify > >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) > >> >> + (cmp (pointer_plus:s @2 @1) @0)) > >> > > >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some > >> > care with "cmp == LE_EXPR" below) > >> > Do you want :s on cmp as well? > >> > > >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) > >> > > >> > Don't you want TYPE_OVERFLOW_UNDEFINED? > >> > >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, > >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR. > >> > >> Tested as before. OK to install? > >> > >> Richard > >> > >> > >> 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> > >> > >> gcc/ > >> * match.pd: Optimise pointer range checks. > >> > >> gcc/testsuite/ > >> * gcc.dg/pointer-range-check-1.c: New test. > >> * gcc.dg/pointer-range-check-2.c: Likewise. > >> > >> Index: gcc/match.pd > >> =================================================================== > >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 > >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 > >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (if (inverse_conditions_p (@0, @2) > >> && element_precision (type) == element_precision (op_type)) > >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > >> + > >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for > >> + expressions like: > >> + > >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) > >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > >> + > >> + If pointers are known not to wrap, B checks whether @1 bytes starting at > >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > >> + A is more efficiently tested as: > >> + > >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > >> + > >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > >> + with @1 - 1. */ > >> +(for ior (truth_orif truth_or bit_ior) > >> + (for cmp (le lt) > >> + (simplify > >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) > >> + (cmp:cs (pointer_plus:s @2 @1) @0)) > >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) > > > > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. > > This was there because we're converting pointer arithmetic into integer > arithmetic, and -ftrapv could cause that new integer code to trap. > TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined, > but it seemed bad form to make it actually trap, especially when the > above does some reassociation too. Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type you are doing the arithmetic on (sizetype) instead? > > >> + /* Convert the B form to the A form. */ > >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : > >> 0); } > >> + /* Always fails for negative values. */ > >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION > >> (sizetype)) > >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let > >> + tree_swap_operands_p pick a canonical order. */ > > > > gimple_resimplify takes care of that - well, it doesn't since minus isn't > > commutative... I guess you get better CSE later when doing this thus ok, > > but it does look a bit off here ;) > > > > I think you shouldn't use 'sizetype' without guarding this whole thing > > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). > > OK, hadn't realised that could be false. Would building the appropriate > unsigned type be OK without the condition, or does it need to be sizetype? That would probably work but it might be dead slow on those targets that choose to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit sizetype because there's no arithmetic for 24bit but in offsetted addressing). So the transform is likely not a win for those. > > Since the original IL performs an ordered compare of two eventually unrelated > > pointers (or pointers adjusted in a way that crosses object > > boundaries) (uh... ;)) > > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype > > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be > > interpreted "signed" that should also handle negative offsets correctly then? > > The transform isn't valid when @1 is negative though, so I think we > need that check either way. + /* Always fails for negative values. */ + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) shouldn't that be wi::min_precision (off * 2, UNSIGNED) <= TYPE_PRECISION (sizetype)? > I could use POINTER_DIFF_EXPR and convert to unsigned though, if that > seems better. That might also allow us to CSE the retained > POINTER_PLUS_EXPRs. Yeah, that sounds better then. > > Also, when @2 == @0 + (@1+1) then the original condition is true but > > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? > > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 > > -> -1 > @1 * 2 > > > > which is false. So I can't really see how you can apply this transform in > > general (it would be fine for generating alias checks but a bit more > > pessimistic). > > > > But maybe I am missing something? > > It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. Hmm, so mathematically this is (@0 - @2) % modreduce + @1 > @1 * 2 then, but I don't want to think too much about this since Marc didn't object here ;) Thanks, Richard. > Thanks, > Richard > > > Richard. > > > >> + (with { tree ptr1 = @0, ptr2 = @2; > >> + if (tree_swap_operands_p (ptr1, ptr2)) > >> + std::swap (ptr1, ptr2); } > >> + (gt (plus (minus (convert:sizetype { ptr1; }) > >> + (convert:sizetype { ptr2; })) > >> + { wide_int_to_tree (sizetype, off); }) > >> + { wide_int_to_tree (sizetype, off * 2); })))))))) > >> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c > >> =================================================================== > >> --- /dev/null 2018-07-09 14:52:09.234750850 +0100 > >> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23 > >> 15:58:33.480269844 +0100 > >> @@ -0,0 +1,37 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ > >> + > >> +/* All four functions should be folded to: > >> + > >> + ((sizetype) a - (sizetype) b + 15) < 30. */ > >> + > >> +_Bool > >> +f1 (char *a, char *b) > >> +{ > >> + return (a + 16 <= b) || (b + 16 <= a); > >> +} > >> + > >> +_Bool > >> +f2 (char *a, char *b) > >> +{ > >> + return (a + 15 < b) || (b + 15 < a); > >> +} > >> + > >> +_Bool > >> +f3 (char *a, char *b) > >> +{ > >> + return (a + 16 <= b) || (b + 16 <= a); > >> +} > >> + > >> +_Bool > >> +f4 (char *a, char *b) > >> +{ > >> + return (a + 15 < b) || (b + 15 < a); > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 > >> "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 > >> "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 > >> "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ > >> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c > >> =================================================================== > >> --- /dev/null 2018-07-09 14:52:09.234750850 +0100 > >> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23 > >> 15:58:33.480269844 +0100 > >> @@ -0,0 +1,31 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer > >> -fdump-tree-optimized" } */ > >> + > >> +_Bool > >> +f1 (char *a, char *b) > >> +{ > >> + return (a + 16 <= b) || (b + 16 <= a); > >> +} > >> + > >> +_Bool > >> +f2 (char *a, char *b) > >> +{ > >> + return (a + 15 < b) || (b + 15 < a); > >> +} > >> + > >> +_Bool > >> +f3 (char *a, char *b) > >> +{ > >> + return (a + 16 <= b) || (b + 16 <= a); > >> +} > >> + > >> +_Bool > >> +f4 (char *a, char *b) > >> +{ > >> + return (a + 15 < b) || (b + 15 < a); > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 > >> "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
On Tue, 31 Jul 2018, Richard Biener wrote: >>> Also, when @2 == @0 + (@1+1) then the original condition is true but >>> ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? >>> (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 >>> -> -1 > @1 * 2 >>> >>> which is false. So I can't really see how you can apply this transform in >>> general (it would be fine for generating alias checks but a bit more >>> pessimistic). >>> >>> But maybe I am missing something? >> >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. > > Hmm, so mathematically this is > > (@0 - @2) % modreduce + @1 > @1 * 2 > > then, but I don't want to think too much about this since Marc didn't > object here ;) We already transform abs(x)<=3 into (unsigned)x+3u<=6u, that's the usual way we do range checking so I didn't pay much attention to that part. (tempted to say: "I didn't want to think too much about this since Richard was going to do it anyway" ;-) Turning multiple comparisons of the form P + cst CMP Q + cst into a range check on P - Q sounds good (we don't really have to restrict to the case where the range is symmetric). Actually, just turning P + cst CMP Q + cst into P - Q CMP cst should do it, we should already have code to handle range checking on integers (modulo the issue of CSE P-Q and Q-P). But I don't know if a couple :s is sufficient to make this transformation a good canonicalization. If we start from a comparison of pointer_plus, I think it would make sense to use pointer_diff. I believe currently we try to use pointer operations (pointer_plus, pointer_diff, lt) only for related pointers and we cast to some integer type for wilder cases (implementation of std::less in C++ for instance). On the other hand, in an alias check, the 2 pointers are possibly unrelated, so maybe the code emitted for an alias check should be changed.
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> [Sorry, somehow missed this till now] >> >> Richard Biener <richard.guenther@gmail.com> writes: >> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford >> > <richard.sandiford@arm.com> wrote: >> >> >> >> Marc Glisse <marc.glisse@inria.fr> writes: >> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote: >> >> > >> >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 >> >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 >> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> >> >> (if (inverse_conditions_p (@0, @2) >> >> >> && element_precision (type) == element_precision (op_type)) >> >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> >> >> + >> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> >> >> + expressions like: >> >> >> + >> >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> >> >> + >> >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at >> >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. >> >> >> + A is more efficiently tested as: >> >> >> + >> >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> >> >> + >> >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> >> >> + with @1 - 1. */ >> >> >> +(for ior (truth_orif truth_or bit_ior) >> >> >> + (for cmp (le lt) >> >> >> + (simplify >> >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) >> >> >> + (cmp (pointer_plus:s @2 @1) @0)) >> >> > >> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may >> >> > need some >> >> > care with "cmp == LE_EXPR" below) >> >> > Do you want :s on cmp as well? >> >> > >> >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) >> >> > >> >> > Don't you want TYPE_OVERFLOW_UNDEFINED? >> >> >> >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, >> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR. >> >> >> >> Tested as before. OK to install? >> >> >> >> Richard >> >> >> >> >> >> 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> >> >> >> >> gcc/ >> >> * match.pd: Optimise pointer range checks. >> >> >> >> gcc/testsuite/ >> >> * gcc.dg/pointer-range-check-1.c: New test. >> >> * gcc.dg/pointer-range-check-2.c: Likewise. >> >> >> >> Index: gcc/match.pd >> >> =================================================================== >> >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 >> >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> >> (if (inverse_conditions_p (@0, @2) >> >> && element_precision (type) == element_precision (op_type)) >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) >> >> + >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for >> >> + expressions like: >> >> + >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) >> >> + >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. >> >> + A is more efficiently tested as: >> >> + >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) >> >> + >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced >> >> + with @1 - 1. */ >> >> +(for ior (truth_orif truth_or bit_ior) >> >> + (for cmp (le lt) >> >> + (simplify >> >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) >> >> + (cmp:cs (pointer_plus:s @2 @1) @0)) >> >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) >> > >> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. >> >> This was there because we're converting pointer arithmetic into integer >> arithmetic, and -ftrapv could cause that new integer code to trap. >> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined, >> but it seemed bad form to make it actually trap, especially when the >> above does some reassociation too. > > Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type > you are doing the arithmetic on (sizetype) instead? OK, done below. >> >> + /* Convert the B form to the A form. */ >> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : >> >> 0); } >> >> + /* Always fails for negative values. */ >> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION >> >> (sizetype)) >> >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let >> >> + tree_swap_operands_p pick a canonical order. */ >> > >> > gimple_resimplify takes care of that - well, it doesn't since minus isn't >> > commutative... I guess you get better CSE later when doing this thus ok, >> > but it does look a bit off here ;) >> > >> > I think you shouldn't use 'sizetype' without guarding this whole thing >> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). >> >> OK, hadn't realised that could be false. Would building the appropriate >> unsigned type be OK without the condition, or does it need to be sizetype? > > That would probably work but it might be dead slow on those targets that choose > to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit > sizetype because there's no arithmetic for 24bit but in offsetted addressing). > So the transform is likely not a win for those. Ah, yeah. The version below adds the == test. >> > Since the original IL performs an ordered compare of two eventually >> > unrelated >> > pointers (or pointers adjusted in a way that crosses object >> > boundaries) (uh... ;)) >> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype >> > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be >> > interpreted "signed" that should also handle negative offsets >> > correctly then? >> >> The transform isn't valid when @1 is negative though, so I think we >> need that check either way. > > + /* Always fails for negative values. */ > + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) > > shouldn't that be wi::min_precision (off * 2, UNSIGNED) <= > TYPE_PRECISION (sizetype)? Gah, yes :-( >> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that >> seems better. That might also allow us to CSE the retained >> POINTER_PLUS_EXPRs. > > Yeah, that sounds better then. And the CSEing did happen, so we now get the "ideal" sequence for the original alias checks: add x0, x0, 15 sub x6, x0, x1 sub x5, x0, x2 cmp x6, 30 ccmp x5, 30, 0, hi whereas the original patch had an extra add. >> > Also, when @2 == @0 + (@1+1) then the original condition is true but >> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? >> > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 >> > -> -1 > @1 * 2 >> > >> > which is false. So I can't really see how you can apply this transform in >> > general (it would be fine for generating alias checks but a bit more >> > pessimistic). >> > >> > But maybe I am missing something? >> >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. > > Hmm, so mathematically this is > > (@0 - @2) % modreduce + @1 > @1 * 2 Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2 (and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false while we want it to be true. But... > then, but I don't want to think too much about this since Marc didn't > object here ;) ...we start out converting the range check for A to: !(-@1 <= @0 - @2 <= @1) i.e. @1 + 1 bytes at @0 and @2 are free from overlap. Then we just apply the usual trick: A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B Substituting @0 - @2 for X, -@1 for A and @1 for B gives: !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2)) -> (unsigned) (@0 - @2 + @1) > (unsigned) (@1 * 2) Then the B case is just an off-by-one variant. Patch tested as before. Thanks, Richard 2018-08-01 Richard Sandiford <richard.sandiford@arm.com> gcc/ * match.pd: Optimise pointer range checks. gcc/testsuite/ * gcc.dg/pointer-range-check-1.c: New test. * gcc.dg/pointer-range-check-2.c: Likewise. Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-07-24 19:04:14.985363724 +0100 +++ gcc/match.pd 2018-08-01 10:55:10.695766411 +0100 @@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) + +/* For pointers @0 and @2 and nonnegative constant offset @1, look for + expressions like: + + A: (@0 + @1 < @2) | (@2 + @1 < @0) + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) + + If pointers are known not to wrap, B checks whether @1 bytes starting + at @0 and @2 do not overlap, while A tests the same thing for @1 + 1 + bytes. A is more efficiently tested as: + + A: (sizetype) (@0 + @1 - @2) > @1 * 2 + + The equivalent expression for B is given by replacing @1 with @1 - 1: + + B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2 + + @0 and @2 can be swapped in both expressions without changing the result. + + The folds rely on sizetype's being unsigned (which is always true) + and on its being the same width as the pointer (which we have to check). + + The fold replaces two pointer_plus expressions, two comparisons and + an IOR with a pointer_plus, a pointer_diff, and a comparison, so in + the best case it's a saving of two operations. The A fold retains one + of the original pointer_pluses, so is a win even if both pointer_pluses + are used elsewhere. The B fold is a wash if both pointer_pluses are + used elsewhere, since all we end up doing is replacing a comparison with + a pointer_plus. We do still apply the fold under those circumstances + though, in case applying it to other conditions eventually makes one of the + pointer_pluses dead. */ +(for ior (truth_orif truth_or bit_ior) + (for cmp (le lt) + (simplify + (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2) + (cmp:cs (pointer_plus@4 @2 @1) @0)) + (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (sizetype) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype)) + /* Calculate the rhs constant. */ + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); + offset_int rhs = off * 2; } + /* Always fails for negative values. */ + (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype)) + /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p + pick a canonical order. This increases the chances of using the + same pointer_plus in multiple checks. */ + (with { bool swap_p = tree_swap_operands_p (@0, @2); + tree rhs_tree = wide_int_to_tree (sizetype, rhs); } + (if (cmp == LT_EXPR) + (gt (convert:sizetype + (pointer_diff:ssizetype { swap_p ? @4 : @3; } + { swap_p ? @0 : @2; })) + { rhs_tree; }) + (gt (convert:sizetype + (pointer_diff:ssizetype + (pointer_plus { swap_p ? @2 : @0; } + { wide_int_to_tree (sizetype, off); }) + { swap_p ? @0 : @2; })) + { rhs_tree; }))))))))) Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c =================================================================== --- /dev/null 2018-07-26 10:26:13.137955424 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-08-01 10:55:10.695766411 +0100 @@ -0,0 +1,37 @@ +/* { dg-do compile { target { ilp32 || lp64 } } } */ +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ + +/* All four functions should be folded to: + + (sizetype) (a + 15 - b) < 30. */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) | (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) | (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c =================================================================== --- /dev/null 2018-07-26 10:26:13.137955424 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-08-01 10:55:10.695766411 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) | (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) | (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Marc Glisse <marc.glisse@inria.fr> writes: > On Tue, 31 Jul 2018, Richard Biener wrote: > >>>> Also, when @2 == @0 + (@1+1) then the original condition is true but >>>> ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? >>>> (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 >>>> -> -1 > @1 * 2 >>>> >>>> which is false. So I can't really see how you can apply this transform in >>>> general (it would be fine for generating alias checks but a bit more >>>> pessimistic). >>>> >>>> But maybe I am missing something? >>> >>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. >> >> Hmm, so mathematically this is >> >> (@0 - @2) % modreduce + @1 > @1 * 2 >> >> then, but I don't want to think too much about this since Marc didn't >> object here ;) > > We already transform abs(x)<=3 into (unsigned)x+3u<=6u, that's the usual > way we do range checking so I didn't pay much attention to that part. > (tempted to say: "I didn't want to think too much about this since > Richard was going to do it anyway" ;-) > > Turning multiple comparisons of the form P + cst CMP Q + cst into a > range check on P - Q sounds good (we don't really have to restrict to > the case where the range is symmetric). Actually, just turning P + cst > CMP Q + cst into P - Q CMP cst should do it, we should already have code > to handle range checking on integers (modulo the issue of CSE P-Q and > Q-P). But I don't know if a couple :s is sufficient to make this > transformation a good canonicalization. Yeah. Like you say, in the cases being handled by the patch, folding the two comparisons separately and then folding the IOR would require either (a) folding: P + cst < Q to the rather unnatural-looking: Q - P > -cst when tree_swap_operands_p (P, Q) or (b) making the range fold handle reversed pointer_diffs (which I guess makes more sense). But the fold in the patch should be either a strict win or a wash even without :s on the pointer_pluses (only realised that when doing today's respin) whereas this canonicalisation really would need the :s. So I think doing the fold on the IOR makes sense. > If we start from a comparison of pointer_plus, I think it would make > sense to use pointer_diff. > > I believe currently we try to use pointer operations (pointer_plus, > pointer_diff, lt) only for related pointers and we cast to some integer > type for wilder cases (implementation of std::less in C++ for instance). > On the other hand, in an alias check, the 2 pointers are possibly > unrelated, so maybe the code emitted for an alias check should be > changed. If we can prove that the pointers are to different objects, it would be valid to fold the check to "no alias", regardless of the constant (although in practice we should have weeded out those cases before generating the check). In that sense, relying on the UBness of comparing pointers to different objects would be fine. If there's a risk of the check being folded to "alias" when the pointers are known to point to different objects then that would be more of a problem (as a missed optimisation). Thanks, Richard
On Wed, Aug 1, 2018 at 12:25 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener <richard.guenther@gmail.com> writes: > > On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> [Sorry, somehow missed this till now] > >> > >> Richard Biener <richard.guenther@gmail.com> writes: > >> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford > >> > <richard.sandiford@arm.com> wrote: > >> >> > >> >> Marc Glisse <marc.glisse@inria.fr> writes: > >> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote: > >> >> > > >> >> >> --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 > >> >> >> +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 > >> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> >> >> (if (inverse_conditions_p (@0, @2) > >> >> >> && element_precision (type) == element_precision (op_type)) > >> >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > >> >> >> + > >> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for > >> >> >> + expressions like: > >> >> >> + > >> >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) > >> >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > >> >> >> + > >> >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at > >> >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > >> >> >> + A is more efficiently tested as: > >> >> >> + > >> >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > >> >> >> + > >> >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > >> >> >> + with @1 - 1. */ > >> >> >> +(for ior (truth_orif truth_or bit_ior) > >> >> >> + (for cmp (le lt) > >> >> >> + (simplify > >> >> >> + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) > >> >> >> + (cmp (pointer_plus:s @2 @1) @0)) > >> >> > > >> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may > >> >> > need some > >> >> > care with "cmp == LE_EXPR" below) > >> >> > Do you want :s on cmp as well? > >> >> > > >> >> >> + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) > >> >> > > >> >> > Don't you want TYPE_OVERFLOW_UNDEFINED? > >> >> > >> >> Thanks, fixed below. Think the cmp == LE_EXPR stuff is still ok with :c, > >> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR. > >> >> > >> >> Tested as before. OK to install? > >> >> > >> >> Richard > >> >> > >> >> > >> >> 2018-07-23 Richard Sandiford <richard.sandiford@arm.com> > >> >> > >> >> gcc/ > >> >> * match.pd: Optimise pointer range checks. > >> >> > >> >> gcc/testsuite/ > >> >> * gcc.dg/pointer-range-check-1.c: New test. > >> >> * gcc.dg/pointer-range-check-2.c: Likewise. > >> >> > >> >> Index: gcc/match.pd > >> >> =================================================================== > >> >> --- gcc/match.pd 2018-07-23 15:56:47.000000000 +0100 > >> >> +++ gcc/match.pd 2018-07-23 15:58:33.480269844 +0100 > >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> >> (if (inverse_conditions_p (@0, @2) > >> >> && element_precision (type) == element_precision (op_type)) > >> >> (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > >> >> + > >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for > >> >> + expressions like: > >> >> + > >> >> + A: (@0 + @1 < @2) | (@2 + @1 < @0) > >> >> + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > >> >> + > >> >> + If pointers are known not to wrap, B checks whether @1 bytes starting at > >> >> + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. > >> >> + A is more efficiently tested as: > >> >> + > >> >> + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) > >> >> + > >> >> + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced > >> >> + with @1 - 1. */ > >> >> +(for ior (truth_orif truth_or bit_ior) > >> >> + (for cmp (le lt) > >> >> + (simplify > >> >> + (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2) > >> >> + (cmp:cs (pointer_plus:s @2 @1) @0)) > >> >> + (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) > >> > > >> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv. > >> > >> This was there because we're converting pointer arithmetic into integer > >> arithmetic, and -ftrapv could cause that new integer code to trap. > >> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined, > >> but it seemed bad form to make it actually trap, especially when the > >> above does some reassociation too. > > > > Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type > > you are doing the arithmetic on (sizetype) instead? > > OK, done below. > > >> >> + /* Convert the B form to the A form. */ > >> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : > >> >> 0); } > >> >> + /* Always fails for negative values. */ > >> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION > >> >> (sizetype)) > >> >> + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let > >> >> + tree_swap_operands_p pick a canonical order. */ > >> > > >> > gimple_resimplify takes care of that - well, it doesn't since minus isn't > >> > commutative... I guess you get better CSE later when doing this thus ok, > >> > but it does look a bit off here ;) > >> > > >> > I think you shouldn't use 'sizetype' without guarding this whole thing > >> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)). > >> > >> OK, hadn't realised that could be false. Would building the appropriate > >> unsigned type be OK without the condition, or does it need to be sizetype? > > > > That would probably work but it might be dead slow on those targets that choose > > to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit > > sizetype because there's no arithmetic for 24bit but in offsetted addressing). > > So the transform is likely not a win for those. > > Ah, yeah. The version below adds the == test. > > >> > Since the original IL performs an ordered compare of two eventually > >> > unrelated > >> > pointers (or pointers adjusted in a way that crosses object > >> > boundaries) (uh... ;)) > >> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype > >> > conversion? Since POINTER_PLUS_EXPR offsets are supposed to be > >> > interpreted "signed" that should also handle negative offsets > >> > correctly then? > >> > >> The transform isn't valid when @1 is negative though, so I think we > >> need that check either way. > > > > + /* Always fails for negative values. */ > > + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) > > > > shouldn't that be wi::min_precision (off * 2, UNSIGNED) <= > > TYPE_PRECISION (sizetype)? > > Gah, yes :-( > > >> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that > >> seems better. That might also allow us to CSE the retained > >> POINTER_PLUS_EXPRs. > > > > Yeah, that sounds better then. > > And the CSEing did happen, so we now get the "ideal" sequence for the > original alias checks: > > add x0, x0, 15 > sub x6, x0, x1 > sub x5, x0, x2 > cmp x6, 30 > ccmp x5, 30, 0, hi > > whereas the original patch had an extra add. > > >> > Also, when @2 == @0 + (@1+1) then the original condition is true but > >> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not? > >> > (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2 > >> > -> -1 > @1 * 2 > >> > > >> > which is false. So I can't really see how you can apply this transform in > >> > general (it would be fine for generating alias checks but a bit more > >> > pessimistic). > >> > > >> > But maybe I am missing something? > >> > >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true. > > > > Hmm, so mathematically this is > > > > (@0 - @2) % modreduce + @1 > @1 * 2 > > Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2 > (and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false > while we want it to be true. But... > > > then, but I don't want to think too much about this since Marc didn't > > object here ;) > > ...we start out converting the range check for A to: > > !(-@1 <= @0 - @2 <= @1) > > i.e. @1 + 1 bytes at @0 and @2 are free from overlap. Then we just > apply the usual trick: > > A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B > > Substituting @0 - @2 for X, -@1 for A and @1 for B gives: > > !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2)) > -> (unsigned) (@0 - @2 + @1) > (unsigned) (@1 * 2) > > Then the B case is just an off-by-one variant. > > Patch tested as before. OK. Thanks, Richard. > Thanks, > Richard > > > 2018-08-01 Richard Sandiford <richard.sandiford@arm.com> > > gcc/ > * match.pd: Optimise pointer range checks. > > gcc/testsuite/ > * gcc.dg/pointer-range-check-1.c: New test. > * gcc.dg/pointer-range-check-2.c: Likewise. > > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-07-24 19:04:14.985363724 +0100 > +++ gcc/match.pd 2018-08-01 10:55:10.695766411 +0100 > @@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > + > +/* For pointers @0 and @2 and nonnegative constant offset @1, look for > + expressions like: > + > + A: (@0 + @1 < @2) | (@2 + @1 < @0) > + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) > + > + If pointers are known not to wrap, B checks whether @1 bytes starting > + at @0 and @2 do not overlap, while A tests the same thing for @1 + 1 > + bytes. A is more efficiently tested as: > + > + A: (sizetype) (@0 + @1 - @2) > @1 * 2 > + > + The equivalent expression for B is given by replacing @1 with @1 - 1: > + > + B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2 > + > + @0 and @2 can be swapped in both expressions without changing the result. > + > + The folds rely on sizetype's being unsigned (which is always true) > + and on its being the same width as the pointer (which we have to check). > + > + The fold replaces two pointer_plus expressions, two comparisons and > + an IOR with a pointer_plus, a pointer_diff, and a comparison, so in > + the best case it's a saving of two operations. The A fold retains one > + of the original pointer_pluses, so is a win even if both pointer_pluses > + are used elsewhere. The B fold is a wash if both pointer_pluses are > + used elsewhere, since all we end up doing is replacing a comparison with > + a pointer_plus. We do still apply the fold under those circumstances > + though, in case applying it to other conditions eventually makes one of the > + pointer_pluses dead. */ > +(for ior (truth_orif truth_or bit_ior) > + (for cmp (le lt) > + (simplify > + (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2) > + (cmp:cs (pointer_plus@4 @2 @1) @0)) > + (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (sizetype) > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype)) > + /* Calculate the rhs constant. */ > + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); > + offset_int rhs = off * 2; } > + /* Always fails for negative values. */ > + (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype)) > + /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p > + pick a canonical order. This increases the chances of using the > + same pointer_plus in multiple checks. */ > + (with { bool swap_p = tree_swap_operands_p (@0, @2); > + tree rhs_tree = wide_int_to_tree (sizetype, rhs); } > + (if (cmp == LT_EXPR) > + (gt (convert:sizetype > + (pointer_diff:ssizetype { swap_p ? @4 : @3; } > + { swap_p ? @0 : @2; })) > + { rhs_tree; }) > + (gt (convert:sizetype > + (pointer_diff:ssizetype > + (pointer_plus { swap_p ? @2 : @0; } > + { wide_int_to_tree (sizetype, off); }) > + { swap_p ? @0 : @2; })) > + { rhs_tree; }))))))))) > Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c > =================================================================== > --- /dev/null 2018-07-26 10:26:13.137955424 +0100 > +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-08-01 10:55:10.695766411 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do compile { target { ilp32 || lp64 } } } */ > +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ > + > +/* All four functions should be folded to: > + > + (sizetype) (a + 15 - b) < 30. */ > + > +_Bool > +f1 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f2 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +_Bool > +f3 (char *a, char *b) > +{ > + return (a + 16 <= b) | (b + 16 <= a); > +} > + > +_Bool > +f4 (char *a, char *b) > +{ > + return (a + 15 < b) | (b + 15 < a); > +} > + > +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ > Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c > =================================================================== > --- /dev/null 2018-07-26 10:26:13.137955424 +0100 > +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-08-01 10:55:10.695766411 +0100 > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ > + > +_Bool > +f1 (char *a, char *b) > +{ > + return (a + 16 <= b) || (b + 16 <= a); > +} > + > +_Bool > +f2 (char *a, char *b) > +{ > + return (a + 15 < b) || (b + 15 < a); > +} > + > +_Bool > +f3 (char *a, char *b) > +{ > + return (a + 16 <= b) | (b + 16 <= a); > +} > + > +_Bool > +f4 (char *a, char *b) > +{ > + return (a + 15 < b) | (b + 15 < a); > +} > + > +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
On Wed, 1 Aug 2018, Richard Sandiford wrote: > +/* For pointers @0 and @2 and nonnegative constant offset @1, look for > + expressions like: > + > + A: (@0 + @1 < @2) | (@2 + @1 < @0) > + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) Once this is in, we may want to consider the opposite: (@0 + @1 > @2) & (@2 + @1 > @0) > + /* Always fails for negative values. */ > + (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype)) I guess we could simplify to 'true' in the 'else' case but that's less interesting. >> Turning multiple comparisons of the form P + cst CMP Q + cst into a >> range check on P - Q sounds good (we don't really have to restrict to >> the case where the range is symmetric). Actually, just turning P + cst >> CMP Q + cst into P - Q CMP cst should do it, we should already have code >> to handle range checking on integers (modulo the issue of CSE P-Q and >> Q-P). But I don't know if a couple :s is sufficient to make this >> transformation a good canonicalization. > > Yeah. Like you say, in the cases being handled by the patch, folding the > two comparisons separately and then folding the IOR would require either > > (a) folding: > P + cst < Q > to the rather unnatural-looking: > Q - P > -cst > when tree_swap_operands_p (P, Q) or Is it unnatural? If it helps other optimizations, it doesn't look that bad to me ;-) One issue is if we start mixing forms because only one is single_use: @0 + @1 < @2 | @1 < @0 - @2 > (b) making the range fold handle reversed pointer_diffs (which I guess > makes more sense). It would be interesting to have a way to write: (plus @0 (opposite@1 @0)) which would match if @0 is a-b and @1 is b-a or anything similar (I don't want to repeat all the cases of negate, minus, pointer_diff, etc in each transformation), but (match (opposite (minus @0 @1)) (minus @1 @0)) is not a valid syntax. More likely we would write (plus @0 @1) (if (opposite_p (@0, @1)) with opposite_p defined outside of match.pd :-( Maybe there is a way to simulate binary predicates on @0 @1 using unary predicates on (artificial @0 @1). (looks like I forgot to add pointer_diff to negate_expr_p) >> If we start from a comparison of pointer_plus, I think it would make >> sense to use pointer_diff. >> >> I believe currently we try to use pointer operations (pointer_plus, >> pointer_diff, lt) only for related pointers and we cast to some integer >> type for wilder cases (implementation of std::less in C++ for instance). >> On the other hand, in an alias check, the 2 pointers are possibly >> unrelated, so maybe the code emitted for an alias check should be >> changed. > > If we can prove that the pointers are to different objects, it would > be valid to fold the check to "no alias", regardless of the constant > (although in practice we should have weeded out those cases before > generating the check). In that sense, relying on the UBness of > comparing pointers to different objects would be fine. If there's a > risk of the check being folded to "alias" when the pointers are known > to point to different objects then that would be more of a problem > (as a missed optimisation). Assuming they are related, we are also assuming that pointer_diff will not overflow. But for unrelated objects, in particular on 32bit targets, pointer differences cannot all be represented by a 32 bit ptrdiff_t (the differences live in a range twice that size), and doing comparisons on it can have strange effects. In this particular case, I don't really see a way things could break (you would somehow need one pointer close to 0 and the other close to 2^32 so they end up close modulo 2^32, but that would require @2+@1 to overflow which means the loop will never run that far anyway). But we are still lying and taking a risk that some other optimization will trust us. (I am ok with keeping the current alias code, just saying that it involves a small, possibly negligible risk)
Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-07-18 18:44:22.565914281 +0100 +++ gcc/match.pd 2018-07-20 11:24:33.692045585 +0100 @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) + +/* For pointers @0 and @2 and unsigned constant offset @1, look for + expressions like: + + A: (@0 + @1 < @2) | (@2 + @1 < @0) + B: (@0 + @1 <= @2) | (@2 + @1 <= @0) + + If pointers are known not to wrap, B checks whether @1 bytes starting at + @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes. + A is more efficiently tested as: + + ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) + + as long as @1 * 2 doesn't overflow. B is the same with @1 replaced + with @1 - 1. */ +(for ior (truth_orif truth_or bit_ior) + (for cmp (le lt) + (simplify + (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2) + (cmp (pointer_plus:s @2 @1) @0)) + (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) + /* Convert the B form to the A form. */ + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); } + /* Always fails for negative values. */ + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype)) + /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let + tree_swap_operands_p pick a canonical order. */ + (with { tree ptr1 = @0, ptr2 = @2; + if (tree_swap_operands_p (ptr1, ptr2)) + std::swap (ptr1, ptr2); } + (gt (plus (minus (convert:sizetype { ptr1; }) + (convert:sizetype { ptr2; })) + { wide_int_to_tree (sizetype, off); }) + { wide_int_to_tree (sizetype, off * 2); })))))))) Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c =================================================================== --- /dev/null 2018-07-09 14:52:09.234750850 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-20 11:24:33.692045585 +0100 @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */ + +/* All four functions should be folded to: + + ((sizetype) a - (sizetype) b + 15) < 30. */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */ Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c =================================================================== --- /dev/null 2018-07-09 14:52:09.234750850 +0100 +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-20 11:24:33.692045585 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */ + +_Bool +f1 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f2 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +_Bool +f3 (char *a, char *b) +{ + return (a + 16 <= b) || (b + 16 <= a); +} + +_Bool +f4 (char *a, char *b) +{ + return (a + 15 < b) || (b + 15 < a); +} + +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */ +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */