diff mbox series

Fold pointer range checks with equal spans

Message ID 87in5a2odu.fsf@arm.com
State New
Headers show
Series Fold pointer range checks with equal spans | expand

Commit Message

Richard Sandiford July 20, 2018, 10:27 a.m. UTC
When checking whether vectorised accesses at A and B are independent,
the vectoriser falls back to tests of the form:

    A + size <= B || B + size <= A

But in the common case that "size" is just the constant size of a vector
(or a small multiple), it would be more efficient to do:

   ((size_t) A - (size_t) B + size - 1) > (size - 1) * 2

This patch adds folds to do that.  E.g. before the patch, the alias
checks for:

  for (int j = 0; j < n; ++j)
    {
      for (int i = 0; i < 16; ++i)
	a[i] = (b[i] + c[i]) >> 1;
      a += step;
      b += step;
      c += step;
    }

were:

        add     x7, x1, 15
        add     x5, x0, 15
        cmp     x0, x7
        add     x7, x2, 15
        ccmp    x1, x5, 2, ls
        cset    w8, hi
        cmp     x0, x7
        ccmp    x2, x5, 2, ls
        cset    w4, hi
        tst     w8, w4

while after the patch they're:

        sub     x7, x0, x1
        sub     x5, x0, x2
        add     x7, x7, 15
        add     x5, x5, 15
        cmp     x7, 30
        ccmp    x5, 30, 0, hi

The old scheme needs:

[A] one addition per vector pointer
[B] two comparisons and one IOR per range check

The new one needs:

[C] one subtraction, one addition and one comparison per range check

The range checks are then ANDed together, with the same number of
ANDs either way.

With conditional comparisons (such as on AArch64), we're able to remove
the IOR between comparisons in the old scheme, but then need an explicit
AND or branch when combining the range checks, as the example above shows.
With the new scheme we can instead use conditional comparisons for
the AND chain.

So even with conditional comparisons, the new scheme should in practice
be a win in almost all cases.  Without conditional comparisons, the new
scheme removes [A] and replaces [B] with an equivalent number of operations
[C], so should always give fewer operations overall.  Although each [C]
is linear, multiple [C]s are easily parallelisable.

A better implementation of the above would be:

        add     x5, x0, 15
        sub     x7, x5, x1
        sub     x5, x5, x2
        cmp     x7, 30
        ccmp    x5, 30, 0, hi

where we add 15 to "a" before the subtraction.  Unfortunately,
canonicalisation rules mean that even if we try to create it in
that form, it gets folded into the one above instead.

An alternative would be not to do this in match.pd and instead get
tree-data-ref.c to do it itself.  I started out that way but thought
the match.pd approach seemed cleaner.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-07-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* match.pd: Optimise pointer range checks.

gcc/testsuite/
	* gcc.dg/pointer-range-check-1.c: New test.
	* gcc.dg/pointer-range-check-2.c: Likewise.

Comments

Marc Glisse July 20, 2018, 12:10 p.m. UTC | #1
On Fri, 20 Jul 2018, Richard Sandiford wrote:

> --- gcc/match.pd	2018-07-18 18:44:22.565914281 +0100
> +++ gcc/match.pd	2018-07-20 11:24:33.692045585 +0100
> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (inverse_conditions_p (@0, @2)
>         && element_precision (type) == element_precision (op_type))
>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> +
> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> +   expressions like:
> +
> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> +
> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> +   A is more efficiently tested as:
> +
> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> +
> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> +   with @1 - 1.  */
> +(for ior (truth_orif truth_or bit_ior)
> + (for cmp (le lt)
> +  (simplify
> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> +	(cmp (pointer_plus:s @2 @1) @0))

Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some 
care with "cmp == LE_EXPR" below)
Do you want :s on cmp as well?

> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))

Don't you want TYPE_OVERFLOW_UNDEFINED?

> +    /* Convert the B form to the A form.  */
> +    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> +	 tree_swap_operands_p pick a canonical order.  */
> +      (with { tree ptr1 = @0, ptr2 = @2;
> +	      if (tree_swap_operands_p (ptr1, ptr2))
> +		std::swap (ptr1, ptr2); }
> +       (gt (plus (minus (convert:sizetype { ptr1; })
> +			(convert:sizetype { ptr2; }))
> +		 { wide_int_to_tree (sizetype, off); })
> +	   { wide_int_to_tree (sizetype, off * 2); }))))))))
Richard Sandiford July 23, 2018, 3:04 p.m. UTC | #2
Marc Glisse <marc.glisse@inria.fr> writes:
> On Fri, 20 Jul 2018, Richard Sandiford wrote:
>
>> --- gcc/match.pd	2018-07-18 18:44:22.565914281 +0100
>> +++ gcc/match.pd	2018-07-20 11:24:33.692045585 +0100
>> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>    (if (inverse_conditions_p (@0, @2)
>>         && element_precision (type) == element_precision (op_type))
>>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> +
>> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> +   expressions like:
>> +
>> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> +
>> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
>> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
>> +   A is more efficiently tested as:
>> +
>> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> +
>> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> +   with @1 - 1.  */
>> +(for ior (truth_orif truth_or bit_ior)
>> + (for cmp (le lt)
>> +  (simplify
>> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
>> +	(cmp (pointer_plus:s @2 @1) @0))
>
> Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some 
> care with "cmp == LE_EXPR" below)
> Do you want :s on cmp as well?
>
>> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>
> Don't you want TYPE_OVERFLOW_UNDEFINED?

Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
since the generated code sets cmp to LE_EXPR when matching GE_EXPR.

Tested as before.  OK to install?

Richard


2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* match.pd: Optimise pointer range checks.

gcc/testsuite/
	* gcc.dg/pointer-range-check-1.c: New test.
	* gcc.dg/pointer-range-check-2.c: Likewise.

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-07-23 15:56:47.000000000 +0100
+++ gcc/match.pd	2018-07-23 15:58:33.480269844 +0100
@@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+
+/* For pointers @0 and @2 and unsigned constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting at
+   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
+   A is more efficiently tested as:
+
+   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
+
+   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
+   with @1 - 1.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
+	(cmp:cs (pointer_plus:s @2 @1) @0))
+   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+    /* Convert the B form to the A form.  */
+    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
+     /* Always fails for negative values.  */
+     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
+      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
+	 tree_swap_operands_p pick a canonical order.  */
+      (with { tree ptr1 = @0, ptr2 = @2;
+	      if (tree_swap_operands_p (ptr1, ptr2))
+		std::swap (ptr1, ptr2); }
+       (gt (plus (minus (convert:sizetype { ptr1; })
+			(convert:sizetype { ptr2; }))
+		 { wide_int_to_tree (sizetype, off); })
+	   { wide_int_to_tree (sizetype, off * 2); }))))))))
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===================================================================
--- /dev/null	2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c	2018-07-23 15:58:33.480269844 +0100
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
+
+/* All four functions should be folded to:
+
+   ((sizetype) a - (sizetype) b + 15) < 30.  */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
===================================================================
--- /dev/null	2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-2.c	2018-07-23 15:58:33.480269844 +0100
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Richard Biener July 25, 2018, 9:07 a.m. UTC | #3
On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Marc Glisse <marc.glisse@inria.fr> writes:
> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
> >
> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>    (if (inverse_conditions_p (@0, @2)
> >>         && element_precision (type) == element_precision (op_type))
> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> +
> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> +   expressions like:
> >> +
> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> +
> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> >> +   A is more efficiently tested as:
> >> +
> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> +
> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> +   with @1 - 1.  */
> >> +(for ior (truth_orif truth_or bit_ior)
> >> + (for cmp (le lt)
> >> +  (simplify
> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> +    (cmp (pointer_plus:s @2 @1) @0))
> >
> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some
> > care with "cmp == LE_EXPR" below)
> > Do you want :s on cmp as well?
> >
> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >
> > Don't you want TYPE_OVERFLOW_UNDEFINED?
>
> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
>
> Tested as before.  OK to install?
>
> Richard
>
>
> 2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * match.pd: Optimise pointer range checks.
>
> gcc/testsuite/
>         * gcc.dg/pointer-range-check-1.c: New test.
>         * gcc.dg/pointer-range-check-2.c: Likewise.
>
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> +
> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> +   expressions like:
> +
> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> +
> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> +   A is more efficiently tested as:
> +
> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> +
> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> +   with @1 - 1.  */
> +(for ior (truth_orif truth_or bit_ior)
> + (for cmp (le lt)
> +  (simplify
> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
> +       (cmp:cs (pointer_plus:s @2 @1) @0))
> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))

no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.

> +    /* Convert the B form to the A form.  */
> +    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> +        tree_swap_operands_p pick a canonical order.  */

gimple_resimplify takes care of that - well, it doesn't since minus isn't
commutative...  I guess you get better CSE later when doing this thus ok,
but it does look a bit off here ;)

I think you shouldn't use 'sizetype' without guarding this whole thing
with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).

Since the original IL performs an ordered compare of two eventually unrelated
pointers (or pointers adjusted in a way that crosses object
boundaries) (uh... ;))
I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
interpreted "signed" that should also handle negative offsets correctly then?

Also, when @2 == @0 + (@1+1) then the original condition is true but
((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
   (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
-> -1 > @1 * 2

which is false.  So I can't really see how you can apply this transform in
general (it would be fine for generating alias checks but a bit more
pessimistic).

But maybe I am missing something?

Richard.

> +      (with { tree ptr1 = @0, ptr2 = @2;
> +             if (tree_swap_operands_p (ptr1, ptr2))
> +               std::swap (ptr1, ptr2); }
> +       (gt (plus (minus (convert:sizetype { ptr1; })
> +                       (convert:sizetype { ptr2; }))
> +                { wide_int_to_tree (sizetype, off); })
> +          { wide_int_to_tree (sizetype, off * 2); }))))))))
> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
> ===================================================================
> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c        2018-07-23 15:58:33.480269844 +0100
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
> +
> +/* All four functions should be folded to:
> +
> +   ((sizetype) a - (sizetype) b + 15) < 30.  */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
> ===================================================================
> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c        2018-07-23 15:58:33.480269844 +0100
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Richard Sandiford July 30, 2018, 5:47 p.m. UTC | #4
[Sorry, somehow missed this till now]

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Marc Glisse <marc.glisse@inria.fr> writes:
>> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
>> >
>> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
>> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
>> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >>    (if (inverse_conditions_p (@0, @2)
>> >>         && element_precision (type) == element_precision (op_type))
>> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> +
>> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> +   expressions like:
>> >> +
>> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> +
>> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
>> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
>> >> +   A is more efficiently tested as:
>> >> +
>> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> +
>> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> >> +   with @1 - 1.  */
>> >> +(for ior (truth_orif truth_or bit_ior)
>> >> + (for cmp (le lt)
>> >> +  (simplify
>> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> +    (cmp (pointer_plus:s @2 @1) @0))
>> >
>> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some
>> > care with "cmp == LE_EXPR" below)
>> > Do you want :s on cmp as well?
>> >
>> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>> >
>> > Don't you want TYPE_OVERFLOW_UNDEFINED?
>>
>> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
>> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
>>
>> Tested as before.  OK to install?
>>
>> Richard
>>
>>
>> 2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>
>>
>> gcc/
>>         * match.pd: Optimise pointer range checks.
>>
>> gcc/testsuite/
>>         * gcc.dg/pointer-range-check-1.c: New test.
>>         * gcc.dg/pointer-range-check-2.c: Likewise.
>>
>> Index: gcc/match.pd
>> ===================================================================
>> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
>> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
>> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>     (if (inverse_conditions_p (@0, @2)
>>          && element_precision (type) == element_precision (op_type))
>>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> +
>> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> +   expressions like:
>> +
>> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> +
>> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
>> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
>> +   A is more efficiently tested as:
>> +
>> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> +
>> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> +   with @1 - 1.  */
>> +(for ior (truth_orif truth_or bit_ior)
>> + (for cmp (le lt)
>> +  (simplify
>> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
>> +       (cmp:cs (pointer_plus:s @2 @1) @0))
>> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
>
> no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.

This was there because we're converting pointer arithmetic into integer
arithmetic, and -ftrapv could cause that new integer code to trap.
TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
but it seemed bad form to make it actually trap, especially when the
above does some reassociation too.

>> +    /* Convert the B form to the A form.  */
>> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
>> 0); }
>> +     /* Always fails for negative values.  */
>> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
>> (sizetype))
>> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
>> +        tree_swap_operands_p pick a canonical order.  */
>
> gimple_resimplify takes care of that - well, it doesn't since minus isn't
> commutative...  I guess you get better CSE later when doing this thus ok,
> but it does look a bit off here ;)
>
> I think you shouldn't use 'sizetype' without guarding this whole thing
> with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).

OK, hadn't realised that could be false.  Would building the appropriate
unsigned type be OK without the condition, or does it need to be sizetype?

> Since the original IL performs an ordered compare of two eventually unrelated
> pointers (or pointers adjusted in a way that crosses object
> boundaries) (uh... ;))
> I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
> conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
> interpreted "signed" that should also handle negative offsets correctly then?

The transform isn't valid when @1 is negative though, so I think we
need that check either way.

I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
seems better.  That might also allow us to CSE the retained
POINTER_PLUS_EXPRs.

> Also, when @2 == @0 + (@1+1) then the original condition is true but
> ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
> -> -1 > @1 * 2
>
> which is false.  So I can't really see how you can apply this transform in
> general (it would be fine for generating alias checks but a bit more
> pessimistic).
>
> But maybe I am missing something?

It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.

Thanks,
Richard

> Richard.
>
>> +      (with { tree ptr1 = @0, ptr2 = @2;
>> +             if (tree_swap_operands_p (ptr1, ptr2))
>> +               std::swap (ptr1, ptr2); }
>> +       (gt (plus (minus (convert:sizetype { ptr1; })
>> +                       (convert:sizetype { ptr2; }))
>> +                { wide_int_to_tree (sizetype, off); })
>> +          { wide_int_to_tree (sizetype, off * 2); }))))))))
>> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
>> ===================================================================
>> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
>> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23
>> 15:58:33.480269844 +0100
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
>> +
>> +/* All four functions should be folded to:
>> +
>> +   ((sizetype) a - (sizetype) b + 15) < 30.  */
>> +
>> +_Bool
>> +f1 (char *a, char *b)
>> +{
>> +  return (a + 16 <= b) || (b + 16 <= a);
>> +}
>> +
>> +_Bool
>> +f2 (char *a, char *b)
>> +{
>> +  return (a + 15 < b) || (b + 15 < a);
>> +}
>> +
>> +_Bool
>> +f3 (char *a, char *b)
>> +{
>> +  return (a + 16 <= b) || (b + 16 <= a);
>> +}
>> +
>> +_Bool
>> +f4 (char *a, char *b)
>> +{
>> +  return (a + 15 < b) || (b + 15 < a);
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4
>> "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4
>> "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4
>> "optimized" } } */
>> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
>> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
>> ===================================================================
>> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
>> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23
>> 15:58:33.480269844 +0100
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer
>> -fdump-tree-optimized" } */
>> +
>> +_Bool
>> +f1 (char *a, char *b)
>> +{
>> +  return (a + 16 <= b) || (b + 16 <= a);
>> +}
>> +
>> +_Bool
>> +f2 (char *a, char *b)
>> +{
>> +  return (a + 15 < b) || (b + 15 < a);
>> +}
>> +
>> +_Bool
>> +f3 (char *a, char *b)
>> +{
>> +  return (a + 16 <= b) || (b + 16 <= a);
>> +}
>> +
>> +_Bool
>> +f4 (char *a, char *b)
>> +{
>> +  return (a + 15 < b) || (b + 15 < a);
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8
>> "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Richard Biener July 31, 2018, 10:55 a.m. UTC | #5
On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> [Sorry, somehow missed this till now]
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Marc Glisse <marc.glisse@inria.fr> writes:
> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
> >> >
> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >>    (if (inverse_conditions_p (@0, @2)
> >> >>         && element_precision (type) == element_precision (op_type))
> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> >> +
> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> +   expressions like:
> >> >> +
> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> +
> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> >> >> +   A is more efficiently tested as:
> >> >> +
> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> +
> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> >> +   with @1 - 1.  */
> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> + (for cmp (le lt)
> >> >> +  (simplify
> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
> >> >
> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some
> >> > care with "cmp == LE_EXPR" below)
> >> > Do you want :s on cmp as well?
> >> >
> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >> >
> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
> >>
> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
> >>
> >> Tested as before.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>
> >>
> >> gcc/
> >>         * match.pd: Optimise pointer range checks.
> >>
> >> gcc/testsuite/
> >>         * gcc.dg/pointer-range-check-1.c: New test.
> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
> >>
> >> Index: gcc/match.pd
> >> ===================================================================
> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>     (if (inverse_conditions_p (@0, @2)
> >>          && element_precision (type) == element_precision (op_type))
> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> +
> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> +   expressions like:
> >> +
> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> +
> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> >> +   A is more efficiently tested as:
> >> +
> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> +
> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> +   with @1 - 1.  */
> >> +(for ior (truth_orif truth_or bit_ior)
> >> + (for cmp (le lt)
> >> +  (simplify
> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> >
> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
>
> This was there because we're converting pointer arithmetic into integer
> arithmetic, and -ftrapv could cause that new integer code to trap.
> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
> but it seemed bad form to make it actually trap, especially when the
> above does some reassociation too.

Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
you are doing the arithmetic on (sizetype) instead?

>
> >> +    /* Convert the B form to the A form.  */
> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
> >> 0); }
> >> +     /* Always fails for negative values.  */
> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
> >> (sizetype))
> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> >> +        tree_swap_operands_p pick a canonical order.  */
> >
> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
> > commutative...  I guess you get better CSE later when doing this thus ok,
> > but it does look a bit off here ;)
> >
> > I think you shouldn't use 'sizetype' without guarding this whole thing
> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
>
> OK, hadn't realised that could be false.  Would building the appropriate
> unsigned type be OK without the condition, or does it need to be sizetype?

That would probably work but it might be dead slow on those targets that choose
to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
sizetype because there's no arithmetic for 24bit but in offsetted addressing).
So the transform is likely not a win for those.

> > Since the original IL performs an ordered compare of two eventually unrelated
> > pointers (or pointers adjusted in a way that crosses object
> > boundaries) (uh... ;))
> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
> > interpreted "signed" that should also handle negative offsets correctly then?
>
> The transform isn't valid when @1 is negative though, so I think we
> need that check either way.

+     /* Always fails for negative values.  */
+     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))

shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
TYPE_PRECISION (sizetype)?

> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
> seems better.  That might also allow us to CSE the retained
> POINTER_PLUS_EXPRs.

Yeah, that sounds better then.

> > Also, when @2 == @0 + (@1+1) then the original condition is true but
> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
> > -> -1 > @1 * 2
> >
> > which is false.  So I can't really see how you can apply this transform in
> > general (it would be fine for generating alias checks but a bit more
> > pessimistic).
> >
> > But maybe I am missing something?
>
> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.

Hmm, so mathematically this is

  (@0 - @2) % modreduce + @1 > @1 * 2

then, but I don't want to think too much about this since Marc didn't
object here ;)

Thanks,
Richard.

> Thanks,
> Richard
>
> > Richard.
> >
> >> +      (with { tree ptr1 = @0, ptr2 = @2;
> >> +             if (tree_swap_operands_p (ptr1, ptr2))
> >> +               std::swap (ptr1, ptr2); }
> >> +       (gt (plus (minus (convert:sizetype { ptr1; })
> >> +                       (convert:sizetype { ptr2; }))
> >> +                { wide_int_to_tree (sizetype, off); })
> >> +          { wide_int_to_tree (sizetype, off * 2); }))))))))
> >> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
> >> ===================================================================
> >> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> >> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23
> >> 15:58:33.480269844 +0100
> >> @@ -0,0 +1,37 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
> >> +
> >> +/* All four functions should be folded to:
> >> +
> >> +   ((sizetype) a - (sizetype) b + 15) < 30.  */
> >> +
> >> +_Bool
> >> +f1 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f2 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +_Bool
> >> +f3 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f4 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
> >> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
> >> ===================================================================
> >> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> >> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23
> >> 15:58:33.480269844 +0100
> >> @@ -0,0 +1,31 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer
> >> -fdump-tree-optimized" } */
> >> +
> >> +_Bool
> >> +f1 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f2 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +_Bool
> >> +f3 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f4 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Marc Glisse July 31, 2018, 8:53 p.m. UTC | #6
On Tue, 31 Jul 2018, Richard Biener wrote:

>>> Also, when @2 == @0 + (@1+1) then the original condition is true but
>>> ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>>>    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
>>> -> -1 > @1 * 2
>>>
>>> which is false.  So I can't really see how you can apply this transform in
>>> general (it would be fine for generating alias checks but a bit more
>>> pessimistic).
>>>
>>> But maybe I am missing something?
>>
>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
>
> Hmm, so mathematically this is
>
>  (@0 - @2) % modreduce + @1 > @1 * 2
>
> then, but I don't want to think too much about this since Marc didn't
> object here ;)

We already transform abs(x)<=3 into (unsigned)x+3u<=6u, that's the usual
way we do range checking so I didn't pay much attention to that part.
(tempted to say: "I didn't want to think too much about this since
Richard was going to do it anyway" ;-)

Turning multiple comparisons of the form P + cst CMP Q + cst into a
range check on P - Q sounds good (we don't really have to restrict to
the case where the range is symmetric). Actually, just turning P + cst
CMP Q + cst into P - Q CMP cst should do it, we should already have code
to handle range checking on integers (modulo the issue of CSE P-Q and
Q-P). But I don't know if a couple :s is sufficient to make this
transformation a good canonicalization.

If we start from a comparison of pointer_plus, I think it would make
sense to use pointer_diff.

I believe currently we try to use pointer operations (pointer_plus,
pointer_diff, lt) only for related pointers and we cast to some integer
type for wilder cases (implementation of std::less in C++ for instance).
On the other hand, in an alias check, the 2 pointers are possibly
unrelated, so maybe the code emitted for an alias check should be
changed.
Richard Sandiford Aug. 1, 2018, 10:25 a.m. UTC | #7
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> [Sorry, somehow missed this till now]
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Marc Glisse <marc.glisse@inria.fr> writes:
>> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
>> >> >
>> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
>> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
>> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >> >>    (if (inverse_conditions_p (@0, @2)
>> >> >>         && element_precision (type) == element_precision (op_type))
>> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> >> +
>> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> >> +   expressions like:
>> >> >> +
>> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> >> +
>> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
>> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
>> >> >> +   A is more efficiently tested as:
>> >> >> +
>> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> >> +
>> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> >> >> +   with @1 - 1.  */
>> >> >> +(for ior (truth_orif truth_or bit_ior)
>> >> >> + (for cmp (le lt)
>> >> >> +  (simplify
>> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
>> >> >
>> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may
>> >> > need some
>> >> > care with "cmp == LE_EXPR" below)
>> >> > Do you want :s on cmp as well?
>> >> >
>> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>> >> >
>> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
>> >>
>> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
>> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
>> >>
>> >> Tested as before.  OK to install?
>> >>
>> >> Richard
>> >>
>> >>
>> >> 2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>
>> >>
>> >> gcc/
>> >>         * match.pd: Optimise pointer range checks.
>> >>
>> >> gcc/testsuite/
>> >>         * gcc.dg/pointer-range-check-1.c: New test.
>> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
>> >>
>> >> Index: gcc/match.pd
>> >> ===================================================================
>> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
>> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
>> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >>     (if (inverse_conditions_p (@0, @2)
>> >>          && element_precision (type) == element_precision (op_type))
>> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
>> >> +
>> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
>> >> +   expressions like:
>> >> +
>> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
>> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
>> >> +
>> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
>> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
>> >> +   A is more efficiently tested as:
>> >> +
>> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
>> >> +
>> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
>> >> +   with @1 - 1.  */
>> >> +(for ior (truth_orif truth_or bit_ior)
>> >> + (for cmp (le lt)
>> >> +  (simplify
>> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
>> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
>> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
>> >
>> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
>>
>> This was there because we're converting pointer arithmetic into integer
>> arithmetic, and -ftrapv could cause that new integer code to trap.
>> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
>> but it seemed bad form to make it actually trap, especially when the
>> above does some reassociation too.
>
> Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
> you are doing the arithmetic on (sizetype) instead?

OK, done below.

>> >> +    /* Convert the B form to the A form.  */
>> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
>> >> 0); }
>> >> +     /* Always fails for negative values.  */
>> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
>> >> (sizetype))
>> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
>> >> +        tree_swap_operands_p pick a canonical order.  */
>> >
>> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
>> > commutative...  I guess you get better CSE later when doing this thus ok,
>> > but it does look a bit off here ;)
>> >
>> > I think you shouldn't use 'sizetype' without guarding this whole thing
>> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
>>
>> OK, hadn't realised that could be false.  Would building the appropriate
>> unsigned type be OK without the condition, or does it need to be sizetype?
>
> That would probably work but it might be dead slow on those targets that choose
> to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
> sizetype because there's no arithmetic for 24bit but in offsetted addressing).
> So the transform is likely not a win for those.

Ah, yeah.  The version below adds the == test.

>> > Since the original IL performs an ordered compare of two eventually
>> > unrelated
>> > pointers (or pointers adjusted in a way that crosses object
>> > boundaries) (uh... ;))
>> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
>> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
>> > interpreted "signed" that should also handle negative offsets
>> > correctly then?
>>
>> The transform isn't valid when @1 is negative though, so I think we
>> need that check either way.
>
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
>
> shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
> TYPE_PRECISION (sizetype)?

Gah, yes :-(

>> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
>> seems better.  That might also allow us to CSE the retained
>> POINTER_PLUS_EXPRs.
>
> Yeah, that sounds better then.

And the CSEing did happen, so we now get the "ideal" sequence for the
original alias checks:

	add     x0, x0, 15
	sub     x6, x0, x1
	sub     x5, x0, x2
	cmp     x6, 30
	ccmp    x5, 30, 0, hi

whereas the original patch had an extra add.

>> > Also, when @2 == @0 + (@1+1) then the original condition is true but
>> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
>> > -> -1 > @1 * 2
>> >
>> > which is false.  So I can't really see how you can apply this transform in
>> > general (it would be fine for generating alias checks but a bit more
>> > pessimistic).
>> >
>> > But maybe I am missing something?
>>
>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
>
> Hmm, so mathematically this is
>
>   (@0 - @2) % modreduce + @1 > @1 * 2

Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2
(and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false
while we want it to be true.  But...

> then, but I don't want to think too much about this since Marc didn't
> object here ;)

...we start out converting the range check for A to:

   !(-@1 <= @0 - @2 <= @1)

i.e. @1 + 1 bytes at @0 and @2 are free from overlap.  Then we just
apply the usual trick:

   A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B

Substituting @0 - @2 for X, -@1 for A and @1 for B gives:

   !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2))
->   (unsigned) (@0 - @2 + @1) >  (unsigned) (@1 * 2)

Then the B case is just an off-by-one variant.

Patch tested as before.

Thanks,
Richard


2018-08-01  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* match.pd: Optimise pointer range checks.

gcc/testsuite/
	* gcc.dg/pointer-range-check-1.c: New test.
	* gcc.dg/pointer-range-check-2.c: Likewise.

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-07-24 19:04:14.985363724 +0100
+++ gcc/match.pd	2018-08-01 10:55:10.695766411 +0100
@@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+
+/* For pointers @0 and @2 and nonnegative constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting
+   at @0 and @2 do not overlap, while A tests the same thing for @1 + 1
+   bytes.  A is more efficiently tested as:
+
+   A: (sizetype) (@0 + @1 - @2) > @1 * 2
+
+   The equivalent expression for B is given by replacing @1 with @1 - 1:
+
+   B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2
+
+   @0 and @2 can be swapped in both expressions without changing the result.
+
+   The folds rely on sizetype's being unsigned (which is always true)
+   and on its being the same width as the pointer (which we have to check).
+
+   The fold replaces two pointer_plus expressions, two comparisons and
+   an IOR with a pointer_plus, a pointer_diff, and a comparison, so in
+   the best case it's a saving of two operations.  The A fold retains one
+   of the original pointer_pluses, so is a win even if both pointer_pluses
+   are used elsewhere.  The B fold is a wash if both pointer_pluses are
+   used elsewhere, since all we end up doing is replacing a comparison with
+   a pointer_plus.  We do still apply the fold under those circumstances
+   though, in case applying it to other conditions eventually makes one of the
+   pointer_pluses dead.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2)
+	(cmp:cs (pointer_plus@4 @2 @1) @0))
+   (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+	&& TYPE_OVERFLOW_WRAPS (sizetype)
+	&& TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype))
+    /* Calculate the rhs constant.  */
+    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0);
+	    offset_int rhs = off * 2; }
+     /* Always fails for negative values.  */
+     (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))
+      /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p
+	 pick a canonical order.  This increases the chances of using the
+	 same pointer_plus in multiple checks.  */
+      (with { bool swap_p = tree_swap_operands_p (@0, @2);
+	      tree rhs_tree = wide_int_to_tree (sizetype, rhs); }
+       (if (cmp == LT_EXPR)
+	(gt (convert:sizetype
+	     (pointer_diff:ssizetype { swap_p ? @4 : @3; }
+				     { swap_p ? @0 : @2; }))
+	    { rhs_tree; })
+	(gt (convert:sizetype
+	     (pointer_diff:ssizetype
+	      (pointer_plus { swap_p ? @2 : @0; }
+			    { wide_int_to_tree (sizetype, off); })
+	      { swap_p ? @0 : @2; }))
+	    { rhs_tree; })))))))))
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===================================================================
--- /dev/null	2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c	2018-08-01 10:55:10.695766411 +0100
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
+
+/* All four functions should be folded to:
+
+   (sizetype) (a + 15 - b) < 30.  */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
===================================================================
--- /dev/null	2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-2.c	2018-08-01 10:55:10.695766411 +0100
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) | (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) | (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Richard Sandiford Aug. 1, 2018, 10:59 a.m. UTC | #8
Marc Glisse <marc.glisse@inria.fr> writes:
> On Tue, 31 Jul 2018, Richard Biener wrote:
>
>>>> Also, when @2 == @0 + (@1+1) then the original condition is true but
>>>> ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
>>>>    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
>>>> -> -1 > @1 * 2
>>>>
>>>> which is false.  So I can't really see how you can apply this transform in
>>>> general (it would be fine for generating alias checks but a bit more
>>>> pessimistic).
>>>>
>>>> But maybe I am missing something?
>>>
>>> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
>>
>> Hmm, so mathematically this is
>>
>>  (@0 - @2) % modreduce + @1 > @1 * 2
>>
>> then, but I don't want to think too much about this since Marc didn't
>> object here ;)
>
> We already transform abs(x)<=3 into (unsigned)x+3u<=6u, that's the usual
> way we do range checking so I didn't pay much attention to that part.
> (tempted to say: "I didn't want to think too much about this since
> Richard was going to do it anyway" ;-)
>
> Turning multiple comparisons of the form P + cst CMP Q + cst into a
> range check on P - Q sounds good (we don't really have to restrict to
> the case where the range is symmetric). Actually, just turning P + cst
> CMP Q + cst into P - Q CMP cst should do it, we should already have code
> to handle range checking on integers (modulo the issue of CSE P-Q and
> Q-P). But I don't know if a couple :s is sufficient to make this
> transformation a good canonicalization.

Yeah.  Like you say, in the cases being handled by the patch, folding the
two comparisons separately and then folding the IOR would require either

(a) folding:
       P + cst < Q
    to the rather unnatural-looking:
       Q - P > -cst
    when tree_swap_operands_p (P, Q) or

(b) making the range fold handle reversed pointer_diffs (which I guess
    makes more sense).

But the fold in the patch should be either a strict win or a wash even
without :s on the pointer_pluses (only realised that when doing today's
respin) whereas this canonicalisation really would need the :s.
So I think doing the fold on the IOR makes sense.

> If we start from a comparison of pointer_plus, I think it would make
> sense to use pointer_diff.
>
> I believe currently we try to use pointer operations (pointer_plus,
> pointer_diff, lt) only for related pointers and we cast to some integer
> type for wilder cases (implementation of std::less in C++ for instance).
> On the other hand, in an alias check, the 2 pointers are possibly
> unrelated, so maybe the code emitted for an alias check should be
> changed.

If we can prove that the pointers are to different objects, it would
be valid to fold the check to "no alias", regardless of the constant
(although in practice we should have weeded out those cases before
generating the check).  In that sense, relying on the UBness of
comparing pointers to different objects would be fine.  If there's a
risk of the check being folded to "alias" when the pointers are known
to point to different objects then that would be more of a problem
(as a missed optimisation).

Thanks,
Richard
Richard Biener Aug. 1, 2018, 11:58 a.m. UTC | #9
On Wed, Aug 1, 2018 at 12:25 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> [Sorry, somehow missed this till now]
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> >>
> >> >> Marc Glisse <marc.glisse@inria.fr> writes:
> >> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
> >> >> >
> >> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
> >> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
> >> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >> >>    (if (inverse_conditions_p (@0, @2)
> >> >> >>         && element_precision (type) == element_precision (op_type))
> >> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> >> >> +
> >> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> >> +   expressions like:
> >> >> >> +
> >> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> >> +
> >> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> >> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> >> >> >> +   A is more efficiently tested as:
> >> >> >> +
> >> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> >> +
> >> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> >> >> +   with @1 - 1.  */
> >> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> >> + (for cmp (le lt)
> >> >> >> +  (simplify
> >> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
> >> >> >
> >> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may
> >> >> > need some
> >> >> > care with "cmp == LE_EXPR" below)
> >> >> > Do you want :s on cmp as well?
> >> >> >
> >> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >> >> >
> >> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
> >> >>
> >> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
> >> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
> >> >>
> >> >> Tested as before.  OK to install?
> >> >>
> >> >> Richard
> >> >>
> >> >>
> >> >> 2018-07-23  Richard Sandiford  <richard.sandiford@arm.com>
> >> >>
> >> >> gcc/
> >> >>         * match.pd: Optimise pointer range checks.
> >> >>
> >> >> gcc/testsuite/
> >> >>         * gcc.dg/pointer-range-check-1.c: New test.
> >> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
> >> >>
> >> >> Index: gcc/match.pd
> >> >> ===================================================================
> >> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
> >> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >>     (if (inverse_conditions_p (@0, @2)
> >> >>          && element_precision (type) == element_precision (op_type))
> >> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> >> +
> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> +   expressions like:
> >> >> +
> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> +
> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting at
> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
> >> >> +   A is more efficiently tested as:
> >> >> +
> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> +
> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> >> +   with @1 - 1.  */
> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> + (for cmp (le lt)
> >> >> +  (simplify
> >> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
> >> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> >> >
> >> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
> >>
> >> This was there because we're converting pointer arithmetic into integer
> >> arithmetic, and -ftrapv could cause that new integer code to trap.
> >> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
> >> but it seemed bad form to make it actually trap, especially when the
> >> above does some reassociation too.
> >
> > Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
> > you are doing the arithmetic on (sizetype) instead?
>
> OK, done below.
>
> >> >> +    /* Convert the B form to the A form.  */
> >> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
> >> >> 0); }
> >> >> +     /* Always fails for negative values.  */
> >> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
> >> >> (sizetype))
> >> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> >> >> +        tree_swap_operands_p pick a canonical order.  */
> >> >
> >> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
> >> > commutative...  I guess you get better CSE later when doing this thus ok,
> >> > but it does look a bit off here ;)
> >> >
> >> > I think you shouldn't use 'sizetype' without guarding this whole thing
> >> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
> >>
> >> OK, hadn't realised that could be false.  Would building the appropriate
> >> unsigned type be OK without the condition, or does it need to be sizetype?
> >
> > That would probably work but it might be dead slow on those targets that choose
> > to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
> > sizetype because there's no arithmetic for 24bit but in offsetted addressing).
> > So the transform is likely not a win for those.
>
> Ah, yeah.  The version below adds the == test.
>
> >> > Since the original IL performs an ordered compare of two eventually
> >> > unrelated
> >> > pointers (or pointers adjusted in a way that crosses object
> >> > boundaries) (uh... ;))
> >> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
> >> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
> >> > interpreted "signed" that should also handle negative offsets
> >> > correctly then?
> >>
> >> The transform isn't valid when @1 is negative though, so I think we
> >> need that check either way.
> >
> > +     /* Always fails for negative values.  */
> > +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
> >
> > shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
> > TYPE_PRECISION (sizetype)?
>
> Gah, yes :-(
>
> >> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
> >> seems better.  That might also allow us to CSE the retained
> >> POINTER_PLUS_EXPRs.
> >
> > Yeah, that sounds better then.
>
> And the CSEing did happen, so we now get the "ideal" sequence for the
> original alias checks:
>
>         add     x0, x0, 15
>         sub     x6, x0, x1
>         sub     x5, x0, x2
>         cmp     x6, 30
>         ccmp    x5, 30, 0, hi
>
> whereas the original patch had an extra add.
>
> >> > Also, when @2 == @0 + (@1+1) then the original condition is true but
> >> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
> >> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
> >> > -> -1 > @1 * 2
> >> >
> >> > which is false.  So I can't really see how you can apply this transform in
> >> > general (it would be fine for generating alias checks but a bit more
> >> > pessimistic).
> >> >
> >> > But maybe I am missing something?
> >>
> >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
> >
> > Hmm, so mathematically this is
> >
> >   (@0 - @2) % modreduce + @1 > @1 * 2
>
> Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2
> (and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false
> while we want it to be true.  But...
>
> > then, but I don't want to think too much about this since Marc didn't
> > object here ;)
>
> ...we start out converting the range check for A to:
>
>    !(-@1 <= @0 - @2 <= @1)
>
> i.e. @1 + 1 bytes at @0 and @2 are free from overlap.  Then we just
> apply the usual trick:
>
>    A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B
>
> Substituting @0 - @2 for X, -@1 for A and @1 for B gives:
>
>    !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2))
> ->   (unsigned) (@0 - @2 + @1) >  (unsigned) (@1 * 2)
>
> Then the B case is just an off-by-one variant.
>
> Patch tested as before.

OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2018-08-01  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * match.pd: Optimise pointer range checks.
>
> gcc/testsuite/
>         * gcc.dg/pointer-range-check-1.c: New test.
>         * gcc.dg/pointer-range-check-2.c: Likewise.
>
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-07-24 19:04:14.985363724 +0100
> +++ gcc/match.pd        2018-08-01 10:55:10.695766411 +0100
> @@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> +
> +/* For pointers @0 and @2 and nonnegative constant offset @1, look for
> +   expressions like:
> +
> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> +
> +   If pointers are known not to wrap, B checks whether @1 bytes starting
> +   at @0 and @2 do not overlap, while A tests the same thing for @1 + 1
> +   bytes.  A is more efficiently tested as:
> +
> +   A: (sizetype) (@0 + @1 - @2) > @1 * 2
> +
> +   The equivalent expression for B is given by replacing @1 with @1 - 1:
> +
> +   B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2
> +
> +   @0 and @2 can be swapped in both expressions without changing the result.
> +
> +   The folds rely on sizetype's being unsigned (which is always true)
> +   and on its being the same width as the pointer (which we have to check).
> +
> +   The fold replaces two pointer_plus expressions, two comparisons and
> +   an IOR with a pointer_plus, a pointer_diff, and a comparison, so in
> +   the best case it's a saving of two operations.  The A fold retains one
> +   of the original pointer_pluses, so is a win even if both pointer_pluses
> +   are used elsewhere.  The B fold is a wash if both pointer_pluses are
> +   used elsewhere, since all we end up doing is replacing a comparison with
> +   a pointer_plus.  We do still apply the fold under those circumstances
> +   though, in case applying it to other conditions eventually makes one of the
> +   pointer_pluses dead.  */
> +(for ior (truth_orif truth_or bit_ior)
> + (for cmp (le lt)
> +  (simplify
> +   (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2)
> +       (cmp:cs (pointer_plus@4 @2 @1) @0))
> +   (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +       && TYPE_OVERFLOW_WRAPS (sizetype)
> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype))
> +    /* Calculate the rhs constant.  */
> +    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0);
> +           offset_int rhs = off * 2; }
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))
> +      /* Since the order of @0 and @2 doesn't matter, let tree_swap_operands_p
> +        pick a canonical order.  This increases the chances of using the
> +        same pointer_plus in multiple checks.  */
> +      (with { bool swap_p = tree_swap_operands_p (@0, @2);
> +             tree rhs_tree = wide_int_to_tree (sizetype, rhs); }
> +       (if (cmp == LT_EXPR)
> +       (gt (convert:sizetype
> +            (pointer_diff:ssizetype { swap_p ? @4 : @3; }
> +                                    { swap_p ? @0 : @2; }))
> +           { rhs_tree; })
> +       (gt (convert:sizetype
> +            (pointer_diff:ssizetype
> +             (pointer_plus { swap_p ? @2 : @0; }
> +                           { wide_int_to_tree (sizetype, off); })
> +             { swap_p ? @0 : @2; }))
> +           { rhs_tree; })))))))))
> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
> ===================================================================
> --- /dev/null   2018-07-26 10:26:13.137955424 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c        2018-08-01 10:55:10.695766411 +0100
> @@ -0,0 +1,37 @@
> +/* { dg-do compile { target { ilp32 || lp64 } } } */
> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
> +
> +/* All four functions should be folded to:
> +
> +   (sizetype) (a + 15 - b) < 30.  */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) | (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) | (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
> ===================================================================
> --- /dev/null   2018-07-26 10:26:13.137955424 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c        2018-08-01 10:55:10.695766411 +0100
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) | (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) | (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */
Marc Glisse Aug. 1, 2018, 12:18 p.m. UTC | #10
On Wed, 1 Aug 2018, Richard Sandiford wrote:

> +/* For pointers @0 and @2 and nonnegative constant offset @1, look for
> +   expressions like:
> +
> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)

Once this is in, we may want to consider the opposite:

(@0 + @1 > @2) & (@2 + @1 > @0)

> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))

I guess we could simplify to 'true' in the 'else' case but that's less
interesting.

>> Turning multiple comparisons of the form P + cst CMP Q + cst into a
>> range check on P - Q sounds good (we don't really have to restrict to
>> the case where the range is symmetric). Actually, just turning P + cst
>> CMP Q + cst into P - Q CMP cst should do it, we should already have code
>> to handle range checking on integers (modulo the issue of CSE P-Q and
>> Q-P). But I don't know if a couple :s is sufficient to make this
>> transformation a good canonicalization.
>
> Yeah.  Like you say, in the cases being handled by the patch, folding the
> two comparisons separately and then folding the IOR would require either
>
> (a) folding:
>       P + cst < Q
>    to the rather unnatural-looking:
>       Q - P > -cst
>    when tree_swap_operands_p (P, Q) or

Is it unnatural? If it helps other optimizations, it doesn't look that
bad to me ;-)

One issue is if we start mixing forms because only one is single_use:
@0 + @1 < @2 | @1 < @0 - @2

> (b) making the range fold handle reversed pointer_diffs (which I guess
>    makes more sense).

It would be interesting to have a way to write:

(plus @0 (opposite@1 @0))

which would match if @0 is a-b and @1 is b-a or anything similar (I
don't want to repeat all the cases of negate, minus, pointer_diff, etc
in each transformation), but

(match (opposite (minus @0 @1)) (minus @1 @0))

is not a valid syntax. More likely we would write

(plus @0 @1) (if (opposite_p (@0, @1))

with opposite_p defined outside of match.pd :-(

Maybe there is a way to simulate binary predicates on @0 @1 using unary
predicates on (artificial @0 @1).

(looks like I forgot to add pointer_diff to negate_expr_p)

>> If we start from a comparison of pointer_plus, I think it would make
>> sense to use pointer_diff.
>>
>> I believe currently we try to use pointer operations (pointer_plus,
>> pointer_diff, lt) only for related pointers and we cast to some integer
>> type for wilder cases (implementation of std::less in C++ for instance).
>> On the other hand, in an alias check, the 2 pointers are possibly
>> unrelated, so maybe the code emitted for an alias check should be
>> changed.
>
> If we can prove that the pointers are to different objects, it would
> be valid to fold the check to "no alias", regardless of the constant
> (although in practice we should have weeded out those cases before
> generating the check).  In that sense, relying on the UBness of
> comparing pointers to different objects would be fine.  If there's a
> risk of the check being folded to "alias" when the pointers are known
> to point to different objects then that would be more of a problem
> (as a missed optimisation).

Assuming they are related, we are also assuming that pointer_diff will
not overflow. But for unrelated objects, in particular on 32bit targets,
pointer differences cannot all be represented by a 32 bit ptrdiff_t (the
differences live in a range twice that size), and doing comparisons on
it can have strange effects. In this particular case, I don't really see
a way things could break (you would somehow need one pointer close to 0
and the other close to 2^32 so they end up close modulo 2^32, but that
would require @2+@1 to overflow which means the loop will never run that
far anyway).

But we are still lying and taking a risk that some other optimization
will trust us. (I am ok with keeping the current alias code, just saying
that it involves a small, possibly negligible risk)
diff mbox series

Patch

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-07-18 18:44:22.565914281 +0100
+++ gcc/match.pd	2018-07-20 11:24:33.692045585 +0100
@@ -4924,3 +4924,37 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+
+/* For pointers @0 and @2 and unsigned constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting at
+   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
+   A is more efficiently tested as:
+
+   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
+
+   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
+   with @1 - 1.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
+	(cmp (pointer_plus:s @2 @1) @0))
+   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+    /* Convert the B form to the A form.  */
+    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
+     /* Always fails for negative values.  */
+     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
+      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
+	 tree_swap_operands_p pick a canonical order.  */
+      (with { tree ptr1 = @0, ptr2 = @2;
+	      if (tree_swap_operands_p (ptr1, ptr2))
+		std::swap (ptr1, ptr2); }
+       (gt (plus (minus (convert:sizetype { ptr1; })
+			(convert:sizetype { ptr2; }))
+		 { wide_int_to_tree (sizetype, off); })
+	   { wide_int_to_tree (sizetype, off * 2); }))))))))
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===================================================================
--- /dev/null	2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c	2018-07-20 11:24:33.692045585 +0100
@@ -0,0 +1,37 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
+
+/* All four functions should be folded to:
+
+   ((sizetype) a - (sizetype) b + 15) < 30.  */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
===================================================================
--- /dev/null	2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-2.c	2018-07-20 11:24:33.692045585 +0100
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
+
+_Bool
+f1 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f2 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+_Bool
+f3 (char *a, char *b)
+{
+  return (a + 16 <= b) || (b + 16 <= a);
+}
+
+_Bool
+f4 (char *a, char *b)
+{
+  return (a + 15 < b) || (b + 15 < a);
+}
+
+/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
+/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */