Message ID | 20170621144001.GR2123@tucnak |
---|---|
State | New |
Headers | show |
On Wed, Jun 21, 2017 at 04:40:01PM +0200, Jakub Jelinek wrote: > So, I wrote following patch to do the subtraction in unsigned > type. It passes bootstrap, but on both x86_64-linux and i686-linux > regresses: > +FAIL: gcc.dg/torture/pr66178.c -O* (test for excess errors) > +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr" > +FAIL: g++.dg/tree-ssa/pr21082.C -std=gnu++* (test for excess errors) Another option is to do what the patch does only when sanitizing and accept in that case less efficient code and rejection of weird corner case testcases like the first one. We risk miscompilation of the pointer differences, but I haven't managed to come up with a testcase where it would show (I guess more likely is when we propagate constants into the pointers). Jakub
On Wed, 21 Jun 2017, Jakub Jelinek wrote: > So, I wrote following patch to do the subtraction in unsigned > type. It passes bootstrap, but on both x86_64-linux and i686-linux > regresses: > +FAIL: gcc.dg/torture/pr66178.c -O* (test for excess errors) > +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr" > +FAIL: g++.dg/tree-ssa/pr21082.C -std=gnu++* (test for excess errors) > > E.g. in the first testcase we have in the test: > static uintptr_t a = ((char *)&&l2-(char *)&&l3)+((char *)&&l1-(char *)&&l2); > Without the patch, we ended up with: > static uintptr_t a = (uintptr_t) (((long int) &l2 - (long int) &l3) + ((long int) &l1 - (long int) &l2)); > but with the patch with (the negation in signed type sounds like a folding > bug), which is too difficult for the initializer_constant_valid_p* handling: > (uintptr_t) (((long unsigned int) -(long int) &l3 - (long unsigned int) &l2) + ((long unsigned int) &l2 + (long unsigned int) &l1)); > Shall we just xfail that test, or make sure we don't reassociate such > subtractions, something different? Adding to match.pd a few more simple reassoc transformations (like (c-b)+(b-a) for instance) that work for both signed and unsigned is on my TODO-list, though that may not be enough. Maybe together with fixing whatever produced the negation would suffice? > The second failure is on: > int f (long *a, long *b, long *c) { > __PTRDIFF_TYPE__ l1 = b - a; > __PTRDIFF_TYPE__ l2 = c - a; > return l1 < l2; > } > where without the patch during forwprop2 we optimize it > using match.pd: > /* X - Z < Y - Z is the same as X < Y when there is no overflow. */ > because we had: > b.0_1 = (long int) b_8(D); > a.1_2 = (long int) a_9(D); > _3 = b.0_1 - a.1_2; > c.2_4 = (long int) c_11(D); > a.3_5 = (long int) a_9(D); > _6 = c.2_4 - a.3_5; > _7 = _3 < _6; > But with the patch we have: > b.0_1 = (long unsigned int) b_9(D); > a.1_2 = (long unsigned int) a_10(D); > _3 = b.0_1 - a.1_2; > _4 = (long int) _3; > c.2_5 = (long unsigned int) c_11(D); > _6 = c.2_5 - a.1_2; > _7 = (long int) _6; > _8 = _4 < _7; > instead. But that is something we can't generally optimize. > So do we need to introduce POINTER_DIFF (where we could still > optimize this) or remove the test? If we rely on largest possible > array to be half of the VA size - 1 (i.e. where for x > y both being > pointers into the same array x - y > 0), then it is a valid optimization > of the 2 pointer subtractions, but it is not a valid optimization on > comparison of unsigned subtractions cast to signed type. (this testcase was meant as a simpler version of vector.size() < vector.capacity() ) It does indeed seem impossible to do this optimization with the unsigned pointer subtraction. If we consider pointers as unsigned, with a subtraction that has a signed result with the constraint that overflow is undefined, we cannot model that optimally with just the usual signed/unsigned operations, so I am in favor of POINTER_DIFF, at least in the long run (together with having a signed second argument for POINTER_PLUS, etc). For 64-bit platforms it might have been easier to declare that the upper half (3/4 ?) of the address space doesn't exist... > The third one is > if (&a[b] - &a[c] != b - c) > link_error(); > where fold already during generic folding used to be able to cope with it, > but now we have: > (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 != b - c > which we don't fold. Once we have this last expression, we have lost, we need to know that the multiplication cannot overflow for this. When the size multiplications are done in a signed type in the future (?), it might help. On the other hand, is this an important optimization? I am surprised we are only doing this transformation in generic (so some hack in the front-end could still work), it shouldn't be hard to implement some subset of fold_addr_of_array_ref_difference in match.pd (it is recursive so a complete move may be harder). But that would make your patch break even more stuff :-(
On Wed, 21 Jun 2017, Marc Glisse wrote: > On Wed, 21 Jun 2017, Jakub Jelinek wrote: > > > So, I wrote following patch to do the subtraction in unsigned > > type. It passes bootstrap, but on both x86_64-linux and i686-linux > > regresses: > > +FAIL: gcc.dg/torture/pr66178.c -O* (test for excess errors) > > +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized > > "minus_expr" > > +FAIL: g++.dg/tree-ssa/pr21082.C -std=gnu++* (test for excess errors) > > > > E.g. in the first testcase we have in the test: > > static uintptr_t a = ((char *)&&l2-(char *)&&l3)+((char *)&&l1-(char > > *)&&l2); > > Without the patch, we ended up with: > > static uintptr_t a = (uintptr_t) (((long int) &l2 - (long int) &l3) + ((long > > int) &l1 - (long int) &l2)); > > but with the patch with (the negation in signed type sounds like a folding > > bug), which is too difficult for the initializer_constant_valid_p* handling: > > (uintptr_t) (((long unsigned int) -(long int) &l3 - (long unsigned int) &l2) > > + ((long unsigned int) &l2 + (long unsigned int) &l1)); > > Shall we just xfail that test, or make sure we don't reassociate such > > subtractions, something different? > > Adding to match.pd a few more simple reassoc transformations (like > (c-b)+(b-a) for instance) that work for both signed and unsigned is on > my TODO-list, though that may not be enough. Maybe together with fixing > whatever produced the negation would suffice? I guess try to fix the negation and see if that magically fixes things... > > The second failure is on: > > int f (long *a, long *b, long *c) { > > __PTRDIFF_TYPE__ l1 = b - a; > > __PTRDIFF_TYPE__ l2 = c - a; > > return l1 < l2; > > } > > where without the patch during forwprop2 we optimize it > > using match.pd: > > /* X - Z < Y - Z is the same as X < Y when there is no overflow. */ > > because we had: > > b.0_1 = (long int) b_8(D); > > a.1_2 = (long int) a_9(D); > > _3 = b.0_1 - a.1_2; > > c.2_4 = (long int) c_11(D); > > a.3_5 = (long int) a_9(D); > > _6 = c.2_4 - a.3_5; > > _7 = _3 < _6; > > But with the patch we have: > > b.0_1 = (long unsigned int) b_9(D); > > a.1_2 = (long unsigned int) a_10(D); > > _3 = b.0_1 - a.1_2; > > _4 = (long int) _3; > > c.2_5 = (long unsigned int) c_11(D); > > _6 = c.2_5 - a.1_2; > > _7 = (long int) _6; > > _8 = _4 < _7; > > instead. But that is something we can't generally optimize. > > So do we need to introduce POINTER_DIFF (where we could still > > optimize this) or remove the test? If we rely on largest possible > > array to be half of the VA size - 1 (i.e. where for x > y both being > > pointers into the same array x - y > 0), then it is a valid optimization > > of the 2 pointer subtractions, but it is not a valid optimization on > > comparison of unsigned subtractions cast to signed type. > > (this testcase was meant as a simpler version of > vector.size() < vector.capacity() ) > > It does indeed seem impossible to do this optimization with the unsigned > pointer subtraction. Yep :/ This is the issue with all the non-pointer integer folding fixes as well -- as soon as we introduce unsigned ops for the fear of introducing undefined overflow we lose on followup optimization opportunities. > If we consider pointers as unsigned, with a subtraction that has a signed > result with the constraint that overflow is undefined, we cannot model that > optimally with just the usual signed/unsigned operations, so I am in favor of > POINTER_DIFF, at least in the long run (together with having a signed second > argument for POINTER_PLUS, etc). For 64-bit platforms it might have been > easier to declare that the upper half (3/4 ?) of the address space doesn't > exist... I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree code is quite a big job. So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce ptrdiff_t. What's the advantage of having this? And yes, I agree that POINTER_PLUS_EXPR should take ptrdiff_t rather than sizetype offset -- changing one without the other will lead to awkwardness in required patterns involving both like (p - q) + q. As said, it's a big job with likely all sorts of (testsuite) fallout. > > The third one is > > if (&a[b] - &a[c] != b - c) > > link_error(); > > where fold already during generic folding used to be able to cope with it, > > but now we have: > > (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 != > > b - c > > which we don't fold. > > Once we have this last expression, we have lost, we need to know that the > multiplication cannot overflow for this. When the size multiplications are > done in a signed type in the future (?), it might help. Not sure where the unsigned multiply comes from -- I guess we fold it inside the cast ... > On the other hand, is this an important optimization? I am surprised we are > only doing this transformation in generic (so some hack in the front-end could > still work), it shouldn't be hard to implement some subset of > fold_addr_of_array_ref_difference in match.pd (it is recursive so a complete > move may be harder). But that would make your patch break even more stuff :-( It's always the question whether the above testsuite regressions have any real-world impact of course. I don't like the idea of not doing foldings condidional on sanitization too much. Richard.
On Thu, 22 Jun 2017, Richard Biener wrote: >> If we consider pointers as unsigned, with a subtraction that has a signed >> result with the constraint that overflow is undefined, we cannot model that >> optimally with just the usual signed/unsigned operations, so I am in favor of >> POINTER_DIFF, at least in the long run (together with having a signed second >> argument for POINTER_PLUS, etc). For 64-bit platforms it might have been >> easier to declare that the upper half (3/4 ?) of the address space doesn't >> exist... > > I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree > code is quite a big job. Yes :-( It is probably not realistic to introduce it just to avoid a couple regressions while fixing a bug. > So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce > ptrdiff_t. What's the advantage of having this? It represents q-p with one statement instead of 3 (long)q-(long)p or 4 (long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so (q-p)>0 is equivalent to p<q, not just (long)p<(long)q. It properly models what (undefined) overflow means for pointers. Of course it is hard to know in advance if that's significant or negligible, maybe size_t finds its way in too many places anyway. > And yes, I agree that POINTER_PLUS_EXPR should take > ptrdiff_t rather than sizetype offset -- changing one without the > other will lead to awkwardness in required patterns involving > both like (p - q) + q. > > As said, it's a big job with likely all sorts of (testsuite) fallout. > >>> The third one is >>> if (&a[b] - &a[c] != b - c) >>> link_error(); >>> where fold already during generic folding used to be able to cope with it, >>> but now we have: >>> (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 != >>> b - c >>> which we don't fold. >> >> Once we have this last expression, we have lost, we need to know that the >> multiplication cannot overflow for this. When the size multiplications are >> done in a signed type in the future (?), it might help. > > Not sure where the unsigned multiply comes from -- I guess we fold > it inside the cast ... We usually do those multiplications in an unsigned type. I experimented with changing one such place in https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01641.html , there is probably at least another one in the middle-end.
On Thu, 22 Jun 2017, Marc Glisse wrote: > On Thu, 22 Jun 2017, Richard Biener wrote: > > > > If we consider pointers as unsigned, with a subtraction that has a signed > > > result with the constraint that overflow is undefined, we cannot model > > > that > > > optimally with just the usual signed/unsigned operations, so I am in favor > > > of > > > POINTER_DIFF, at least in the long run (together with having a signed > > > second > > > argument for POINTER_PLUS, etc). For 64-bit platforms it might have been > > > easier to declare that the upper half (3/4 ?) of the address space doesn't > > > exist... > > > > I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree > > code is quite a big job. > > Yes :-( > It is probably not realistic to introduce it just to avoid a couple > regressions while fixing a bug. > > > So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce > > ptrdiff_t. What's the advantage of having this? > > It represents q-p with one statement instead of 3 (long)q-(long)p or 4 > (long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so > (q-p)>0 is equivalent to p<q, not just (long)p<(long)q. It properly models > what (undefined) overflow means for pointers. > > Of course it is hard to know in advance if that's significant or > negligible, maybe size_t finds its way in too many places anyway. As with all those experiments ... Well, if I would sell this as a consultant to somebody I'd estimate 3 man months for this work which realistically means you have to start now otherwise you won't make it this stage 1. A smaller job would be to make POINTER_PLUS_EXPR take ptrdiff_t as offset operand. But the fallout is likely similar. A lame(?) half-way transition would allow for both unsigned and signed ptrdiff_t (note sizetype -> [u]ptrdiff_t is already a transition for some embedded archs eventually). Maybe allowing both signed and unsigned offsets is desirable (you of course get interesting effects when combining those in GIMPLE where signedness matters). Richard. > > And yes, I agree that POINTER_PLUS_EXPR should take > > ptrdiff_t rather than sizetype offset -- changing one without the > > other will lead to awkwardness in required patterns involving > > both like (p - q) + q. > > > > As said, it's a big job with likely all sorts of (testsuite) fallout. > > > > > > The third one is > > > > if (&a[b] - &a[c] != b - c) > > > > link_error(); > > > > where fold already during generic folding used to be able to cope with > > > > it, > > > > but now we have: > > > > (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 > > > > != > > > > b - c > > > > which we don't fold. > > > > > > Once we have this last expression, we have lost, we need to know that the > > > multiplication cannot overflow for this. When the size multiplications are > > > done in a signed type in the future (?), it might help. > > > > Not sure where the unsigned multiply comes from -- I guess we fold > > it inside the cast ... > > We usually do those multiplications in an unsigned type. I experimented > with changing one such place in > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01641.html , there is > probably at least another one in the middle-end. > >
--- gcc/c/c-typeck.c.jj 2017-06-21 12:46:10.130822026 +0200 +++ gcc/c/c-typeck.c 2017-06-21 13:22:45.868310030 +0200 @@ -3781,7 +3781,7 @@ static tree pointer_diff (location_t loc, tree op0, tree op1) { tree restype = ptrdiff_type_node; - tree result, inttype; + tree result, inttype, uinttype; addr_space_t as0 = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op0))); addr_space_t as1 = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op1))); @@ -3809,11 +3809,21 @@ pointer_diff (location_t loc, tree op0, /* Determine integer type to perform computations in. This will usually be the same as the result type (ptrdiff_t), but may need to be a wider - type if pointers for the address space are wider than ptrdiff_t. */ + type if pointers for the address space are wider than ptrdiff_t. + The subtraction should be performed in the unsigned type, because for + the case where one pointer is above the middle of the address space + and the other pointer is below the middle of the address space + we'd otherwise introduce undefined behavior during the subtraction. */ if (TYPE_PRECISION (restype) < TYPE_PRECISION (TREE_TYPE (op0))) - inttype = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (op0)), 0); + { + inttype = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (op0)), 0); + uinttype = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (op0)), 1); + } else - inttype = restype; + { + inttype = restype; + uinttype = c_common_type_for_size (TYPE_PRECISION (restype), 1); + } if (TREE_CODE (target_type) == VOID_TYPE) pedwarn (loc, OPT_Wpointer_arith, @@ -3822,14 +3832,15 @@ pointer_diff (location_t loc, tree op0, pedwarn (loc, OPT_Wpointer_arith, "pointer to a function used in subtraction"); - /* First do the subtraction as integers; + /* First do the subtraction as unsigned integers; + then convert to signed integer and then drop through to build the divide operator. Do not do default conversions on the minus operator in case restype is a short type. */ - op0 = build_binary_op (loc, - MINUS_EXPR, convert (inttype, op0), - convert (inttype, op1), 0); + op0 = build_binary_op (loc, MINUS_EXPR, convert (uinttype, op0), + convert (uinttype, op1), 0); + op0 = convert (inttype, op0); /* This generates an error if op1 is pointer to incomplete type. */ if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (TREE_TYPE (orig_op1)))) error_at (loc, "arithmetic on pointer to an incomplete type"); --- gcc/cp/typeck.c.jj 2017-06-19 08:28:07.000000000 +0200 +++ gcc/cp/typeck.c 2017-06-21 13:26:17.120829891 +0200 @@ -5398,14 +5398,18 @@ pointer_diff (location_t loc, tree op0, return error_mark_node; } - /* First do the subtraction as integers; + tree uinttype = c_common_type_for_size (TYPE_PRECISION (restype), 1); + + /* First do the subtraction as unsigned integers; + then cast to restype and then drop through to build the divide operator. */ op0 = cp_build_binary_op (loc, MINUS_EXPR, - cp_convert (restype, op0, complain), - cp_convert (restype, op1, complain), + cp_convert (uinttype, op0, complain), + cp_convert (uinttype, op1, complain), complain); + op0 = cp_convert (restype, op0, complain); /* This generates an error if op1 is a pointer to an incomplete type. */ if (!COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (op1))))