Message ID | cb770c02-a88c-4aa3-b834-a64e175ae1f4@redhat.com |
---|---|
State | New |
Headers | show |
Series | Handle strncpy in tree-ssa-dse.c | expand |
On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote: > > > While looking at BZ 80576 I realized a few things. > > First for STRNCPY we know the exact count of bytes written and we can > treat it just like MEMCPY and others, both in terms of removing/trimming > them and in terms of using them to allow removal of other stores. > > This patch adds support for those routines in DSE. We test that > subsequent statements can make those calls dead and vice versa and that > we can trim from the head or tail appropriately. > > While writing that code I also stumbled over a blob of code that I think > I copied from tree-ssa-alias.c that isn't necessary. In the relevant > code the byte count is always found in the same place. There's no need > to check the number of operands to the call to figure out where the > count would be. So that little blob of code is simplified ever so slightly. > > Finally, while writing the tests for strncpy I stumbled over a case that > we're still not handling well. > > In particular something like this: > > > > void h (char *s) > { > extern char a[8]; > __builtin_memset (a, 0, sizeof a); > __builtin_strncpy (a, s, sizeof a); > frob (a); > } > > In this case ref_maybe_used_by_stmt_p returns true for the "a" array at > the strncpy call. AFAICT that appears to happen because "a" and "s" > could alias each other. > > strncpy is documented as not allowing overlap between the source and > destination objects. So in theory we could consider them not aliasing > for this call. I haven't implemented this, but I've got some ideas > here. But it does allow things like strncpy (&a[0], &a[n+1], n) for example? I think this kind of specialities should be handled in ref_maybe_used_by_call_p_1 directly, but I'm not exactly sure how - it's probably another case of a missing must-alias query, with that we could do /* If REF overlaps with the strncpy destination then the source argument may not alias it. */ if (refs_must_alias_p (ref_for_strncpy_dest, ref)) return false; hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping source and destination) and the above strncpy we have a must-alias (not kill!) of REF but also are using it. So it's not so easy and would cover only very specific cases. > Anyway, I've included an xfailed test for this case in this patch. > > Bootstrapped and regression tested on x86_64, ppc64, ppc64le, aarch64 & > sparc64. Installing on the trunk momentarily. > > We could in theory handle stpncpy too, we just have to be more careful > with its return value. > > Jeff
On 7/22/19 2:01 AM, Richard Biener wrote: > On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote: >> >> >> While looking at BZ 80576 I realized a few things. >> >> First for STRNCPY we know the exact count of bytes written and we can >> treat it just like MEMCPY and others, both in terms of removing/trimming >> them and in terms of using them to allow removal of other stores. >> >> This patch adds support for those routines in DSE. We test that >> subsequent statements can make those calls dead and vice versa and that >> we can trim from the head or tail appropriately. >> >> While writing that code I also stumbled over a blob of code that I think >> I copied from tree-ssa-alias.c that isn't necessary. In the relevant >> code the byte count is always found in the same place. There's no need >> to check the number of operands to the call to figure out where the >> count would be. So that little blob of code is simplified ever so slightly. >> >> Finally, while writing the tests for strncpy I stumbled over a case that >> we're still not handling well. >> >> In particular something like this: >> >> >> >> void h (char *s) >> { >> extern char a[8]; >> __builtin_memset (a, 0, sizeof a); >> __builtin_strncpy (a, s, sizeof a); >> frob (a); >> } >> >> In this case ref_maybe_used_by_stmt_p returns true for the "a" array at >> the strncpy call. AFAICT that appears to happen because "a" and "s" >> could alias each other. >> >> strncpy is documented as not allowing overlap between the source and >> destination objects. So in theory we could consider them not aliasing >> for this call. I haven't implemented this, but I've got some ideas >> here. > > But it does allow things like strncpy (&a[0], &a[n+1], n) for example? Given that they're not allowed to overlap, I would think not. If that were allowed then the code which aggressively transforms strncpy to memcpy would need to be disabled (or at least throttled back) as well. > > I think this kind of specialities should be handled in > ref_maybe_used_by_call_p_1 > directly, but I'm not exactly sure how - it's probably another case of > a missing must-alias query, with that we could do > > /* If REF overlaps with the strncpy destination then the source argument > may not alias it. */ > if (refs_must_alias_p (ref_for_strncpy_dest, ref)) > return false; > > hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping > source and destination) and the above strncpy we have a must-alias > (not kill!) of REF but also are using it. > > So it's not so easy and would cover only very specific cases. I'd been working under the assumption there would be nothing we could do from a global standpoint in the alias oracle. Except for trivial straightline functions, the call is always going to be in some kind of control context. Thus the ability to exploit would be context dependent on the control path leading to the call and not globally true. With that in mind I was thinking that we could tackle in DSE. Essentially asking if there's an overlap between the object we're tracking for DSE and the destination of the str[n]cpy just before the call to ref_maybe_used_by_stmt_p. Obviously any such check has to avoid doing the wrong thing for memmove and any other call where the source/destination are allowed to overlap. That work would be a few patches deep in the queue of things I'm looking at. First up is handling strcpy in tree-ssa-dse.c as well as non-constant write sizes, neither of which are particularly complex. Jeff
On 7/22/19 8:55 AM, Jeff Law wrote: > On 7/22/19 2:01 AM, Richard Biener wrote: >> On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote: >>> >>> >>> While looking at BZ 80576 I realized a few things. >>> >>> First for STRNCPY we know the exact count of bytes written and we can >>> treat it just like MEMCPY and others, both in terms of removing/trimming >>> them and in terms of using them to allow removal of other stores. >>> >>> This patch adds support for those routines in DSE. We test that >>> subsequent statements can make those calls dead and vice versa and that >>> we can trim from the head or tail appropriately. >>> >>> While writing that code I also stumbled over a blob of code that I think >>> I copied from tree-ssa-alias.c that isn't necessary. In the relevant >>> code the byte count is always found in the same place. There's no need >>> to check the number of operands to the call to figure out where the >>> count would be. So that little blob of code is simplified ever so slightly. >>> >>> Finally, while writing the tests for strncpy I stumbled over a case that >>> we're still not handling well. >>> >>> In particular something like this: >>> >>> >>> >>> void h (char *s) >>> { >>> extern char a[8]; >>> __builtin_memset (a, 0, sizeof a); >>> __builtin_strncpy (a, s, sizeof a); >>> frob (a); >>> } >>> >>> In this case ref_maybe_used_by_stmt_p returns true for the "a" array at >>> the strncpy call. AFAICT that appears to happen because "a" and "s" >>> could alias each other. >>> >>> strncpy is documented as not allowing overlap between the source and >>> destination objects. So in theory we could consider them not aliasing >>> for this call. I haven't implemented this, but I've got some ideas >>> here. >> >> But it does allow things like strncpy (&a[0], &a[n+1], n) for example? > Given that they're not allowed to overlap, I would think not. If that > were allowed then the code which aggressively transforms strncpy to > memcpy would need to be disabled (or at least throttled back) as well. I think there's some (maybe too much) room for interpretation here as to exactly what the sentence If copying takes place between objects that overlap, the behavior is undefined. means. Taken to an extreme, one might say that the following violates the requirement: char a[4] = "123\0"; strcpy (a, a + 2); because the call copies bytes within the same object (the array a) which inevitably overlaps itself. But I'm pretty sure that's not the intended interpretation -- the object itself does overlap but not the bytes that are copied. (This is also the view -Wrestrict takes.) If it were undefined, then so would be the following: memcpy (a, a + 2, 2); With that in mind, I would be inclined to expect the following not to violate the requirement either: strncpy (a, a + 2, 4); because the bytes that are copied do not overlap. With a set to "123\0" as done above it's equivalent to: memcpy (a, a + 2, 2); memset (a + 2, 0, 2); Admittedly, the examples aren't exactly the same which makes this question interesting. Is it worth raising an interpretation request with WG14? Martin >> I think this kind of specialities should be handled in >> ref_maybe_used_by_call_p_1 >> directly, but I'm not exactly sure how - it's probably another case of >> a missing must-alias query, with that we could do >> >> /* If REF overlaps with the strncpy destination then the source argument >> may not alias it. */ >> if (refs_must_alias_p (ref_for_strncpy_dest, ref)) >> return false; >> >> hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping >> source and destination) and the above strncpy we have a must-alias >> (not kill!) of REF but also are using it. >> >> So it's not so easy and would cover only very specific cases. > I'd been working under the assumption there would be nothing we could do > from a global standpoint in the alias oracle. Except for trivial > straightline functions, the call is always going to be in some kind of > control context. Thus the ability to exploit would be context dependent > on the control path leading to the call and not globally true. > > With that in mind I was thinking that we could tackle in DSE. > Essentially asking if there's an overlap between the object we're > tracking for DSE and the destination of the str[n]cpy just before the > call to ref_maybe_used_by_stmt_p. > > Obviously any such check has to avoid doing the wrong thing for memmove > and any other call where the source/destination are allowed to overlap. > > That work would be a few patches deep in the queue of things I'm looking > at. First up is handling strcpy in tree-ssa-dse.c as well as > non-constant write sizes, neither of which are particularly complex. > > > > Jeff >
On 7/22/19 9:40 AM, Martin Sebor wrote: >> Given that they're not allowed to overlap, I would think not. If that >> were allowed then the code which aggressively transforms strncpy to >> memcpy would need to be disabled (or at least throttled back) as well. > > I think there's some (maybe too much) room for interpretation here > as to exactly what the sentence > > If copying takes place between objects that overlap, the behavior > is undefined. > > means. Taken to an extreme, one might say that the following > violates the requirement: > > char a[4] = "123\0"; > strcpy (a, a + 2); > > because the call copies bytes within the same object (the array a) > which inevitably overlaps itself. But I'm pretty sure that's not > the intended interpretation -- the object itself does overlap but > not the bytes that are copied. (This is also the view -Wrestrict > takes.) If it were undefined, then so would be the following: > > memcpy (a, a + 2, 2); > > With that in mind, I would be inclined to expect the following not > to violate the requirement either: > > strncpy (a, a + 2, 4); > > because the bytes that are copied do not overlap. With a set to > "123\0" as done above it's equivalent to: > > memcpy (a, a + 2, 2); > memset (a + 2, 0, 2); > > Admittedly, the examples aren't exactly the same which makes this > question interesting. Is it worth raising an interpretation request > with WG14? Hmm, I'd tend to go with the more conservative interpretation. ie, something like your first example is OK since there is no overlap between the two regions accessed within the array. If I confused things by dismissing Richi's example too quickly, then I apologize. Can't hurt to get clarification though. Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 08f91ed32db..d8f60042ac1 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2019-07-19 Jeff Law <law@redhat.com> + + * tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle + strncpy. Drop some trivial dead code. + (maybe_trim_memstar_call): Handle strncpy. + 2019-07-19 Richard Biener <rguenther@suse.de> PR tree-optimization/91211 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 98fb40ddd96..ce8e3c781b9 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2019-07-19 Jeff Law <law@redhat.com> + + * gcc.dg/tree-ssa/ssa-dse-37.c: New test. + * gcc.dg/tree-ssa/ssa-dse-38.c: New test. + 2019-07-19 Richard Biener <rguenther@suse.de> PR tree-optimization/91211 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c new file mode 100644 index 00000000000..56251fc340f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c @@ -0,0 +1,60 @@ +/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */ + + +#ifndef SCOPE +#define SCOPE +#endif + +extern void frob (char *); + +void g (char *s) +{ + SCOPE char a[8]; + __builtin_strncpy (a, s, sizeof a); + __builtin_memset (a, 0, sizeof a); + frob (a); +} + +void h (char *s) +{ + SCOPE char a[8]; + __builtin_memset (a, 0, sizeof a); + __builtin_strncpy (a, s, sizeof a); + frob (a); +} + +void i (char *s) +{ + SCOPE char a[8]; + __builtin_strncpy (a, s, sizeof a); + __builtin_memset (a, 0, sizeof a - 5); + frob (a); +} + +void j (char *s) +{ + SCOPE char a[8]; + __builtin_memset (a, 0, sizeof a); + __builtin_strncpy (a, s, sizeof a - 5); + frob (a); +} + +void l (char *s) +{ + SCOPE char a[8]; + __builtin_strncpy (a, s, sizeof a); + __builtin_memset (a + 2, 0, sizeof a - 2); + frob (a); +} + +void m (char *s) +{ + SCOPE char a[8]; + __builtin_memset (a, 0, sizeof a); + __builtin_strncpy (a + 2, s, sizeof a - 2); + frob (a); +} + +/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" } } */ +/* { dg-final { scan-tree-dump-times "Trimming statement " 4 "dse1" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c new file mode 100644 index 00000000000..7ae33bfd169 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c @@ -0,0 +1,12 @@ +/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */ + + +/* This changes the scope of the destination object and exposes + missed optimizations in DSE. */ +#define SCOPE extern +#include "ssa-dse-37.c" + +/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Trimming statement " 4 "dse1" { xfail *-*-* } } } */ + + diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 9bdcf9ae6af..5b7c4fc6d1a 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -113,10 +113,10 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write) case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: + case BUILT_IN_STRNCPY: + case BUILT_IN_STRNCPY_CHK: { - tree size = NULL_TREE; - if (gimple_call_num_args (stmt) == 3) - size = gimple_call_arg (stmt, 2); + tree size = gimple_call_arg (stmt, 2); tree ptr = gimple_call_arg (stmt, 0); ao_ref_init_from_ptr_and_size (write, ptr, size); return true; @@ -469,8 +469,10 @@ maybe_trim_memstar_call (ao_ref *ref, sbitmap live, gimple *stmt) { case BUILT_IN_MEMCPY: case BUILT_IN_MEMMOVE: + case BUILT_IN_STRNCPY: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMMOVE_CHK: + case BUILT_IN_STRNCPY_CHK: { int head_trim, tail_trim; compute_trims (ref, live, &head_trim, &tail_trim, stmt); @@ -966,9 +968,11 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) { case BUILT_IN_MEMCPY: case BUILT_IN_MEMMOVE: + case BUILT_IN_STRNCPY: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMMOVE_CHK: + case BUILT_IN_STRNCPY_CHK: case BUILT_IN_MEMSET_CHK: { /* Occasionally calls with an explicit length of zero