Handle strncpy in tree-ssa-dse.c
diff mbox series

Message ID cb770c02-a88c-4aa3-b834-a64e175ae1f4@redhat.com
State New
Headers show
Series
  • Handle strncpy in tree-ssa-dse.c
Related show

Commit Message

Jeff Law July 19, 2019, 5:04 p.m. UTC
While looking at BZ 80576 I realized a few things.

First for STRNCPY we know the exact count of bytes written and we can
treat it just like MEMCPY and others, both in terms of removing/trimming
them and in terms of using them to allow removal of other stores.

This patch adds support for those routines in DSE.  We test that
subsequent statements can make those calls dead and vice versa and that
we can trim from the head or tail appropriately.

While writing that code I also stumbled over a blob of code that I think
I copied from tree-ssa-alias.c that isn't necessary.  In the relevant
code the byte count is always found in the same place.  There's no need
to check the number of operands to the call to figure out where the
count would be.  So that little blob of code is simplified ever so slightly.

Finally, while writing the tests for strncpy I stumbled over a case that
we're still not handling well.

In particular something like this:



void h (char *s)
{
  extern char a[8];
  __builtin_memset (a, 0, sizeof a);
  __builtin_strncpy (a, s, sizeof a);
  frob (a);
}

In this case ref_maybe_used_by_stmt_p returns true for the "a" array at
the strncpy call.  AFAICT that appears to happen because  "a" and "s"
could alias each other.

strncpy is documented as not allowing overlap between the source and
destination objects.  So in theory we could consider them not aliasing
for this call.  I haven't implemented this, but I've got some ideas
here.  Anyway, I've included an xfailed test for this case in this patch.

Bootstrapped and regression tested on x86_64, ppc64, ppc64le, aarch64 &
sparc64.  Installing on the trunk momentarily.

We could in theory handle stpncpy too, we just have to be more careful
with its return value.

Jeff
commit 844df9c9ed48c2c0e80b633eb4f513d1228ef62d
Author: Jeff Law <law@redhat.com>
Date:   Fri Jul 19 11:03:10 2019 -0600

            * tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle
            strncpy.  Drop some trivial dead code.
            (maybe_trim_memstar_call): Handle strncpy.
    
            * gcc.dg/tree-ssa/ssa-dse-37.c: New test.
            * gcc.dg/tree-ssa/ssa-dse-38.c: New test.

Comments

Richard Biener July 22, 2019, 8:01 a.m. UTC | #1
On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote:
>
>
> While looking at BZ 80576 I realized a few things.
>
> First for STRNCPY we know the exact count of bytes written and we can
> treat it just like MEMCPY and others, both in terms of removing/trimming
> them and in terms of using them to allow removal of other stores.
>
> This patch adds support for those routines in DSE.  We test that
> subsequent statements can make those calls dead and vice versa and that
> we can trim from the head or tail appropriately.
>
> While writing that code I also stumbled over a blob of code that I think
> I copied from tree-ssa-alias.c that isn't necessary.  In the relevant
> code the byte count is always found in the same place.  There's no need
> to check the number of operands to the call to figure out where the
> count would be.  So that little blob of code is simplified ever so slightly.
>
> Finally, while writing the tests for strncpy I stumbled over a case that
> we're still not handling well.
>
> In particular something like this:
>
>
>
> void h (char *s)
> {
>   extern char a[8];
>   __builtin_memset (a, 0, sizeof a);
>   __builtin_strncpy (a, s, sizeof a);
>   frob (a);
> }
>
> In this case ref_maybe_used_by_stmt_p returns true for the "a" array at
> the strncpy call.  AFAICT that appears to happen because  "a" and "s"
> could alias each other.
>
> strncpy is documented as not allowing overlap between the source and
> destination objects.  So in theory we could consider them not aliasing
> for this call.  I haven't implemented this, but I've got some ideas
> here.

But it does allow things like strncpy (&a[0], &a[n+1], n) for example?

I think this kind of specialities should be handled in
ref_maybe_used_by_call_p_1
directly, but I'm not exactly sure how - it's probably another case of
a missing must-alias query, with that we could do

  /* If REF overlaps with the strncpy destination then the source argument
     may not alias it.  */
  if (refs_must_alias_p (ref_for_strncpy_dest, ref))
    return false;

hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping
source and destination) and the above strncpy we have a must-alias
(not kill!) of REF but also are using it.

So it's not so easy and would cover only very specific cases.

>  Anyway, I've included an xfailed test for this case in this patch.
>
> Bootstrapped and regression tested on x86_64, ppc64, ppc64le, aarch64 &
> sparc64.  Installing on the trunk momentarily.
>
> We could in theory handle stpncpy too, we just have to be more careful
> with its return value.
>
> Jeff
Jeff Law July 22, 2019, 2:55 p.m. UTC | #2
On 7/22/19 2:01 AM, Richard Biener wrote:
> On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote:
>>
>>
>> While looking at BZ 80576 I realized a few things.
>>
>> First for STRNCPY we know the exact count of bytes written and we can
>> treat it just like MEMCPY and others, both in terms of removing/trimming
>> them and in terms of using them to allow removal of other stores.
>>
>> This patch adds support for those routines in DSE.  We test that
>> subsequent statements can make those calls dead and vice versa and that
>> we can trim from the head or tail appropriately.
>>
>> While writing that code I also stumbled over a blob of code that I think
>> I copied from tree-ssa-alias.c that isn't necessary.  In the relevant
>> code the byte count is always found in the same place.  There's no need
>> to check the number of operands to the call to figure out where the
>> count would be.  So that little blob of code is simplified ever so slightly.
>>
>> Finally, while writing the tests for strncpy I stumbled over a case that
>> we're still not handling well.
>>
>> In particular something like this:
>>
>>
>>
>> void h (char *s)
>> {
>>   extern char a[8];
>>   __builtin_memset (a, 0, sizeof a);
>>   __builtin_strncpy (a, s, sizeof a);
>>   frob (a);
>> }
>>
>> In this case ref_maybe_used_by_stmt_p returns true for the "a" array at
>> the strncpy call.  AFAICT that appears to happen because  "a" and "s"
>> could alias each other.
>>
>> strncpy is documented as not allowing overlap between the source and
>> destination objects.  So in theory we could consider them not aliasing
>> for this call.  I haven't implemented this, but I've got some ideas
>> here.
> 
> But it does allow things like strncpy (&a[0], &a[n+1], n) for example?
Given that they're not allowed to overlap, I would think not.  If that
were allowed then the code which aggressively transforms strncpy to
memcpy would need to be disabled (or at least throttled back) as well.


> 
> I think this kind of specialities should be handled in
> ref_maybe_used_by_call_p_1
> directly, but I'm not exactly sure how - it's probably another case of
> a missing must-alias query, with that we could do
> 
>   /* If REF overlaps with the strncpy destination then the source argument
>      may not alias it.  */
>   if (refs_must_alias_p (ref_for_strncpy_dest, ref))
>     return false;
> 
> hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping
> source and destination) and the above strncpy we have a must-alias
> (not kill!) of REF but also are using it.
> 
> So it's not so easy and would cover only very specific cases.
I'd been working under the assumption there would be nothing we could do
from a global standpoint in the alias oracle.  Except for trivial
straightline functions, the call is always going to be in some kind of
control context.  Thus the ability to exploit would be context dependent
on the control path leading to the call and not globally true.

With that in mind I was thinking that we could tackle in DSE.
Essentially asking if there's an overlap between the object we're
tracking for DSE and the destination of the str[n]cpy just before the
call to ref_maybe_used_by_stmt_p.

Obviously any such check has to avoid doing the wrong thing for memmove
and any other call where the source/destination are allowed to overlap.

That work would be a few patches deep in the queue of things I'm looking
at.  First up is handling strcpy in tree-ssa-dse.c as well as
non-constant write sizes, neither of which are particularly complex.



Jeff
Martin Sebor July 22, 2019, 3:40 p.m. UTC | #3
On 7/22/19 8:55 AM, Jeff Law wrote:
> On 7/22/19 2:01 AM, Richard Biener wrote:
>> On Fri, Jul 19, 2019 at 7:04 PM Jeff Law <law@redhat.com> wrote:
>>>
>>>
>>> While looking at BZ 80576 I realized a few things.
>>>
>>> First for STRNCPY we know the exact count of bytes written and we can
>>> treat it just like MEMCPY and others, both in terms of removing/trimming
>>> them and in terms of using them to allow removal of other stores.
>>>
>>> This patch adds support for those routines in DSE.  We test that
>>> subsequent statements can make those calls dead and vice versa and that
>>> we can trim from the head or tail appropriately.
>>>
>>> While writing that code I also stumbled over a blob of code that I think
>>> I copied from tree-ssa-alias.c that isn't necessary.  In the relevant
>>> code the byte count is always found in the same place.  There's no need
>>> to check the number of operands to the call to figure out where the
>>> count would be.  So that little blob of code is simplified ever so slightly.
>>>
>>> Finally, while writing the tests for strncpy I stumbled over a case that
>>> we're still not handling well.
>>>
>>> In particular something like this:
>>>
>>>
>>>
>>> void h (char *s)
>>> {
>>>    extern char a[8];
>>>    __builtin_memset (a, 0, sizeof a);
>>>    __builtin_strncpy (a, s, sizeof a);
>>>    frob (a);
>>> }
>>>
>>> In this case ref_maybe_used_by_stmt_p returns true for the "a" array at
>>> the strncpy call.  AFAICT that appears to happen because  "a" and "s"
>>> could alias each other.
>>>
>>> strncpy is documented as not allowing overlap between the source and
>>> destination objects.  So in theory we could consider them not aliasing
>>> for this call.  I haven't implemented this, but I've got some ideas
>>> here.
>>
>> But it does allow things like strncpy (&a[0], &a[n+1], n) for example?
> Given that they're not allowed to overlap, I would think not.  If that
> were allowed then the code which aggressively transforms strncpy to
> memcpy would need to be disabled (or at least throttled back) as well.

I think there's some (maybe too much) room for interpretation here
as to exactly what the sentence

   If copying takes place between objects that overlap, the behavior
   is undefined.

means.  Taken to an extreme, one might say that the following
violates the requirement:

   char a[4] = "123\0";
   strcpy (a, a + 2);

because the call copies bytes within the same object (the array a)
which inevitably overlaps itself.  But I'm pretty sure that's not
the intended  interpretation -- the object itself does overlap but
not the bytes that are copied.  (This is also the view -Wrestrict 
takes.)  If it were undefined, then so would be the following:

   memcpy (a, a + 2, 2);

With that in mind, I would be inclined to expect the following not
to violate the requirement either:

   strncpy (a, a + 2, 4);

because the bytes that are copied do not overlap.  With a set to
"123\0" as done above it's equivalent to:

   memcpy (a, a + 2, 2);
   memset (a + 2, 0, 2);

Admittedly, the examples aren't exactly the same which makes this
question interesting.  Is it worth raising an interpretation request
with WG14?

Martin


>> I think this kind of specialities should be handled in
>> ref_maybe_used_by_call_p_1
>> directly, but I'm not exactly sure how - it's probably another case of
>> a missing must-alias query, with that we could do
>>
>>    /* If REF overlaps with the strncpy destination then the source argument
>>       may not alias it.  */
>>    if (refs_must_alias_p (ref_for_strncpy_dest, ref))
>>      return false;
>>
>> hmm, OTOH for REF covering &a[n/2] to &a[3*n/2] (half overlapping
>> source and destination) and the above strncpy we have a must-alias
>> (not kill!) of REF but also are using it.
>>
>> So it's not so easy and would cover only very specific cases.
> I'd been working under the assumption there would be nothing we could do
> from a global standpoint in the alias oracle.  Except for trivial
> straightline functions, the call is always going to be in some kind of
> control context.  Thus the ability to exploit would be context dependent
> on the control path leading to the call and not globally true.
> 
> With that in mind I was thinking that we could tackle in DSE.
> Essentially asking if there's an overlap between the object we're
> tracking for DSE and the destination of the str[n]cpy just before the
> call to ref_maybe_used_by_stmt_p.
> 
> Obviously any such check has to avoid doing the wrong thing for memmove
> and any other call where the source/destination are allowed to overlap.
> 
> That work would be a few patches deep in the queue of things I'm looking
> at.  First up is handling strcpy in tree-ssa-dse.c as well as
> non-constant write sizes, neither of which are particularly complex.
> 
> 
> 
> Jeff
>
Jeff Law July 22, 2019, 11:22 p.m. UTC | #4
On 7/22/19 9:40 AM, Martin Sebor wrote:
>> Given that they're not allowed to overlap, I would think not.  If that
>> were allowed then the code which aggressively transforms strncpy to
>> memcpy would need to be disabled (or at least throttled back) as well.
> 
> I think there's some (maybe too much) room for interpretation here
> as to exactly what the sentence
> 
>   If copying takes place between objects that overlap, the behavior
>   is undefined.
> 
> means.  Taken to an extreme, one might say that the following
> violates the requirement:
> 
>   char a[4] = "123\0";
>   strcpy (a, a + 2);
> 
> because the call copies bytes within the same object (the array a)
> which inevitably overlaps itself.  But I'm pretty sure that's not
> the intended  interpretation -- the object itself does overlap but
> not the bytes that are copied.  (This is also the view -Wrestrict
> takes.)  If it were undefined, then so would be the following:
> 
>   memcpy (a, a + 2, 2);
> 
> With that in mind, I would be inclined to expect the following not
> to violate the requirement either:
> 
>   strncpy (a, a + 2, 4);
> 
> because the bytes that are copied do not overlap.  With a set to
> "123\0" as done above it's equivalent to:
> 
>   memcpy (a, a + 2, 2);
>   memset (a + 2, 0, 2);
> 
> Admittedly, the examples aren't exactly the same which makes this
> question interesting.  Is it worth raising an interpretation request
> with WG14?
Hmm, I'd tend to go with the more conservative interpretation.  ie,
something like your first example is OK since there is no overlap
between the two regions accessed within the array.

If I confused things by dismissing Richi's example too quickly, then I
apologize.

Can't hurt to get clarification though.

Jeff

Patch
diff mbox series

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 08f91ed32db..d8f60042ac1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@ 
+2019-07-19  Jeff Law  <law@redhat.com>
+
+	* tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle
+	strncpy.  Drop some trivial dead code.
+	(maybe_trim_memstar_call): Handle strncpy.
+
 2019-07-19  Richard Biener  <rguenther@suse.de>
 
 	PR tree-optimization/91211
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 98fb40ddd96..ce8e3c781b9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2019-07-19  Jeff Law  <law@redhat.com>
+
+	* gcc.dg/tree-ssa/ssa-dse-37.c: New test.
+	* gcc.dg/tree-ssa/ssa-dse-38.c: New test.
+
 2019-07-19  Richard Biener  <rguenther@suse.de>
 
 	PR tree-optimization/91211
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c
new file mode 100644
index 00000000000..56251fc340f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-37.c
@@ -0,0 +1,60 @@ 
+/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */
+
+
+#ifndef SCOPE
+#define SCOPE
+#endif
+
+extern void frob (char *);
+
+void g (char *s)
+{
+  SCOPE char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a, 0, sizeof a); 
+  frob (a);
+}
+
+void h (char *s)
+{
+  SCOPE char a[8];
+  __builtin_memset (a, 0, sizeof a); 
+  __builtin_strncpy (a, s, sizeof a);
+  frob (a);
+}
+
+void i (char *s)
+{
+  SCOPE char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a, 0, sizeof a - 5); 
+  frob (a);
+}
+
+void j (char *s)
+{
+  SCOPE char a[8];
+  __builtin_memset (a, 0, sizeof a); 
+  __builtin_strncpy (a, s, sizeof a - 5);
+  frob (a);
+}
+
+void l (char *s)
+{
+  SCOPE char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a + 2, 0, sizeof a - 2); 
+  frob (a);
+}
+
+void m (char *s)
+{
+  SCOPE char a[8];
+  __builtin_memset (a, 0, sizeof a); 
+  __builtin_strncpy (a + 2, s, sizeof a - 2);
+  frob (a);
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Trimming statement " 4 "dse1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c
new file mode 100644
index 00000000000..7ae33bfd169
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-38.c
@@ -0,0 +1,12 @@ 
+/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */
+
+
+/* This changes the scope of the destination object and exposes
+   missed optimizations in DSE.  */
+#define SCOPE extern
+#include "ssa-dse-37.c"
+
+/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Trimming statement " 4 "dse1" { xfail *-*-* } } } */
+
+
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9bdcf9ae6af..5b7c4fc6d1a 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -113,10 +113,10 @@  initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
 	case BUILT_IN_MEMCPY_CHK:
 	case BUILT_IN_MEMMOVE_CHK:
 	case BUILT_IN_MEMSET_CHK:
+	case BUILT_IN_STRNCPY:
+	case BUILT_IN_STRNCPY_CHK:
 	  {
-	    tree size = NULL_TREE;
-	    if (gimple_call_num_args (stmt) == 3)
-	      size = gimple_call_arg (stmt, 2);
+	    tree size = gimple_call_arg (stmt, 2);
 	    tree ptr = gimple_call_arg (stmt, 0);
 	    ao_ref_init_from_ptr_and_size (write, ptr, size);
 	    return true;
@@ -469,8 +469,10 @@  maybe_trim_memstar_call (ao_ref *ref, sbitmap live, gimple *stmt)
     {
     case BUILT_IN_MEMCPY:
     case BUILT_IN_MEMMOVE:
+    case BUILT_IN_STRNCPY:
     case BUILT_IN_MEMCPY_CHK:
     case BUILT_IN_MEMMOVE_CHK:
+    case BUILT_IN_STRNCPY_CHK:
       {
 	int head_trim, tail_trim;
 	compute_trims (ref, live, &head_trim, &tail_trim, stmt);
@@ -966,9 +968,11 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	{
 	case BUILT_IN_MEMCPY:
 	case BUILT_IN_MEMMOVE:
+	case BUILT_IN_STRNCPY:
 	case BUILT_IN_MEMSET:
 	case BUILT_IN_MEMCPY_CHK:
 	case BUILT_IN_MEMMOVE_CHK:
+	case BUILT_IN_STRNCPY_CHK:
 	case BUILT_IN_MEMSET_CHK:
 	  {
 	    /* Occasionally calls with an explicit length of zero