handle local aggregate initialization in strlen (PR 83821)

Message ID 3c35d53d-41e6-2813-d447-c38931ddd535@gmail.com
State New
Headers show
Series
  • handle local aggregate initialization in strlen (PR 83821)
Related show

Commit Message

Martin Sebor Jan. 12, 2018, 9:30 p.m.
A failure in a test for the recently enhanced -Warray-bounds
warning exposed an unnecessarily broad restriction in the strlen
pass that prevents it from tracking the length of a member string
of locally defined and initialized struct:

   void f (void)
   {
     struct { char s[8]; int i } a = { "1234", 5 };

     if (strlen (a.s) != 4)   // not folded
       abort ();
    }

IIUC, the restriction was in place to account for writes into
an array changing or invalidating the length of a string stored
in its initial elements.  This would happen if the write either
changed the string's terminating nul byte, or if it reset one
of the prior non-nul bytes.

To reflect just this intent the restriction can be tightened
up to improve the pass' ability to track even the lengths of
string members of locally initialized aggregates.  Besides
leading to better code this change also clears up the test
failure.

Tested on x86_64-linux.

Martin

Comments

Martin Sebor Jan. 18, 2018, 7:13 p.m. | #1
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01131.html

This was submitted in stage 3 but if fixing xfailed assertions
in tests by enhancing optimizations is out of scope for the
current stage let me know so I can schedule this change for
GCC 9.

On 01/12/2018 02:30 PM, Martin Sebor wrote:
> A failure in a test for the recently enhanced -Warray-bounds
> warning exposed an unnecessarily broad restriction in the strlen
> pass that prevents it from tracking the length of a member string
> of locally defined and initialized struct:
>
>   void f (void)
>   {
>     struct { char s[8]; int i } a = { "1234", 5 };
>
>     if (strlen (a.s) != 4)   // not folded
>       abort ();
>    }
>
> IIUC, the restriction was in place to account for writes into
> an array changing or invalidating the length of a string stored
> in its initial elements.  This would happen if the write either
> changed the string's terminating nul byte, or if it reset one
> of the prior non-nul bytes.
>
> To reflect just this intent the restriction can be tightened
> up to improve the pass' ability to track even the lengths of
> string members of locally initialized aggregates.  Besides
> leading to better code this change also clears up the test
> failure.
>
> Tested on x86_64-linux.
>
> Martin
>
Jeff Law April 30, 2018, 5:50 p.m. | #2
On 01/12/2018 02:30 PM, Martin Sebor wrote:
> A failure in a test for the recently enhanced -Warray-bounds
> warning exposed an unnecessarily broad restriction in the strlen
> pass that prevents it from tracking the length of a member string
> of locally defined and initialized struct:
> 
>   void f (void)
>   {
>     struct { char s[8]; int i } a = { "1234", 5 };
> 
>     if (strlen (a.s) != 4)   // not folded
>       abort ();
>    }
> 
> IIUC, the restriction was in place to account for writes into
> an array changing or invalidating the length of a string stored
> in its initial elements.  This would happen if the write either
> changed the string's terminating nul byte, or if it reset one
> of the prior non-nul bytes.
> 
> To reflect just this intent the restriction can be tightened
> up to improve the pass' ability to track even the lengths of
> string members of locally initialized aggregates.  Besides
> leading to better code this change also clears up the test
> failure.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> 
> gcc-83821.diff
> 
> 
> PR tree-optimization/83821 - local aggregate initialization defeats strlen optimization
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/83821
> 	* tree-ssa-strlen.c (maybe_invalidate): Consider the length of
> 	a string when available.
> 	(handle_char_store): Reset calloc statement on a non-nul store.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/83821
> 	* c-c++-common/Warray-bounds-4.c: Remove XFAIL.
> 	* gcc.dg/strlenopt-43.c: New test.
> 	* gcc.dg/strlenopt-44.c: Same.
> 	* gcc.dg/tree-ssa/calloc-4.c: Same.
I see what you're trying to do.  But I'm really struggling to understand
Marc G's comment "Do not use si->nonzero_chars" since that's precisely
what your patch does.

Your patch seems reasonable on the surface, but I fear there's something
I'm missing.  Can you reach out to Marc G. to see if he recalls the
rational behind the comment.

The comment in its original form was introduced here:

commit 9f15ed6e5c148ded6e7942e75595d91151792c9b
Author: glisse <glisse@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Jun 24 18:50:00 2014 +0000

    2014-06-24  Marc Glisse  <marc.glisse@inria.fr>

            PR tree-optimization/57742
    gcc/
            * tree-ssa-strlen.c (get_string_length): Ignore malloc.
            (handle_builtin_malloc, handle_builtin_memset): New functions.
            (strlen_optimize_stmt): Call them.
            * passes.def: Move strlen after loop+dom but before vrp.
    gcc/testsuite/
            * g++.dg/tree-ssa/calloc.C: New testcase.
            * gcc.dg/tree-ssa/calloc-1.c: Likewise.
            * gcc.dg/tree-ssa/calloc-2.c: Likewise.
            * gcc.dg/strlenopt-9.c: Adapt.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@211956
138bc75d-0d04-0410-961f-82ee72b054a4


Jeff
Martin Sebor May 10, 2018, 9:16 p.m. | #3
Hi Marc,

Can you please comment/respond to Jeff's question below and
confirm whether my understanding of the restriction (below)
is correct?

Thanks
Martin

On 04/30/2018 11:50 AM, Jeff Law wrote:
> On 01/12/2018 02:30 PM, Martin Sebor wrote:
>> A failure in a test for the recently enhanced -Warray-bounds
>> warning exposed an unnecessarily broad restriction in the strlen
>> pass that prevents it from tracking the length of a member string
>> of locally defined and initialized struct:
>>
>>   void f (void)
>>   {
>>     struct { char s[8]; int i } a = { "1234", 5 };
>>
>>     if (strlen (a.s) != 4)   // not folded
>>       abort ();
>>    }
>>
>> IIUC, the restriction was in place to account for writes into
>> an array changing or invalidating the length of a string stored
>> in its initial elements.  This would happen if the write either
>> changed the string's terminating nul byte, or if it reset one
>> of the prior non-nul bytes.
>>
>> To reflect just this intent the restriction can be tightened
>> up to improve the pass' ability to track even the lengths of
>> string members of locally initialized aggregates.  Besides
>> leading to better code this change also clears up the test
>> failure.
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>>
>> gcc-83821.diff
>>
>>
>> PR tree-optimization/83821 - local aggregate initialization defeats strlen optimization
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/83821
>> 	* tree-ssa-strlen.c (maybe_invalidate): Consider the length of
>> 	a string when available.
>> 	(handle_char_store): Reset calloc statement on a non-nul store.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/83821
>> 	* c-c++-common/Warray-bounds-4.c: Remove XFAIL.
>> 	* gcc.dg/strlenopt-43.c: New test.
>> 	* gcc.dg/strlenopt-44.c: Same.
>> 	* gcc.dg/tree-ssa/calloc-4.c: Same.
> I see what you're trying to do.  But I'm really struggling to understand
> Marc G's comment "Do not use si->nonzero_chars" since that's precisely
> what your patch does.
>
> Your patch seems reasonable on the surface, but I fear there's something
> I'm missing.  Can you reach out to Marc G. to see if he recalls the
> rational behind the comment.
>
> The comment in its original form was introduced here:
>
> commit 9f15ed6e5c148ded6e7942e75595d91151792c9b
> Author: glisse <glisse@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Jun 24 18:50:00 2014 +0000
>
>     2014-06-24  Marc Glisse  <marc.glisse@inria.fr>
>
>             PR tree-optimization/57742
>     gcc/
>             * tree-ssa-strlen.c (get_string_length): Ignore malloc.
>             (handle_builtin_malloc, handle_builtin_memset): New functions.
>             (strlen_optimize_stmt): Call them.
>             * passes.def: Move strlen after loop+dom but before vrp.
>     gcc/testsuite/
>             * g++.dg/tree-ssa/calloc.C: New testcase.
>             * gcc.dg/tree-ssa/calloc-1.c: Likewise.
>             * gcc.dg/tree-ssa/calloc-2.c: Likewise.
>             * gcc.dg/strlenopt-9.c: Adapt.
>
>
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@211956
> 138bc75d-0d04-0410-961f-82ee72b054a4
>
>
> Jeff
>
Marc Glisse May 10, 2018, 10:05 p.m. | #4
On Thu, 10 May 2018, Martin Sebor wrote:

> Can you please comment/respond to Jeff's question below and
> confirm whether my understanding of the restriction (below)
> is correct?

I don't remember it at all, I really should have expanded that comment...

The documentation of nonzero_chars seems to indicate that, unless 
full_string_p, it is only a lower bound on the length of the string, so 
not suitable for this kind of alias check. I don't know if we also have 
easy access to some upper bound.

(I noticed while looking at this pass that it could probably use 
POINTER_DIFF_EXPR more)
Jeff Law May 23, 2018, 2:57 p.m. | #5
On 05/10/2018 04:05 PM, Marc Glisse wrote:
> On Thu, 10 May 2018, Martin Sebor wrote:
> 
>> Can you please comment/respond to Jeff's question below and
>> confirm whether my understanding of the restriction (below)
>> is correct?
> 
> I don't remember it at all, I really should have expanded that comment...
> 
> The documentation of nonzero_chars seems to indicate that, unless
> full_string_p, it is only a lower bound on the length of the string, so
> not suitable for this kind of alias check. I don't know if we also have
> easy access to some upper bound.
> 
> (I noticed while looking at this pass that it could probably use
> POINTER_DIFF_EXPR more)
So ISTM that we'd need to guard the code that uses si->nonzero_chars in
maybe_invalidate to also check FULL_STRING_P since it appears we're
using si->nonzero_chars as a string length.

jeff
Martin Sebor May 23, 2018, 7:28 p.m. | #6
On 05/23/2018 08:57 AM, Jeff Law wrote:
> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>> On Thu, 10 May 2018, Martin Sebor wrote:
>>
>>> Can you please comment/respond to Jeff's question below and
>>> confirm whether my understanding of the restriction (below)
>>> is correct?
>>
>> I don't remember it at all, I really should have expanded that comment...
>>
>> The documentation of nonzero_chars seems to indicate that, unless
>> full_string_p, it is only a lower bound on the length of the string, so
>> not suitable for this kind of alias check. I don't know if we also have
>> easy access to some upper bound.
>>
>> (I noticed while looking at this pass that it could probably use
>> POINTER_DIFF_EXPR more)
> So ISTM that we'd need to guard the code that uses si->nonzero_chars in
> maybe_invalidate to also check FULL_STRING_P since it appears we're
> using si->nonzero_chars as a string length.

I'm not sure I see why.  Can you explain?

Here's my explanation of the current approach.  si->nonzero_chars
is the lower bound on si's length.  maybe_invalidate() invalidates
the string length which is only necessary when something overwrites
one of the first si->nonzero_chars of the array.  It doesn't matter
whether si is nul-terminated at that point.

The difference can be seen on the following test case which gets
optimized as I would expect only if full_string_p is not considered,
else the (minimum) string length is invalidated by the assignment
to a.b because full_string_p is false.

struct A {
   char a[9];
   int b;
};

void f (void)
{
   struct A a;

   __builtin_memcpy (a.a, "1234", 4);
   a.b = 0;                            // <<< maybe_invalidate()
   a.a[4] = 0;

   if (__builtin_strlen (a.a) != 4)
     __builtin_abort ();
}

Martin
Jeff Law May 24, 2018, 9:23 p.m. | #7
On 05/23/2018 01:28 PM, Martin Sebor wrote:
> On 05/23/2018 08:57 AM, Jeff Law wrote:
>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>>
>>>> Can you please comment/respond to Jeff's question below and
>>>> confirm whether my understanding of the restriction (below)
>>>> is correct?
>>>
>>> I don't remember it at all, I really should have expanded that
>>> comment...
>>>
>>> The documentation of nonzero_chars seems to indicate that, unless
>>> full_string_p, it is only a lower bound on the length of the string, so
>>> not suitable for this kind of alias check. I don't know if we also have
>>> easy access to some upper bound.
>>>
>>> (I noticed while looking at this pass that it could probably use
>>> POINTER_DIFF_EXPR more)
>> So ISTM that we'd need to guard the code that uses si->nonzero_chars in
>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>> using si->nonzero_chars as a string length.
> 
> I'm not sure I see why.  Can you explain?
> 
> Here's my explanation of the current approach.  si->nonzero_chars
> is the lower bound on si's length.  maybe_invalidate() invalidates
> the string length which is only necessary when something overwrites
> one of the first si->nonzero_chars of the array.  It doesn't matter
> whether si is nul-terminated at that point.
> 
> The difference can be seen on the following test case which gets
> optimized as I would expect only if full_string_p is not considered,
> else the (minimum) string length is invalidated by the assignment
> to a.b because full_string_p is false.
My bad.  I was thinking in terms of maximum string length which
si->nonzero_chars does not represent unless it's also FULL_STRING_P.  A
a minimum string length it should be OK.

So OK with a quick comment clarifying that we're using it to compute a
minimum string length rather than a maximum string length.

Sorry for the goof on my part.

jeff
Marc Glisse May 24, 2018, 9:40 p.m. | #8
On Wed, 23 May 2018, Martin Sebor wrote:

> On 05/23/2018 08:57 AM, Jeff Law wrote:
>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>> 
>>>> Can you please comment/respond to Jeff's question below and
>>>> confirm whether my understanding of the restriction (below)
>>>> is correct?
>>> 
>>> I don't remember it at all, I really should have expanded that comment...
>>> 
>>> The documentation of nonzero_chars seems to indicate that, unless
>>> full_string_p, it is only a lower bound on the length of the string, so
>>> not suitable for this kind of alias check. I don't know if we also have
>>> easy access to some upper bound.
>>> 
>>> (I noticed while looking at this pass that it could probably use
>>> POINTER_DIFF_EXPR more)
>> So ISTM that we'd need to guard the code that uses si->nonzero_chars in
>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>> using si->nonzero_chars as a string length.
>
> I'm not sure I see why.  Can you explain?
>
> Here's my explanation of the current approach.  si->nonzero_chars
> is the lower bound on si's length.  maybe_invalidate() invalidates
> the string length which is only necessary when something overwrites
> one of the first si->nonzero_chars of the array.  It doesn't matter
> whether si is nul-terminated at that point.

When you say "invalidates the string length", does that mean it only 
invalidates nonzero_chars (the lower bound), or does that also determine 
if you can CSE a call to strlen before and after this instruction? Your 
argument only applies in the first case.
Martin Sebor May 24, 2018, 11:33 p.m. | #9
On 05/24/2018 03:40 PM, Marc Glisse wrote:
> On Wed, 23 May 2018, Martin Sebor wrote:
>
>> On 05/23/2018 08:57 AM, Jeff Law wrote:
>>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>>>
>>>>> Can you please comment/respond to Jeff's question below and
>>>>> confirm whether my understanding of the restriction (below)
>>>>> is correct?
>>>>
>>>> I don't remember it at all, I really should have expanded that
>>>> comment...
>>>>
>>>> The documentation of nonzero_chars seems to indicate that, unless
>>>> full_string_p, it is only a lower bound on the length of the string, so
>>>> not suitable for this kind of alias check. I don't know if we also have
>>>> easy access to some upper bound.
>>>>
>>>> (I noticed while looking at this pass that it could probably use
>>>> POINTER_DIFF_EXPR more)
>>> So ISTM that we'd need to guard the code that uses si->nonzero_chars in
>>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>>> using si->nonzero_chars as a string length.
>>
>> I'm not sure I see why.  Can you explain?
>>
>> Here's my explanation of the current approach.  si->nonzero_chars
>> is the lower bound on si's length.  maybe_invalidate() invalidates
>> the string length which is only necessary when something overwrites
>> one of the first si->nonzero_chars of the array.  It doesn't matter
>> whether si is nul-terminated at that point.
>
> When you say "invalidates the string length", does that mean it only
> invalidates nonzero_chars (the lower bound), or does that also determine
> if you can CSE a call to strlen before and after this instruction? Your
> argument only applies in the first case.

"invalidates the string length" means that maybe_invalidate(STMT)
removes the length information record for a string whose length
might be modified by STMT.

If the exact string length is known (full_string_p is true) then
its length can only be modified by writing a nul into one of
the leading nonzero_chars.  If the exact string length is not
known (i.e., full_string_p is false), then nonzero_chars is not
constant and writes into the string are assumed to change its
length, and thus invalidate it.

But I'm not sure I understand from your brief description what
use case you have in mind.  If you have an example I'll test it.

Martin
Marc Glisse May 25, 2018, 6:06 a.m. | #10
On Thu, 24 May 2018, Martin Sebor wrote:

> On 05/24/2018 03:40 PM, Marc Glisse wrote:
>> On Wed, 23 May 2018, Martin Sebor wrote:
>> 
>>> On 05/23/2018 08:57 AM, Jeff Law wrote:
>>>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>>>> 
>>>>>> Can you please comment/respond to Jeff's question below and
>>>>>> confirm whether my understanding of the restriction (below)
>>>>>> is correct?
>>>>> 
>>>>> I don't remember it at all, I really should have expanded that
>>>>> comment...
>>>>> 
>>>>> The documentation of nonzero_chars seems to indicate that, unless
>>>>> full_string_p, it is only a lower bound on the length of the string, so
>>>>> not suitable for this kind of alias check. I don't know if we also have
>>>>> easy access to some upper bound.
>>>>> 
>>>>> (I noticed while looking at this pass that it could probably use
>>>>> POINTER_DIFF_EXPR more)
>>>> So ISTM that we'd need to guard the code that uses si->nonzero_chars in
>>>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>>>> using si->nonzero_chars as a string length.
>>> 
>>> I'm not sure I see why.  Can you explain?
>>> 
>>> Here's my explanation of the current approach.  si->nonzero_chars
>>> is the lower bound on si's length.  maybe_invalidate() invalidates
>>> the string length which is only necessary when something overwrites
>>> one of the first si->nonzero_chars of the array.  It doesn't matter
>>> whether si is nul-terminated at that point.
>> 
>> When you say "invalidates the string length", does that mean it only
>> invalidates nonzero_chars (the lower bound), or does that also determine
>> if you can CSE a call to strlen before and after this instruction? Your
>> argument only applies in the first case.
>
> "invalidates the string length" means that maybe_invalidate(STMT)
> removes the length information record for a string whose length
> might be modified by STMT.
>
> If the exact string length is known (full_string_p is true) then
> its length can only be modified by writing a nul into one of
> the leading nonzero_chars.  If the exact string length is not
> known (i.e., full_string_p is false), then nonzero_chars is not
> constant

Why couldn't nonzero_chars be constant in that case?

void f(const char*s){
   s[0]='a';
   // I know that strlen(s) is at least 1 here
}

> and writes into the string are assumed to change its
> length, and thus invalidate it.
>
> But I'm not sure I understand from your brief description what
> use case you have in mind.  If you have an example I'll test it.

The example may be completely off, but just to try and describe what I am 
wondering about:

int f(const char*s,char*restrict t,char c){
// we know nothing about s for now.
s[0]='a'; // nonzero_chars should now be 1
strcpy(t,s);
s[2]=c; // this is after nonzero_chars
strcpy(t,s);
}

If we assume that writing to s[2] is after the end of the original string 
s, we can remove one of the strcpy. But that's not a valid transformation 
in general.

I originally wanted to see if we could CSE calls to strlen(s) before and 
after the write, but the lhs of the first call to strlen would replace 
nonzero_chars. And I don't know so well what we use strinfo for and thus 
how dangerous it is not to invalidate it.
Martin Sebor May 25, 2018, 3:43 p.m. | #11
On 05/25/2018 12:06 AM, Marc Glisse wrote:
> On Thu, 24 May 2018, Martin Sebor wrote:
>
>> On 05/24/2018 03:40 PM, Marc Glisse wrote:
>>> On Wed, 23 May 2018, Martin Sebor wrote:
>>>
>>>> On 05/23/2018 08:57 AM, Jeff Law wrote:
>>>>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>>>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>>>>>
>>>>>>> Can you please comment/respond to Jeff's question below and
>>>>>>> confirm whether my understanding of the restriction (below)
>>>>>>> is correct?
>>>>>>
>>>>>> I don't remember it at all, I really should have expanded that
>>>>>> comment...
>>>>>>
>>>>>> The documentation of nonzero_chars seems to indicate that, unless
>>>>>> full_string_p, it is only a lower bound on the length of the
>>>>>> string, so
>>>>>> not suitable for this kind of alias check. I don't know if we also
>>>>>> have
>>>>>> easy access to some upper bound.
>>>>>>
>>>>>> (I noticed while looking at this pass that it could probably use
>>>>>> POINTER_DIFF_EXPR more)
>>>>> So ISTM that we'd need to guard the code that uses
>>>>> si->nonzero_chars in
>>>>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>>>>> using si->nonzero_chars as a string length.
>>>>
>>>> I'm not sure I see why.  Can you explain?
>>>>
>>>> Here's my explanation of the current approach.  si->nonzero_chars
>>>> is the lower bound on si's length.  maybe_invalidate() invalidates
>>>> the string length which is only necessary when something overwrites
>>>> one of the first si->nonzero_chars of the array.  It doesn't matter
>>>> whether si is nul-terminated at that point.
>>>
>>> When you say "invalidates the string length", does that mean it only
>>> invalidates nonzero_chars (the lower bound), or does that also determine
>>> if you can CSE a call to strlen before and after this instruction? Your
>>> argument only applies in the first case.
>>
>> "invalidates the string length" means that maybe_invalidate(STMT)
>> removes the length information record for a string whose length
>> might be modified by STMT.
>>
>> If the exact string length is known (full_string_p is true) then
>> its length can only be modified by writing a nul into one of
>> the leading nonzero_chars.  If the exact string length is not
>> known (i.e., full_string_p is false), then nonzero_chars is not
>> constant
>
> Why couldn't nonzero_chars be constant in that case?
>
> void f(const char*s){
>   s[0]='a';
>   // I know that strlen(s) is at least 1 here
> }

I was responding specifically to your question about the strlen()
CSE.  Above there is no call to strlen().  What I understood you
were asking about is something like:

   int f (char *s)
   {
     int n0 = strlen (s);
     s[7]='a';
     int n1 = strlen (s);   // can this be replaced by n0?
     return n0 == n1;
   }

At the point of the assignment, nonzero_chars for s is known but
not constant and so s's strinfo is invalidated/removed.

I should correct one thing: when a non-constant string length
has been computed and stored in nonzerro_chars full_string_p
is set to true (by handle_builtin_strlen).  But because
nonzero_chars is non-constant, writing into the string at
any offset invalidates the string length (the non-constant
nonzero_chars is effectively treated as PTRDIFF_MAX).

>> and writes into the string are assumed to change its
>> length, and thus invalidate it.
>>
>> But I'm not sure I understand from your brief description what
>> use case you have in mind.  If you have an example I'll test it.
>
> The example may be completely off, but just to try and describe what I
> am wondering about:
>
> int f(const char*s,char*restrict t,char c){
> // we know nothing about s for now.
> s[0]='a'; // nonzero_chars should now be 1
> strcpy(t,s);
> s[2]=c; // this is after nonzero_chars
> strcpy(t,s);
> }
>
> If we assume that writing to s[2] is after the end of the original
> string s, we can remove one of the strcpy. But that's not a valid
> transformation in general.

Above, s[0] creates strinfo for s with nonzero_chars == 1 and
full_string_p == false.  Then, after the first call to strcpy,
that same strinfo is invalidated/removed because the length of
s is unknown.  The next assignment to s[2] doesn't do anything
(in the strlen pass) because the right operand is not constant).
In the second strcpy there is no strinfo for s.

>
> I originally wanted to see if we could CSE calls to strlen(s) before and
> after the write, but the lhs of the first call to strlen would replace
> nonzero_chars. And I don't know so well what we use strinfo for and thus
> how dangerous it is not to invalidate it.

The pass does track non-constant strlen() results so I think
the optimization you describe should be possible.  For instance,
here it should be safe to eliminate the second strlen:

   int f (char *s)
   {
     s[0] = 'a';
     s[1] = 'b';
     int n0 = __builtin_strlen (s);

     s[0] = 'x';
     int n1 = __builtin_strlen (s);

     return n0 == n1;
   }

It isn't optimized today because (as I explained above) the first
strlen() replaces the constant nonzero_chars == 2 with the non-
constant result of strlen(s).

I have been thinking about enhancing the strlen pass to deal
with ranges (not just the VRP kind but ranges of string lengths).
With that, it would be possible to optimize the above because
in addition to replacing nonzero_chars with strlen(s), the first
strlen would also set s's length range to [2, PTRDIFF_MAX].

Martin
Marc Glisse May 25, 2018, 9:55 p.m. | #12
On Fri, 25 May 2018, Martin Sebor wrote:

>> Why couldn't nonzero_chars be constant in that case?
>> 
>> void f(const char*s){
>>   s[0]='a';
>>   // I know that strlen(s) is at least 1 here
>> }
>
> I was responding specifically to your question about the strlen()
> CSE.  Above there is no call to strlen().  What I understood you
> were asking about is something like:
>
>  int f (char *s)
>  {
>    int n0 = strlen (s);
>    s[7]='a';
>    int n1 = strlen (s);   // can this be replaced by n0?
>    return n0 == n1;
>  }

Yes, but I realized afterwards that calling strlen causes strinfo to be 
updated, replacing a non-tight nonzero_chars with the lhs of the call. So 
I need a different use of strinfo to demonstrate the issue.

>> The example may be completely off, but just to try and describe what I
>> am wondering about:
>> 
>> int f(const char*s,char*restrict t,char c){

The const shouldn't have been there.

>> // we know nothing about s for now.
>> s[0]='a'; // nonzero_chars should now be 1
>> strcpy(t,s);
>> s[2]=c; // this is after nonzero_chars
>> strcpy(t,s);
>> }
>> 
>> If we assume that writing to s[2] is after the end of the original
>> string s, we can remove one of the strcpy. But that's not a valid
>> transformation in general.
>
> Above, s[0] creates strinfo for s with nonzero_chars == 1 and
> full_string_p == false.  Then, after the first call to strcpy,
> that same strinfo is invalidated/removed because the length of
> s is unknown.

strcpy writes to t, which cannot alias s, so I don't see why this should 
invalidate the strinfo for s.

> The next assignment to s[2] doesn't do anything
> (in the strlen pass) because the right operand is not constant).

Well, it should at least invalidate stuff.

> In the second strcpy there is no strinfo for s.

This was a bad example again, because we don't have the relevant 
optimization anyway (2 identical calls to strcpy(t,s) in a row are not 
simplified to a single one).

>> I originally wanted to see if we could CSE calls to strlen(s) before and
>> after the write, but the lhs of the first call to strlen would replace
>> nonzero_chars. And I don't know so well what we use strinfo for and thus
>> how dangerous it is not to invalidate it.
>
> The pass does track non-constant strlen() results so I think
> the optimization you describe should be possible.  For instance,
> here it should be safe to eliminate the second strlen:
>
>  int f (char *s)
>  {
>    s[0] = 'a';
>    s[1] = 'b';
>    int n0 = __builtin_strlen (s);
>
>    s[0] = 'x';
>    int n1 = __builtin_strlen (s);
>
>    return n0 == n1;
>  }
>
> It isn't optimized today because (as I explained above) the first
> strlen() replaces the constant nonzero_chars == 2 with the non-
> constant result of strlen(s).

Let me try a completely different example

char*f(){
   char*p=__builtin_calloc(12,1);
   __builtin_memset(p+5,1,2);
   __builtin_memset(p,0,12);
   return p;
}

With your patch from 12/1 and -fno-tree-dse, this is miscompiled and 
misses the final memset.


I also still fear it may be possible to come up with a more strlen-like 
example along the lines of

s[0]='a'; // nonzero_chars=1
- do something with s that relies on its length and doesn't invalidate
s[2]='a'; // patch prevents it from invalidating
- do something else with s

where the fact that s has a different length in lines 2 and 4 is hidden by 
the patch. But maybe all the transformations are carefully written to 
avoid this problem...
Martin Sebor May 25, 2018, 11 p.m. | #13
On 05/25/2018 03:55 PM, Marc Glisse wrote:
> On Fri, 25 May 2018, Martin Sebor wrote:
>
>>> Why couldn't nonzero_chars be constant in that case?
>>>
>>> void f(const char*s){
>>>   s[0]='a';
>>>   // I know that strlen(s) is at least 1 here
>>> }
>>
>> I was responding specifically to your question about the strlen()
>> CSE.  Above there is no call to strlen().  What I understood you
>> were asking about is something like:
>>
>>  int f (char *s)
>>  {
>>    int n0 = strlen (s);
>>    s[7]='a';
>>    int n1 = strlen (s);   // can this be replaced by n0?
>>    return n0 == n1;
>>  }
>
> Yes, but I realized afterwards that calling strlen causes strinfo to be
> updated, replacing a non-tight nonzero_chars with the lhs of the call.
> So I need a different use of strinfo to demonstrate the issue.
>
>>> The example may be completely off, but just to try and describe what I
>>> am wondering about:
>>>
>>> int f(const char*s,char*restrict t,char c){
>
> The const shouldn't have been there.
>
>>> // we know nothing about s for now.
>>> s[0]='a'; // nonzero_chars should now be 1
>>> strcpy(t,s);
>>> s[2]=c; // this is after nonzero_chars
>>> strcpy(t,s);
>>> }
>>>
>>> If we assume that writing to s[2] is after the end of the original
>>> string s, we can remove one of the strcpy. But that's not a valid
>>> transformation in general.
>>
>> Above, s[0] creates strinfo for s with nonzero_chars == 1 and
>> full_string_p == false.  Then, after the first call to strcpy,
>> that same strinfo is invalidated/removed because the length of
>> s is unknown.
>
> strcpy writes to t, which cannot alias s, so I don't see why this should
> invalidate the strinfo for s.

That might be a question for Richard.  After processing the strcpy
call statement the pass ends up calling maybe_invalidate() where
it initializes an ao_ref with the source string and calls
stmt_may_clobber_ref_p_1() to determine if the statement may
clobber the reference.  The function returns true.

>
>> The next assignment to s[2] doesn't do anything
>> (in the strlen pass) because the right operand is not constant).
>
> Well, it should at least invalidate stuff.

It does without the patch, but not with it because the offset
is greater than nonzero_chars.

>> In the second strcpy there is no strinfo for s.
>
> This was a bad example again, because we don't have the relevant
> optimization anyway (2 identical calls to strcpy(t,s) in a row are not
> simplified to a single one).
>
>>> I originally wanted to see if we could CSE calls to strlen(s) before and
>>> after the write, but the lhs of the first call to strlen would replace
>>> nonzero_chars. And I don't know so well what we use strinfo for and thus
>>> how dangerous it is not to invalidate it.
>>
>> The pass does track non-constant strlen() results so I think
>> the optimization you describe should be possible.  For instance,
>> here it should be safe to eliminate the second strlen:
>>
>>  int f (char *s)
>>  {
>>    s[0] = 'a';
>>    s[1] = 'b';
>>    int n0 = __builtin_strlen (s);
>>
>>    s[0] = 'x';
>>    int n1 = __builtin_strlen (s);
>>
>>    return n0 == n1;
>>  }
>>
>> It isn't optimized today because (as I explained above) the first
>> strlen() replaces the constant nonzero_chars == 2 with the non-
>> constant result of strlen(s).
>
> Let me try a completely different example
>
> char*f(){
>   char*p=__builtin_calloc(12,1);
>   __builtin_memset(p+5,1,2);
>   __builtin_memset(p,0,12);
>   return p;
> }
>
> With your patch from 12/1 and -fno-tree-dse, this is miscompiled and
> misses the final memset.

So it is.  Thanks for the example.  It looks like it's an artifact
of the second memset() call not resetting the strinfo::stmt member,
and the final memset relying on the member to see if the memory
is all zeros.  I missed/forgot about that part of the pass.

> I also still fear it may be possible to come up with a more strlen-like
> example along the lines of
>
> s[0]='a'; // nonzero_chars=1
> - do something with s that relies on its length and doesn't invalidate
> s[2]='a'; // patch prevents it from invalidating
> - do something else with s
>
> where the fact that s has a different length in lines 2 and 4 is hidden
> by the patch. But maybe all the transformations are carefully written to
> avoid this problem...

Let me do some more testing once I handle the calloc issue above.
If you think of more examples please post them.  This last one was 
obviously quite helpful.

Martin

Patch

PR tree-optimization/83821 - local aggregate initialization defeats strlen optimization

gcc/ChangeLog:

	PR tree-optimization/83821
	* tree-ssa-strlen.c (maybe_invalidate): Consider the length of
	a string when available.
	(handle_char_store): Reset calloc statement on a non-nul store.

gcc/testsuite/ChangeLog:

	PR tree-optimization/83821
	* c-c++-common/Warray-bounds-4.c: Remove XFAIL.
	* gcc.dg/strlenopt-43.c: New test.
	* gcc.dg/strlenopt-44.c: Same.
	* gcc.dg/tree-ssa/calloc-4.c: Same.

Index: gcc/testsuite/gcc.dg/strlenopt-43.c
===================================================================
--- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
@@ -0,0 +1,178 @@ 
+/* PR tree-optimization/83821 - local aggregate initialization defeats
+   strlen optimization
+   { dg-do compile }
+   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
+
+#include "strlenopt.h"
+
+#define CAT(x, y) x ## y
+#define CONCAT(x, y) CAT (x, y)
+#define FAILNAME(name) CONCAT (call_ ## name ##_on_line_, __LINE__)
+
+#define FAIL(name) do {				\
+    extern void FAILNAME (name) (void);		\
+    FAILNAME (name)();				\
+  } while (0)
+
+/* Macro to emit a call to funcation named
+     call_in_true_branch_not_eliminated_on_line_NNN()
+   for each call that's expected to be eliminated.  The dg-final
+   scan-tree-dump-time directive at the bottom of the test verifies
+   that no such call appears in output.  */
+#define ELIM(expr) \
+  if (!(expr)) FAIL (in_true_branch_not_eliminated); else (void)0
+
+/* Macro to emit a call to a function named
+     call_made_in_{true,false}_branch_on_line_NNN()
+   for each call that's expected to be retained.  The dg-final
+   scan-tree-dump-time directive at the bottom of the test verifies
+   that the expected number of both kinds of calls appears in output
+   (a pair for each line with the invocation of the KEEP() macro.  */
+#define KEEP(expr)				\
+  if (expr)					\
+    FAIL (made_in_true_branch);			\
+  else						\
+    FAIL (made_in_false_branch)
+
+#define STR10 "0123456789"
+#define STR20 STR10 STR10
+#define STR30 STR20 STR10
+#define STR40 STR20 STR20
+
+struct Consec
+{
+  char s1[sizeof STR40];
+  char s2[sizeof STR40];
+  const char *p1;
+  const char *p2;
+};
+
+void elim_init_consecutive (void)
+{
+  struct Consec a = { STR10, STR10, STR10, STR10 };
+
+  ELIM (strlen (a.s1) == sizeof STR10 - 1);
+  ELIM (strlen (a.s2) == sizeof STR10 - 1);
+  ELIM (strlen (a.p1) == sizeof STR10 - 1);
+  ELIM (strlen (a.p2) == sizeof STR10 - 1);
+}
+
+
+void elim_array_init_consecutive (void)
+{
+  struct Consec a[2] = {
+    { STR10, STR20, STR30, STR40 },
+    { STR40, STR30, STR20, STR10 }
+  };
+
+  ELIM (strlen (a[0].s1) == sizeof STR10 - 1);
+  ELIM (strlen (a[0].s2) == sizeof STR20 - 1);
+  ELIM (strlen (a[0].p1) == sizeof STR30 - 1);
+  ELIM (strlen (a[0].p2) == sizeof STR40 - 1);
+
+  ELIM (strlen (a[1].s1) == sizeof STR40 - 1);
+  ELIM (strlen (a[1].s2) == sizeof STR30 - 1);
+  ELIM (strlen (a[1].p1) == sizeof STR20 - 1);
+  ELIM (strlen (a[1].p2) == sizeof STR10 - 1);
+}
+
+
+struct NonConsec
+{
+  char s1[sizeof STR40];
+  int i1;
+  char s2[sizeof STR40];
+  int i2;
+  const char *p1;
+  int i3;
+  const char *p2;
+  int i4;
+};
+
+void elim_init_nonconsecutive (void)
+{
+  struct NonConsec b = { STR10, 123, STR20, 456, b.s1, 789, b.s2, 123 };
+
+  ELIM (strlen (b.s1) == sizeof STR10 - 1);
+  ELIM (strlen (b.s2) == sizeof STR20 - 1);
+  ELIM (strlen (b.p1) == sizeof STR10 - 1);
+  ELIM (strlen (b.p2) == sizeof STR20 - 1);
+}
+
+void elim_assign_tmp_nonconsecutive (void)
+{
+  struct NonConsec b = { "a", 1, "b", 2, "c", 3, "d", 4 };
+
+  b = (struct NonConsec){ STR10, 123, STR20, 456, STR30, 789, STR40, 123 };
+
+  ELIM (strlen (b.s1) == sizeof STR10 - 1);
+  ELIM (strlen (b.s2) == sizeof STR20 - 1);
+  ELIM (strlen (b.p1) == sizeof STR30 - 1);
+  ELIM (strlen (b.p2) == sizeof STR40 - 1);
+}
+
+const struct NonConsec bcst = {
+  STR40, -1, STR30, -2, STR20, -3, STR10, -4
+};
+
+void elim_assign_cst_nonconsecutive (void)
+{
+  struct NonConsec b = { "a", 1, "b", 2, "c", 3, "d" };
+
+  b = bcst;
+
+  ELIM (strlen (b.s1) == sizeof STR40 - 1);
+  ELIM (strlen (b.s2) == sizeof STR30 - 1);
+  ELIM (strlen (b.p1) == sizeof STR20 - 1);
+  ELIM (strlen (b.p2) == sizeof STR10 - 1);
+}
+
+void elim_copy_cst_nonconsecutive (void)
+{
+  struct NonConsec b = { "a", 1, "b", 2, "c", 3, "d" };
+  memcpy (&b, &bcst, sizeof b);
+
+  /* ELIM (strlen (b.s1) == sizeof STR40 - 1);
+     ELIM (strlen (b.s2) == sizeof STR30 - 1); */
+  ELIM (strlen (b.p1) == sizeof STR20 - 1);
+  ELIM (strlen (b.p2) == sizeof STR10 - 1);
+}
+
+
+#line 1000
+
+int sink (void*);
+
+void keep_init_nonconsecutive (void)
+{
+  struct NonConsec b = {
+    STR10, 123, STR20, 456, b.s1, 789, b.s2,
+    sink (&b)
+  };
+
+  KEEP (strlen (b.s1) == sizeof STR10 - 1);
+  KEEP (strlen (b.s2) == sizeof STR10 - 1);
+  KEEP (strlen (b.p1) == sizeof STR10 - 1);
+  KEEP (strlen (b.p2) == sizeof STR10 - 1);
+}
+
+void keep_assign_tmp_nonconsecutive (void)
+{
+  struct NonConsec b = { "a", 1, "b", 2, "c", 3, "d", 4 };
+
+  b = (struct NonConsec){
+    STR10, 123, STR20, 456, STR30, 789, STR40,
+    sink (&b)
+  };
+
+  KEEP (strlen (b.s1) == sizeof STR10 - 1);
+  KEEP (strlen (b.s2) == sizeof STR20 - 1);
+  KEEP (strlen (b.p1) == sizeof STR30 - 1);
+  KEEP (strlen (b.p2) == sizeof STR40 - 1);
+}
+
+/* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } }
+
+   { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 8 "optimized" } }
+   { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 8 "optimized" } } */
+
Index: gcc/testsuite/gcc.dg/strlenopt-44.c
===================================================================
--- gcc/testsuite/gcc.dg/strlenopt-44.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-44.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* PR tree-optimization/83821 - local aggregate initialization defeats
+   strlen optimization
+   Verify that a strlen() call is eliminated for a pointer to a region
+   of memory allocated by calloc() even if one or more nul bytes are
+   written into it.
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned n;
+
+void* elim_strlen_calloc_store_memset_1 (unsigned a, unsigned b)
+{
+  char *p = __builtin_calloc (a, 1);
+
+  p[1] = '\0';
+
+  __builtin_memset (p, 0, b);
+
+  n = __builtin_strlen (p);
+
+  return p;
+}
+
+void* elim_strlen_calloc_store_memset_2 (unsigned a, unsigned b, unsigned c)
+{
+  char *p = __builtin_calloc (a, 1);
+
+  p[1] = '\0';
+  __builtin_memset (p, 0, b);
+
+  n = __builtin_strlen (p);
+
+  p[3] = 0;
+  __builtin_memset (p, 0, c);
+
+  n = __builtin_strlen (p);
+
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-not "strlen" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/calloc-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/calloc-4.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc-4.c	(working copy)
@@ -0,0 +1,39 @@ 
+/* PR tree-optimization/83821 - local aggregate initialization defeats
+   strlen optimization
+   Verify that a memset() call to zero out a subregion of memory
+   allocated by calloc() is eliminated even if a zero byte is written
+   into it in between the two calls.  See the calloc-2.c test that
+   verifies that the memset() calls isn't eliminated if the written
+   value is non-zero.
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-optimized" } */
+
+void* elim_calloc_store_memset_1 (unsigned a, unsigned b)
+{
+  char *p = __builtin_calloc (a, 1);
+
+  p[1] = '\0';
+
+  __builtin_memset (p, 0, b);
+
+  n = __builtin_strlen (p);
+
+  return p;
+}
+
+void* elim_calloc_store_memset_2 (unsigned a, unsigned b, unsigned c)
+{
+  char *p = __builtin_calloc (a, 1);
+
+  p[1] = '\0';
+  __builtin_memset (p, 0, b);
+
+  p[3] = 0;
+  __builtin_memset (p, 0, c);
+
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } }
+   { dg-final { scan-tree-dump-times "calloc" 2 "optimized" } }
+   { dg-final { scan-tree-dump-not "memset" "optimized" } } */
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	(revision 256590)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -693,8 +693,16 @@  maybe_invalidate (gimple *stmt)
 	if (!si->dont_invalidate)
 	  {
 	    ao_ref r;
-	    /* Do not use si->nonzero_chars.  */
-	    ao_ref_init_from_ptr_and_size (&r, si->ptr, NULL_TREE);
+	    tree size = NULL_TREE;
+	    if (si->nonzero_chars)
+	      {
+		/* Include the terminating nul in the size of the string
+		   to consider when determining possible clobber.  */
+		tree type = TREE_TYPE (si->nonzero_chars);
+		size = fold_build2 (PLUS_EXPR, type, si->nonzero_chars,
+				    build_int_cst (type, 1));
+	      }
+	    ao_ref_init_from_ptr_and_size (&r, si->ptr, size);
 	    if (stmt_may_clobber_ref_p_1 (stmt, &r))
 	      {
 		set_strinfo (i, NULL);
@@ -2839,8 +2847,23 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	    si = get_strinfo (idx);
 	  if (offset == 0)
 	    ssaname = TREE_OPERAND (lhs, 0);
-	  else if (si == NULL || compare_nonzero_chars (si, offset) < 0)
-	    return true;
+	  else if (si == NULL)
+            return true;
+	  else if (compare_nonzero_chars (si, offset) < 0)
+	    {
+	      if (si->stmt && !initializer_zerop (rhs))
+		{
+		  /* Reset the calloc statement that created the string
+		     if a non-zero value is being written into it, even
+		     if it's past the leading nul byte, so that subsequent
+		     memset calls aren't eliminated that may clear that
+		     byte.  */
+		  tree fn = gimple_call_fndecl (si->stmt);
+		  if (DECL_FUNCTION_CODE (fn) == BUILT_IN_CALLOC)
+		    si->stmt = NULL;
+		}
+	      return true;
+	    }
 	}
     }
   else