diff mbox series

avoid warning on constant strncpy until next statement is reachable (PR 87028)

Message ID a86f07e3-ca84-59f3-c827-adfe6d1ddb0b@gmail.com
State New
Headers show
Series avoid warning on constant strncpy until next statement is reachable (PR 87028) | expand

Commit Message

Martin Sebor Aug. 24, 2018, 3:58 p.m. UTC
The warning suppression for -Wstringop-truncation looks for
the next statement after a truncating strncpy to see if it
adds a terminating nul.  This only works when the next
statement can be reached using the Gimple statement iterator
which isn't until after gimplification.  As a result, strncpy
calls that truncate their constant argument that are being
folded to memcpy this early get diagnosed even if they are
followed by the nul assignment:

   const char s[] = "12345";
   char d[3];

   void f (void)
   {
     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
     d[sizeof d - 1] = 0;
   }

To avoid the warning I propose to defer folding strncpy to
memcpy until the pointer to the basic block the strnpy call
is in can be used to try to reach the next statement (this
happens as early as ccp1).  I'm aware of the preference to
fold things early but in the case of strncpy (a relatively
rarely used function that is often misused), getting
the warning right while folding a bit later but still fairly
early on seems like a reasonable compromise.  I fear that
otherwise, the false positives will drive users to adopt
other unsafe solutions (like memcpy) where these kinds of
bugs cannot be as readily detected.

Tested on x86_64-linux.

Martin

PS There still are outstanding cases where the warning can
be avoided.  I xfailed them in the test for now but will
still try to get them to work for GCC 9.

Comments

Jeff Law Aug. 26, 2018, 5:24 a.m. UTC | #1
On 08/24/2018 09:58 AM, Martin Sebor wrote:
> The warning suppression for -Wstringop-truncation looks for
> the next statement after a truncating strncpy to see if it
> adds a terminating nul.  This only works when the next
> statement can be reached using the Gimple statement iterator
> which isn't until after gimplification.  As a result, strncpy
> calls that truncate their constant argument that are being
> folded to memcpy this early get diagnosed even if they are
> followed by the nul assignment:
> 
>   const char s[] = "12345";
>   char d[3];
> 
>   void f (void)
>   {
>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>     d[sizeof d - 1] = 0;
>   }
> 
> To avoid the warning I propose to defer folding strncpy to
> memcpy until the pointer to the basic block the strnpy call
> is in can be used to try to reach the next statement (this
> happens as early as ccp1).  I'm aware of the preference to
> fold things early but in the case of strncpy (a relatively
> rarely used function that is often misused), getting
> the warning right while folding a bit later but still fairly
> early on seems like a reasonable compromise.  I fear that
> otherwise, the false positives will drive users to adopt
> other unsafe solutions (like memcpy) where these kinds of
> bugs cannot be as readily detected.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> PS There still are outstanding cases where the warning can
> be avoided.  I xfailed them in the test for now but will
> still try to get them to work for GCC 9.
> 
> gcc-87028.diff
> 
> 
> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> gcc/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> 	statement doesn't belong to a basic block.
> 	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> 	the left hand side of assignment.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* c-c++-common/Wstringop-truncation.c: Remove xfails.
> 	* gcc.dg/Wstringop-truncation-5.c: New test.
> 
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 07341eb..284c2fb 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>    if (tree_int_cst_lt (ssize, len))
>      return false;
>  
> +  /* Defer warning (and folding) until the next statement in the basic
> +     block is reachable.  */
> +  if (!gimple_bb (stmt))
> +    return false;
I think you want cfun->cfg as the test here.  They should be equivalent
in practice.


> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> index d0792aa..f1988f6 100644
> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c
> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>  	  && known_eq (dstoff, lhsoff)
>  	  && operand_equal_p (dstbase, lhsbase, 0))
>  	return false;
> +
> +      if (code == MEM_REF
> +	  && TREE_CODE (lhsbase) == SSA_NAME
> +	  && known_eq (dstoff, lhsoff))
> +	{
> +	  /* Extract the referenced variable from something like
> +	       MEM[(char *)d_3(D) + 3B] = 0;  */
> +	  gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> +	  if (gimple_nop_p (def))
> +	    {
> +	      lhsbase = SSA_NAME_VAR (lhsbase);
> +	      if (lhsbase
> +		  && dstbase
> +		  && operand_equal_p (dstbase, lhsbase, 0))
> +		return false;
> +	    }
> +	}
If you find yourself looking at SSA_NAME_VAR, you're usually barking up
the wrong tree.  It'd be easier to suggest something here if I could see
the gimple (with virtual operands).  BUt at some level what you really
want to do is make sure the base of the MEM_REF is the same as what got
passed as the destination of the strncpy.  You'd want to be testing
SSA_NAMEs in that case.

Jeff

Jeff
Richard Biener Aug. 27, 2018, 8:29 a.m. UTC | #2
On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>
> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> > The warning suppression for -Wstringop-truncation looks for
> > the next statement after a truncating strncpy to see if it
> > adds a terminating nul.  This only works when the next
> > statement can be reached using the Gimple statement iterator
> > which isn't until after gimplification.  As a result, strncpy
> > calls that truncate their constant argument that are being
> > folded to memcpy this early get diagnosed even if they are
> > followed by the nul assignment:
> >
> >   const char s[] = "12345";
> >   char d[3];
> >
> >   void f (void)
> >   {
> >     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >     d[sizeof d - 1] = 0;
> >   }
> >
> > To avoid the warning I propose to defer folding strncpy to
> > memcpy until the pointer to the basic block the strnpy call
> > is in can be used to try to reach the next statement (this
> > happens as early as ccp1).  I'm aware of the preference to
> > fold things early but in the case of strncpy (a relatively
> > rarely used function that is often misused), getting
> > the warning right while folding a bit later but still fairly
> > early on seems like a reasonable compromise.  I fear that
> > otherwise, the false positives will drive users to adopt
> > other unsafe solutions (like memcpy) where these kinds of
> > bugs cannot be as readily detected.
> >
> > Tested on x86_64-linux.
> >
> > Martin
> >
> > PS There still are outstanding cases where the warning can
> > be avoided.  I xfailed them in the test for now but will
> > still try to get them to work for GCC 9.
> >
> > gcc-87028.diff
> >
> >
> > PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> > gcc/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >       statement doesn't belong to a basic block.
> >       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >       the left hand side of assignment.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >       * gcc.dg/Wstringop-truncation-5.c: New test.
> >
> > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > index 07341eb..284c2fb 100644
> > --- a/gcc/gimple-fold.c
> > +++ b/gcc/gimple-fold.c
> > @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
> >    if (tree_int_cst_lt (ssize, len))
> >      return false;
> >
> > +  /* Defer warning (and folding) until the next statement in the basic
> > +     block is reachable.  */
> > +  if (!gimple_bb (stmt))
> > +    return false;
> I think you want cfun->cfg as the test here.  They should be equivalent
> in practice.

Please do not add 'cfun' references.  Note that the next stmt is also accessible
when there is no CFG.  I guess the issue is that we fold this during
gimplification
where the next stmt is not yet "there" (but still in GENERIC)?

We generally do not want to have unfolded stmts in the IL when we can avoid that
which is why we fold most stmts during gimplification.  We also do that because
we now do less folding on GENERIC.

There may be the possibility to refactor gimplification time folding to what we
do during inlining - queue stmts we want to fold and perform all
folding delayed.
This of course means bigger compile-time due to cache effects.

>
> > diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> > index d0792aa..f1988f6 100644
> > --- a/gcc/tree-ssa-strlen.c
> > +++ b/gcc/tree-ssa-strlen.c
> > @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
> >         && known_eq (dstoff, lhsoff)
> >         && operand_equal_p (dstbase, lhsbase, 0))
> >       return false;
> > +
> > +      if (code == MEM_REF
> > +       && TREE_CODE (lhsbase) == SSA_NAME
> > +       && known_eq (dstoff, lhsoff))
> > +     {
> > +       /* Extract the referenced variable from something like
> > +            MEM[(char *)d_3(D) + 3B] = 0;  */
> > +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> > +       if (gimple_nop_p (def))
> > +         {
> > +           lhsbase = SSA_NAME_VAR (lhsbase);
> > +           if (lhsbase
> > +               && dstbase
> > +               && operand_equal_p (dstbase, lhsbase, 0))
> > +             return false;
> > +         }
> > +     }
> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> the wrong tree.  It'd be easier to suggest something here if I could see
> the gimple (with virtual operands).  BUt at some level what you really
> want to do is make sure the base of the MEM_REF is the same as what got
> passed as the destination of the strncpy.  You'd want to be testing
> SSA_NAMEs in that case.

Yes.  Why not simply compare the SSA names?  Why would it be
not OK to do that when !lhsbase?

Richard.

>
> Jeff
>
> Jeff
Jeff Law Aug. 27, 2018, 3:32 p.m. UTC | #3
On 08/27/2018 02:29 AM, Richard Biener wrote:
> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>> The warning suppression for -Wstringop-truncation looks for
>>> the next statement after a truncating strncpy to see if it
>>> adds a terminating nul.  This only works when the next
>>> statement can be reached using the Gimple statement iterator
>>> which isn't until after gimplification.  As a result, strncpy
>>> calls that truncate their constant argument that are being
>>> folded to memcpy this early get diagnosed even if they are
>>> followed by the nul assignment:
>>>
>>>   const char s[] = "12345";
>>>   char d[3];
>>>
>>>   void f (void)
>>>   {
>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>     d[sizeof d - 1] = 0;
>>>   }
>>>
>>> To avoid the warning I propose to defer folding strncpy to
>>> memcpy until the pointer to the basic block the strnpy call
>>> is in can be used to try to reach the next statement (this
>>> happens as early as ccp1).  I'm aware of the preference to
>>> fold things early but in the case of strncpy (a relatively
>>> rarely used function that is often misused), getting
>>> the warning right while folding a bit later but still fairly
>>> early on seems like a reasonable compromise.  I fear that
>>> otherwise, the false positives will drive users to adopt
>>> other unsafe solutions (like memcpy) where these kinds of
>>> bugs cannot be as readily detected.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> PS There still are outstanding cases where the warning can
>>> be avoided.  I xfailed them in the test for now but will
>>> still try to get them to work for GCC 9.
>>>
>>> gcc-87028.diff
>>>
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>> gcc/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>       statement doesn't belong to a basic block.
>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>       the left hand side of assignment.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 07341eb..284c2fb 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>    if (tree_int_cst_lt (ssize, len))
>>>      return false;
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>> I think you want cfun->cfg as the test here.  They should be equivalent
>> in practice.
> 
> Please do not add 'cfun' references.  Note that the next stmt is also accessible
> when there is no CFG.  I guess the issue is that we fold this during
> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
That was my assumption.  I almost suggested peeking at gsi_next and
avoiding in that case.

> 
> We generally do not want to have unfolded stmts in the IL when we can avoid that
> which is why we fold most stmts during gimplification.  We also do that because
> we now do less folding on GENERIC.
But an unfolded call in the IL should always be safe and we've got
plenty of opportunities to fold it later.

Jeff
Richard Biener Aug. 27, 2018, 3:42 p.m. UTC | #4
On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>
> On 08/27/2018 02:29 AM, Richard Biener wrote:
> > On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>
> >> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>> The warning suppression for -Wstringop-truncation looks for
> >>> the next statement after a truncating strncpy to see if it
> >>> adds a terminating nul.  This only works when the next
> >>> statement can be reached using the Gimple statement iterator
> >>> which isn't until after gimplification.  As a result, strncpy
> >>> calls that truncate their constant argument that are being
> >>> folded to memcpy this early get diagnosed even if they are
> >>> followed by the nul assignment:
> >>>
> >>>   const char s[] = "12345";
> >>>   char d[3];
> >>>
> >>>   void f (void)
> >>>   {
> >>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>     d[sizeof d - 1] = 0;
> >>>   }
> >>>
> >>> To avoid the warning I propose to defer folding strncpy to
> >>> memcpy until the pointer to the basic block the strnpy call
> >>> is in can be used to try to reach the next statement (this
> >>> happens as early as ccp1).  I'm aware of the preference to
> >>> fold things early but in the case of strncpy (a relatively
> >>> rarely used function that is often misused), getting
> >>> the warning right while folding a bit later but still fairly
> >>> early on seems like a reasonable compromise.  I fear that
> >>> otherwise, the false positives will drive users to adopt
> >>> other unsafe solutions (like memcpy) where these kinds of
> >>> bugs cannot be as readily detected.
> >>>
> >>> Tested on x86_64-linux.
> >>>
> >>> Martin
> >>>
> >>> PS There still are outstanding cases where the warning can
> >>> be avoided.  I xfailed them in the test for now but will
> >>> still try to get them to work for GCC 9.
> >>>
> >>> gcc-87028.diff
> >>>
> >>>
> >>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> >>> gcc/ChangeLog:
> >>>
> >>>       PR tree-optimization/87028
> >>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>       statement doesn't belong to a basic block.
> >>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>       the left hand side of assignment.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>       PR tree-optimization/87028
> >>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>
> >>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>> index 07341eb..284c2fb 100644
> >>> --- a/gcc/gimple-fold.c
> >>> +++ b/gcc/gimple-fold.c
> >>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
> >>>    if (tree_int_cst_lt (ssize, len))
> >>>      return false;
> >>>
> >>> +  /* Defer warning (and folding) until the next statement in the basic
> >>> +     block is reachable.  */
> >>> +  if (!gimple_bb (stmt))
> >>> +    return false;
> >> I think you want cfun->cfg as the test here.  They should be equivalent
> >> in practice.
> >
> > Please do not add 'cfun' references.  Note that the next stmt is also accessible
> > when there is no CFG.  I guess the issue is that we fold this during
> > gimplification where the next stmt is not yet "there" (but still in GENERIC)?
> That was my assumption.  I almost suggested peeking at gsi_next and
> avoiding in that case.

So I'd rather add guards to maybe_fold_stmt in the gimplifier then.

> >
> > We generally do not want to have unfolded stmts in the IL when we can avoid that
> > which is why we fold most stmts during gimplification.  We also do that because
> > we now do less folding on GENERIC.
> But an unfolded call in the IL should always be safe and we've got
> plenty of opportunities to fold it later.

Well - we do.  The very first one is forwprop though which means we'll miss to
re-write some memcpy parts into SSA:

          NEXT_PASS (pass_ccp, false /* nonzero_p */);
          /* After CCP we rewrite no longer addressed locals into SSA
             form if possible.  */
          NEXT_PASS (pass_forwprop);

likewise early object-size will be confused by memcpy calls that just exist
to avoid TBAA issues (another of our recommendations besides using unions).

We do fold mem* early for a reason ;)

"We can always do warnings earlier" would be a similar true sentence.

Both come at a cost.  You know I'm usually declaring GCC to be an
optimizing compiler
and not a static analysis engine ;)  So I'm not too much convinced when seeing
disabling/delaying folding here and there to catch some false
negatives for -Wxyz.

We need to work out a plan rather than throwing sticks here and there.

Richard.

>
> Jeff
Martin Sebor Aug. 27, 2018, 4:27 p.m. UTC | #5
On 08/27/2018 02:29 AM, Richard Biener wrote:
> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>> The warning suppression for -Wstringop-truncation looks for
>>> the next statement after a truncating strncpy to see if it
>>> adds a terminating nul.  This only works when the next
>>> statement can be reached using the Gimple statement iterator
>>> which isn't until after gimplification.  As a result, strncpy
>>> calls that truncate their constant argument that are being
>>> folded to memcpy this early get diagnosed even if they are
>>> followed by the nul assignment:
>>>
>>>   const char s[] = "12345";
>>>   char d[3];
>>>
>>>   void f (void)
>>>   {
>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>     d[sizeof d - 1] = 0;
>>>   }
>>>
>>> To avoid the warning I propose to defer folding strncpy to
>>> memcpy until the pointer to the basic block the strnpy call
>>> is in can be used to try to reach the next statement (this
>>> happens as early as ccp1).  I'm aware of the preference to
>>> fold things early but in the case of strncpy (a relatively
>>> rarely used function that is often misused), getting
>>> the warning right while folding a bit later but still fairly
>>> early on seems like a reasonable compromise.  I fear that
>>> otherwise, the false positives will drive users to adopt
>>> other unsafe solutions (like memcpy) where these kinds of
>>> bugs cannot be as readily detected.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> PS There still are outstanding cases where the warning can
>>> be avoided.  I xfailed them in the test for now but will
>>> still try to get them to work for GCC 9.
>>>
>>> gcc-87028.diff
>>>
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>> gcc/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>       statement doesn't belong to a basic block.
>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>       the left hand side of assignment.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 07341eb..284c2fb 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>    if (tree_int_cst_lt (ssize, len))
>>>      return false;
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>> I think you want cfun->cfg as the test here.  They should be equivalent
>> in practice.
>
> Please do not add 'cfun' references.  Note that the next stmt is also accessible
> when there is no CFG.  I guess the issue is that we fold this during
> gimplification
> where the next stmt is not yet "there" (but still in GENERIC)?
>
> We generally do not want to have unfolded stmts in the IL when we can avoid that
> which is why we fold most stmts during gimplification.  We also do that because
> we now do less folding on GENERIC.
>
> There may be the possibility to refactor gimplification time folding to what we
> do during inlining - queue stmts we want to fold and perform all
> folding delayed.
> This of course means bigger compile-time due to cache effects.
>
>>
>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>> index d0792aa..f1988f6 100644
>>> --- a/gcc/tree-ssa-strlen.c
>>> +++ b/gcc/tree-ssa-strlen.c
>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>         && known_eq (dstoff, lhsoff)
>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>       return false;
>>> +
>>> +      if (code == MEM_REF
>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>> +       && known_eq (dstoff, lhsoff))
>>> +     {
>>> +       /* Extract the referenced variable from something like
>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>> +       if (gimple_nop_p (def))
>>> +         {
>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>> +           if (lhsbase
>>> +               && dstbase
>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>> +             return false;
>>> +         }
>>> +     }
>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>> the wrong tree.  It'd be easier to suggest something here if I could see
>> the gimple (with virtual operands).  BUt at some level what you really
>> want to do is make sure the base of the MEM_REF is the same as what got
>> passed as the destination of the strncpy.  You'd want to be testing
>> SSA_NAMEs in that case.
>
> Yes.  Why not simply compare the SSA names?  Why would it be
> not OK to do that when !lhsbase?

The added code handles this case:

   void f (char *d)
   {
     __builtin_strncpy (d, "12345", 4);
     d[3] = 0;
   }

where during forwprop we see:

   __builtin_strncpy (d_3(D), "12345", 4);
   MEM[(char *)d_3(D) + 3B] = 0;

The next statement after the strncpy is the assignment whose
lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
is no other information in the GIMPLE_NOP that I can see to
tell that the operand is d_3(D) or that it's the same as
the strncpy argument (i.e., the PARAM_DECl d).  Having to
do open-code this all the time seems so cumbersome -- is
there some API that would do this for me?  (I thought
get_addr_base_and_unit_offset was that API but clearly in
this case it doesn't do what I expect -- it just returns
the argument.)

Martin
Martin Sebor Aug. 27, 2018, 8:30 p.m. UTC | #6
On 08/25/2018 11:24 PM, Jeff Law wrote:
> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>> The warning suppression for -Wstringop-truncation looks for
>> the next statement after a truncating strncpy to see if it
>> adds a terminating nul.  This only works when the next
>> statement can be reached using the Gimple statement iterator
>> which isn't until after gimplification.  As a result, strncpy
>> calls that truncate their constant argument that are being
>> folded to memcpy this early get diagnosed even if they are
>> followed by the nul assignment:
>>
>>   const char s[] = "12345";
>>   char d[3];
>>
>>   void f (void)
>>   {
>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>     d[sizeof d - 1] = 0;
>>   }
>>
>> To avoid the warning I propose to defer folding strncpy to
>> memcpy until the pointer to the basic block the strnpy call
>> is in can be used to try to reach the next statement (this
>> happens as early as ccp1).  I'm aware of the preference to
>> fold things early but in the case of strncpy (a relatively
>> rarely used function that is often misused), getting
>> the warning right while folding a bit later but still fairly
>> early on seems like a reasonable compromise.  I fear that
>> otherwise, the false positives will drive users to adopt
>> other unsafe solutions (like memcpy) where these kinds of
>> bugs cannot be as readily detected.
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> PS There still are outstanding cases where the warning can
>> be avoided.  I xfailed them in the test for now but will
>> still try to get them to work for GCC 9.
>>
>> gcc-87028.diff
>>
>>
>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/87028
>> 	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>> 	statement doesn't belong to a basic block.
>> 	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>> 	the left hand side of assignment.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/87028
>> 	* c-c++-common/Wstringop-truncation.c: Remove xfails.
>> 	* gcc.dg/Wstringop-truncation-5.c: New test.
>>
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 07341eb..284c2fb 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>    if (tree_int_cst_lt (ssize, len))
>>      return false;
>>
>> +  /* Defer warning (and folding) until the next statement in the basic
>> +     block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
> I think you want cfun->cfg as the test here.  They should be equivalent
> in practice.
>
>
>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>> index d0792aa..f1988f6 100644
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>>  	  && known_eq (dstoff, lhsoff)
>>  	  && operand_equal_p (dstbase, lhsbase, 0))
>>  	return false;
>> +
>> +      if (code == MEM_REF
>> +	  && TREE_CODE (lhsbase) == SSA_NAME
>> +	  && known_eq (dstoff, lhsoff))
>> +	{
>> +	  /* Extract the referenced variable from something like
>> +	       MEM[(char *)d_3(D) + 3B] = 0;  */
>> +	  gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>> +	  if (gimple_nop_p (def))
>> +	    {
>> +	      lhsbase = SSA_NAME_VAR (lhsbase);
>> +	      if (lhsbase
>> +		  && dstbase
>> +		  && operand_equal_p (dstbase, lhsbase, 0))
>> +		return false;
>> +	    }
>> +	}
> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> the wrong tree.  It'd be easier to suggest something here if I could see
> the gimple (with virtual operands).  BUt at some level what you really
> want to do is make sure the base of the MEM_REF is the same as what got
> passed as the destination of the strncpy.  You'd want to be testing
> SSA_NAMEs in that case.

I replied to Richard with the code that this hunk handles:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01697.html

I couldn't find any other way to determine that d_3(D) in

   MEM[(char *)d_3(D) + 3B] = 0;

is the same as the first argument in:

   __builtin_strncpy (d_3(D), "12345", 4);

The MEM_REF operand is an SSA_NAME whose DEF_STMT is
a GIMPLE_NOP and whose SSA_NAME_VAR is the PARAM_DECL d.
Where else can I get the variable from?

Martin
Jeff Law Aug. 28, 2018, 4:27 a.m. UTC | #7
On 08/27/2018 10:27 AM, Martin Sebor wrote:
> On 08/27/2018 02:29 AM, Richard Biener wrote:
>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>> The warning suppression for -Wstringop-truncation looks for
>>>> the next statement after a truncating strncpy to see if it
>>>> adds a terminating nul.  This only works when the next
>>>> statement can be reached using the Gimple statement iterator
>>>> which isn't until after gimplification.  As a result, strncpy
>>>> calls that truncate their constant argument that are being
>>>> folded to memcpy this early get diagnosed even if they are
>>>> followed by the nul assignment:
>>>>
>>>>   const char s[] = "12345";
>>>>   char d[3];
>>>>
>>>>   void f (void)
>>>>   {
>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>     d[sizeof d - 1] = 0;
>>>>   }
>>>>
>>>> To avoid the warning I propose to defer folding strncpy to
>>>> memcpy until the pointer to the basic block the strnpy call
>>>> is in can be used to try to reach the next statement (this
>>>> happens as early as ccp1).  I'm aware of the preference to
>>>> fold things early but in the case of strncpy (a relatively
>>>> rarely used function that is often misused), getting
>>>> the warning right while folding a bit later but still fairly
>>>> early on seems like a reasonable compromise.  I fear that
>>>> otherwise, the false positives will drive users to adopt
>>>> other unsafe solutions (like memcpy) where these kinds of
>>>> bugs cannot be as readily detected.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> PS There still are outstanding cases where the warning can
>>>> be avoided.  I xfailed them in the test for now but will
>>>> still try to get them to work for GCC 9.
>>>>
>>>> gcc-87028.diff
>>>>
>>>>
>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>> strncpy with global variable source string
>>>> gcc/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>       statement doesn't belong to a basic block.
>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>       the left hand side of assignment.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>
>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>> index 07341eb..284c2fb 100644
>>>> --- a/gcc/gimple-fold.c
>>>> +++ b/gcc/gimple-fold.c
>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>> (gimple_stmt_iterator *gsi,
>>>>    if (tree_int_cst_lt (ssize, len))
>>>>      return false;
>>>>
>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>> +     block is reachable.  */
>>>> +  if (!gimple_bb (stmt))
>>>> +    return false;
>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>> in practice.
>>
>> Please do not add 'cfun' references.  Note that the next stmt is also
>> accessible
>> when there is no CFG.  I guess the issue is that we fold this during
>> gimplification
>> where the next stmt is not yet "there" (but still in GENERIC)?
>>
>> We generally do not want to have unfolded stmts in the IL when we can
>> avoid that
>> which is why we fold most stmts during gimplification.  We also do
>> that because
>> we now do less folding on GENERIC.
>>
>> There may be the possibility to refactor gimplification time folding
>> to what we
>> do during inlining - queue stmts we want to fold and perform all
>> folding delayed.
>> This of course means bigger compile-time due to cache effects.
>>
>>>
>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>> index d0792aa..f1988f6 100644
>>>> --- a/gcc/tree-ssa-strlen.c
>>>> +++ b/gcc/tree-ssa-strlen.c
>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>         && known_eq (dstoff, lhsoff)
>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>       return false;
>>>> +
>>>> +      if (code == MEM_REF
>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>> +       && known_eq (dstoff, lhsoff))
>>>> +     {
>>>> +       /* Extract the referenced variable from something like
>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>> +       if (gimple_nop_p (def))
>>>> +         {
>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>> +           if (lhsbase
>>>> +               && dstbase
>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>> +             return false;
>>>> +         }
>>>> +     }
>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>> the gimple (with virtual operands).  BUt at some level what you really
>>> want to do is make sure the base of the MEM_REF is the same as what got
>>> passed as the destination of the strncpy.  You'd want to be testing
>>> SSA_NAMEs in that case.
>>
>> Yes.  Why not simply compare the SSA names?  Why would it be
>> not OK to do that when !lhsbase?
> 
> The added code handles this case:
> 
>   void f (char *d)
>   {
>     __builtin_strncpy (d, "12345", 4);
>     d[3] = 0;
>   }
> 
> where during forwprop we see:
> 
>   __builtin_strncpy (d_3(D), "12345", 4);
>   MEM[(char *)d_3(D) + 3B] = 0;
> 
> The next statement after the strncpy is the assignment whose
> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> is no other information in the GIMPLE_NOP that I can see to
> tell that the operand is d_3(D) or that it's the same as
> the strncpy argument (i.e., the PARAM_DECl d).  Having to
> do open-code this all the time seems so cumbersome -- is
> there some API that would do this for me?  (I thought
> get_addr_base_and_unit_offset was that API but clearly in
> this case it doesn't do what I expect -- it just returns
> the argument.)

I think you need to look harder at that MEM_REF.  It references d_3.
That's what you need to be checking.  The base (d_3) is the first
operand of the MEM_REF, the offset is the second operand of the MEM_REF.

(gdb) p debug_gimple_stmt ($2)
# .MEM_5 = VDEF <.MEM_4>
MEM[(char *)d_3(D) + 3B] = 0;


(gdb) p gimple_assign_lhs ($2)
$5 = (tree_node *) 0x7ffff01a6208

(gdb) p debug_tree ($5)
 <mem_ref 0x7ffff01a6208
    type <integer_type 0x7ffff00723f0 char public string-flag QI
        size <integer_cst 0x7ffff0059d80 constant 8>
        unit-size <integer_cst 0x7ffff0059d98 constant 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
<integer_cst 0x7ffff0059df8 127>
        pointer_to_this <pointer_type 0x7ffff007de70>>

    arg:0 <ssa_name 0x7ffff0063dc8
        type <pointer_type 0x7ffff007de70 type <integer_type
0x7ffff00723f0 char>
            public unsigned DI
            size <integer_cst 0x7ffff0059c90 constant 64>
            unit-size <integer_cst 0x7ffff0059ca8 constant 8>
            align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
        visited var <parm_decl 0x7ffff01a5000 d>
        def_stmt GIMPLE_NOP
        version:3>
    arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
constant 3>
    j.c:4:6 start: j.c:4:5 finish: j.c:4:8>


Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

(gdb) p debug_tree (lhsbase)
<ssa_name 0x7ffff0063dc8
    type <pointer_type 0x7ffff007de70
        type <integer_type 0x7ffff00723f0 char public string-flag QI
            size <integer_cst 0x7ffff0059d80 constant 8>
            unit-size <integer_cst 0x7ffff0059d98 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
            pointer_to_this <pointer_type 0x7ffff007de70>>
        public unsigned DI
        size <integer_cst 0x7ffff0059c90 constant 64>
        unit-size <integer_cst 0x7ffff0059ca8 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
    visited var <parm_decl 0x7ffff01a5000 d>
    def_stmt GIMPLE_NOP
    version:3>


Sadly, dstbase is the PARM_DECL for d.  That's where things are going
"wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.

Jeff

Jeff
Richard Biener Aug. 28, 2018, 9:55 a.m. UTC | #8
On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <law@redhat.com> wrote:
>
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > On 08/27/2018 02:29 AM, Richard Biener wrote:
> >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>
> >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>> The warning suppression for -Wstringop-truncation looks for
> >>>> the next statement after a truncating strncpy to see if it
> >>>> adds a terminating nul.  This only works when the next
> >>>> statement can be reached using the Gimple statement iterator
> >>>> which isn't until after gimplification.  As a result, strncpy
> >>>> calls that truncate their constant argument that are being
> >>>> folded to memcpy this early get diagnosed even if they are
> >>>> followed by the nul assignment:
> >>>>
> >>>>   const char s[] = "12345";
> >>>>   char d[3];
> >>>>
> >>>>   void f (void)
> >>>>   {
> >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>     d[sizeof d - 1] = 0;
> >>>>   }
> >>>>
> >>>> To avoid the warning I propose to defer folding strncpy to
> >>>> memcpy until the pointer to the basic block the strnpy call
> >>>> is in can be used to try to reach the next statement (this
> >>>> happens as early as ccp1).  I'm aware of the preference to
> >>>> fold things early but in the case of strncpy (a relatively
> >>>> rarely used function that is often misused), getting
> >>>> the warning right while folding a bit later but still fairly
> >>>> early on seems like a reasonable compromise.  I fear that
> >>>> otherwise, the false positives will drive users to adopt
> >>>> other unsafe solutions (like memcpy) where these kinds of
> >>>> bugs cannot be as readily detected.
> >>>>
> >>>> Tested on x86_64-linux.
> >>>>
> >>>> Martin
> >>>>
> >>>> PS There still are outstanding cases where the warning can
> >>>> be avoided.  I xfailed them in the test for now but will
> >>>> still try to get them to work for GCC 9.
> >>>>
> >>>> gcc-87028.diff
> >>>>
> >>>>
> >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>> strncpy with global variable source string
> >>>> gcc/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>>       statement doesn't belong to a basic block.
> >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>>       the left hand side of assignment.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>
> >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>> index 07341eb..284c2fb 100644
> >>>> --- a/gcc/gimple-fold.c
> >>>> +++ b/gcc/gimple-fold.c
> >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>> (gimple_stmt_iterator *gsi,
> >>>>    if (tree_int_cst_lt (ssize, len))
> >>>>      return false;
> >>>>
> >>>> +  /* Defer warning (and folding) until the next statement in the basic
> >>>> +     block is reachable.  */
> >>>> +  if (!gimple_bb (stmt))
> >>>> +    return false;
> >>> I think you want cfun->cfg as the test here.  They should be equivalent
> >>> in practice.
> >>
> >> Please do not add 'cfun' references.  Note that the next stmt is also
> >> accessible
> >> when there is no CFG.  I guess the issue is that we fold this during
> >> gimplification
> >> where the next stmt is not yet "there" (but still in GENERIC)?
> >>
> >> We generally do not want to have unfolded stmts in the IL when we can
> >> avoid that
> >> which is why we fold most stmts during gimplification.  We also do
> >> that because
> >> we now do less folding on GENERIC.
> >>
> >> There may be the possibility to refactor gimplification time folding
> >> to what we
> >> do during inlining - queue stmts we want to fold and perform all
> >> folding delayed.
> >> This of course means bigger compile-time due to cache effects.
> >>
> >>>
> >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> >>>> index d0792aa..f1988f6 100644
> >>>> --- a/gcc/tree-ssa-strlen.c
> >>>> +++ b/gcc/tree-ssa-strlen.c
> >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> >>>>         && known_eq (dstoff, lhsoff)
> >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> >>>>       return false;
> >>>> +
> >>>> +      if (code == MEM_REF
> >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> >>>> +       && known_eq (dstoff, lhsoff))
> >>>> +     {
> >>>> +       /* Extract the referenced variable from something like
> >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> >>>> +       if (gimple_nop_p (def))
> >>>> +         {
> >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> >>>> +           if (lhsbase
> >>>> +               && dstbase
> >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> >>>> +             return false;
> >>>> +         }
> >>>> +     }
> >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> >>> the wrong tree.  It'd be easier to suggest something here if I could see
> >>> the gimple (with virtual operands).  BUt at some level what you really
> >>> want to do is make sure the base of the MEM_REF is the same as what got
> >>> passed as the destination of the strncpy.  You'd want to be testing
> >>> SSA_NAMEs in that case.
> >>
> >> Yes.  Why not simply compare the SSA names?  Why would it be
> >> not OK to do that when !lhsbase?
> >
> > The added code handles this case:
> >
> >   void f (char *d)
> >   {
> >     __builtin_strncpy (d, "12345", 4);
> >     d[3] = 0;
> >   }
> >
> > where during forwprop we see:
> >
> >   __builtin_strncpy (d_3(D), "12345", 4);
> >   MEM[(char *)d_3(D) + 3B] = 0;
> >
> > The next statement after the strncpy is the assignment whose
> > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > is no other information in the GIMPLE_NOP that I can see to
> > tell that the operand is d_3(D) or that it's the same as
> > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > do open-code this all the time seems so cumbersome -- is
> > there some API that would do this for me?  (I thought
> > get_addr_base_and_unit_offset was that API but clearly in
> > this case it doesn't do what I expect -- it just returns
> > the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
>
> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
>
>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.

Probably get_attr_nonstring_decl has the same "mistake" and returns
the PARM_DECL instead of the SSA name pointer.  So we're comparing
apples and oranges here.

Yeah:

/* If EXPR refers to a character array or pointer declared attribute
   nonstring return a decl for that array or pointer and set *REF to
   the referenced enclosing object or pointer.  Otherwise returns
   null.  */

tree
get_attr_nonstring_decl (tree expr, tree *ref)
{
  tree decl = expr;
  if (TREE_CODE (decl) == SSA_NAME)
    {
      gimple *def = SSA_NAME_DEF_STMT (decl);

      if (is_gimple_assign (def))
        {
          tree_code code = gimple_assign_rhs_code (def);
          if (code == ADDR_EXPR
              || code == COMPONENT_REF
              || code == VAR_DECL)
            decl = gimple_assign_rhs1 (def);
        }
      else if (tree var = SSA_NAME_VAR (decl))
        decl = var;
    }

  if (TREE_CODE (decl) == ADDR_EXPR)
    decl = TREE_OPERAND (decl, 0);

  if (ref)
    *ref = decl;

I see a lot of "magic" here again in the attempt to "propagate"
a nonstring attribute.  Note

foo (char *p __attribute__(("nonstring")))
{
  p = "bar";
  strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
}

is perfectly valid and p as passed to strlen is _not_ nonstring(?).

I think in your code comparing bases you want to look at the _original_
argument to the string function rather than what get_attr_nonstring_decl
returned as ref.

Richard.

> Jeff
>
> Jeff
Richard Biener Aug. 28, 2018, 9:57 a.m. UTC | #9
On Tue, Aug 28, 2018 at 11:55 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <law@redhat.com> wrote:
> >
> > On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > > On 08/27/2018 02:29 AM, Richard Biener wrote:
> > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> > >>>
> > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> > >>>> The warning suppression for -Wstringop-truncation looks for
> > >>>> the next statement after a truncating strncpy to see if it
> > >>>> adds a terminating nul.  This only works when the next
> > >>>> statement can be reached using the Gimple statement iterator
> > >>>> which isn't until after gimplification.  As a result, strncpy
> > >>>> calls that truncate their constant argument that are being
> > >>>> folded to memcpy this early get diagnosed even if they are
> > >>>> followed by the nul assignment:
> > >>>>
> > >>>>   const char s[] = "12345";
> > >>>>   char d[3];
> > >>>>
> > >>>>   void f (void)
> > >>>>   {
> > >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> > >>>>     d[sizeof d - 1] = 0;
> > >>>>   }
> > >>>>
> > >>>> To avoid the warning I propose to defer folding strncpy to
> > >>>> memcpy until the pointer to the basic block the strnpy call
> > >>>> is in can be used to try to reach the next statement (this
> > >>>> happens as early as ccp1).  I'm aware of the preference to
> > >>>> fold things early but in the case of strncpy (a relatively
> > >>>> rarely used function that is often misused), getting
> > >>>> the warning right while folding a bit later but still fairly
> > >>>> early on seems like a reasonable compromise.  I fear that
> > >>>> otherwise, the false positives will drive users to adopt
> > >>>> other unsafe solutions (like memcpy) where these kinds of
> > >>>> bugs cannot be as readily detected.
> > >>>>
> > >>>> Tested on x86_64-linux.
> > >>>>
> > >>>> Martin
> > >>>>
> > >>>> PS There still are outstanding cases where the warning can
> > >>>> be avoided.  I xfailed them in the test for now but will
> > >>>> still try to get them to work for GCC 9.
> > >>>>
> > >>>> gcc-87028.diff
> > >>>>
> > >>>>
> > >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> > >>>> strncpy with global variable source string
> > >>>> gcc/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> > >>>>       statement doesn't belong to a basic block.
> > >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> > >>>>       the left hand side of assignment.
> > >>>>
> > >>>> gcc/testsuite/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> > >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> > >>>>
> > >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > >>>> index 07341eb..284c2fb 100644
> > >>>> --- a/gcc/gimple-fold.c
> > >>>> +++ b/gcc/gimple-fold.c
> > >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> > >>>> (gimple_stmt_iterator *gsi,
> > >>>>    if (tree_int_cst_lt (ssize, len))
> > >>>>      return false;
> > >>>>
> > >>>> +  /* Defer warning (and folding) until the next statement in the basic
> > >>>> +     block is reachable.  */
> > >>>> +  if (!gimple_bb (stmt))
> > >>>> +    return false;
> > >>> I think you want cfun->cfg as the test here.  They should be equivalent
> > >>> in practice.
> > >>
> > >> Please do not add 'cfun' references.  Note that the next stmt is also
> > >> accessible
> > >> when there is no CFG.  I guess the issue is that we fold this during
> > >> gimplification
> > >> where the next stmt is not yet "there" (but still in GENERIC)?
> > >>
> > >> We generally do not want to have unfolded stmts in the IL when we can
> > >> avoid that
> > >> which is why we fold most stmts during gimplification.  We also do
> > >> that because
> > >> we now do less folding on GENERIC.
> > >>
> > >> There may be the possibility to refactor gimplification time folding
> > >> to what we
> > >> do during inlining - queue stmts we want to fold and perform all
> > >> folding delayed.
> > >> This of course means bigger compile-time due to cache effects.
> > >>
> > >>>
> > >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> > >>>> index d0792aa..f1988f6 100644
> > >>>> --- a/gcc/tree-ssa-strlen.c
> > >>>> +++ b/gcc/tree-ssa-strlen.c
> > >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> > >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> > >>>>         && known_eq (dstoff, lhsoff)
> > >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> > >>>>       return false;
> > >>>> +
> > >>>> +      if (code == MEM_REF
> > >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> > >>>> +       && known_eq (dstoff, lhsoff))
> > >>>> +     {
> > >>>> +       /* Extract the referenced variable from something like
> > >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> > >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> > >>>> +       if (gimple_nop_p (def))
> > >>>> +         {
> > >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> > >>>> +           if (lhsbase
> > >>>> +               && dstbase
> > >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> > >>>> +             return false;
> > >>>> +         }
> > >>>> +     }
> > >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> > >>> the wrong tree.  It'd be easier to suggest something here if I could see
> > >>> the gimple (with virtual operands).  BUt at some level what you really
> > >>> want to do is make sure the base of the MEM_REF is the same as what got
> > >>> passed as the destination of the strncpy.  You'd want to be testing
> > >>> SSA_NAMEs in that case.
> > >>
> > >> Yes.  Why not simply compare the SSA names?  Why would it be
> > >> not OK to do that when !lhsbase?
> > >
> > > The added code handles this case:
> > >
> > >   void f (char *d)
> > >   {
> > >     __builtin_strncpy (d, "12345", 4);
> > >     d[3] = 0;
> > >   }
> > >
> > > where during forwprop we see:
> > >
> > >   __builtin_strncpy (d_3(D), "12345", 4);
> > >   MEM[(char *)d_3(D) + 3B] = 0;
> > >
> > > The next statement after the strncpy is the assignment whose
> > > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > > is no other information in the GIMPLE_NOP that I can see to
> > > tell that the operand is d_3(D) or that it's the same as
> > > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > > do open-code this all the time seems so cumbersome -- is
> > > there some API that would do this for me?  (I thought
> > > get_addr_base_and_unit_offset was that API but clearly in
> > > this case it doesn't do what I expect -- it just returns
> > > the argument.)
> >
> > I think you need to look harder at that MEM_REF.  It references d_3.
> > That's what you need to be checking.  The base (d_3) is the first
> > operand of the MEM_REF, the offset is the second operand of the MEM_REF.
> >
> > (gdb) p debug_gimple_stmt ($2)
> > # .MEM_5 = VDEF <.MEM_4>
> > MEM[(char *)d_3(D) + 3B] = 0;
> >
> >
> > (gdb) p gimple_assign_lhs ($2)
> > $5 = (tree_node *) 0x7ffff01a6208
> >
> > (gdb) p debug_tree ($5)
> >  <mem_ref 0x7ffff01a6208
> >     type <integer_type 0x7ffff00723f0 char public string-flag QI
> >         size <integer_cst 0x7ffff0059d80 constant 8>
> >         unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> > 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> > <integer_cst 0x7ffff0059df8 127>
> >         pointer_to_this <pointer_type 0x7ffff007de70>>
> >
> >     arg:0 <ssa_name 0x7ffff0063dc8
> >         type <pointer_type 0x7ffff007de70 type <integer_type
> > 0x7ffff00723f0 char>
> >             public unsigned DI
> >             size <integer_cst 0x7ffff0059c90 constant 64>
> >             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >         visited var <parm_decl 0x7ffff01a5000 d>
> >         def_stmt GIMPLE_NOP
> >         version:3>
> >     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> > constant 3>
> >     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
> >
> >
> > Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
> >
> > (gdb) p debug_tree (lhsbase)
> > <ssa_name 0x7ffff0063dc8
> >     type <pointer_type 0x7ffff007de70
> >         type <integer_type 0x7ffff00723f0 char public string-flag QI
> >             size <integer_cst 0x7ffff0059d80 constant 8>
> >             unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> > 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
> >             pointer_to_this <pointer_type 0x7ffff007de70>>
> >         public unsigned DI
> >         size <integer_cst 0x7ffff0059c90 constant 64>
> >         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >     visited var <parm_decl 0x7ffff01a5000 d>
> >     def_stmt GIMPLE_NOP
> >     version:3>
> >
> >
> > Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> > "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> > debug get_addr_base_and_unit_offset to understand what's going on.
> > Essentially you're getting different results of
> > get_addr_base_and_unit_offset in a case where they arguably should be
> > the same.
>
> Probably get_attr_nonstring_decl has the same "mistake" and returns
> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> apples and oranges here.
>
> Yeah:
>
> /* If EXPR refers to a character array or pointer declared attribute
>    nonstring return a decl for that array or pointer and set *REF to
>    the referenced enclosing object or pointer.  Otherwise returns
>    null.  */
>
> tree
> get_attr_nonstring_decl (tree expr, tree *ref)
> {
>   tree decl = expr;
>   if (TREE_CODE (decl) == SSA_NAME)
>     {
>       gimple *def = SSA_NAME_DEF_STMT (decl);
>
>       if (is_gimple_assign (def))
>         {
>           tree_code code = gimple_assign_rhs_code (def);
>           if (code == ADDR_EXPR
>               || code == COMPONENT_REF
>               || code == VAR_DECL)
>             decl = gimple_assign_rhs1 (def);
>         }
>       else if (tree var = SSA_NAME_VAR (decl))
>         decl = var;
>     }
>
>   if (TREE_CODE (decl) == ADDR_EXPR)
>     decl = TREE_OPERAND (decl, 0);
>
>   if (ref)
>     *ref = decl;
>
> I see a lot of "magic" here again in the attempt to "propagate"
> a nonstring attribute.  Note
>
> foo (char *p __attribute__(("nonstring")))
> {
>   p = "bar";
>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> }
>
> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I think in your code comparing bases you want to look at the _original_
> argument to the string function rather than what get_attr_nonstring_decl
> returned as ref.

Oh, and this 'nonstring' feels like sth that could be propagated by points-to
analysis.

> Richard.
>
> > Jeff
> >
> > Jeff
Martin Sebor Aug. 28, 2018, 8:43 p.m. UTC | #10
On 08/27/2018 10:27 PM, Jeff Law wrote:
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>> the next statement after a truncating strncpy to see if it
>>>>> adds a terminating nul.  This only works when the next
>>>>> statement can be reached using the Gimple statement iterator
>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>> calls that truncate their constant argument that are being
>>>>> folded to memcpy this early get diagnosed even if they are
>>>>> followed by the nul assignment:
>>>>>
>>>>>   const char s[] = "12345";
>>>>>   char d[3];
>>>>>
>>>>>   void f (void)
>>>>>   {
>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>     d[sizeof d - 1] = 0;
>>>>>   }
>>>>>
>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>> is in can be used to try to reach the next statement (this
>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>> fold things early but in the case of strncpy (a relatively
>>>>> rarely used function that is often misused), getting
>>>>> the warning right while folding a bit later but still fairly
>>>>> early on seems like a reasonable compromise.  I fear that
>>>>> otherwise, the false positives will drive users to adopt
>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>> bugs cannot be as readily detected.
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> PS There still are outstanding cases where the warning can
>>>>> be avoided.  I xfailed them in the test for now but will
>>>>> still try to get them to work for GCC 9.
>>>>>
>>>>> gcc-87028.diff
>>>>>
>>>>>
>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>> strncpy with global variable source string
>>>>> gcc/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>       statement doesn't belong to a basic block.
>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>       the left hand side of assignment.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>
>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>> index 07341eb..284c2fb 100644
>>>>> --- a/gcc/gimple-fold.c
>>>>> +++ b/gcc/gimple-fold.c
>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>> (gimple_stmt_iterator *gsi,
>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>      return false;
>>>>>
>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>> +     block is reachable.  */
>>>>> +  if (!gimple_bb (stmt))
>>>>> +    return false;
>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>> in practice.
>>>
>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>> accessible
>>> when there is no CFG.  I guess the issue is that we fold this during
>>> gimplification
>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>
>>> We generally do not want to have unfolded stmts in the IL when we can
>>> avoid that
>>> which is why we fold most stmts during gimplification.  We also do
>>> that because
>>> we now do less folding on GENERIC.
>>>
>>> There may be the possibility to refactor gimplification time folding
>>> to what we
>>> do during inlining - queue stmts we want to fold and perform all
>>> folding delayed.
>>> This of course means bigger compile-time due to cache effects.
>>>
>>>>
>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>> index d0792aa..f1988f6 100644
>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>         && known_eq (dstoff, lhsoff)
>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>       return false;
>>>>> +
>>>>> +      if (code == MEM_REF
>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>> +       && known_eq (dstoff, lhsoff))
>>>>> +     {
>>>>> +       /* Extract the referenced variable from something like
>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>> +       if (gimple_nop_p (def))
>>>>> +         {
>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>> +           if (lhsbase
>>>>> +               && dstbase
>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>> +             return false;
>>>>> +         }
>>>>> +     }
>>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>> want to do is make sure the base of the MEM_REF is the same as what got
>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>> SSA_NAMEs in that case.
>>>
>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>> not OK to do that when !lhsbase?
>>
>> The added code handles this case:
>>
>>   void f (char *d)
>>   {
>>     __builtin_strncpy (d, "12345", 4);
>>     d[3] = 0;
>>   }
>>
>> where during forwprop we see:
>>
>>   __builtin_strncpy (d_3(D), "12345", 4);
>>   MEM[(char *)d_3(D) + 3B] = 0;
>>
>> The next statement after the strncpy is the assignment whose
>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>> is no other information in the GIMPLE_NOP that I can see to
>> tell that the operand is d_3(D) or that it's the same as
>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>> do open-code this all the time seems so cumbersome -- is
>> there some API that would do this for me?  (I thought
>> get_addr_base_and_unit_offset was that API but clearly in
>> this case it doesn't do what I expect -- it just returns
>> the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

The d in the MEM_REF you see in the dump above is the SSA_NAME's
SSA_NAME_VAR:

           visited var <parm_decl 0x7ffff01a5000 d>

Here's the print_node() code that prints it:

	  print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);

There is nothing else in the MEM_REF operand that tells me that.
Why is it wrong to look at the SSA_NAME_VAR?

> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".

As Richard observed, that's because get_attr_nonstring_decl()
returns the DECL that the expression refers to.  It does that
because that's where it looks for attribute nonstring, and so
that the warning can mention the DECL with the attribute.

I suppose since I'm not supposed to be using SSA_NAME_VAR
(I still don't understand why it's taboo) I'll have to avoid
using the get_attr_nonstring_decl() return value and instead
look into comparing the SSA_NAMEs.

Martin

> Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.
>
> Jeff
>
> Jeff
>
Jeff Law Aug. 28, 2018, 10:17 p.m. UTC | #11
On 08/28/2018 02:43 PM, Martin Sebor wrote:
> On 08/27/2018 10:27 PM, Jeff Law wrote:
>> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>> the next statement after a truncating strncpy to see if it
>>>>>> adds a terminating nul.  This only works when the next
>>>>>> statement can be reached using the Gimple statement iterator
>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>> calls that truncate their constant argument that are being
>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>> followed by the nul assignment:
>>>>>>
>>>>>>   const char s[] = "12345";
>>>>>>   char d[3];
>>>>>>
>>>>>>   void f (void)
>>>>>>   {
>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>     d[sizeof d - 1] = 0;
>>>>>>   }
>>>>>>
>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>> is in can be used to try to reach the next statement (this
>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>> rarely used function that is often misused), getting
>>>>>> the warning right while folding a bit later but still fairly
>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>> otherwise, the false positives will drive users to adopt
>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>> bugs cannot be as readily detected.
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS There still are outstanding cases where the warning can
>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>> still try to get them to work for GCC 9.
>>>>>>
>>>>>> gcc-87028.diff
>>>>>>
>>>>>>
>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>> strncpy with global variable source string
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>>>> when
>>>>>>       statement doesn't belong to a basic block.
>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>> MEM_REF on
>>>>>>       the left hand side of assignment.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>
>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>> index 07341eb..284c2fb 100644
>>>>>> --- a/gcc/gimple-fold.c
>>>>>> +++ b/gcc/gimple-fold.c
>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>      return false;
>>>>>>
>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>> basic
>>>>>> +     block is reachable.  */
>>>>>> +  if (!gimple_bb (stmt))
>>>>>> +    return false;
>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>> equivalent
>>>>> in practice.
>>>>
>>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>>> accessible
>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>> gimplification
>>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>>
>>>> We generally do not want to have unfolded stmts in the IL when we can
>>>> avoid that
>>>> which is why we fold most stmts during gimplification.  We also do
>>>> that because
>>>> we now do less folding on GENERIC.
>>>>
>>>> There may be the possibility to refactor gimplification time folding
>>>> to what we
>>>> do during inlining - queue stmts we want to fold and perform all
>>>> folding delayed.
>>>> This of course means bigger compile-time due to cache effects.
>>>>
>>>>>
>>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>>> index d0792aa..f1988f6 100644
>>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>>         && known_eq (dstoff, lhsoff)
>>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>>       return false;
>>>>>> +
>>>>>> +      if (code == MEM_REF
>>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>>> +       && known_eq (dstoff, lhsoff))
>>>>>> +     {
>>>>>> +       /* Extract the referenced variable from something like
>>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>>> +       if (gimple_nop_p (def))
>>>>>> +         {
>>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>>> +           if (lhsbase
>>>>>> +               && dstbase
>>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>>> +             return false;
>>>>>> +         }
>>>>>> +     }
>>>>> If you find yourself looking at SSA_NAME_VAR, you're usually
>>>>> barking up
>>>>> the wrong tree.  It'd be easier to suggest something here if I
>>>>> could see
>>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>>> want to do is make sure the base of the MEM_REF is the same as what
>>>>> got
>>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>>> SSA_NAMEs in that case.
>>>>
>>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>>> not OK to do that when !lhsbase?
>>>
>>> The added code handles this case:
>>>
>>>   void f (char *d)
>>>   {
>>>     __builtin_strncpy (d, "12345", 4);
>>>     d[3] = 0;
>>>   }
>>>
>>> where during forwprop we see:
>>>
>>>   __builtin_strncpy (d_3(D), "12345", 4);
>>>   MEM[(char *)d_3(D) + 3B] = 0;
>>>
>>> The next statement after the strncpy is the assignment whose
>>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>>> is no other information in the GIMPLE_NOP that I can see to
>>> tell that the operand is d_3(D) or that it's the same as
>>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>>> do open-code this all the time seems so cumbersome -- is
>>> there some API that would do this for me?  (I thought
>>> get_addr_base_and_unit_offset was that API but clearly in
>>> this case it doesn't do what I expect -- it just returns
>>> the argument.)
>>
>> I think you need to look harder at that MEM_REF.  It references d_3.
>> That's what you need to be checking.  The base (d_3) is the first
>> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>>
>> (gdb) p debug_gimple_stmt ($2)
>> # .MEM_5 = VDEF <.MEM_4>
>> MEM[(char *)d_3(D) + 3B] = 0;
>>
>>
>> (gdb) p gimple_assign_lhs ($2)
>> $5 = (tree_node *) 0x7ffff01a6208
>>
>> (gdb) p debug_tree ($5)
>>  <mem_ref 0x7ffff01a6208
>>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>>         size <integer_cst 0x7ffff0059d80 constant 8>
>>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
>> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
>> <integer_cst 0x7ffff0059df8 127>
>>         pointer_to_this <pointer_type 0x7ffff007de70>>
>>
>>     arg:0 <ssa_name 0x7ffff0063dc8
>>         type <pointer_type 0x7ffff007de70 type <integer_type
>> 0x7ffff00723f0 char>
>>             public unsigned DI
>>             size <integer_cst 0x7ffff0059c90 constant 64>
>>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>         visited var <parm_decl 0x7ffff01a5000 d>
>>         def_stmt GIMPLE_NOP
>>         version:3>
>>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
>> constant 3>
>>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>>
>>
>> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
> 
> The d in the MEM_REF you see in the dump above is the SSA_NAME's
> SSA_NAME_VAR:
> 
>           visited var <parm_decl 0x7ffff01a5000 d>
> 
> Here's the print_node() code that prints it:
> 
>       print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);
> 
> There is nothing else in the MEM_REF operand that tells me that.
> Why is it wrong to look at the SSA_NAME_VAR?
> 
>> (gdb) p debug_tree (lhsbase)
>> <ssa_name 0x7ffff0063dc8
>>     type <pointer_type 0x7ffff007de70
>>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>>             size <integer_cst 0x7ffff0059d80 constant 8>
>>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
>> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>>             pointer_to_this <pointer_type 0x7ffff007de70>>
>>         public unsigned DI
>>         size <integer_cst 0x7ffff0059c90 constant 64>
>>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>     visited var <parm_decl 0x7ffff01a5000 d>
>>     def_stmt GIMPLE_NOP
>>     version:3>
>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>> "wrong".
> 
> As Richard observed, that's because get_attr_nonstring_decl()
> returns the DECL that the expression refers to.  It does that
> because that's where it looks for attribute nonstring, and so
> that the warning can mention the DECL with the attribute.
> 
> I suppose since I'm not supposed to be using SSA_NAME_VAR
> (I still don't understand why it's taboo) I'll have to avoid
> using the get_attr_nonstring_decl() return value and instead
> look into comparing the SSA_NAMEs.
Because it's not generally useful because it has no dataflow information
associated with it.  SSA_NAMEs are what carry dataflow information and
what you need to check if you want to know if two objects are the same.

SSA_NAME_VAR's primary use is for diagnostic messages and debugging.  We
do hang attributes off the _DECL node it refers to, so you can take an
SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME
has a particular attribute property.  But if you're trying to see if two
objects in the IL are the same, you need to be looking at the SSA_NAME.

jeff
Martin Sebor Aug. 29, 2018, 12:12 a.m. UTC | #12
>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>> debug get_addr_base_and_unit_offset to understand what's going on.
>> Essentially you're getting different results of
>> get_addr_base_and_unit_offset in a case where they arguably should be
>> the same.
>
> Probably get_attr_nonstring_decl has the same "mistake" and returns
> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> apples and oranges here.

Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
intentional but the function need not (perhaps should not)
also set *REF to it.

>
> Yeah:
>
> /* If EXPR refers to a character array or pointer declared attribute
>    nonstring return a decl for that array or pointer and set *REF to
>    the referenced enclosing object or pointer.  Otherwise returns
>    null.  */
>
> tree
> get_attr_nonstring_decl (tree expr, tree *ref)
> {
>   tree decl = expr;
>   if (TREE_CODE (decl) == SSA_NAME)
>     {
>       gimple *def = SSA_NAME_DEF_STMT (decl);
>
>       if (is_gimple_assign (def))
>         {
>           tree_code code = gimple_assign_rhs_code (def);
>           if (code == ADDR_EXPR
>               || code == COMPONENT_REF
>               || code == VAR_DECL)
>             decl = gimple_assign_rhs1 (def);
>         }
>       else if (tree var = SSA_NAME_VAR (decl))
>         decl = var;
>     }
>
>   if (TREE_CODE (decl) == ADDR_EXPR)
>     decl = TREE_OPERAND (decl, 0);
>
>   if (ref)
>     *ref = decl;
>
> I see a lot of "magic" here again in the attempt to "propagate"
> a nonstring attribute.

That's the function's purpose: to look for the attribute.  Is
there a better way to do this?

> Note
>
> foo (char *p __attribute__(("nonstring")))
> {
>   p = "bar";
>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> }
>
> is perfectly valid and p as passed to strlen is _not_ nonstring(?).

I don't know if you're saying that it should get a warning or
shouldn't.  Right now it doesn't because the strlen() call is
folded before we check for nonstring.

I could see an argument for diagnosing it but I suspect you
wouldn't like it because it would mean more warning from
the folder.  I could also see an argument against it because,
as you said, it's safe.

If you take the assignment to p away then a warning is issued,
and that's because p is declared with attribute nonstring.
That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.

> I think in your code comparing bases you want to look at the _original_
> argument to the string function rather than what get_attr_nonstring_decl
> returned as ref.

I've adjusted get_attr_nonstring_decl() to avoid setting *REF
to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
the patch.  I've also updated the comment above SSA_NAME_VAR
to clarify its purpose per Jeff's comments.

Attached is an updated revision with these changes.

Martin
PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
gcc/ChangeLog:

	PR tree-optimization/87028
	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
	SSA_NAME_VAR.
	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
	when statement doesn't belong to a basic block.
	* tree.h (SSA_NAME_VAR): Update comment.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.

gcc/testsuite/ChangeLog:

	PR tree-optimization/87028
	* c-c++-common/Wstringop-truncation.c: Remove xfails.
	* gcc.dg/Wstringop-truncation-5.c: New test.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 263925)
+++ gcc/tree.h	(working copy)
@@ -1697,7 +1697,10 @@ extern tree maybe_wrap_with_location (tree, locati
    : NULL_TREE)
 
 /* Returns the variable being referenced.  This can be NULL_TREE for
-   temporaries not associated with any user variable.
+   temporaries not associated with any user variable.  The result
+   is mainly useful for debugging, diagnostics, or as the target
+   declaration referenced by an SSA_NAME.  Otherwise, because
+   it has no dataflow information, it should not be used.
    Once released, this is the only field that can be relied upon.  */
 #define SSA_NAME_VAR(NODE)					\
   (SSA_NAME_CHECK (NODE)->ssa_name.var == NULL_TREE		\
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 263928)
+++ gcc/calls.c	(working copy)
@@ -1503,6 +1503,7 @@ tree
 get_attr_nonstring_decl (tree expr, tree *ref)
 {
   tree decl = expr;
+  tree var = NULL_TREE;
   if (TREE_CODE (decl) == SSA_NAME)
     {
       gimple *def = SSA_NAME_DEF_STMT (decl);
@@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
 	      || code == VAR_DECL)
 	    decl = gimple_assign_rhs1 (def);
 	}
-      else if (tree var = SSA_NAME_VAR (decl))
-	decl = var;
+      else
+	var = SSA_NAME_VAR (decl);
     }
 
   if (TREE_CODE (decl) == ADDR_EXPR)
     decl = TREE_OPERAND (decl, 0);
 
+  /* To simplify calling code, store the referenced DECL regardless of
+     the attribute determined below, but avoid storing the SSA_NAME_VAR
+     obtained above (it's not useful for dataflow purposes).  */
   if (ref)
     *ref = decl;
 
-  if (TREE_CODE (decl) == ARRAY_REF)
+  /* Use the SSA_NAME_VAR that was determined above to see if it's
+     declared nonstring.  Otherwise drill down into the referenced
+     DECL.  */
+  if (var)
+    decl = var;
+  else if (TREE_CODE (decl) == ARRAY_REF)
     decl = TREE_OPERAND (decl, 0);
   else if (TREE_CODE (decl) == COMPONENT_REF)
     decl = TREE_OPERAND (decl, 1);
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	(revision 263925)
+++ gcc/gimple-fold.c	(working copy)
@@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
   if (tree_int_cst_lt (ssize, len))
     return false;
 
+  /* Defer warning (and folding) until the next statement in the basic
+     block is reachable.  */
+  if (!gimple_bb (stmt))
+    return false;
+
   /* Diagnose truncation that leaves the copy unterminated.  */
   maybe_diag_stxncpy_trunc (*gsi, src, len);
 
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	(revision 263925)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -1904,8 +1904,6 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi
   if (TREE_CODE (dstdecl) == ADDR_EXPR)
     dstdecl = TREE_OPERAND (dstdecl, 0);
 
-  tree ref = NULL_TREE;
-
   if (!sidx)
     {
       /* If the source is a non-string return early to avoid warning
@@ -1914,12 +1912,14 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi
       tree srcdecl = gimple_call_arg (stmt, 1);
       if (TREE_CODE (srcdecl) == ADDR_EXPR)
 	srcdecl = TREE_OPERAND (srcdecl, 0);
-      if (get_attr_nonstring_decl (srcdecl, &ref))
+      if (get_attr_nonstring_decl (srcdecl, NULL))
 	return false;
     }
 
-  /* Likewise, if the destination refers to a an array/pointer declared
-     nonstring return early.  */
+  /* Likewise, if the destination refers to an array/pointer declared
+     nonstring return early.  REF will be set to the referenced enclosing
+     object or pointer either way.  */
+  tree ref;
   if (get_attr_nonstring_decl (dstdecl, &ref))
     return false;
 
Index: gcc/testsuite/c-c++-common/Wstringop-truncation.c
===================================================================
--- gcc/testsuite/c-c++-common/Wstringop-truncation.c	(revision 263925)
+++ gcc/testsuite/c-c++-common/Wstringop-truncation.c	(working copy)
@@ -329,9 +329,8 @@ void test_strncpy_array (Dest *pd, int i, const ch
      of the array to NUL is not diagnosed.  */
   {
     /* This might be better written using memcpy() but it's safe so
-       it probably shouldn't be diagnosed.  It currently triggers
-       a warning because of bug 81704.  */
-    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+       it isn't diagnosed.  See pr81704 and pr87028.  */
+    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     dst7[sizeof dst7 - 1] = '\0';
     sink (dst7);
   }
@@ -350,7 +349,7 @@ void test_strncpy_array (Dest *pd, int i, const ch
   }
 
   {
-    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     pd->a5[sizeof pd->a5 - 1] = '\0';
     sink (pd);
   }
Index: gcc/testsuite/gcc.dg/Wstringop-truncation-5.c
===================================================================
--- gcc/testsuite/gcc.dg/Wstringop-truncation-5.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wstringop-truncation-5.c	(working copy)
@@ -0,0 +1,64 @@
+/* PR tree-optimization/87028 - false positive -Wstringop-truncation
+   strncpy with global variable source string
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-truncation" } */
+
+char *strncpy (char *, const char *, __SIZE_TYPE__);
+
+#define STR   "1234567890"
+
+struct S
+{
+  char a[5], b[5];
+};
+
+const char arr[] = STR;
+const char* const ptr = STR;
+
+const char arr2[][10] = { "123", STR };
+
+void test_literal (struct S *s)
+{
+  strncpy (s->a, STR, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+}
+
+void test_global_arr (struct S *s)
+{
+  strncpy (s->a, arr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_global_arr2 (struct S *s)
+{
+  strncpy (s->a, arr2[1], sizeof s->a - 1); /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+
+  strncpy (s->b, arr2[0], sizeof s->a - 1);
+}
+
+void test_global_ptr (struct S *s)
+{
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_local_arr (struct S *s)
+{
+  const char arr[] = STR;
+  strncpy (s->a, arr, sizeof s->a - 1);
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_local_ptr (struct S *s)
+{
+  const char* const ptr = STR;
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_compound_literal (struct S *s)
+{
+  strncpy (s->a, (char[]){ STR }, sizeof s->a - 1);
+  s->a [sizeof s->a - 1] = '\0';
+}
Richard Biener Aug. 29, 2018, 7:29 a.m. UTC | #13
On Wed, Aug 29, 2018 at 2:12 AM Martin Sebor <msebor@gmail.com> wrote:
>
> >> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> >> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> >> debug get_addr_base_and_unit_offset to understand what's going on.
> >> Essentially you're getting different results of
> >> get_addr_base_and_unit_offset in a case where they arguably should be
> >> the same.
> >
> > Probably get_attr_nonstring_decl has the same "mistake" and returns
> > the PARM_DECL instead of the SSA name pointer.  So we're comparing
> > apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
> >
> > Yeah:
> >
> > /* If EXPR refers to a character array or pointer declared attribute
> >    nonstring return a decl for that array or pointer and set *REF to
> >    the referenced enclosing object or pointer.  Otherwise returns
> >    null.  */
> >
> > tree
> > get_attr_nonstring_decl (tree expr, tree *ref)
> > {
> >   tree decl = expr;
> >   if (TREE_CODE (decl) == SSA_NAME)
> >     {
> >       gimple *def = SSA_NAME_DEF_STMT (decl);
> >
> >       if (is_gimple_assign (def))
> >         {
> >           tree_code code = gimple_assign_rhs_code (def);
> >           if (code == ADDR_EXPR
> >               || code == COMPONENT_REF
> >               || code == VAR_DECL)
> >             decl = gimple_assign_rhs1 (def);
> >         }
> >       else if (tree var = SSA_NAME_VAR (decl))
> >         decl = var;
> >     }
> >
> >   if (TREE_CODE (decl) == ADDR_EXPR)
> >     decl = TREE_OPERAND (decl, 0);
> >
> >   if (ref)
> >     *ref = decl;
> >
> > I see a lot of "magic" here again in the attempt to "propagate"
> > a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?

Well, the question is what "nonstring" is, semantically.  I read it
as sth like __restrinct - a pointer with "nonstring" attribute points
to a non-string.  So I suspect your function either computes
"may expr point to a nonstring" or "must expr point to a nonstring"
if it gets a pointer argument.  If it gets a (string) object it checks whether
that object is declared "nonstring" (thus, if you'd built a pointer to expr
whether that pointer _must_ point to a nonstring.  So I guess the first
one is "must".  Clearly looking at SSA_NAME_VAR isn't good here,
it would be semantically correct only for SSA_NAME_IS_DEFAULT_DEF
and SSA_NAME_VAR being a PARM_DECL.

I guess it would be nice to clearly separate the pointer vs. object case
by documentation in the function - all of the quoted parts above seem
to be for the address case so a gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))
inside the if (TREE_CODE (decl) == SSA_NAME) path should never trigger?

> > Note
> >
> > foo (char *p __attribute__(("nonstring")))
> > {
> >   p = "bar";
> >   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> > }
> >
> > is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.

I say it shouldn't because I assign "bar" to p and after that p isn't
the original parameter anymore?

> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
>
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>
> > I think in your code comparing bases you want to look at the _original_
> > argument to the string function rather than what get_attr_nonstring_decl
> > returned as ref.
>
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
>
> Attached is an updated revision with these changes.
>
> Martin
Martin Sebor Aug. 29, 2018, 3:43 p.m. UTC | #14
On 08/29/2018 01:29 AM, Richard Biener wrote:
> On Wed, Aug 29, 2018 at 2:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>> Essentially you're getting different results of
>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>> the same.
>>>
>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>> apples and oranges here.
>>
>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>> intentional but the function need not (perhaps should not)
>> also set *REF to it.
>>
>>>
>>> Yeah:
>>>
>>> /* If EXPR refers to a character array or pointer declared attribute
>>>    nonstring return a decl for that array or pointer and set *REF to
>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>    null.  */
>>>
>>> tree
>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>> {
>>>   tree decl = expr;
>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>     {
>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>
>>>       if (is_gimple_assign (def))
>>>         {
>>>           tree_code code = gimple_assign_rhs_code (def);
>>>           if (code == ADDR_EXPR
>>>               || code == COMPONENT_REF
>>>               || code == VAR_DECL)
>>>             decl = gimple_assign_rhs1 (def);
>>>         }
>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>         decl = var;
>>>     }
>>>
>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>     decl = TREE_OPERAND (decl, 0);
>>>
>>>   if (ref)
>>>     *ref = decl;
>>>
>>> I see a lot of "magic" here again in the attempt to "propagate"
>>> a nonstring attribute.
>>
>> That's the function's purpose: to look for the attribute.  Is
>> there a better way to do this?
>
> Well, the question is what "nonstring" is, semantically.  I read it
> as sth like __restrinct - a pointer with "nonstring" attribute points
> to a non-string.  So I suspect your function either computes
> "may expr point to a nonstring" or "must expr point to a nonstring"
> if it gets a pointer argument.  If it gets a (string) object it checks whether
> that object is declared "nonstring" (thus, if you'd built a pointer to expr
> whether that pointer _must_ point to a nonstring.  So I guess the first
> one is "must".  Clearly looking at SSA_NAME_VAR isn't good here,
> it would be semantically correct only for SSA_NAME_IS_DEFAULT_DEF
> and SSA_NAME_VAR being a PARM_DECL.
>
> I guess it would be nice to clearly separate the pointer vs. object case
> by documentation in the function - all of the quoted parts above seem
> to be for the address case so a gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))
> inside the if (TREE_CODE (decl) == SSA_NAME) path should never trigger?

Attribute nonstring on either an array or a pointer decl means
"it need not be a nul-terminated string."  I.e., it's just
a sequence of bytes.  If it happens to have a nul in it then it
is a string.   I don't think of the pointer case as different
from the array.

The get_attr_nonstring_decl() function isn't a predicate telling
us whether or not an expression refers to a string.  It returns
true if it refers to an object declared nonstring.  Whether what
the object contains/points to is in fact a string is determined
somewhere else.

>
>>> Note
>>>
>>> foo (char *p __attribute__(("nonstring")))
>>> {
>>>   p = "bar";
>>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>>> }
>>>
>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>
>> I don't know if you're saying that it should get a warning or
>> shouldn't.  Right now it doesn't because the strlen() call is
>> folded before we check for nonstring.
>
> I say it shouldn't because I assign "bar" to p and after that p isn't
> the original parameter anymore?

I agree with not warning here, but I don't think of p's nonstring
property as changing with an assignment.  It's still nonstring,
we just know that what it points to at the moment is a string.
If the code were instead:

   extern char a[];
   p = a;
   return strlen (a) + strlen (p);

a warning would be expected for strlen (p) because p is declared
to point to what need not be a string.  A warning would not be
expected for strlen (a) because it is not declared nonstring so
when we don't know, the assumption is that it is a string.

Does that make sense?

Martin

PS Since restrict is a property of a pointer and part of the type
system nonstring a property of what the pointer points and not
part of the type system to I don't think of them as similar.  In
my mind, nonstring is analogous to the notions of object constness
and volatility (but not the const and volatile qualifiers).  it's
okay to assign the address of a const object to a non-const pointer,
but it's an error to try to modify the object through the pointer.
(It would be nice to add a warning to detect these kinds of errors
as well.)

>
>> I could see an argument for diagnosing it but I suspect you
>> wouldn't like it because it would mean more warning from
>> the folder.  I could also see an argument against it because,
>> as you said, it's safe.
>>
>> If you take the assignment to p away then a warning is issued,
>> and that's because p is declared with attribute nonstring.
>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>
>>> I think in your code comparing bases you want to look at the _original_
>>> argument to the string function rather than what get_attr_nonstring_decl
>>> returned as ref.
>>
>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>> the patch.  I've also updated the comment above SSA_NAME_VAR
>> to clarify its purpose per Jeff's comments.
>>
>> Attached is an updated revision with these changes.
>>
>> Martin
Jeff Law Aug. 30, 2018, 12:27 a.m. UTC | #15
On 08/28/2018 06:12 PM, Martin Sebor wrote:
>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
> 
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
> 
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
> 
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
Well, there's a distinction between looking for the attribute (which
will be on the _DECL node) and determining if the current instance (an
SSA_NAME) has that attribute.

What I think Richard is implying is that it might be better to propagate
the state of the attribute to instances rather than going from an
SSA_NAME backwards through the use-def chains or SSA_NAME_VAR to get to
a potentially related _DECL node.

This could be built into the alias oracle, or via a propagation engine.
In either approach you should be able to cut down on false positives as
well as false negatives.

> 
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
> 
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
> 
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
Well, this is where propagating the bit would help.  The assignment p =
"bar" would clobber the nonstring property because we know "bar" is
properly terminated. Pointer arithmetic, casts and the like would
preserve the property and so on.

If it were done via the aliasing oracle, the instance of P in the strlen
call would be known to point to a proper string and thus the call is safe.

Hope this helps...


Jeff
Richard Biener Aug. 30, 2018, 8:47 a.m. UTC | #16
On Thu, Aug 30, 2018 at 2:27 AM Jeff Law <law@redhat.com> wrote:
>
> On 08/28/2018 06:12 PM, Martin Sebor wrote:
> >>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> >>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> >>> debug get_addr_base_and_unit_offset to understand what's going on.
> >>> Essentially you're getting different results of
> >>> get_addr_base_and_unit_offset in a case where they arguably should be
> >>> the same.
> >>
> >> Probably get_attr_nonstring_decl has the same "mistake" and returns
> >> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> >> apples and oranges here.
> >
> > Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> > intentional but the function need not (perhaps should not)
> > also set *REF to it.
> >
> >>
> >> Yeah:
> >>
> >> /* If EXPR refers to a character array or pointer declared attribute
> >>    nonstring return a decl for that array or pointer and set *REF to
> >>    the referenced enclosing object or pointer.  Otherwise returns
> >>    null.  */
> >>
> >> tree
> >> get_attr_nonstring_decl (tree expr, tree *ref)
> >> {
> >>   tree decl = expr;
> >>   if (TREE_CODE (decl) == SSA_NAME)
> >>     {
> >>       gimple *def = SSA_NAME_DEF_STMT (decl);
> >>
> >>       if (is_gimple_assign (def))
> >>         {
> >>           tree_code code = gimple_assign_rhs_code (def);
> >>           if (code == ADDR_EXPR
> >>               || code == COMPONENT_REF
> >>               || code == VAR_DECL)
> >>             decl = gimple_assign_rhs1 (def);
> >>         }
> >>       else if (tree var = SSA_NAME_VAR (decl))
> >>         decl = var;
> >>     }
> >>
> >>   if (TREE_CODE (decl) == ADDR_EXPR)
> >>     decl = TREE_OPERAND (decl, 0);
> >>
> >>   if (ref)
> >>     *ref = decl;
> >>
> >> I see a lot of "magic" here again in the attempt to "propagate"
> >> a nonstring attribute.
> >
> > That's the function's purpose: to look for the attribute.  Is
> > there a better way to do this?
> Well, there's a distinction between looking for the attribute (which
> will be on the _DECL node) and determining if the current instance (an
> SSA_NAME) has that attribute.
>
> What I think Richard is implying is that it might be better to propagate
> the state of the attribute to instances rather than going from an
> SSA_NAME backwards through the use-def chains or SSA_NAME_VAR to get to
> a potentially related _DECL node.
>
> This could be built into the alias oracle, or via a propagation engine.
> In either approach you should be able to cut down on false positives as
> well as false negatives.

It's more like the underlying decl of a SSA name doesn't guarantee you
the entity was originally related to that decl.

Maybe we're should be more strict here because we use the underlying
decl for debug info purposes.

Given there's really no semantic on the attribute but it just suppresses
warnings I'm OK with looking at the underlying decl.  Yes, propagating
would eventually improve things but it might be overkill at the same time
(just costing compile-time).

> >
> >> Note
> >>
> >> foo (char *p __attribute__(("nonstring")))
> >> {
> >>   p = "bar";
> >>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> >> }
> >>
> >> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
> >
> > I don't know if you're saying that it should get a warning or
> > shouldn't.  Right now it doesn't because the strlen() call is
> > folded before we check for nonstring.
> >
> > I could see an argument for diagnosing it but I suspect you
> > wouldn't like it because it would mean more warning from
> > the folder.  I could also see an argument against it because,
> > as you said, it's safe.
> Well, this is where propagating the bit would help.  The assignment p =
> "bar" would clobber the nonstring property because we know "bar" is
> properly terminated. Pointer arithmetic, casts and the like would
> preserve the property and so on.
>
> If it were done via the aliasing oracle, the instance of P in the strlen
> call would be known to point to a proper string and thus the call is safe.
>
> Hope this helps...

So to elaborate a bit here - to propagate these kind of attributes
in PTA analysis (for example) you'd need to introduce fake
pointed-to objects (just special ids like nonlocal), nonstring
and string and have "sources" of those generate constraints.
After propagation finished you could then see whether an
SSA name points to either string or nonstring exclusively or
to both and set a bit in the pointer-info according to that
result.

It comes at the cost of increasing points-to bitmaps and
more constraints during propagation.

If you can do with just knowing whether any nonstring source
can be possibly pointed-to the effect on code not using that
attribute would be none.  Just be aware that with points-to
analysis this stuff leaks quite a bit since it is conservative
propagation (may point to nonstring) - separately tracking
may point to string allows you to get an idea of
"must point to nonstring".  But that comes at a cost.

A "must point to" propagator would be useful thing to have
as well I guess.  That would fit in a value-numbering kind
of framework.

Richard.

>
>
> Jeff
Martin Sebor Sept. 12, 2018, 3:50 p.m. UTC | #17
PING: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

There have been follow up comments in this thread suggesting
alternate designs for the nonstr attribute but (AFAICT) no
objections to the bug fix.  I don't expect to have the time
to redesign and reimplement the attribute for GCC 9 in terms
of the alias oracle as was suggested but I would like to avoid
the warning in the report.

Is the final patch okay to commit?

Martin

On 08/28/2018 06:12 PM, Martin Sebor wrote:
>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
>
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
>
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
>
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>
>> I think in your code comparing bases you want to look at the _original_
>> argument to the string function rather than what get_attr_nonstring_decl
>> returned as ref.
>
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
>
> Attached is an updated revision with these changes.
>
> Martin
Jeff Law Sept. 18, 2018, 1:30 a.m. UTC | #18
On 8/28/18 6:12 PM, Martin Sebor wrote:
>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
> 
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
> 
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
> 
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
> 
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
> 
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
> 
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
> 
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
> 
>> I think in your code comparing bases you want to look at the _original_
>> argument to the string function rather than what get_attr_nonstring_decl
>> returned as ref.
> 
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
> 
> Attached is an updated revision with these changes.
> 
> Martin
> 
> gcc-87028.diff
> 
> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> gcc/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
> 	SSA_NAME_VAR.
> 	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
> 	when statement doesn't belong to a basic block.
> 	* tree.h (SSA_NAME_VAR): Update comment.
> 	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* c-c++-common/Wstringop-truncation.c: Remove xfails.
> 	* gcc.dg/Wstringop-truncation-5.c: New test.
> 

> Index: gcc/calls.c
> ===================================================================
> --- gcc/calls.c	(revision 263928)
> +++ gcc/calls.c	(working copy)
> @@ -1503,6 +1503,7 @@ tree
>  get_attr_nonstring_decl (tree expr, tree *ref)
>  {
>    tree decl = expr;
> +  tree var = NULL_TREE;
>    if (TREE_CODE (decl) == SSA_NAME)
>      {
>        gimple *def = SSA_NAME_DEF_STMT (decl);
> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>  	      || code == VAR_DECL)
>  	    decl = gimple_assign_rhs1 (def);
>  	}
> -      else if (tree var = SSA_NAME_VAR (decl))
> -	decl = var;
> +      else
> +	var = SSA_NAME_VAR (decl);
>      }
>  
>    if (TREE_CODE (decl) == ADDR_EXPR)
>      decl = TREE_OPERAND (decl, 0);
>  
> +  /* To simplify calling code, store the referenced DECL regardless of
> +     the attribute determined below, but avoid storing the SSA_NAME_VAR
> +     obtained above (it's not useful for dataflow purposes).  */
>    if (ref)
>      *ref = decl;
>  
> -  if (TREE_CODE (decl) == ARRAY_REF)
> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
> +     declared nonstring.  Otherwise drill down into the referenced
> +     DECL.  */
> +  if (var)
> +    decl = var;
> +  else if (TREE_CODE (decl) == ARRAY_REF)
>      decl = TREE_OPERAND (decl, 0);
>    else if (TREE_CODE (decl) == COMPONENT_REF)
>      decl = TREE_OPERAND (decl, 1);
The more I look at this the more I think what we really want to be doing
is real propagation of the property either via the alias oracle or a
propagation engine.   You can't even guarantee that if you've got an
SSA_NAME that the value it holds has any relation to its underlying
SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
from a some other SSA_NAME with a different underlying SSA_NAME_VAR.

I'm not going to insist on it, but I think if we find ourselves
extending this again in a way that is really working around lack of
propagation of the property then we should go back and fix the
propagation problem.



> Index: gcc/gimple-fold.c
> ===================================================================
> --- gcc/gimple-fold.c	(revision 263925)
> +++ gcc/gimple-fold.c	(working copy)
> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
>    if (tree_int_cst_lt (ssize, len))
>      return false;
>  
> +  /* Defer warning (and folding) until the next statement in the basic
> +     block is reachable.  */
> +  if (!gimple_bb (stmt))
> +    return false;
> +
>    /* Diagnose truncation that leaves the copy unterminated.  */
>    maybe_diag_stxncpy_trunc (*gsi, src, len);
I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
entirely clear to me if the subsequent comments about needing to fold
early where meant to raise issues with guarding earlier or not.

Jeff
Martin Sebor Sept. 21, 2018, 5:13 p.m. UTC | #19
On 09/17/2018 07:30 PM, Jeff Law wrote:
> On 8/28/18 6:12 PM, Martin Sebor wrote:
>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>> Essentially you're getting different results of
>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>> the same.
>>>
>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>> apples and oranges here.
>>
>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>> intentional but the function need not (perhaps should not)
>> also set *REF to it.
>>
>>>
>>> Yeah:
>>>
>>> /* If EXPR refers to a character array or pointer declared attribute
>>>    nonstring return a decl for that array or pointer and set *REF to
>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>    null.  */
>>>
>>> tree
>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>> {
>>>   tree decl = expr;
>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>     {
>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>
>>>       if (is_gimple_assign (def))
>>>         {
>>>           tree_code code = gimple_assign_rhs_code (def);
>>>           if (code == ADDR_EXPR
>>>               || code == COMPONENT_REF
>>>               || code == VAR_DECL)
>>>             decl = gimple_assign_rhs1 (def);
>>>         }
>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>         decl = var;
>>>     }
>>>
>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>     decl = TREE_OPERAND (decl, 0);
>>>
>>>   if (ref)
>>>     *ref = decl;
>>>
>>> I see a lot of "magic" here again in the attempt to "propagate"
>>> a nonstring attribute.
>>
>> That's the function's purpose: to look for the attribute.  Is
>> there a better way to do this?
>>
>>> Note
>>>
>>> foo (char *p __attribute__(("nonstring")))
>>> {
>>>   p = "bar";
>>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>>> }
>>>
>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>
>> I don't know if you're saying that it should get a warning or
>> shouldn't.  Right now it doesn't because the strlen() call is
>> folded before we check for nonstring.
>>
>> I could see an argument for diagnosing it but I suspect you
>> wouldn't like it because it would mean more warning from
>> the folder.  I could also see an argument against it because,
>> as you said, it's safe.
>>
>> If you take the assignment to p away then a warning is issued,
>> and that's because p is declared with attribute nonstring.
>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>
>>> I think in your code comparing bases you want to look at the _original_
>>> argument to the string function rather than what get_attr_nonstring_decl
>>> returned as ref.
>>
>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>> the patch.  I've also updated the comment above SSA_NAME_VAR
>> to clarify its purpose per Jeff's comments.
>>
>> Attached is an updated revision with these changes.
>>
>> Martin
>>
>> gcc-87028.diff
>>
>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/87028
>> 	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
>> 	SSA_NAME_VAR.
>> 	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>> 	when statement doesn't belong to a basic block.
>> 	* tree.h (SSA_NAME_VAR): Update comment.
>> 	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/87028
>> 	* c-c++-common/Wstringop-truncation.c: Remove xfails.
>> 	* gcc.dg/Wstringop-truncation-5.c: New test.
>>
>
>> Index: gcc/calls.c
>> ===================================================================
>> --- gcc/calls.c	(revision 263928)
>> +++ gcc/calls.c	(working copy)
>> @@ -1503,6 +1503,7 @@ tree
>>  get_attr_nonstring_decl (tree expr, tree *ref)
>>  {
>>    tree decl = expr;
>> +  tree var = NULL_TREE;
>>    if (TREE_CODE (decl) == SSA_NAME)
>>      {
>>        gimple *def = SSA_NAME_DEF_STMT (decl);
>> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>>  	      || code == VAR_DECL)
>>  	    decl = gimple_assign_rhs1 (def);
>>  	}
>> -      else if (tree var = SSA_NAME_VAR (decl))
>> -	decl = var;
>> +      else
>> +	var = SSA_NAME_VAR (decl);
>>      }
>>
>>    if (TREE_CODE (decl) == ADDR_EXPR)
>>      decl = TREE_OPERAND (decl, 0);
>>
>> +  /* To simplify calling code, store the referenced DECL regardless of
>> +     the attribute determined below, but avoid storing the SSA_NAME_VAR
>> +     obtained above (it's not useful for dataflow purposes).  */
>>    if (ref)
>>      *ref = decl;
>>
>> -  if (TREE_CODE (decl) == ARRAY_REF)
>> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
>> +     declared nonstring.  Otherwise drill down into the referenced
>> +     DECL.  */
>> +  if (var)
>> +    decl = var;
>> +  else if (TREE_CODE (decl) == ARRAY_REF)
>>      decl = TREE_OPERAND (decl, 0);
>>    else if (TREE_CODE (decl) == COMPONENT_REF)
>>      decl = TREE_OPERAND (decl, 1);
> The more I look at this the more I think what we really want to be doing
> is real propagation of the property either via the alias oracle or a
> propagation engine.   You can't even guarantee that if you've got an
> SSA_NAME that the value it holds has any relation to its underlying
> SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
> from a some other SSA_NAME with a different underlying SSA_NAME_VAR.
>
> I'm not going to insist on it, but I think if we find ourselves
> extending this again in a way that is really working around lack of
> propagation of the property then we should go back and fix the
> propagation problem.

We talked about improving this back in the GCC 8 cycle.  I've
been collecting input (and test cases) from Miguel Ojeda from
the adoption of the attribute in the Linux kernel.  There are
a number of issues I was hoping to get to in stage 1 but that
has been derailed by all the strlen back and forth.  I'm still
hoping to be able to fix some of the false positives here in
stage 3 but, IIUC the constraints, a redesign along the lines
you suggest would be considered overly intrusive.  (If not,
I'm willing to look into it.)

That said, I had the impression from Richard's comments that
implementing the propagation in points-to analysis would come
at a cost and have its own downsides:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01954.html

So I wasn't sure it was necessarily an endorsement of
the approach as the ideal solution or just a passing thought.

>> Index: gcc/gimple-fold.c
>> ===================================================================
>> --- gcc/gimple-fold.c	(revision 263925)
>> +++ gcc/gimple-fold.c	(working copy)
>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
>>    if (tree_int_cst_lt (ssize, len))
>>      return false;
>>
>> +  /* Defer warning (and folding) until the next statement in the basic
>> +     block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
>> +
>>    /* Diagnose truncation that leaves the copy unterminated.  */
>>    maybe_diag_stxncpy_trunc (*gsi, src, len);
> I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
> entirely clear to me if the subsequent comments about needing to fold
> early where meant to raise issues with guarding earlier or not.

I'm fine with moving it if that's preferable.

Moving the test to maybe_fold_stmt() would, IMO, be the right
change to make in general, at least for library built-ins.
I have been meaning to suggest it independently of this fix
but because of its pervasive impact I've been holding off,
expecting it to be controversial.  If there is consensus I'm
happy to make this change but I would prefer to do it separately
since it causes a number of regressions in tests that expect
built-ins to be folded very early on (i.e., look for evidence
of the folding in the output of -fdump-tree-gimple or
-fdump-tree-ccp1).  Some of the regression would go away if
maybe_fold_stmt() only avoided folding of library built-in
functions.  Resolving the others would require adjusting
the tests to either use optimization or look for the evidence
of folding in later passes than gimple or ccp1).  I think all
that is reasonable and won't impact the efficiency of
the emitted object code, but it's obviously a much bigger
change than a simple fix for a false positive warning.

If that sounds reasonable, is the patch acceptable as is?

The latest version is here:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

Martin
Martin Sebor Oct. 1, 2018, 9:24 p.m. UTC | #20
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

On 09/21/2018 11:13 AM, Martin Sebor wrote:
> On 09/17/2018 07:30 PM, Jeff Law wrote:
>> On 8/28/18 6:12 PM, Martin Sebor wrote:
>>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>>> Essentially you're getting different results of
>>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>>> the same.
>>>>
>>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>>> apples and oranges here.
>>>
>>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>>> intentional but the function need not (perhaps should not)
>>> also set *REF to it.
>>>
>>>>
>>>> Yeah:
>>>>
>>>> /* If EXPR refers to a character array or pointer declared attribute
>>>>    nonstring return a decl for that array or pointer and set *REF to
>>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>>    null.  */
>>>>
>>>> tree
>>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>>> {
>>>>   tree decl = expr;
>>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>>     {
>>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>>
>>>>       if (is_gimple_assign (def))
>>>>         {
>>>>           tree_code code = gimple_assign_rhs_code (def);
>>>>           if (code == ADDR_EXPR
>>>>               || code == COMPONENT_REF
>>>>               || code == VAR_DECL)
>>>>             decl = gimple_assign_rhs1 (def);
>>>>         }
>>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>>         decl = var;
>>>>     }
>>>>
>>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>>     decl = TREE_OPERAND (decl, 0);
>>>>
>>>>   if (ref)
>>>>     *ref = decl;
>>>>
>>>> I see a lot of "magic" here again in the attempt to "propagate"
>>>> a nonstring attribute.
>>>
>>> That's the function's purpose: to look for the attribute.  Is
>>> there a better way to do this?
>>>
>>>> Note
>>>>
>>>> foo (char *p __attribute__(("nonstring")))
>>>> {
>>>>   p = "bar";
>>>>   strlen (p); // or whatever is necessary to call
>>>> get_attr_nonstring_decl
>>>> }
>>>>
>>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>>
>>> I don't know if you're saying that it should get a warning or
>>> shouldn't.  Right now it doesn't because the strlen() call is
>>> folded before we check for nonstring.
>>>
>>> I could see an argument for diagnosing it but I suspect you
>>> wouldn't like it because it would mean more warning from
>>> the folder.  I could also see an argument against it because,
>>> as you said, it's safe.
>>>
>>> If you take the assignment to p away then a warning is issued,
>>> and that's because p is declared with attribute nonstring.
>>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>>
>>>> I think in your code comparing bases you want to look at the _original_
>>>> argument to the string function rather than what
>>>> get_attr_nonstring_decl
>>>> returned as ref.
>>>
>>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>>> the patch.  I've also updated the comment above SSA_NAME_VAR
>>> to clarify its purpose per Jeff's comments.
>>>
>>> Attached is an updated revision with these changes.
>>>
>>> Martin
>>>
>>> gcc-87028.diff
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>> strncpy with global variable source string
>>> gcc/ChangeLog:
>>>
>>>     PR tree-optimization/87028
>>>     * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
>>>     SSA_NAME_VAR.
>>>     * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>     when statement doesn't belong to a basic block.
>>>     * tree.h (SSA_NAME_VAR): Update comment.
>>>     * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     PR tree-optimization/87028
>>>     * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>     * gcc.dg/Wstringop-truncation-5.c: New test.
>>>
>>
>>> Index: gcc/calls.c
>>> ===================================================================
>>> --- gcc/calls.c    (revision 263928)
>>> +++ gcc/calls.c    (working copy)
>>> @@ -1503,6 +1503,7 @@ tree
>>>  get_attr_nonstring_decl (tree expr, tree *ref)
>>>  {
>>>    tree decl = expr;
>>> +  tree var = NULL_TREE;
>>>    if (TREE_CODE (decl) == SSA_NAME)
>>>      {
>>>        gimple *def = SSA_NAME_DEF_STMT (decl);
>>> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>>>            || code == VAR_DECL)
>>>          decl = gimple_assign_rhs1 (def);
>>>      }
>>> -      else if (tree var = SSA_NAME_VAR (decl))
>>> -    decl = var;
>>> +      else
>>> +    var = SSA_NAME_VAR (decl);
>>>      }
>>>
>>>    if (TREE_CODE (decl) == ADDR_EXPR)
>>>      decl = TREE_OPERAND (decl, 0);
>>>
>>> +  /* To simplify calling code, store the referenced DECL regardless of
>>> +     the attribute determined below, but avoid storing the SSA_NAME_VAR
>>> +     obtained above (it's not useful for dataflow purposes).  */
>>>    if (ref)
>>>      *ref = decl;
>>>
>>> -  if (TREE_CODE (decl) == ARRAY_REF)
>>> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
>>> +     declared nonstring.  Otherwise drill down into the referenced
>>> +     DECL.  */
>>> +  if (var)
>>> +    decl = var;
>>> +  else if (TREE_CODE (decl) == ARRAY_REF)
>>>      decl = TREE_OPERAND (decl, 0);
>>>    else if (TREE_CODE (decl) == COMPONENT_REF)
>>>      decl = TREE_OPERAND (decl, 1);
>> The more I look at this the more I think what we really want to be doing
>> is real propagation of the property either via the alias oracle or a
>> propagation engine.   You can't even guarantee that if you've got an
>> SSA_NAME that the value it holds has any relation to its underlying
>> SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
>> from a some other SSA_NAME with a different underlying SSA_NAME_VAR.
>>
>> I'm not going to insist on it, but I think if we find ourselves
>> extending this again in a way that is really working around lack of
>> propagation of the property then we should go back and fix the
>> propagation problem.
>
> We talked about improving this back in the GCC 8 cycle.  I've
> been collecting input (and test cases) from Miguel Ojeda from
> the adoption of the attribute in the Linux kernel.  There are
> a number of issues I was hoping to get to in stage 1 but that
> has been derailed by all the strlen back and forth.  I'm still
> hoping to be able to fix some of the false positives here in
> stage 3 but, IIUC the constraints, a redesign along the lines
> you suggest would be considered overly intrusive.  (If not,
> I'm willing to look into it.)
>
> That said, I had the impression from Richard's comments that
> implementing the propagation in points-to analysis would come
> at a cost and have its own downsides:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01954.html
>
> So I wasn't sure it was necessarily an endorsement of
> the approach as the ideal solution or just a passing thought.
>
>>> Index: gcc/gimple-fold.c
>>> ===================================================================
>>> --- gcc/gimple-fold.c    (revision 263925)
>>> +++ gcc/gimple-fold.c    (working copy)
>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
>>>    if (tree_int_cst_lt (ssize, len))
>>>      return false;
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>>> +
>>>    /* Diagnose truncation that leaves the copy unterminated.  */
>>>    maybe_diag_stxncpy_trunc (*gsi, src, len);
>> I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
>> entirely clear to me if the subsequent comments about needing to fold
>> early where meant to raise issues with guarding earlier or not.
>
> I'm fine with moving it if that's preferable.
>
> Moving the test to maybe_fold_stmt() would, IMO, be the right
> change to make in general, at least for library built-ins.
> I have been meaning to suggest it independently of this fix
> but because of its pervasive impact I've been holding off,
> expecting it to be controversial.  If there is consensus I'm
> happy to make this change but I would prefer to do it separately
> since it causes a number of regressions in tests that expect
> built-ins to be folded very early on (i.e., look for evidence
> of the folding in the output of -fdump-tree-gimple or
> -fdump-tree-ccp1).  Some of the regression would go away if
> maybe_fold_stmt() only avoided folding of library built-in
> functions.  Resolving the others would require adjusting
> the tests to either use optimization or look for the evidence
> of folding in later passes than gimple or ccp1).  I think all
> that is reasonable and won't impact the efficiency of
> the emitted object code, but it's obviously a much bigger
> change than a simple fix for a false positive warning.
>
> If that sounds reasonable, is the patch acceptable as is?
>
> The latest version is here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>
> Martin
Jeff Law Oct. 4, 2018, 2:58 p.m. UTC | #21
On 8/27/18 9:42 AM, Richard Biener wrote:
> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>> the next statement after a truncating strncpy to see if it
>>>>> adds a terminating nul.  This only works when the next
>>>>> statement can be reached using the Gimple statement iterator
>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>> calls that truncate their constant argument that are being
>>>>> folded to memcpy this early get diagnosed even if they are
>>>>> followed by the nul assignment:
>>>>>
>>>>>   const char s[] = "12345";
>>>>>   char d[3];
>>>>>
>>>>>   void f (void)
>>>>>   {
>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>     d[sizeof d - 1] = 0;
>>>>>   }
>>>>>
>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>> is in can be used to try to reach the next statement (this
>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>> fold things early but in the case of strncpy (a relatively
>>>>> rarely used function that is often misused), getting
>>>>> the warning right while folding a bit later but still fairly
>>>>> early on seems like a reasonable compromise.  I fear that
>>>>> otherwise, the false positives will drive users to adopt
>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>> bugs cannot be as readily detected.
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> PS There still are outstanding cases where the warning can
>>>>> be avoided.  I xfailed them in the test for now but will
>>>>> still try to get them to work for GCC 9.
>>>>>
>>>>> gcc-87028.diff
>>>>>
>>>>>
>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>>> gcc/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>       statement doesn't belong to a basic block.
>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>       the left hand side of assignment.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>
>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>> index 07341eb..284c2fb 100644
>>>>> --- a/gcc/gimple-fold.c
>>>>> +++ b/gcc/gimple-fold.c
>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>      return false;
>>>>>
>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>> +     block is reachable.  */
>>>>> +  if (!gimple_bb (stmt))
>>>>> +    return false;
>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>> in practice.
>>>
>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
>>> when there is no CFG.  I guess the issue is that we fold this during
>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
>> That was my assumption.  I almost suggested peeking at gsi_next and
>> avoiding in that case.
> 
> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
So I think the concern with adding the guards to maybe_fold_stmt is the
possibility of further fallout.

I guess they could be written to target this case specifically to
minimize fallout, but that feels like we're doing the same thing
(band-aid) just in a different place.



> 
>>>
>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
>>> which is why we fold most stmts during gimplification.  We also do that because
>>> we now do less folding on GENERIC.
>> But an unfolded call in the IL should always be safe and we've got
>> plenty of opportunities to fold it later.
> 
> Well - we do.  The very first one is forwprop though which means we'll miss to
> re-write some memcpy parts into SSA:
> 
>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>           /* After CCP we rewrite no longer addressed locals into SSA
>              form if possible.  */
>           NEXT_PASS (pass_forwprop);
> 
> likewise early object-size will be confused by memcpy calls that just exist
> to avoid TBAA issues (another of our recommendations besides using unions).
> 
> We do fold mem* early for a reason ;)
> 
> "We can always do warnings earlier" would be a similar true sentence.
I'm not disagreeing at all.  There's a natural tension between the
benefits of folding early to enable more optimizations downstream and
leaving the IL in a state where we can give actionable warnings.

Similarly there's a natural tension between warning early vs warning
late.  Code that triggers the warning may ultimately be proved
unreachable, or we may discover simplifications that either suppress or
expose a warning.

There is no easy answer here.  But I think we can legitimately ask
questions.  ie, does folding strnlen here really improve things
downstream in ways that are measurable?  Does the false positive really
impact the utility of the warning?  etc.

I'd hazard a guess that Martin is particularly sensitive to false
positives based on feedback he's received from our developer community
as well as downstream consumers of his work.

> 
> Both come at a cost.  You know I'm usually declaring GCC to be an
> optimizing compiler
> and not a static analysis engine ;)  So I'm not too much convinced when seeing
> disabling/delaying folding here and there to catch some false
> negatives for -Wxyz.
> 
> We need to work out a plan rather than throwing sticks here and there.
I still lean towards the optimization side in general, but I'm also a
believer that the right place to issue warnings is in the tool that gets
used every day.  Deferring it to some other tool that runs at a later
time which all developers may not have access to and not all projects
use means the static analysis side is orders of magnitude less useful
than it should be.


I think the long term plan in my mind would be to not fold during
gimplification.  Immediately after gimplification we'd do warning
analysis, then we'd commence with folding everything before starting the
gimple optimization pipeline.

With the natural tensions around warning early vs warning late the
warning analysis phase may choose to mark statements in various ways
rather than issuing the warning at that time.  ie, it might choose to
mark the statement with the set of potential issues as well as marking
those which were proven safe.  A later pass could then refine things and
issue the actual warning.  Or something like that.

You'll probably note this mirrors the design I wanted to do for
Wuninitialized to improve its stability while at the same time allowing
us the option of trying to minimize false positives.

But all that seems a bit pie in the sky right now.  I think the question
we should answer is do we tackle 87028 now or defer it to a later date
when we've fleshed this out further?

Jeff
Jeff Law Oct. 4, 2018, 3:24 p.m. UTC | #22
On 8/28/18 6:12 PM, Martin Sebor wrote:
>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
> 
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
Yea.  That does seem wrong.  I wonder if we should immediately carve out
a patch to fix that independent of everything else and see if there's
any undesirable fallout.


>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
> 
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
Well, I think the core issue here is what does the attribute mean and
that's brought up elsewhere in the discussion.   We've got the attribute
attached to the DECL node, but then we want to query an SSA_NAME.  But
the value in an SSA_NAME may have no real relationship with the DECL.

> 
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
This is a good example of the SSA_NAME vs DECL question above (there's
others of course).  The value held by the SSA_NAME that gets passed to
strlen here has nothing to do with the underlying DECL for p.

As long as we're using the attribute on the DECL rather than tracking
when it applies to the SSA_NAME, then we have problems like above.


> 
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
Put that aside for now.  It's an implementation detail.

In an ideal world, should we warn for the above code?  I'd lean towards
warning on the assignment rather than the strlen call.  The assignment
effectively drops the nonstring attribute.  It feels a lot like the case
where an assignment drops a const qualifier.

Jeff
Martin Sebor Oct. 4, 2018, 3:51 p.m. UTC | #23
On 10/04/2018 08:58 AM, Jeff Law wrote:
> On 8/27/18 9:42 AM, Richard Biener wrote:
>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>> the next statement after a truncating strncpy to see if it
>>>>>> adds a terminating nul.  This only works when the next
>>>>>> statement can be reached using the Gimple statement iterator
>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>> calls that truncate their constant argument that are being
>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>> followed by the nul assignment:
>>>>>>
>>>>>>   const char s[] = "12345";
>>>>>>   char d[3];
>>>>>>
>>>>>>   void f (void)
>>>>>>   {
>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>     d[sizeof d - 1] = 0;
>>>>>>   }
>>>>>>
>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>> is in can be used to try to reach the next statement (this
>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>> rarely used function that is often misused), getting
>>>>>> the warning right while folding a bit later but still fairly
>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>> otherwise, the false positives will drive users to adopt
>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>> bugs cannot be as readily detected.
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS There still are outstanding cases where the warning can
>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>> still try to get them to work for GCC 9.
>>>>>>
>>>>>> gcc-87028.diff
>>>>>>
>>>>>>
>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>>       statement doesn't belong to a basic block.
>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>>       the left hand side of assignment.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>
>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>> index 07341eb..284c2fb 100644
>>>>>> --- a/gcc/gimple-fold.c
>>>>>> +++ b/gcc/gimple-fold.c
>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>      return false;
>>>>>>
>>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>>> +     block is reachable.  */
>>>>>> +  if (!gimple_bb (stmt))
>>>>>> +    return false;
>>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>>> in practice.
>>>>
>>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>> avoiding in that case.
>>
>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> So I think the concern with adding the guards to maybe_fold_stmt is the
> possibility of further fallout.
>
> I guess they could be written to target this case specifically to
> minimize fallout, but that feels like we're doing the same thing
> (band-aid) just in a different place.
>
>
>
>>
>>>>
>>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
>>>> which is why we fold most stmts during gimplification.  We also do that because
>>>> we now do less folding on GENERIC.
>>> But an unfolded call in the IL should always be safe and we've got
>>> plenty of opportunities to fold it later.
>>
>> Well - we do.  The very first one is forwprop though which means we'll miss to
>> re-write some memcpy parts into SSA:
>>
>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>           /* After CCP we rewrite no longer addressed locals into SSA
>>              form if possible.  */
>>           NEXT_PASS (pass_forwprop);
>>
>> likewise early object-size will be confused by memcpy calls that just exist
>> to avoid TBAA issues (another of our recommendations besides using unions).
>>
>> We do fold mem* early for a reason ;)
>>
>> "We can always do warnings earlier" would be a similar true sentence.
> I'm not disagreeing at all.  There's a natural tension between the
> benefits of folding early to enable more optimizations downstream and
> leaving the IL in a state where we can give actionable warnings.

Similar trade-offs between folding early and losing information
as a result also impact high-level optimizations.

For instance, folding the strlen argument below

   void f3 (struct A* p)
   {
     __builtin_strcpy (p->a, "123");

     if (__builtin_strlen (p->a + 1) != 2)   // not folded
       __builtin_abort ();
   }

into

   _2 = &MEM[(void *)p_4(D) + 2B];

early on defeats the strlen optimization because there is no
mechanism to determine what member (void *)p_4(D) + 2B refers
to (this is bug 86955).

Another example is folding of strlen calls with no-nconstant
offsets into constant strings like here:

   const char a[] = "123";

   void f (int i)
   {
     if (__builtin_strlen (&a[i]) > 3)
       __builtin_abort ();
   }

into sizeof a - 1 - i, which then prevents the result from
being folded to false  (bug 86434), not to mention the code
it emits for out-of-bounds indices.

There are a number of other similar examples in Bugzilla
that I've filed as I discovered then during testing my
warnings (e.g., 86572).

In my mind, transforming library calls into "lossy" low-level
primitives like MEM_REF would be better done only after higher
level optimizations have had a chance to analyze them.  Ditto
for other similar transformations (like to other library calls).
Having more accurate information helps both optimization and
warnings.  It also makes the warnings more meaningful.
Printing "memcpy overflows a buffer" when the source code
has a call to strncpy is less than ideal.

> Similarly there's a natural tension between warning early vs warning
> late.  Code that triggers the warning may ultimately be proved
> unreachable, or we may discover simplifications that either suppress or
> expose a warning.
>
> There is no easy answer here.  But I think we can legitimately ask
> questions.  ie, does folding strnlen here really improve things
> downstream in ways that are measurable?  Does the false positive really
> impact the utility of the warning?  etc.
>
> I'd hazard a guess that Martin is particularly sensitive to false
> positives based on feedback he's received from our developer community
> as well as downstream consumers of his work.

Yes.  The kernel folks in particular have done a lot of work
cleaning up their code in an effort to adopt the warning and
attribute nonstring.  They have been keeping me in the loop
on their progress (and feeding me back test cases with false
positives and negatives they run into).

Martin

>
>>
>> Both come at a cost.  You know I'm usually declaring GCC to be an
>> optimizing compiler
>> and not a static analysis engine ;)  So I'm not too much convinced when seeing
>> disabling/delaying folding here and there to catch some false
>> negatives for -Wxyz.
>>
>> We need to work out a plan rather than throwing sticks here and there.
> I still lean towards the optimization side in general, but I'm also a
> believer that the right place to issue warnings is in the tool that gets
> used every day.  Deferring it to some other tool that runs at a later
> time which all developers may not have access to and not all projects
> use means the static analysis side is orders of magnitude less useful
> than it should be.
>
>
> I think the long term plan in my mind would be to not fold during
> gimplification.  Immediately after gimplification we'd do warning
> analysis, then we'd commence with folding everything before starting the
> gimple optimization pipeline.
>
> With the natural tensions around warning early vs warning late the
> warning analysis phase may choose to mark statements in various ways
> rather than issuing the warning at that time.  ie, it might choose to
> mark the statement with the set of potential issues as well as marking
> those which were proven safe.  A later pass could then refine things and
> issue the actual warning.  Or something like that.
>
> You'll probably note this mirrors the design I wanted to do for
> Wuninitialized to improve its stability while at the same time allowing
> us the option of trying to minimize false positives.
>
> But all that seems a bit pie in the sky right now.  I think the question
> we should answer is do we tackle 87028 now or defer it to a later date
> when we've fleshed this out further?
>
> Jeff
>
Joseph Myers Oct. 4, 2018, 7:40 p.m. UTC | #24
On Thu, 4 Oct 2018, Jeff Law wrote:

> With the natural tensions around warning early vs warning late the
> warning analysis phase may choose to mark statements in various ways
> rather than issuing the warning at that time.  ie, it might choose to
> mark the statement with the set of potential issues as well as marking
> those which were proven safe.  A later pass could then refine things and
> issue the actual warning.  Or something like that.

Note that could include front ends marking expressions for warning issues 
as well.  (E.g. the signed/unsigned warnings, which fold and do some crude 
checks for cases where they can prove the signed value is never negative.)  
To avoid that front-end folding you want later code outside the front end 
to have the information that there was a (comparison / implicit conversion 
from signed to unsigned) that should be warned about if the later code 
can't prove the conversion was value-preserving.
Richard Biener Oct. 8, 2018, 10:05 a.m. UTC | #25
On Thu, Oct 4, 2018 at 5:51 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 10/04/2018 08:58 AM, Jeff Law wrote:
> > On 8/27/18 9:42 AM, Richard Biener wrote:
> >> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
> >>>
> >>> On 08/27/2018 02:29 AM, Richard Biener wrote:
> >>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>>>
> >>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>>>> The warning suppression for -Wstringop-truncation looks for
> >>>>>> the next statement after a truncating strncpy to see if it
> >>>>>> adds a terminating nul.  This only works when the next
> >>>>>> statement can be reached using the Gimple statement iterator
> >>>>>> which isn't until after gimplification.  As a result, strncpy
> >>>>>> calls that truncate their constant argument that are being
> >>>>>> folded to memcpy this early get diagnosed even if they are
> >>>>>> followed by the nul assignment:
> >>>>>>
> >>>>>>   const char s[] = "12345";
> >>>>>>   char d[3];
> >>>>>>
> >>>>>>   void f (void)
> >>>>>>   {
> >>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>>>     d[sizeof d - 1] = 0;
> >>>>>>   }
> >>>>>>
> >>>>>> To avoid the warning I propose to defer folding strncpy to
> >>>>>> memcpy until the pointer to the basic block the strnpy call
> >>>>>> is in can be used to try to reach the next statement (this
> >>>>>> happens as early as ccp1).  I'm aware of the preference to
> >>>>>> fold things early but in the case of strncpy (a relatively
> >>>>>> rarely used function that is often misused), getting
> >>>>>> the warning right while folding a bit later but still fairly
> >>>>>> early on seems like a reasonable compromise.  I fear that
> >>>>>> otherwise, the false positives will drive users to adopt
> >>>>>> other unsafe solutions (like memcpy) where these kinds of
> >>>>>> bugs cannot be as readily detected.
> >>>>>>
> >>>>>> Tested on x86_64-linux.
> >>>>>>
> >>>>>> Martin
> >>>>>>
> >>>>>> PS There still are outstanding cases where the warning can
> >>>>>> be avoided.  I xfailed them in the test for now but will
> >>>>>> still try to get them to work for GCC 9.
> >>>>>>
> >>>>>> gcc-87028.diff
> >>>>>>
> >>>>>>
> >>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>       PR tree-optimization/87028
> >>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>>>>       statement doesn't belong to a basic block.
> >>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>>>>       the left hand side of assignment.
> >>>>>>
> >>>>>> gcc/testsuite/ChangeLog:
> >>>>>>
> >>>>>>       PR tree-optimization/87028
> >>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>>>
> >>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>>>> index 07341eb..284c2fb 100644
> >>>>>> --- a/gcc/gimple-fold.c
> >>>>>> +++ b/gcc/gimple-fold.c
> >>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
> >>>>>>    if (tree_int_cst_lt (ssize, len))
> >>>>>>      return false;
> >>>>>>
> >>>>>> +  /* Defer warning (and folding) until the next statement in the basic
> >>>>>> +     block is reachable.  */
> >>>>>> +  if (!gimple_bb (stmt))
> >>>>>> +    return false;
> >>>>> I think you want cfun->cfg as the test here.  They should be equivalent
> >>>>> in practice.
> >>>>
> >>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
> >>>> when there is no CFG.  I guess the issue is that we fold this during
> >>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
> >>> That was my assumption.  I almost suggested peeking at gsi_next and
> >>> avoiding in that case.
> >>
> >> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> > So I think the concern with adding the guards to maybe_fold_stmt is the
> > possibility of further fallout.
> >
> > I guess they could be written to target this case specifically to
> > minimize fallout, but that feels like we're doing the same thing
> > (band-aid) just in a different place.
> >
> >
> >
> >>
> >>>>
> >>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
> >>>> which is why we fold most stmts during gimplification.  We also do that because
> >>>> we now do less folding on GENERIC.
> >>> But an unfolded call in the IL should always be safe and we've got
> >>> plenty of opportunities to fold it later.
> >>
> >> Well - we do.  The very first one is forwprop though which means we'll miss to
> >> re-write some memcpy parts into SSA:
> >>
> >>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
> >>           /* After CCP we rewrite no longer addressed locals into SSA
> >>              form if possible.  */
> >>           NEXT_PASS (pass_forwprop);
> >>
> >> likewise early object-size will be confused by memcpy calls that just exist
> >> to avoid TBAA issues (another of our recommendations besides using unions).
> >>
> >> We do fold mem* early for a reason ;)
> >>
> >> "We can always do warnings earlier" would be a similar true sentence.
> > I'm not disagreeing at all.  There's a natural tension between the
> > benefits of folding early to enable more optimizations downstream and
> > leaving the IL in a state where we can give actionable warnings.
>
> Similar trade-offs between folding early and losing information
> as a result also impact high-level optimizations.
>
> For instance, folding the strlen argument below
>
>    void f3 (struct A* p)
>    {
>      __builtin_strcpy (p->a, "123");
>
>      if (__builtin_strlen (p->a + 1) != 2)   // not folded
>        __builtin_abort ();
>    }
>
> into
>
>    _2 = &MEM[(void *)p_4(D) + 2B];
>
> early on defeats the strlen optimization because there is no
> mechanism to determine what member (void *)p_4(D) + 2B refers
> to (this is bug 86955).
>
> Another example is folding of strlen calls with no-nconstant
> offsets into constant strings like here:
>
>    const char a[] = "123";
>
>    void f (int i)
>    {
>      if (__builtin_strlen (&a[i]) > 3)
>        __builtin_abort ();
>    }
>
> into sizeof a - 1 - i, which then prevents the result from
> being folded to false  (bug 86434), not to mention the code
> it emits for out-of-bounds indices.
>
> There are a number of other similar examples in Bugzilla
> that I've filed as I discovered then during testing my
> warnings (e.g., 86572).
>
> In my mind, transforming library calls into "lossy" low-level
> primitives like MEM_REF would be better done only after higher
> level optimizations have had a chance to analyze them.

The issue is mostly inlining heuristics.  Not doing the transformation
might end up not inlining the function which in turn might defeat
having more context for your warning analysis...

So it's a chicken-and-egg issue for diagnostics (you run them
later because you do want inlining and optimization).

And it's an important missed optimization for removing abstraction.
IPA inlining runs very early.

So IMHO the only sensible option is to do your warning analysis
in an early IPA phase where you can also freely clone contexts
(do "virtual" inlining) based on heuristics driven by diagnostic
needs rather than relying on optimization heuristics to match
yours.  The IPA phase would necessarily be the
"all_small_ipa_passes" one, and placement needs to be before
pass_local_optimization_passes.

Richard.

>  Ditto
> for other similar transformations (like to other library calls).
> Having more accurate information helps both optimization and
> warnings.  It also makes the warnings more meaningful.
> Printing "memcpy overflows a buffer" when the source code
> has a call to strncpy is less than ideal.
>
> > Similarly there's a natural tension between warning early vs warning
> > late.  Code that triggers the warning may ultimately be proved
> > unreachable, or we may discover simplifications that either suppress or
> > expose a warning.
> >
> > There is no easy answer here.  But I think we can legitimately ask
> > questions.  ie, does folding strnlen here really improve things
> > downstream in ways that are measurable?  Does the false positive really
> > impact the utility of the warning?  etc.
> >
> > I'd hazard a guess that Martin is particularly sensitive to false
> > positives based on feedback he's received from our developer community
> > as well as downstream consumers of his work.
>
> Yes.  The kernel folks in particular have done a lot of work
> cleaning up their code in an effort to adopt the warning and
> attribute nonstring.  They have been keeping me in the loop
> on their progress (and feeding me back test cases with false
> positives and negatives they run into).
>
> Martin
>
> >
> >>
> >> Both come at a cost.  You know I'm usually declaring GCC to be an
> >> optimizing compiler
> >> and not a static analysis engine ;)  So I'm not too much convinced when seeing
> >> disabling/delaying folding here and there to catch some false
> >> negatives for -Wxyz.
> >>
> >> We need to work out a plan rather than throwing sticks here and there.
> > I still lean towards the optimization side in general, but I'm also a
> > believer that the right place to issue warnings is in the tool that gets
> > used every day.  Deferring it to some other tool that runs at a later
> > time which all developers may not have access to and not all projects
> > use means the static analysis side is orders of magnitude less useful
> > than it should be.
> >
> >
> > I think the long term plan in my mind would be to not fold during
> > gimplification.  Immediately after gimplification we'd do warning
> > analysis, then we'd commence with folding everything before starting the
> > gimple optimization pipeline.
> >
> > With the natural tensions around warning early vs warning late the
> > warning analysis phase may choose to mark statements in various ways
> > rather than issuing the warning at that time.  ie, it might choose to
> > mark the statement with the set of potential issues as well as marking
> > those which were proven safe.  A later pass could then refine things and
> > issue the actual warning.  Or something like that.
> >
> > You'll probably note this mirrors the design I wanted to do for
> > Wuninitialized to improve its stability while at the same time allowing
> > us the option of trying to minimize false positives.
> >
> > But all that seems a bit pie in the sky right now.  I think the question
> > we should answer is do we tackle 87028 now or defer it to a later date
> > when we've fleshed this out further?
> >
> > Jeff
> >
>
Martin Sebor Oct. 8, 2018, 9:30 p.m. UTC | #26
On 10/08/2018 04:05 AM, Richard Biener wrote:
> On Thu, Oct 4, 2018 at 5:51 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>
>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>> followed by the nul assignment:
>>>>>>>>
>>>>>>>>   const char s[] = "12345";
>>>>>>>>   char d[3];
>>>>>>>>
>>>>>>>>   void f (void)
>>>>>>>>   {
>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>> rarely used function that is often misused), getting
>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>
>>>>>>>> Tested on x86_64-linux.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>
>>>>>>>> gcc-87028.diff
>>>>>>>>
>>>>>>>>
>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>>>>       the left hand side of assignment.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>
>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>      return false;
>>>>>>>>
>>>>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>>>>> +     block is reachable.  */
>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>> +    return false;
>>>>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>>>>> in practice.
>>>>>>
>>>>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
>>>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>> avoiding in that case.
>>>>
>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>> So I think the concern with adding the guards to maybe_fold_stmt is the
>>> possibility of further fallout.
>>>
>>> I guess they could be written to target this case specifically to
>>> minimize fallout, but that feels like we're doing the same thing
>>> (band-aid) just in a different place.
>>>
>>>
>>>
>>>>
>>>>>>
>>>>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
>>>>>> which is why we fold most stmts during gimplification.  We also do that because
>>>>>> we now do less folding on GENERIC.
>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>> plenty of opportunities to fold it later.
>>>>
>>>> Well - we do.  The very first one is forwprop though which means we'll miss to
>>>> re-write some memcpy parts into SSA:
>>>>
>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>              form if possible.  */
>>>>           NEXT_PASS (pass_forwprop);
>>>>
>>>> likewise early object-size will be confused by memcpy calls that just exist
>>>> to avoid TBAA issues (another of our recommendations besides using unions).
>>>>
>>>> We do fold mem* early for a reason ;)
>>>>
>>>> "We can always do warnings earlier" would be a similar true sentence.
>>> I'm not disagreeing at all.  There's a natural tension between the
>>> benefits of folding early to enable more optimizations downstream and
>>> leaving the IL in a state where we can give actionable warnings.
>>
>> Similar trade-offs between folding early and losing information
>> as a result also impact high-level optimizations.
>>
>> For instance, folding the strlen argument below
>>
>>    void f3 (struct A* p)
>>    {
>>      __builtin_strcpy (p->a, "123");
>>
>>      if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>        __builtin_abort ();
>>    }
>>
>> into
>>
>>    _2 = &MEM[(void *)p_4(D) + 2B];
>>
>> early on defeats the strlen optimization because there is no
>> mechanism to determine what member (void *)p_4(D) + 2B refers
>> to (this is bug 86955).
>>
>> Another example is folding of strlen calls with no-nconstant
>> offsets into constant strings like here:
>>
>>    const char a[] = "123";
>>
>>    void f (int i)
>>    {
>>      if (__builtin_strlen (&a[i]) > 3)
>>        __builtin_abort ();
>>    }
>>
>> into sizeof a - 1 - i, which then prevents the result from
>> being folded to false  (bug 86434), not to mention the code
>> it emits for out-of-bounds indices.
>>
>> There are a number of other similar examples in Bugzilla
>> that I've filed as I discovered then during testing my
>> warnings (e.g., 86572).
>>
>> In my mind, transforming library calls into "lossy" low-level
>> primitives like MEM_REF would be better done only after higher
>> level optimizations have had a chance to analyze them.
>
> The issue is mostly inlining heuristics.  Not doing the transformation
> might end up not inlining the function which in turn might defeat
> having more context for your warning analysis...
>
> So it's a chicken-and-egg issue for diagnostics (you run them
> later because you do want inlining and optimization).
>
> And it's an important missed optimization for removing abstraction.
> IPA inlining runs very early.
>
> So IMHO the only sensible option is to do your warning analysis
> in an early IPA phase where you can also freely clone contexts
> (do "virtual" inlining) based on heuristics driven by diagnostic
> needs rather than relying on optimization heuristics to match
> yours.  The IPA phase would necessarily be the
> "all_small_ipa_passes" one, and placement needs to be before
> pass_local_optimization_passes.

That might work for some of the strncpy truncation warnings,
but it won't help with this problem (87028) because the folding
happens during gimplification.  Holding off on the folding until
the CFG has been constructed would help, and it shouldn't have
a noticeable impact -- the calls will still be folded.

Detecting some of the most serious bugs like buffer overflow
in functions like strcpy or sprintf in all but the most trivial
cases depends on the strlen, sprintf, and object size passes.
All those run much later than pass_local_optimization_passes.

Martin
Martin Sebor Oct. 8, 2018, 9:45 p.m. UTC | #27
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

As with the other patch (bug 84561), there may be ways to redesign
the warning, but I don't have the cycles to undertake it before
stage 1 ends.  Unless someone has a simpler suggestion for how
to avoid this false positive now can we please accept this patch
for GCC 9 and consider the more ambitious approaches for GCC 10?

On 10/01/2018 03:24 PM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>
> On 09/21/2018 11:13 AM, Martin Sebor wrote:
>> On 09/17/2018 07:30 PM, Jeff Law wrote:
>>> On 8/28/18 6:12 PM, Martin Sebor wrote:
>>>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.
>>>>>> I'd
>>>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>>>> Essentially you're getting different results of
>>>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>>>> the same.
>>>>>
>>>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>>>> apples and oranges here.
>>>>
>>>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>>>> intentional but the function need not (perhaps should not)
>>>> also set *REF to it.
>>>>
>>>>>
>>>>> Yeah:
>>>>>
>>>>> /* If EXPR refers to a character array or pointer declared attribute
>>>>>    nonstring return a decl for that array or pointer and set *REF to
>>>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>>>    null.  */
>>>>>
>>>>> tree
>>>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>>>> {
>>>>>   tree decl = expr;
>>>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>>>     {
>>>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>>>
>>>>>       if (is_gimple_assign (def))
>>>>>         {
>>>>>           tree_code code = gimple_assign_rhs_code (def);
>>>>>           if (code == ADDR_EXPR
>>>>>               || code == COMPONENT_REF
>>>>>               || code == VAR_DECL)
>>>>>             decl = gimple_assign_rhs1 (def);
>>>>>         }
>>>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>>>         decl = var;
>>>>>     }
>>>>>
>>>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>>>     decl = TREE_OPERAND (decl, 0);
>>>>>
>>>>>   if (ref)
>>>>>     *ref = decl;
>>>>>
>>>>> I see a lot of "magic" here again in the attempt to "propagate"
>>>>> a nonstring attribute.
>>>>
>>>> That's the function's purpose: to look for the attribute.  Is
>>>> there a better way to do this?
>>>>
>>>>> Note
>>>>>
>>>>> foo (char *p __attribute__(("nonstring")))
>>>>> {
>>>>>   p = "bar";
>>>>>   strlen (p); // or whatever is necessary to call
>>>>> get_attr_nonstring_decl
>>>>> }
>>>>>
>>>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>>>
>>>> I don't know if you're saying that it should get a warning or
>>>> shouldn't.  Right now it doesn't because the strlen() call is
>>>> folded before we check for nonstring.
>>>>
>>>> I could see an argument for diagnosing it but I suspect you
>>>> wouldn't like it because it would mean more warning from
>>>> the folder.  I could also see an argument against it because,
>>>> as you said, it's safe.
>>>>
>>>> If you take the assignment to p away then a warning is issued,
>>>> and that's because p is declared with attribute nonstring.
>>>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>>>
>>>>> I think in your code comparing bases you want to look at the
>>>>> _original_
>>>>> argument to the string function rather than what
>>>>> get_attr_nonstring_decl
>>>>> returned as ref.
>>>>
>>>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>>>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>>>> the patch.  I've also updated the comment above SSA_NAME_VAR
>>>> to clarify its purpose per Jeff's comments.
>>>>
>>>> Attached is an updated revision with these changes.
>>>>
>>>> Martin
>>>>
>>>> gcc-87028.diff
>>>>
>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>> strncpy with global variable source string
>>>> gcc/ChangeLog:
>>>>
>>>>     PR tree-optimization/87028
>>>>     * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
>>>>     SSA_NAME_VAR.
>>>>     * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>>     when statement doesn't belong to a basic block.
>>>>     * tree.h (SSA_NAME_VAR): Update comment.
>>>>     * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>     PR tree-optimization/87028
>>>>     * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>     * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>
>>>
>>>> Index: gcc/calls.c
>>>> ===================================================================
>>>> --- gcc/calls.c    (revision 263928)
>>>> +++ gcc/calls.c    (working copy)
>>>> @@ -1503,6 +1503,7 @@ tree
>>>>  get_attr_nonstring_decl (tree expr, tree *ref)
>>>>  {
>>>>    tree decl = expr;
>>>> +  tree var = NULL_TREE;
>>>>    if (TREE_CODE (decl) == SSA_NAME)
>>>>      {
>>>>        gimple *def = SSA_NAME_DEF_STMT (decl);
>>>> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>>>>            || code == VAR_DECL)
>>>>          decl = gimple_assign_rhs1 (def);
>>>>      }
>>>> -      else if (tree var = SSA_NAME_VAR (decl))
>>>> -    decl = var;
>>>> +      else
>>>> +    var = SSA_NAME_VAR (decl);
>>>>      }
>>>>
>>>>    if (TREE_CODE (decl) == ADDR_EXPR)
>>>>      decl = TREE_OPERAND (decl, 0);
>>>>
>>>> +  /* To simplify calling code, store the referenced DECL regardless of
>>>> +     the attribute determined below, but avoid storing the
>>>> SSA_NAME_VAR
>>>> +     obtained above (it's not useful for dataflow purposes).  */
>>>>    if (ref)
>>>>      *ref = decl;
>>>>
>>>> -  if (TREE_CODE (decl) == ARRAY_REF)
>>>> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
>>>> +     declared nonstring.  Otherwise drill down into the referenced
>>>> +     DECL.  */
>>>> +  if (var)
>>>> +    decl = var;
>>>> +  else if (TREE_CODE (decl) == ARRAY_REF)
>>>>      decl = TREE_OPERAND (decl, 0);
>>>>    else if (TREE_CODE (decl) == COMPONENT_REF)
>>>>      decl = TREE_OPERAND (decl, 1);
>>> The more I look at this the more I think what we really want to be doing
>>> is real propagation of the property either via the alias oracle or a
>>> propagation engine.   You can't even guarantee that if you've got an
>>> SSA_NAME that the value it holds has any relation to its underlying
>>> SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
>>> from a some other SSA_NAME with a different underlying SSA_NAME_VAR.
>>>
>>> I'm not going to insist on it, but I think if we find ourselves
>>> extending this again in a way that is really working around lack of
>>> propagation of the property then we should go back and fix the
>>> propagation problem.
>>
>> We talked about improving this back in the GCC 8 cycle.  I've
>> been collecting input (and test cases) from Miguel Ojeda from
>> the adoption of the attribute in the Linux kernel.  There are
>> a number of issues I was hoping to get to in stage 1 but that
>> has been derailed by all the strlen back and forth.  I'm still
>> hoping to be able to fix some of the false positives here in
>> stage 3 but, IIUC the constraints, a redesign along the lines
>> you suggest would be considered overly intrusive.  (If not,
>> I'm willing to look into it.)
>>
>> That said, I had the impression from Richard's comments that
>> implementing the propagation in points-to analysis would come
>> at a cost and have its own downsides:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01954.html
>>
>> So I wasn't sure it was necessarily an endorsement of
>> the approach as the ideal solution or just a passing thought.
>>
>>>> Index: gcc/gimple-fold.c
>>>> ===================================================================
>>>> --- gcc/gimple-fold.c    (revision 263925)
>>>> +++ gcc/gimple-fold.c    (working copy)
>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>> (gimple_stmt_iterator
>>>>    if (tree_int_cst_lt (ssize, len))
>>>>      return false;
>>>>
>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>> +     block is reachable.  */
>>>> +  if (!gimple_bb (stmt))
>>>> +    return false;
>>>> +
>>>>    /* Diagnose truncation that leaves the copy unterminated.  */
>>>>    maybe_diag_stxncpy_trunc (*gsi, src, len);
>>> I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
>>> entirely clear to me if the subsequent comments about needing to fold
>>> early where meant to raise issues with guarding earlier or not.
>>
>> I'm fine with moving it if that's preferable.
>>
>> Moving the test to maybe_fold_stmt() would, IMO, be the right
>> change to make in general, at least for library built-ins.
>> I have been meaning to suggest it independently of this fix
>> but because of its pervasive impact I've been holding off,
>> expecting it to be controversial.  If there is consensus I'm
>> happy to make this change but I would prefer to do it separately
>> since it causes a number of regressions in tests that expect
>> built-ins to be folded very early on (i.e., look for evidence
>> of the folding in the output of -fdump-tree-gimple or
>> -fdump-tree-ccp1).  Some of the regression would go away if
>> maybe_fold_stmt() only avoided folding of library built-in
>> functions.  Resolving the others would require adjusting
>> the tests to either use optimization or look for the evidence
>> of folding in later passes than gimple or ccp1).  I think all
>> that is reasonable and won't impact the efficiency of
>> the emitted object code, but it's obviously a much bigger
>> change than a simple fix for a false positive warning.
>>
>> If that sounds reasonable, is the patch acceptable as is?
>>
>> The latest version is here:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>
>> Martin
>
Jeff Law Oct. 16, 2018, 9:21 p.m. UTC | #28
On 10/4/18 9:51 AM, Martin Sebor wrote:
> On 10/04/2018 08:58 AM, Jeff Law wrote:
>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>> calls that truncate their constant argument that are being
>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>> followed by the nul assignment:
>>>>>>>
>>>>>>>   const char s[] = "12345";
>>>>>>>   char d[3];
>>>>>>>
>>>>>>>   void f (void)
>>>>>>>   {
>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>   }
>>>>>>>
>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>> rarely used function that is often misused), getting
>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>> bugs cannot be as readily detected.
>>>>>>>
>>>>>>> Tested on x86_64-linux.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>> still try to get them to work for GCC 9.
>>>>>>>
>>>>>>> gcc-87028.diff
>>>>>>>
>>>>>>>
>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>> strncpy with global variable source string
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>>       PR tree-optimization/87028
>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>> folding when
>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>> MEM_REF on
>>>>>>>       the left hand side of assignment.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>>       PR tree-optimization/87028
>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>
>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>> index 07341eb..284c2fb 100644
>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>      return false;
>>>>>>>
>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>> basic
>>>>>>> +     block is reachable.  */
>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>> +    return false;
>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>> equivalent
>>>>>> in practice.
>>>>>
>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>> also accessible
>>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>> GENERIC)?
>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>> avoiding in that case.
>>>
>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>> So I think the concern with adding the guards to maybe_fold_stmt is the
>> possibility of further fallout.
>>
>> I guess they could be written to target this case specifically to
>> minimize fallout, but that feels like we're doing the same thing
>> (band-aid) just in a different place.
>>
>>
>>
>>>
>>>>>
>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>> can avoid that
>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>> that because
>>>>> we now do less folding on GENERIC.
>>>> But an unfolded call in the IL should always be safe and we've got
>>>> plenty of opportunities to fold it later.
>>>
>>> Well - we do.  The very first one is forwprop though which means
>>> we'll miss to
>>> re-write some memcpy parts into SSA:
>>>
>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>              form if possible.  */
>>>           NEXT_PASS (pass_forwprop);
>>>
>>> likewise early object-size will be confused by memcpy calls that just
>>> exist
>>> to avoid TBAA issues (another of our recommendations besides using
>>> unions).
>>>
>>> We do fold mem* early for a reason ;)
>>>
>>> "We can always do warnings earlier" would be a similar true sentence.
>> I'm not disagreeing at all.  There's a natural tension between the
>> benefits of folding early to enable more optimizations downstream and
>> leaving the IL in a state where we can give actionable warnings.
> 
> Similar trade-offs between folding early and losing information
> as a result also impact high-level optimizations.
> 
> For instance, folding the strlen argument below
> 
>   void f3 (struct A* p)
>   {
>     __builtin_strcpy (p->a, "123");
> 
>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
>       __builtin_abort ();
>   }
> 
> into
> 
>   _2 = &MEM[(void *)p_4(D) + 2B];
> 
> early on defeats the strlen optimization because there is no
> mechanism to determine what member (void *)p_4(D) + 2B refers
> to (this is bug 86955).
> 
> Another example is folding of strlen calls with no-nconstant
> offsets into constant strings like here:
> 
>   const char a[] = "123";
> 
>   void f (int i)
>   {
>     if (__builtin_strlen (&a[i]) > 3)
>       __builtin_abort ();
>   }
> 
> into sizeof a - 1 - i, which then prevents the result from
> being folded to false  (bug 86434), not to mention the code
> it emits for out-of-bounds indices.
> 
> There are a number of other similar examples in Bugzilla
> that I've filed as I discovered then during testing my
> warnings (e.g., 86572).
> 
> In my mind, transforming library calls into "lossy" low-level
> primitives like MEM_REF would be better done only after higher
> level optimizations have had a chance to analyze them.  Ditto
> for other similar transformations (like to other library calls).
> Having more accurate information helps both optimization and
> warnings.  It also makes the warnings more meaningful.
> Printing "memcpy overflows a buffer" when the source code
> has a call to strncpy is less than ideal.
> 
>> Similarly there's a natural tension between warning early vs warning
>> late.  Code that triggers the warning may ultimately be proved
>> unreachable, or we may discover simplifications that either suppress or
>> expose a warning.
>>
>> There is no easy answer here.  But I think we can legitimately ask
>> questions.  ie, does folding strnlen here really improve things
>> downstream in ways that are measurable?  Does the false positive really
>> impact the utility of the warning?  etc.
>>
>> I'd hazard a guess that Martin is particularly sensitive to false
>> positives based on feedback he's received from our developer community
>> as well as downstream consumers of his work.
> 
> Yes.  The kernel folks in particular have done a lot of work
> cleaning up their code in an effort to adopt the warning and
> attribute nonstring.  They have been keeping me in the loop
> on their progress (and feeding me back test cases with false
> positives and negatives they run into).
I can't recall seeing further guidance from Richi WRT putting the checks
earlier (maybe_fold_stmt).

If the point here is to avoid false positives by not folding strncpy,
particularly in cases where we don't see the NUL in the copy, but it
appears in a subsequent store, then let's be fairly selective (so as not
to muck up things on the optimization side more than is necessary).

ISTM we can do this by refactoring the warning bits so they're reusable
at different points in the pipeline.  Those bits would always return a
boolean indicating if the given statement might generate a warning or not.

When called early, they would not actually issue any warning.  They
would merely do the best analysis they can and return a status
indicating whether or not the statement would generate a warning given
current context.  The goal here is to leave statements that might
generate a warning as-is in the IL.

When called late (assuming there is a point where we can walk the IL and
issue the appropriate warnings), the routine would actually issue the
warning.

The kind of structure could potentially work for other builtins where we
may need to look at subsequent statements to avoid false positives, but
early folding hides cases by transforming the call into an undesirable form.

Note that for cases where a call looks problematical early because we
can't see statement which stores the terminator, but where the
terminator statement ultimately becomes visible, we still get folding,
it just happens later in the pipeline.

Thoughts?

jeff
Martin Sebor Oct. 21, 2018, 12:01 a.m. UTC | #29
On 10/16/2018 03:21 PM, Jeff Law wrote:
> On 10/4/18 9:51 AM, Martin Sebor wrote:
>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>
>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>> followed by the nul assignment:
>>>>>>>>
>>>>>>>>   const char s[] = "12345";
>>>>>>>>   char d[3];
>>>>>>>>
>>>>>>>>   void f (void)
>>>>>>>>   {
>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>> rarely used function that is often misused), getting
>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>
>>>>>>>> Tested on x86_64-linux.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>
>>>>>>>> gcc-87028.diff
>>>>>>>>
>>>>>>>>
>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>> strncpy with global variable source string
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>> folding when
>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>> MEM_REF on
>>>>>>>>       the left hand side of assignment.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>
>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>      return false;
>>>>>>>>
>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>>> basic
>>>>>>>> +     block is reachable.  */
>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>> +    return false;
>>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>>> equivalent
>>>>>>> in practice.
>>>>>>
>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>>> also accessible
>>>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>> GENERIC)?
>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>> avoiding in that case.
>>>>
>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>> So I think the concern with adding the guards to maybe_fold_stmt is the
>>> possibility of further fallout.
>>>
>>> I guess they could be written to target this case specifically to
>>> minimize fallout, but that feels like we're doing the same thing
>>> (band-aid) just in a different place.
>>>
>>>
>>>
>>>>
>>>>>>
>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>> can avoid that
>>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>>> that because
>>>>>> we now do less folding on GENERIC.
>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>> plenty of opportunities to fold it later.
>>>>
>>>> Well - we do.  The very first one is forwprop though which means
>>>> we'll miss to
>>>> re-write some memcpy parts into SSA:
>>>>
>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>              form if possible.  */
>>>>           NEXT_PASS (pass_forwprop);
>>>>
>>>> likewise early object-size will be confused by memcpy calls that just
>>>> exist
>>>> to avoid TBAA issues (another of our recommendations besides using
>>>> unions).
>>>>
>>>> We do fold mem* early for a reason ;)
>>>>
>>>> "We can always do warnings earlier" would be a similar true sentence.
>>> I'm not disagreeing at all.  There's a natural tension between the
>>> benefits of folding early to enable more optimizations downstream and
>>> leaving the IL in a state where we can give actionable warnings.
>>
>> Similar trade-offs between folding early and losing information
>> as a result also impact high-level optimizations.
>>
>> For instance, folding the strlen argument below
>>
>>   void f3 (struct A* p)
>>   {
>>     __builtin_strcpy (p->a, "123");
>>
>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>       __builtin_abort ();
>>   }
>>
>> into
>>
>>   _2 = &MEM[(void *)p_4(D) + 2B];
>>
>> early on defeats the strlen optimization because there is no
>> mechanism to determine what member (void *)p_4(D) + 2B refers
>> to (this is bug 86955).
>>
>> Another example is folding of strlen calls with no-nconstant
>> offsets into constant strings like here:
>>
>>   const char a[] = "123";
>>
>>   void f (int i)
>>   {
>>     if (__builtin_strlen (&a[i]) > 3)
>>       __builtin_abort ();
>>   }
>>
>> into sizeof a - 1 - i, which then prevents the result from
>> being folded to false  (bug 86434), not to mention the code
>> it emits for out-of-bounds indices.
>>
>> There are a number of other similar examples in Bugzilla
>> that I've filed as I discovered then during testing my
>> warnings (e.g., 86572).
>>
>> In my mind, transforming library calls into "lossy" low-level
>> primitives like MEM_REF would be better done only after higher
>> level optimizations have had a chance to analyze them.  Ditto
>> for other similar transformations (like to other library calls).
>> Having more accurate information helps both optimization and
>> warnings.  It also makes the warnings more meaningful.
>> Printing "memcpy overflows a buffer" when the source code
>> has a call to strncpy is less than ideal.
>>
>>> Similarly there's a natural tension between warning early vs warning
>>> late.  Code that triggers the warning may ultimately be proved
>>> unreachable, or we may discover simplifications that either suppress or
>>> expose a warning.
>>>
>>> There is no easy answer here.  But I think we can legitimately ask
>>> questions.  ie, does folding strnlen here really improve things
>>> downstream in ways that are measurable?  Does the false positive really
>>> impact the utility of the warning?  etc.
>>>
>>> I'd hazard a guess that Martin is particularly sensitive to false
>>> positives based on feedback he's received from our developer community
>>> as well as downstream consumers of his work.
>>
>> Yes.  The kernel folks in particular have done a lot of work
>> cleaning up their code in an effort to adopt the warning and
>> attribute nonstring.  They have been keeping me in the loop
>> on their progress (and feeding me back test cases with false
>> positives and negatives they run into).
> I can't recall seeing further guidance from Richi WRT putting the checks
> earlier (maybe_fold_stmt).
>
> If the point here is to avoid false positives by not folding strncpy,
> particularly in cases where we don't see the NUL in the copy, but it
> appears in a subsequent store, then let's be fairly selective (so as not
> to muck up things on the optimization side more than is necessary).
>
> ISTM we can do this by refactoring the warning bits so they're reusable
> at different points in the pipeline.  Those bits would always return a
> boolean indicating if the given statement might generate a warning or not.
>
> When called early, they would not actually issue any warning.  They
> would merely do the best analysis they can and return a status
> indicating whether or not the statement would generate a warning given
> current context.  The goal here is to leave statements that might
> generate a warning as-is in the IL.
>
> When called late (assuming there is a point where we can walk the IL and
> issue the appropriate warnings), the routine would actually issue the
> warning.
>
> The kind of structure could potentially work for other builtins where we
> may need to look at subsequent statements to avoid false positives, but
> early folding hides cases by transforming the call into an undesirable form.
>
> Note that for cases where a call looks problematical early because we
> can't see statement which stores the terminator, but where the
> terminator statement ultimately becomes visible, we still get folding,
> it just happens later in the pipeline.
>
> Thoughts?

The warning only triggers when the bound is less than or equal
to the length of the constant source string (i.e, when strncpy
truncates).  So IIUC, your suggestion would defer folding only
such strncpy calls and let gimple_fold_builtin_strncpy fold
those with a constant bound that's greater than the length of
the constant source string.  That would be fine with me, but
since strncpy calls with a bound that's greater than the length
of the source are pointless I don't think they are important
enough to worry about folding super early.  The constant ones
that serve any purpose (and that are presumably important to
optimize) are those that truncate.

That said, when optimization isn't enabled, I don't think users
expect calls to library functions to be transformed to calls to
other  functions, or inlined.  Yet that's just what GCC does.
For example, besides triggering the warning, the following:

   char a[4];

   void f (char *s)
   {
     __builtin_strncpy (a, "1234", sizeof a);
     a[3] = 0;
   }

is transformed, even at -O0, into:

   f (char * s)
   {
     <bb 2> :
     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
     a[3] = 0;
     return;
   }

That doesn't seem right.  GCC should avoid these transformations
at -O0, and one way to do that is to defer folding until the CFG
is constructed.  The patch does it for strncpy but a more general
solution would do that for all calls, e.g., in maybe_fold_stmt
as Richard suggested (and I subsequently tested).

Martin
Martin Sebor Oct. 31, 2018, 4:33 p.m. UTC | #30
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

On 10/20/2018 06:01 PM, Martin Sebor wrote:
> On 10/16/2018 03:21 PM, Jeff Law wrote:
>> On 10/4/18 9:51 AM, Martin Sebor wrote:
>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>>> followed by the nul assignment:
>>>>>>>>>
>>>>>>>>>   const char s[] = "12345";
>>>>>>>>>   char d[3];
>>>>>>>>>
>>>>>>>>>   void f (void)
>>>>>>>>>   {
>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>>> rarely used function that is often misused), getting
>>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>>
>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>>
>>>>>>>>> gcc-87028.diff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>>> strncpy with global variable source string
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>>> folding when
>>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>>> MEM_REF on
>>>>>>>>>       the left hand side of assignment.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>>      return false;
>>>>>>>>>
>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>>>> basic
>>>>>>>>> +     block is reachable.  */
>>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>>> +    return false;
>>>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>>>> equivalent
>>>>>>>> in practice.
>>>>>>>
>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>>>> also accessible
>>>>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>>> GENERIC)?
>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>>> avoiding in that case.
>>>>>
>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>>> So I think the concern with adding the guards to maybe_fold_stmt is the
>>>> possibility of further fallout.
>>>>
>>>> I guess they could be written to target this case specifically to
>>>> minimize fallout, but that feels like we're doing the same thing
>>>> (band-aid) just in a different place.
>>>>
>>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>>> can avoid that
>>>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>>>> that because
>>>>>>> we now do less folding on GENERIC.
>>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>>> plenty of opportunities to fold it later.
>>>>>
>>>>> Well - we do.  The very first one is forwprop though which means
>>>>> we'll miss to
>>>>> re-write some memcpy parts into SSA:
>>>>>
>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>>              form if possible.  */
>>>>>           NEXT_PASS (pass_forwprop);
>>>>>
>>>>> likewise early object-size will be confused by memcpy calls that just
>>>>> exist
>>>>> to avoid TBAA issues (another of our recommendations besides using
>>>>> unions).
>>>>>
>>>>> We do fold mem* early for a reason ;)
>>>>>
>>>>> "We can always do warnings earlier" would be a similar true sentence.
>>>> I'm not disagreeing at all.  There's a natural tension between the
>>>> benefits of folding early to enable more optimizations downstream and
>>>> leaving the IL in a state where we can give actionable warnings.
>>>
>>> Similar trade-offs between folding early and losing information
>>> as a result also impact high-level optimizations.
>>>
>>> For instance, folding the strlen argument below
>>>
>>>   void f3 (struct A* p)
>>>   {
>>>     __builtin_strcpy (p->a, "123");
>>>
>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>>       __builtin_abort ();
>>>   }
>>>
>>> into
>>>
>>>   _2 = &MEM[(void *)p_4(D) + 2B];
>>>
>>> early on defeats the strlen optimization because there is no
>>> mechanism to determine what member (void *)p_4(D) + 2B refers
>>> to (this is bug 86955).
>>>
>>> Another example is folding of strlen calls with no-nconstant
>>> offsets into constant strings like here:
>>>
>>>   const char a[] = "123";
>>>
>>>   void f (int i)
>>>   {
>>>     if (__builtin_strlen (&a[i]) > 3)
>>>       __builtin_abort ();
>>>   }
>>>
>>> into sizeof a - 1 - i, which then prevents the result from
>>> being folded to false  (bug 86434), not to mention the code
>>> it emits for out-of-bounds indices.
>>>
>>> There are a number of other similar examples in Bugzilla
>>> that I've filed as I discovered then during testing my
>>> warnings (e.g., 86572).
>>>
>>> In my mind, transforming library calls into "lossy" low-level
>>> primitives like MEM_REF would be better done only after higher
>>> level optimizations have had a chance to analyze them.  Ditto
>>> for other similar transformations (like to other library calls).
>>> Having more accurate information helps both optimization and
>>> warnings.  It also makes the warnings more meaningful.
>>> Printing "memcpy overflows a buffer" when the source code
>>> has a call to strncpy is less than ideal.
>>>
>>>> Similarly there's a natural tension between warning early vs warning
>>>> late.  Code that triggers the warning may ultimately be proved
>>>> unreachable, or we may discover simplifications that either suppress or
>>>> expose a warning.
>>>>
>>>> There is no easy answer here.  But I think we can legitimately ask
>>>> questions.  ie, does folding strnlen here really improve things
>>>> downstream in ways that are measurable?  Does the false positive really
>>>> impact the utility of the warning?  etc.
>>>>
>>>> I'd hazard a guess that Martin is particularly sensitive to false
>>>> positives based on feedback he's received from our developer community
>>>> as well as downstream consumers of his work.
>>>
>>> Yes.  The kernel folks in particular have done a lot of work
>>> cleaning up their code in an effort to adopt the warning and
>>> attribute nonstring.  They have been keeping me in the loop
>>> on their progress (and feeding me back test cases with false
>>> positives and negatives they run into).
>> I can't recall seeing further guidance from Richi WRT putting the checks
>> earlier (maybe_fold_stmt).
>>
>> If the point here is to avoid false positives by not folding strncpy,
>> particularly in cases where we don't see the NUL in the copy, but it
>> appears in a subsequent store, then let's be fairly selective (so as not
>> to muck up things on the optimization side more than is necessary).
>>
>> ISTM we can do this by refactoring the warning bits so they're reusable
>> at different points in the pipeline.  Those bits would always return a
>> boolean indicating if the given statement might generate a warning or
>> not.
>>
>> When called early, they would not actually issue any warning.  They
>> would merely do the best analysis they can and return a status
>> indicating whether or not the statement would generate a warning given
>> current context.  The goal here is to leave statements that might
>> generate a warning as-is in the IL.
>>
>> When called late (assuming there is a point where we can walk the IL and
>> issue the appropriate warnings), the routine would actually issue the
>> warning.
>>
>> The kind of structure could potentially work for other builtins where we
>> may need to look at subsequent statements to avoid false positives, but
>> early folding hides cases by transforming the call into an undesirable
>> form.
>>
>> Note that for cases where a call looks problematical early because we
>> can't see statement which stores the terminator, but where the
>> terminator statement ultimately becomes visible, we still get folding,
>> it just happens later in the pipeline.
>>
>> Thoughts?
>
> The warning only triggers when the bound is less than or equal
> to the length of the constant source string (i.e, when strncpy
> truncates).  So IIUC, your suggestion would defer folding only
> such strncpy calls and let gimple_fold_builtin_strncpy fold
> those with a constant bound that's greater than the length of
> the constant source string.  That would be fine with me, but
> since strncpy calls with a bound that's greater than the length
> of the source are pointless I don't think they are important
> enough to worry about folding super early.  The constant ones
> that serve any purpose (and that are presumably important to
> optimize) are those that truncate.
>
> That said, when optimization isn't enabled, I don't think users
> expect calls to library functions to be transformed to calls to
> other  functions, or inlined.  Yet that's just what GCC does.
> For example, besides triggering the warning, the following:
>
>   char a[4];
>
>   void f (char *s)
>   {
>     __builtin_strncpy (a, "1234", sizeof a);
>     a[3] = 0;
>   }
>
> is transformed, even at -O0, into:
>
>   f (char * s)
>   {
>     <bb 2> :
>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
>     a[3] = 0;
>     return;
>   }
>
> That doesn't seem right.  GCC should avoid these transformations
> at -O0, and one way to do that is to defer folding until the CFG
> is constructed.  The patch does it for strncpy but a more general
> solution would do that for all calls, e.g., in maybe_fold_stmt
> as Richard suggested (and I subsequently tested).
>
> Martin
Jeff Law Nov. 7, 2018, 9:28 p.m. UTC | #31
On 10/20/18 6:01 PM, Martin Sebor wrote:


> 
> The warning only triggers when the bound is less than or equal
> to the length of the constant source string (i.e, when strncpy
> truncates).  So IIUC, your suggestion would defer folding only
> such strncpy calls and let gimple_fold_builtin_strncpy fold
> those with a constant bound that's greater than the length of
> the constant source string.  That would be fine with me, but
> since strncpy calls with a bound that's greater than the length
> of the source are pointless I don't think they are important
> enough to worry about folding super early.  The constant ones
> that serve any purpose (and that are presumably important to
> optimize) are those that truncate.
I was focused exclusively on the case where we have to look for a
subsequent statement that handled termination.  The idea was to only
leave in the cases that we might need to warn for because we couldn't
search subsequent statement for the termination.

Splitting up was primarily meant to get the warning out of the folder
with a minimal impact on code generation.  But if the common case would
result in deferral of folding, then I'd fully expect Richi to object.

> 
> That said, when optimization isn't enabled, I don't think users
> expect calls to library functions to be transformed to calls to
> other  functions, or inlined.  Yet that's just what GCC does.
> For example, besides triggering the warning, the following:
I don't think we should drag this into the issue at hand.  Though I do
generally agree that folding this stuff into low level memory operations
is not what most would expect at -O0.


Jeff
Martin Sebor Nov. 9, 2018, 1:25 a.m. UTC | #32
On 11/07/2018 02:28 PM, Jeff Law wrote:
> On 10/20/18 6:01 PM, Martin Sebor wrote:
>
>
>>
>> The warning only triggers when the bound is less than or equal
>> to the length of the constant source string (i.e, when strncpy
>> truncates).  So IIUC, your suggestion would defer folding only
>> such strncpy calls and let gimple_fold_builtin_strncpy fold
>> those with a constant bound that's greater than the length of
>> the constant source string.  That would be fine with me, but
>> since strncpy calls with a bound that's greater than the length
>> of the source are pointless I don't think they are important
>> enough to worry about folding super early.  The constant ones
>> that serve any purpose (and that are presumably important to
>> optimize) are those that truncate.
> I was focused exclusively on the case where we have to look for a
> subsequent statement that handled termination.  The idea was to only
> leave in the cases that we might need to warn for because we couldn't
> search subsequent statement for the termination.
>
> Splitting up was primarily meant to get the warning out of the folder
> with a minimal impact on code generation.  But if the common case would
> result in deferral of folding, then I'd fully expect Richi to object.

The test case from the bug is:

   struct S {
     char dest[5];
   };

   const char src[] = "1234567890";

   void f (struct S *s)
   {
     strncpy (s->dest, src, sizeof (s->dest) - 1);

     s->dest [sizeof (s->dest) - 1] = '\0';
   }

The strncpy call truncates but it's safe because of the assignment.

This is representative of the use case that the fix is needed for
(one with a constant source string) and that needs folding to be
deferred.  I don't think this use case is a pervasive one, or even
terribly common among all strncpy uses, but it is the only one that
the early folding code handles.  By deferring the folding, this use
case will be transformed to memcpy slightly later than it is now
(during forwprop1 to be precise, so after early folding).

As a data point, in a build of the Linux kernel where I expect
strncpy is used as intended more than in most other code, of
the nearly 500 distinct strncpy calls, 21 instances are folded
before the CFG is complete.  The effect of the patch would fold
the 21 instances later.  I.e., just 4% of all calls.  Most of
these calls (12 out of the 21) are in SCSI drivers, in code
that responds to the INQUIRY command with things like vendor
and product names, hardly something that matters for efficiency.
But at -O1 even they still are ultimately folded to memcpy.

>> That said, when optimization isn't enabled, I don't think users
>> expect calls to library functions to be transformed to calls to
>> other  functions, or inlined.  Yet that's just what GCC does.
>> For example, besides triggering the warning, the following:
> I don't think we should drag this into the issue at hand.  Though I do
> generally agree that folding this stuff into low level memory operations
> is not what most would expect at -O0.

Folding calls to library functions so early that the CFG hasn't
been constructed yet is the root cause of the issue.  I'm not
suggesting to prevent it for all functions as part of this fix,
but if agree that this early folding is a poor choice in general
then we should not be uncomfortable with a patch that defers it
for just a subset of use cases of a single function.

FWIW, I would view it as entirely appropriate to do our due
diligence before making a decision about the early folding of
all library functions, but I'm finding it hard to justify to
myself spending this much time and effort on either the false
positive or on the question of whether something that's by all
measures as inconsequential for efficiency as strncpy should
be transformed to memcpy at point 1 or point 2.

What do you suggest next?

Martin
Martin Sebor Nov. 16, 2018, 3:12 a.m. UTC | #33
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

Please let me know if there is something I need to change here
to make the fix acceptable or if I should stop trying.

On 10/31/2018 10:33 AM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>
> On 10/20/2018 06:01 PM, Martin Sebor wrote:
>> On 10/16/2018 03:21 PM, Jeff Law wrote:
>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>>>
>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>>>> followed by the nul assignment:
>>>>>>>>>>
>>>>>>>>>>   const char s[] = "12345";
>>>>>>>>>>   char d[3];
>>>>>>>>>>
>>>>>>>>>>   void f (void)
>>>>>>>>>>   {
>>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>>>> rarely used function that is often misused), getting
>>>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>>>
>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>>>
>>>>>>>>>> gcc-87028.diff
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>>>> strncpy with global variable source string
>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>>>> folding when
>>>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>>>> MEM_REF on
>>>>>>>>>>       the left hand side of assignment.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>>>
>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>>>      return false;
>>>>>>>>>>
>>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>>>>> basic
>>>>>>>>>> +     block is reachable.  */
>>>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>>>> +    return false;
>>>>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>>>>> equivalent
>>>>>>>>> in practice.
>>>>>>>>
>>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>>>>> also accessible
>>>>>>>> when there is no CFG.  I guess the issue is that we fold this
>>>>>>>> during
>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>>>> GENERIC)?
>>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>>>> avoiding in that case.
>>>>>>
>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
>>>>> the
>>>>> possibility of further fallout.
>>>>>
>>>>> I guess they could be written to target this case specifically to
>>>>> minimize fallout, but that feels like we're doing the same thing
>>>>> (band-aid) just in a different place.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>>>> can avoid that
>>>>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>>>>> that because
>>>>>>>> we now do less folding on GENERIC.
>>>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>>>> plenty of opportunities to fold it later.
>>>>>>
>>>>>> Well - we do.  The very first one is forwprop though which means
>>>>>> we'll miss to
>>>>>> re-write some memcpy parts into SSA:
>>>>>>
>>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>>>              form if possible.  */
>>>>>>           NEXT_PASS (pass_forwprop);
>>>>>>
>>>>>> likewise early object-size will be confused by memcpy calls that just
>>>>>> exist
>>>>>> to avoid TBAA issues (another of our recommendations besides using
>>>>>> unions).
>>>>>>
>>>>>> We do fold mem* early for a reason ;)
>>>>>>
>>>>>> "We can always do warnings earlier" would be a similar true sentence.
>>>>> I'm not disagreeing at all.  There's a natural tension between the
>>>>> benefits of folding early to enable more optimizations downstream and
>>>>> leaving the IL in a state where we can give actionable warnings.
>>>>
>>>> Similar trade-offs between folding early and losing information
>>>> as a result also impact high-level optimizations.
>>>>
>>>> For instance, folding the strlen argument below
>>>>
>>>>   void f3 (struct A* p)
>>>>   {
>>>>     __builtin_strcpy (p->a, "123");
>>>>
>>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>>>       __builtin_abort ();
>>>>   }
>>>>
>>>> into
>>>>
>>>>   _2 = &MEM[(void *)p_4(D) + 2B];
>>>>
>>>> early on defeats the strlen optimization because there is no
>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
>>>> to (this is bug 86955).
>>>>
>>>> Another example is folding of strlen calls with no-nconstant
>>>> offsets into constant strings like here:
>>>>
>>>>   const char a[] = "123";
>>>>
>>>>   void f (int i)
>>>>   {
>>>>     if (__builtin_strlen (&a[i]) > 3)
>>>>       __builtin_abort ();
>>>>   }
>>>>
>>>> into sizeof a - 1 - i, which then prevents the result from
>>>> being folded to false  (bug 86434), not to mention the code
>>>> it emits for out-of-bounds indices.
>>>>
>>>> There are a number of other similar examples in Bugzilla
>>>> that I've filed as I discovered then during testing my
>>>> warnings (e.g., 86572).
>>>>
>>>> In my mind, transforming library calls into "lossy" low-level
>>>> primitives like MEM_REF would be better done only after higher
>>>> level optimizations have had a chance to analyze them.  Ditto
>>>> for other similar transformations (like to other library calls).
>>>> Having more accurate information helps both optimization and
>>>> warnings.  It also makes the warnings more meaningful.
>>>> Printing "memcpy overflows a buffer" when the source code
>>>> has a call to strncpy is less than ideal.
>>>>
>>>>> Similarly there's a natural tension between warning early vs warning
>>>>> late.  Code that triggers the warning may ultimately be proved
>>>>> unreachable, or we may discover simplifications that either
>>>>> suppress or
>>>>> expose a warning.
>>>>>
>>>>> There is no easy answer here.  But I think we can legitimately ask
>>>>> questions.  ie, does folding strnlen here really improve things
>>>>> downstream in ways that are measurable?  Does the false positive
>>>>> really
>>>>> impact the utility of the warning?  etc.
>>>>>
>>>>> I'd hazard a guess that Martin is particularly sensitive to false
>>>>> positives based on feedback he's received from our developer community
>>>>> as well as downstream consumers of his work.
>>>>
>>>> Yes.  The kernel folks in particular have done a lot of work
>>>> cleaning up their code in an effort to adopt the warning and
>>>> attribute nonstring.  They have been keeping me in the loop
>>>> on their progress (and feeding me back test cases with false
>>>> positives and negatives they run into).
>>> I can't recall seeing further guidance from Richi WRT putting the checks
>>> earlier (maybe_fold_stmt).
>>>
>>> If the point here is to avoid false positives by not folding strncpy,
>>> particularly in cases where we don't see the NUL in the copy, but it
>>> appears in a subsequent store, then let's be fairly selective (so as not
>>> to muck up things on the optimization side more than is necessary).
>>>
>>> ISTM we can do this by refactoring the warning bits so they're reusable
>>> at different points in the pipeline.  Those bits would always return a
>>> boolean indicating if the given statement might generate a warning or
>>> not.
>>>
>>> When called early, they would not actually issue any warning.  They
>>> would merely do the best analysis they can and return a status
>>> indicating whether or not the statement would generate a warning given
>>> current context.  The goal here is to leave statements that might
>>> generate a warning as-is in the IL.
>>>
>>> When called late (assuming there is a point where we can walk the IL and
>>> issue the appropriate warnings), the routine would actually issue the
>>> warning.
>>>
>>> The kind of structure could potentially work for other builtins where we
>>> may need to look at subsequent statements to avoid false positives, but
>>> early folding hides cases by transforming the call into an undesirable
>>> form.
>>>
>>> Note that for cases where a call looks problematical early because we
>>> can't see statement which stores the terminator, but where the
>>> terminator statement ultimately becomes visible, we still get folding,
>>> it just happens later in the pipeline.
>>>
>>> Thoughts?
>>
>> The warning only triggers when the bound is less than or equal
>> to the length of the constant source string (i.e, when strncpy
>> truncates).  So IIUC, your suggestion would defer folding only
>> such strncpy calls and let gimple_fold_builtin_strncpy fold
>> those with a constant bound that's greater than the length of
>> the constant source string.  That would be fine with me, but
>> since strncpy calls with a bound that's greater than the length
>> of the source are pointless I don't think they are important
>> enough to worry about folding super early.  The constant ones
>> that serve any purpose (and that are presumably important to
>> optimize) are those that truncate.
>>
>> That said, when optimization isn't enabled, I don't think users
>> expect calls to library functions to be transformed to calls to
>> other  functions, or inlined.  Yet that's just what GCC does.
>> For example, besides triggering the warning, the following:
>>
>>   char a[4];
>>
>>   void f (char *s)
>>   {
>>     __builtin_strncpy (a, "1234", sizeof a);
>>     a[3] = 0;
>>   }
>>
>> is transformed, even at -O0, into:
>>
>>   f (char * s)
>>   {
>>     <bb 2> :
>>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
>>     a[3] = 0;
>>     return;
>>   }
>>
>> That doesn't seem right.  GCC should avoid these transformations
>> at -O0, and one way to do that is to defer folding until the CFG
>> is constructed.  The patch does it for strncpy but a more general
>> solution would do that for all calls, e.g., in maybe_fold_stmt
>> as Richard suggested (and I subsequently tested).
>>
>> Martin
>
Richard Biener Nov. 16, 2018, 9:07 a.m. UTC | #34
On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>
> Please let me know if there is something I need to change here
> to make the fix acceptable or if I should stop trying.

I have one more comment about

+  /* Defer warning (and folding) until the next statement in the basic
+     block is reachable.  */
+  if (!gimple_bb (stmt))
+    return false;
+

it's not about the next statement in the basic-block being "reachable"
(even w/o a CFG you can use gsi_next()) but rather that the next
stmt isn't yet gimplified and thus not inserted into the gimple sequence,
right?  You apply this to gimple_fold_builtin_strncpy but I'd rather
see us not sprinkling this over gimple-fold.c but instead do this
in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.

See the attached (untested).

Richard.



> On 10/31/2018 10:33 AM, Martin Sebor wrote:
> > Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >
> > On 10/20/2018 06:01 PM, Martin Sebor wrote:
> >> On 10/16/2018 03:21 PM, Jeff Law wrote:
> >>> On 10/4/18 9:51 AM, Martin Sebor wrote:
> >>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
> >>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
> >>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
> >>>>>>>
> >>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
> >>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
> >>>>>>>>>> the next statement after a truncating strncpy to see if it
> >>>>>>>>>> adds a terminating nul.  This only works when the next
> >>>>>>>>>> statement can be reached using the Gimple statement iterator
> >>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
> >>>>>>>>>> calls that truncate their constant argument that are being
> >>>>>>>>>> folded to memcpy this early get diagnosed even if they are
> >>>>>>>>>> followed by the nul assignment:
> >>>>>>>>>>
> >>>>>>>>>>   const char s[] = "12345";
> >>>>>>>>>>   char d[3];
> >>>>>>>>>>
> >>>>>>>>>>   void f (void)
> >>>>>>>>>>   {
> >>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>>>>>>>     d[sizeof d - 1] = 0;
> >>>>>>>>>>   }
> >>>>>>>>>>
> >>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
> >>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
> >>>>>>>>>> is in can be used to try to reach the next statement (this
> >>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
> >>>>>>>>>> fold things early but in the case of strncpy (a relatively
> >>>>>>>>>> rarely used function that is often misused), getting
> >>>>>>>>>> the warning right while folding a bit later but still fairly
> >>>>>>>>>> early on seems like a reasonable compromise.  I fear that
> >>>>>>>>>> otherwise, the false positives will drive users to adopt
> >>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
> >>>>>>>>>> bugs cannot be as readily detected.
> >>>>>>>>>>
> >>>>>>>>>> Tested on x86_64-linux.
> >>>>>>>>>>
> >>>>>>>>>> Martin
> >>>>>>>>>>
> >>>>>>>>>> PS There still are outstanding cases where the warning can
> >>>>>>>>>> be avoided.  I xfailed them in the test for now but will
> >>>>>>>>>> still try to get them to work for GCC 9.
> >>>>>>>>>>
> >>>>>>>>>> gcc-87028.diff
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>>>>>>>> strncpy with global variable source string
> >>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
> >>>>>>>>>> folding when
> >>>>>>>>>>       statement doesn't belong to a basic block.
> >>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
> >>>>>>>>>> MEM_REF on
> >>>>>>>>>>       the left hand side of assignment.
> >>>>>>>>>>
> >>>>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>>>>>>>> index 07341eb..284c2fb 100644
> >>>>>>>>>> --- a/gcc/gimple-fold.c
> >>>>>>>>>> +++ b/gcc/gimple-fold.c
> >>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>>>>>>>> (gimple_stmt_iterator *gsi,
> >>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
> >>>>>>>>>>      return false;
> >>>>>>>>>>
> >>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
> >>>>>>>>>> basic
> >>>>>>>>>> +     block is reachable.  */
> >>>>>>>>>> +  if (!gimple_bb (stmt))
> >>>>>>>>>> +    return false;
> >>>>>>>>> I think you want cfun->cfg as the test here.  They should be
> >>>>>>>>> equivalent
> >>>>>>>>> in practice.
> >>>>>>>>
> >>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
> >>>>>>>> also accessible
> >>>>>>>> when there is no CFG.  I guess the issue is that we fold this
> >>>>>>>> during
> >>>>>>>> gimplification where the next stmt is not yet "there" (but still in
> >>>>>>>> GENERIC)?
> >>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
> >>>>>>> avoiding in that case.
> >>>>>>
> >>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> >>>>> So I think the concern with adding the guards to maybe_fold_stmt is
> >>>>> the
> >>>>> possibility of further fallout.
> >>>>>
> >>>>> I guess they could be written to target this case specifically to
> >>>>> minimize fallout, but that feels like we're doing the same thing
> >>>>> (band-aid) just in a different place.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> We generally do not want to have unfolded stmts in the IL when we
> >>>>>>>> can avoid that
> >>>>>>>> which is why we fold most stmts during gimplification.  We also do
> >>>>>>>> that because
> >>>>>>>> we now do less folding on GENERIC.
> >>>>>>> But an unfolded call in the IL should always be safe and we've got
> >>>>>>> plenty of opportunities to fold it later.
> >>>>>>
> >>>>>> Well - we do.  The very first one is forwprop though which means
> >>>>>> we'll miss to
> >>>>>> re-write some memcpy parts into SSA:
> >>>>>>
> >>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
> >>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
> >>>>>>              form if possible.  */
> >>>>>>           NEXT_PASS (pass_forwprop);
> >>>>>>
> >>>>>> likewise early object-size will be confused by memcpy calls that just
> >>>>>> exist
> >>>>>> to avoid TBAA issues (another of our recommendations besides using
> >>>>>> unions).
> >>>>>>
> >>>>>> We do fold mem* early for a reason ;)
> >>>>>>
> >>>>>> "We can always do warnings earlier" would be a similar true sentence.
> >>>>> I'm not disagreeing at all.  There's a natural tension between the
> >>>>> benefits of folding early to enable more optimizations downstream and
> >>>>> leaving the IL in a state where we can give actionable warnings.
> >>>>
> >>>> Similar trade-offs between folding early and losing information
> >>>> as a result also impact high-level optimizations.
> >>>>
> >>>> For instance, folding the strlen argument below
> >>>>
> >>>>   void f3 (struct A* p)
> >>>>   {
> >>>>     __builtin_strcpy (p->a, "123");
> >>>>
> >>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
> >>>>       __builtin_abort ();
> >>>>   }
> >>>>
> >>>> into
> >>>>
> >>>>   _2 = &MEM[(void *)p_4(D) + 2B];
> >>>>
> >>>> early on defeats the strlen optimization because there is no
> >>>> mechanism to determine what member (void *)p_4(D) + 2B refers
> >>>> to (this is bug 86955).
> >>>>
> >>>> Another example is folding of strlen calls with no-nconstant
> >>>> offsets into constant strings like here:
> >>>>
> >>>>   const char a[] = "123";
> >>>>
> >>>>   void f (int i)
> >>>>   {
> >>>>     if (__builtin_strlen (&a[i]) > 3)
> >>>>       __builtin_abort ();
> >>>>   }
> >>>>
> >>>> into sizeof a - 1 - i, which then prevents the result from
> >>>> being folded to false  (bug 86434), not to mention the code
> >>>> it emits for out-of-bounds indices.
> >>>>
> >>>> There are a number of other similar examples in Bugzilla
> >>>> that I've filed as I discovered then during testing my
> >>>> warnings (e.g., 86572).
> >>>>
> >>>> In my mind, transforming library calls into "lossy" low-level
> >>>> primitives like MEM_REF would be better done only after higher
> >>>> level optimizations have had a chance to analyze them.  Ditto
> >>>> for other similar transformations (like to other library calls).
> >>>> Having more accurate information helps both optimization and
> >>>> warnings.  It also makes the warnings more meaningful.
> >>>> Printing "memcpy overflows a buffer" when the source code
> >>>> has a call to strncpy is less than ideal.
> >>>>
> >>>>> Similarly there's a natural tension between warning early vs warning
> >>>>> late.  Code that triggers the warning may ultimately be proved
> >>>>> unreachable, or we may discover simplifications that either
> >>>>> suppress or
> >>>>> expose a warning.
> >>>>>
> >>>>> There is no easy answer here.  But I think we can legitimately ask
> >>>>> questions.  ie, does folding strnlen here really improve things
> >>>>> downstream in ways that are measurable?  Does the false positive
> >>>>> really
> >>>>> impact the utility of the warning?  etc.
> >>>>>
> >>>>> I'd hazard a guess that Martin is particularly sensitive to false
> >>>>> positives based on feedback he's received from our developer community
> >>>>> as well as downstream consumers of his work.
> >>>>
> >>>> Yes.  The kernel folks in particular have done a lot of work
> >>>> cleaning up their code in an effort to adopt the warning and
> >>>> attribute nonstring.  They have been keeping me in the loop
> >>>> on their progress (and feeding me back test cases with false
> >>>> positives and negatives they run into).
> >>> I can't recall seeing further guidance from Richi WRT putting the checks
> >>> earlier (maybe_fold_stmt).
> >>>
> >>> If the point here is to avoid false positives by not folding strncpy,
> >>> particularly in cases where we don't see the NUL in the copy, but it
> >>> appears in a subsequent store, then let's be fairly selective (so as not
> >>> to muck up things on the optimization side more than is necessary).
> >>>
> >>> ISTM we can do this by refactoring the warning bits so they're reusable
> >>> at different points in the pipeline.  Those bits would always return a
> >>> boolean indicating if the given statement might generate a warning or
> >>> not.
> >>>
> >>> When called early, they would not actually issue any warning.  They
> >>> would merely do the best analysis they can and return a status
> >>> indicating whether or not the statement would generate a warning given
> >>> current context.  The goal here is to leave statements that might
> >>> generate a warning as-is in the IL.
> >>>
> >>> When called late (assuming there is a point where we can walk the IL and
> >>> issue the appropriate warnings), the routine would actually issue the
> >>> warning.
> >>>
> >>> The kind of structure could potentially work for other builtins where we
> >>> may need to look at subsequent statements to avoid false positives, but
> >>> early folding hides cases by transforming the call into an undesirable
> >>> form.
> >>>
> >>> Note that for cases where a call looks problematical early because we
> >>> can't see statement which stores the terminator, but where the
> >>> terminator statement ultimately becomes visible, we still get folding,
> >>> it just happens later in the pipeline.
> >>>
> >>> Thoughts?
> >>
> >> The warning only triggers when the bound is less than or equal
> >> to the length of the constant source string (i.e, when strncpy
> >> truncates).  So IIUC, your suggestion would defer folding only
> >> such strncpy calls and let gimple_fold_builtin_strncpy fold
> >> those with a constant bound that's greater than the length of
> >> the constant source string.  That would be fine with me, but
> >> since strncpy calls with a bound that's greater than the length
> >> of the source are pointless I don't think they are important
> >> enough to worry about folding super early.  The constant ones
> >> that serve any purpose (and that are presumably important to
> >> optimize) are those that truncate.
> >>
> >> That said, when optimization isn't enabled, I don't think users
> >> expect calls to library functions to be transformed to calls to
> >> other  functions, or inlined.  Yet that's just what GCC does.
> >> For example, besides triggering the warning, the following:
> >>
> >>   char a[4];
> >>
> >>   void f (char *s)
> >>   {
> >>     __builtin_strncpy (a, "1234", sizeof a);
> >>     a[3] = 0;
> >>   }
> >>
> >> is transformed, even at -O0, into:
> >>
> >>   f (char * s)
> >>   {
> >>     <bb 2> :
> >>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
> >>     a[3] = 0;
> >>     return;
> >>   }
> >>
> >> That doesn't seem right.  GCC should avoid these transformations
> >> at -O0, and one way to do that is to defer folding until the CFG
> >> is constructed.  The patch does it for strncpy but a more general
> >> solution would do that for all calls, e.g., in maybe_fold_stmt
> >> as Richard suggested (and I subsequently tested).
> >>
> >> Martin
> >
>
Martin Sebor Nov. 29, 2018, 8:34 p.m. UTC | #35
On 11/16/2018 02:07 AM, Richard Biener wrote:
> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>
>> Please let me know if there is something I need to change here
>> to make the fix acceptable or if I should stop trying.
>
> I have one more comment about
>
> +  /* Defer warning (and folding) until the next statement in the basic
> +     block is reachable.  */
> +  if (!gimple_bb (stmt))
> +    return false;
> +
>
> it's not about the next statement in the basic-block being "reachable"
> (even w/o a CFG you can use gsi_next()) but rather that the next
> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
> right?

No, it's about the current statement not being associated with
a basic block yet when the warning code runs for the first time
(during gimplify_expr), and so gsi_next() returning null.

> You apply this to gimple_fold_builtin_strncpy but I'd rather
> see us not sprinkling this over gimple-fold.c but instead do this
> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>
> See the attached (untested).

I would also prefer this solution.  I had tested it (in response
to you first mentioning it back in September) and it causes quite
a bit of fallout in tests that look for the folding to take place
very early.  See the end of my reply here:

   https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html

But I'm willing to do the test suite cleanup if you think it's
suitable for GCC 9.  (If you're thinking GCC 10 please let me
know now.)

Thanks
Martin

>
> Richard.
>
>
>
>> On 10/31/2018 10:33 AM, Martin Sebor wrote:
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>
>>> On 10/20/2018 06:01 PM, Martin Sebor wrote:
>>>> On 10/16/2018 03:21 PM, Jeff Law wrote:
>>>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
>>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>>>>>> followed by the nul assignment:
>>>>>>>>>>>>
>>>>>>>>>>>>   const char s[] = "12345";
>>>>>>>>>>>>   char d[3];
>>>>>>>>>>>>
>>>>>>>>>>>>   void f (void)
>>>>>>>>>>>>   {
>>>>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>>>>>> rarely used function that is often misused), getting
>>>>>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-87028.diff
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>>>>>> strncpy with global variable source string
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>>>>>> folding when
>>>>>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>>>>>> MEM_REF on
>>>>>>>>>>>>       the left hand side of assignment.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>       PR tree-optimization/87028
>>>>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>>>>>      return false;
>>>>>>>>>>>>
>>>>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>>>>>>> basic
>>>>>>>>>>>> +     block is reachable.  */
>>>>>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>>>>>> +    return false;
>>>>>>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>>>>>>> equivalent
>>>>>>>>>>> in practice.
>>>>>>>>>>
>>>>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>>>>>>> also accessible
>>>>>>>>>> when there is no CFG.  I guess the issue is that we fold this
>>>>>>>>>> during
>>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>>>>>> GENERIC)?
>>>>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>>>>>> avoiding in that case.
>>>>>>>>
>>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
>>>>>>> the
>>>>>>> possibility of further fallout.
>>>>>>>
>>>>>>> I guess they could be written to target this case specifically to
>>>>>>> minimize fallout, but that feels like we're doing the same thing
>>>>>>> (band-aid) just in a different place.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>>>>>> can avoid that
>>>>>>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>>>>>>> that because
>>>>>>>>>> we now do less folding on GENERIC.
>>>>>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>>>>>> plenty of opportunities to fold it later.
>>>>>>>>
>>>>>>>> Well - we do.  The very first one is forwprop though which means
>>>>>>>> we'll miss to
>>>>>>>> re-write some memcpy parts into SSA:
>>>>>>>>
>>>>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>>>>>              form if possible.  */
>>>>>>>>           NEXT_PASS (pass_forwprop);
>>>>>>>>
>>>>>>>> likewise early object-size will be confused by memcpy calls that just
>>>>>>>> exist
>>>>>>>> to avoid TBAA issues (another of our recommendations besides using
>>>>>>>> unions).
>>>>>>>>
>>>>>>>> We do fold mem* early for a reason ;)
>>>>>>>>
>>>>>>>> "We can always do warnings earlier" would be a similar true sentence.
>>>>>>> I'm not disagreeing at all.  There's a natural tension between the
>>>>>>> benefits of folding early to enable more optimizations downstream and
>>>>>>> leaving the IL in a state where we can give actionable warnings.
>>>>>>
>>>>>> Similar trade-offs between folding early and losing information
>>>>>> as a result also impact high-level optimizations.
>>>>>>
>>>>>> For instance, folding the strlen argument below
>>>>>>
>>>>>>   void f3 (struct A* p)
>>>>>>   {
>>>>>>     __builtin_strcpy (p->a, "123");
>>>>>>
>>>>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>>>>>       __builtin_abort ();
>>>>>>   }
>>>>>>
>>>>>> into
>>>>>>
>>>>>>   _2 = &MEM[(void *)p_4(D) + 2B];
>>>>>>
>>>>>> early on defeats the strlen optimization because there is no
>>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
>>>>>> to (this is bug 86955).
>>>>>>
>>>>>> Another example is folding of strlen calls with no-nconstant
>>>>>> offsets into constant strings like here:
>>>>>>
>>>>>>   const char a[] = "123";
>>>>>>
>>>>>>   void f (int i)
>>>>>>   {
>>>>>>     if (__builtin_strlen (&a[i]) > 3)
>>>>>>       __builtin_abort ();
>>>>>>   }
>>>>>>
>>>>>> into sizeof a - 1 - i, which then prevents the result from
>>>>>> being folded to false  (bug 86434), not to mention the code
>>>>>> it emits for out-of-bounds indices.
>>>>>>
>>>>>> There are a number of other similar examples in Bugzilla
>>>>>> that I've filed as I discovered then during testing my
>>>>>> warnings (e.g., 86572).
>>>>>>
>>>>>> In my mind, transforming library calls into "lossy" low-level
>>>>>> primitives like MEM_REF would be better done only after higher
>>>>>> level optimizations have had a chance to analyze them.  Ditto
>>>>>> for other similar transformations (like to other library calls).
>>>>>> Having more accurate information helps both optimization and
>>>>>> warnings.  It also makes the warnings more meaningful.
>>>>>> Printing "memcpy overflows a buffer" when the source code
>>>>>> has a call to strncpy is less than ideal.
>>>>>>
>>>>>>> Similarly there's a natural tension between warning early vs warning
>>>>>>> late.  Code that triggers the warning may ultimately be proved
>>>>>>> unreachable, or we may discover simplifications that either
>>>>>>> suppress or
>>>>>>> expose a warning.
>>>>>>>
>>>>>>> There is no easy answer here.  But I think we can legitimately ask
>>>>>>> questions.  ie, does folding strnlen here really improve things
>>>>>>> downstream in ways that are measurable?  Does the false positive
>>>>>>> really
>>>>>>> impact the utility of the warning?  etc.
>>>>>>>
>>>>>>> I'd hazard a guess that Martin is particularly sensitive to false
>>>>>>> positives based on feedback he's received from our developer community
>>>>>>> as well as downstream consumers of his work.
>>>>>>
>>>>>> Yes.  The kernel folks in particular have done a lot of work
>>>>>> cleaning up their code in an effort to adopt the warning and
>>>>>> attribute nonstring.  They have been keeping me in the loop
>>>>>> on their progress (and feeding me back test cases with false
>>>>>> positives and negatives they run into).
>>>>> I can't recall seeing further guidance from Richi WRT putting the checks
>>>>> earlier (maybe_fold_stmt).
>>>>>
>>>>> If the point here is to avoid false positives by not folding strncpy,
>>>>> particularly in cases where we don't see the NUL in the copy, but it
>>>>> appears in a subsequent store, then let's be fairly selective (so as not
>>>>> to muck up things on the optimization side more than is necessary).
>>>>>
>>>>> ISTM we can do this by refactoring the warning bits so they're reusable
>>>>> at different points in the pipeline.  Those bits would always return a
>>>>> boolean indicating if the given statement might generate a warning or
>>>>> not.
>>>>>
>>>>> When called early, they would not actually issue any warning.  They
>>>>> would merely do the best analysis they can and return a status
>>>>> indicating whether or not the statement would generate a warning given
>>>>> current context.  The goal here is to leave statements that might
>>>>> generate a warning as-is in the IL.
>>>>>
>>>>> When called late (assuming there is a point where we can walk the IL and
>>>>> issue the appropriate warnings), the routine would actually issue the
>>>>> warning.
>>>>>
>>>>> The kind of structure could potentially work for other builtins where we
>>>>> may need to look at subsequent statements to avoid false positives, but
>>>>> early folding hides cases by transforming the call into an undesirable
>>>>> form.
>>>>>
>>>>> Note that for cases where a call looks problematical early because we
>>>>> can't see statement which stores the terminator, but where the
>>>>> terminator statement ultimately becomes visible, we still get folding,
>>>>> it just happens later in the pipeline.
>>>>>
>>>>> Thoughts?
>>>>
>>>> The warning only triggers when the bound is less than or equal
>>>> to the length of the constant source string (i.e, when strncpy
>>>> truncates).  So IIUC, your suggestion would defer folding only
>>>> such strncpy calls and let gimple_fold_builtin_strncpy fold
>>>> those with a constant bound that's greater than the length of
>>>> the constant source string.  That would be fine with me, but
>>>> since strncpy calls with a bound that's greater than the length
>>>> of the source are pointless I don't think they are important
>>>> enough to worry about folding super early.  The constant ones
>>>> that serve any purpose (and that are presumably important to
>>>> optimize) are those that truncate.
>>>>
>>>> That said, when optimization isn't enabled, I don't think users
>>>> expect calls to library functions to be transformed to calls to
>>>> other  functions, or inlined.  Yet that's just what GCC does.
>>>> For example, besides triggering the warning, the following:
>>>>
>>>>   char a[4];
>>>>
>>>>   void f (char *s)
>>>>   {
>>>>     __builtin_strncpy (a, "1234", sizeof a);
>>>>     a[3] = 0;
>>>>   }
>>>>
>>>> is transformed, even at -O0, into:
>>>>
>>>>   f (char * s)
>>>>   {
>>>>     <bb 2> :
>>>>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
>>>>     a[3] = 0;
>>>>     return;
>>>>   }
>>>>
>>>> That doesn't seem right.  GCC should avoid these transformations
>>>> at -O0, and one way to do that is to defer folding until the CFG
>>>> is constructed.  The patch does it for strncpy but a more general
>>>> solution would do that for all calls, e.g., in maybe_fold_stmt
>>>> as Richard suggested (and I subsequently tested).
>>>>
>>>> Martin
>>>
>>
Jeff Law Nov. 29, 2018, 11:07 p.m. UTC | #36
On 11/29/18 1:34 PM, Martin Sebor wrote:
> On 11/16/2018 02:07 AM, Richard Biener wrote:
>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>
>>> Please let me know if there is something I need to change here
>>> to make the fix acceptable or if I should stop trying.
>>
>> I have one more comment about
>>
>> +  /* Defer warning (and folding) until the next statement in the basic
>> +     block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
>> +
>>
>> it's not about the next statement in the basic-block being "reachable"
>> (even w/o a CFG you can use gsi_next()) but rather that the next
>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
>> right?
> 
> No, it's about the current statement not being associated with
> a basic block yet when the warning code runs for the first time
> (during gimplify_expr), and so gsi_next() returning null.
> 
>> You apply this to gimple_fold_builtin_strncpy but I'd rather
>> see us not sprinkling this over gimple-fold.c but instead do this
>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>>
>> See the attached (untested).
> 
> I would also prefer this solution.  I had tested it (in response
> to you first mentioning it back in September) and it causes quite
> a bit of fallout in tests that look for the folding to take place
> very early.  See the end of my reply here:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
> 
> But I'm willing to do the test suite cleanup if you think it's
> suitable for GCC 9.  (If you're thinking GCC 10 please let me
> know now.)
The fallout on existing tests is minimal.  What's more concerning is
that it doesn't actually pass the new test from Martin's original
submission.  We get bogus warnings.

At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
It can't handle something like this:

test_literal (char * d, struct S * s)
{
  strncpy (d, "1234567890", 3);
  _1 = d + 3;
  *_1 = 0;
}


Note the pointer arithmetic between the strncpy and storing the NUL
terminator.

jeff
Martin Sebor Nov. 29, 2018, 11:43 p.m. UTC | #37
On 11/29/18 4:07 PM, Jeff Law wrote:
> On 11/29/18 1:34 PM, Martin Sebor wrote:
>> On 11/16/2018 02:07 AM, Richard Biener wrote:
>>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>>
>>>> Please let me know if there is something I need to change here
>>>> to make the fix acceptable or if I should stop trying.
>>>
>>> I have one more comment about
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>>> +
>>>
>>> it's not about the next statement in the basic-block being "reachable"
>>> (even w/o a CFG you can use gsi_next()) but rather that the next
>>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
>>> right?
>>
>> No, it's about the current statement not being associated with
>> a basic block yet when the warning code runs for the first time
>> (during gimplify_expr), and so gsi_next() returning null.
>>
>>> You apply this to gimple_fold_builtin_strncpy but I'd rather
>>> see us not sprinkling this over gimple-fold.c but instead do this
>>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>>>
>>> See the attached (untested).
>>
>> I would also prefer this solution.  I had tested it (in response
>> to you first mentioning it back in September) and it causes quite
>> a bit of fallout in tests that look for the folding to take place
>> very early.  See the end of my reply here:
>>
>>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>>
>> But I'm willing to do the test suite cleanup if you think it's
>> suitable for GCC 9.  (If you're thinking GCC 10 please let me
>> know now.)
> The fallout on existing tests is minimal.  What's more concerning is
> that it doesn't actually pass the new test from Martin's original
> submission.  We get bogus warnings.
> 
> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
> It can't handle something like this:
> 
> test_literal (char * d, struct S * s)
> {
>    strncpy (d, "1234567890", 3);
>    _1 = d + 3;
>    *_1 = 0;
> }
> 
> 
> Note the pointer arithmetic between the strncpy and storing the NUL
> terminator.

Right.  I'm less concerned about this case because it involves
a literal that's obviously longer than the destination but it
would be nice if the suppression worked here as well in case
the literal comes from macro expansion.  It will require
another tweak.

But the test from my patch passes with the changes to calls.c
from my patch, so that's an improvement.

I have done the test suite cleanup in the attached patch.  It
was indeed minimal -- not sure why I saw so many failures with
my initial approach.

I can submit a patch to handle the literal case above as
a followup unless you would prefer it done at the same time.

Martin
PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string

gcc/ChangeLog:

	PR tree-optimization/87028
	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
	SSA_NAME_VAR.
	* gcc/gimple-low.c (lower_stmt): Delay foldin built-ins.
	* gimplify (maybe_fold_stmt): Avoid folding statements that
	don't belong to a basic block.
	* tree.h (SSA_NAME_VAR): Update comment.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.

gcc/testsuite/ChangeLog:

	PR tree-optimization/87028
	* c-c++-common/Wstringop-truncation.c: Remove xfails.
	* gcc.dg/Wstringop-truncation-5.c: New test.
	* gcc.dg/strcmpopt_1.c: Adjust.
	* gcc.dg/tree-ssa/pr79697.c: Same.

Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 266637)
+++ gcc/calls.c	(working copy)
@@ -1503,6 +1503,7 @@ tree
 get_attr_nonstring_decl (tree expr, tree *ref)
 {
   tree decl = expr;
+  tree var = NULL_TREE;
   if (TREE_CODE (decl) == SSA_NAME)
     {
       gimple *def = SSA_NAME_DEF_STMT (decl);
@@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
 	      || code == VAR_DECL)
 	    decl = gimple_assign_rhs1 (def);
 	}
-      else if (tree var = SSA_NAME_VAR (decl))
-	decl = var;
+      else
+	var = SSA_NAME_VAR (decl);
     }
 
   if (TREE_CODE (decl) == ADDR_EXPR)
     decl = TREE_OPERAND (decl, 0);
 
+  /* To simplify calling code, store the referenced DECL regardless of
+     the attribute determined below, but avoid storing the SSA_NAME_VAR
+     obtained above (it's not useful for dataflow purposes).  */
   if (ref)
     *ref = decl;
 
-  if (TREE_CODE (decl) == ARRAY_REF)
+  /* Use the SSA_NAME_VAR that was determined above to see if it's
+     declared nonstring.  Otherwise drill down into the referenced
+     DECL.  */
+  if (var)
+    decl = var;
+  else if (TREE_CODE (decl) == ARRAY_REF)
     decl = TREE_OPERAND (decl, 0);
   else if (TREE_CODE (decl) == COMPONENT_REF)
     decl = TREE_OPERAND (decl, 1);
Index: gcc/gimple-low.c
===================================================================
--- gcc/gimple-low.c	(revision 266637)
+++ gcc/gimple-low.c	(working copy)
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-low.h"
 #include "predict.h"
 #include "gimple-predict.h"
+#include "gimple-fold.h"
 
 /* The differences between High GIMPLE and Low GIMPLE are the
    following:
@@ -378,6 +379,12 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lowe
 	    gsi_next (gsi);
 	    return;
 	  }
+
+	/* We delay folding of built calls from gimplification to
+	   here so the IL is in consistent state for the diagnostic
+	   machineries job.  */
+	if (gimple_call_builtin_p (stmt))
+	  fold_stmt (gsi);
       }
       break;
 
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 266637)
+++ gcc/gimplify.c	(working copy)
@@ -3192,6 +3192,10 @@ maybe_fold_stmt (gimple_stmt_iterator *gsi)
       return false;
     else if ((ctx->region_type & ORT_HOST_TEAMS) == ORT_HOST_TEAMS)
       return false;
+  /* Delay folding of builtins until the IL is in consistent state
+     so the diagnostic machinery can do a better job.  */
+  if (gimple_call_builtin_p (gsi_stmt (*gsi)))
+    return false;
   return fold_stmt (gsi);
 }
 
Index: gcc/testsuite/c-c++-common/Wstringop-truncation.c
===================================================================
--- gcc/testsuite/c-c++-common/Wstringop-truncation.c	(revision 266637)
+++ gcc/testsuite/c-c++-common/Wstringop-truncation.c	(working copy)
@@ -329,9 +329,8 @@ void test_strncpy_array (Dest *pd, int i, const ch
      of the array to NUL is not diagnosed.  */
   {
     /* This might be better written using memcpy() but it's safe so
-       it probably shouldn't be diagnosed.  It currently triggers
-       a warning because of bug 81704.  */
-    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+       it shouldn't be diagnosed.  */
+    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     dst7[sizeof dst7 - 1] = '\0';
     sink (dst7);
   }
@@ -350,7 +349,7 @@ void test_strncpy_array (Dest *pd, int i, const ch
   }
 
   {
-    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     pd->a5[sizeof pd->a5 - 1] = '\0';
     sink (pd);
   }
Index: gcc/testsuite/gcc.dg/Wstringop-truncation-5.c
===================================================================
--- gcc/testsuite/gcc.dg/Wstringop-truncation-5.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wstringop-truncation-5.c	(working copy)
@@ -0,0 +1,64 @@
+/* PR tree-optimization/87028 - false positive -Wstringop-truncation
+   strncpy with global variable source string
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-truncation" } */
+
+char *strncpy (char *, const char *, __SIZE_TYPE__);
+
+#define STR   "1234567890"
+
+struct S
+{
+  char a[5], b[5];
+};
+
+const char arr[] = STR;
+const char* const ptr = STR;
+
+const char arr2[][10] = { "123", STR };
+
+void test_literal (struct S *s)
+{
+  strncpy (s->a, STR, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+}
+
+void test_global_arr (struct S *s)
+{
+  strncpy (s->a, arr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_global_arr2 (struct S *s)
+{
+  strncpy (s->a, arr2[1], sizeof s->a - 1); /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+
+  strncpy (s->b, arr2[0], sizeof s->a - 1);
+}
+
+void test_global_ptr (struct S *s)
+{
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_local_arr (struct S *s)
+{
+  const char arr[] = STR;
+  strncpy (s->a, arr, sizeof s->a - 1);
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_local_ptr (struct S *s)
+{
+  const char* const ptr = STR;
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a [sizeof s->a - 1] = '\0';
+}
+
+void test_compound_literal (struct S *s)
+{
+  strncpy (s->a, (char[]){ STR }, sizeof s->a - 1);
+  s->a [sizeof s->a - 1] = '\0';
+}
Index: gcc/testsuite/gcc.dg/fold-bcopy.c
===================================================================
--- gcc/testsuite/gcc.dg/fold-bcopy.c	(revision 266637)
+++ gcc/testsuite/gcc.dg/fold-bcopy.c	(working copy)
@@ -1,6 +1,6 @@
 /* PR tree-optimization/80933 - redundant bzero/bcopy calls not eliminated
    { dg-do compile }
-   { dg-options "-O0 -Wall -fdump-tree-gimple" } */
+   { dg-options "-O1 -Wall -fdump-tree-lower" } */
 
 void f0 (void *dst, const void *src, unsigned n)
 {
@@ -46,9 +46,9 @@ void f6 (void *p)
 /* Verify that calls to bcmp, bcopy, and bzero have all been removed
    and one of each replaced with memcmp, memmove, and memset, respectively.
    The remaining three should be eliminated.
-  { dg-final { scan-tree-dump-not "bcmp|bcopy|bzero" "gimple" } }
-  { dg-final { scan-tree-dump-times "memcmp|memmove|memset" 3 "gimple" } }
+  { dg-final { scan-tree-dump-not "bcmp|bcopy|bzero" "lower" } }
+  { dg-final { scan-tree-dump-times "memcmp|memmove|memset" 3 "lower" } }
 
   Verify that the bcopy to memmove transformation correctly transposed
   the source and destination pointer arguments.
-  { dg-final { scan-tree-dump-times "memmove \\(dst, src" 1 "gimple" } }  */
+  { dg-final { scan-tree-dump-times "memmove \\(dst, src" 1 "lower" } }  */
Index: gcc/testsuite/gcc.dg/strcmpopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/strcmpopt_1.c	(revision 266637)
+++ gcc/testsuite/gcc.dg/strcmpopt_1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fdump-tree-gimple" } */
+/* { dg-options "-fdump-tree-lower" } */
 
 #include <string.h>
 #include <stdlib.h>
@@ -25,4 +25,4 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "strcmp \\(" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "strcmp \\(" 2 "lower" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr79697.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr79697.c	(revision 266637)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr79697.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-gimple -fdump-tree-cddce-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-lower -fdump-tree-cddce-details -fdump-tree-optimized" } */
 
 void f(void)
 {
@@ -18,4 +18,4 @@ void h(void)
 
 /* { dg-final { scan-tree-dump "Deleting : __builtin_strdup" "cddce1" } } */
 /* { dg-final { scan-tree-dump "Deleting : __builtin_strndup" "cddce1" } } */
-/* { dg-final { scan-tree-dump "__builtin_malloc" "gimple" } } */
+/* { dg-final { scan-tree-dump "__builtin_malloc" "lower" } } */
Jeff Law Nov. 30, 2018, 2:01 a.m. UTC | #38
On 11/29/18 4:43 PM, Martin Sebor wrote:
>> The fallout on existing tests is minimal.  What's more concerning is
>> that it doesn't actually pass the new test from Martin's original
>> submission.  We get bogus warnings.
>>
>> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
>> It can't handle something like this:
>>
>> test_literal (char * d, struct S * s)
>> {
>>    strncpy (d, "1234567890", 3);
>>    _1 = d + 3;
>>    *_1 = 0;
>> }
>>
>>
>> Note the pointer arithmetic between the strncpy and storing the NUL
>> terminator.
> 
> Right.  I'm less concerned about this case because it involves
> a literal that's obviously longer than the destination but it
> would be nice if the suppression worked here as well in case
> the literal comes from macro expansion.  It will require
> another tweak.
OK.  If this isn't at the core of the regression BZ, then xfailing those
particular cases and coming back to them is fine.

> 
> But the test from my patch passes with the changes to calls.c
> from my patch, so that's an improvement.
> 
> I have done the test suite cleanup in the attached patch.  It
> was indeed minimal -- not sure why I saw so many failures with
> my initial approach.
Richi's does the folding as part of gimple lowering.  So it's still
pretty early -- basically it ends up hitting just a few tests that are
looking for folded stuff in the .gimple dump.

I had actually targeted this patch as one to work through and try to get
resolved today.  Kind of funny that we were poking at it at the same time.


> 
> I can submit a patch to handle the literal case above as
> a followup unless you would prefer it done at the same time.
Follow-up is fine by me.

jeff
Richard Biener Nov. 30, 2018, 7:57 a.m. UTC | #39
On Thu, Nov 29, 2018 at 9:34 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 11/16/2018 02:07 AM, Richard Biener wrote:
> > On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >>
> >> Please let me know if there is something I need to change here
> >> to make the fix acceptable or if I should stop trying.
> >
> > I have one more comment about
> >
> > +  /* Defer warning (and folding) until the next statement in the basic
> > +     block is reachable.  */
> > +  if (!gimple_bb (stmt))
> > +    return false;
> > +
> >
> > it's not about the next statement in the basic-block being "reachable"
> > (even w/o a CFG you can use gsi_next()) but rather that the next
> > stmt isn't yet gimplified and thus not inserted into the gimple sequence,
> > right?
>
> No, it's about the current statement not being associated with
> a basic block yet when the warning code runs for the first time
> (during gimplify_expr), and so gsi_next() returning null.
>
> > You apply this to gimple_fold_builtin_strncpy but I'd rather
> > see us not sprinkling this over gimple-fold.c but instead do this
> > in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
> >
> > See the attached (untested).
>
> I would also prefer this solution.  I had tested it (in response
> to you first mentioning it back in September) and it causes quite
> a bit of fallout in tests that look for the folding to take place
> very early.  See the end of my reply here:
>
>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>
> But I'm willing to do the test suite cleanup if you think it's
> suitable for GCC 9.  (If you're thinking GCC 10 please let me
> know now.)

I very much prefer that to the hacks in gimple-fold.c if it doesn't
help now then I'll rather live with some bogus warnings for GCC 9
and fix it up properly for GCC 10.

I expect the fallout to be quite minimal (also considering my
suggestion to do the folding in gimple-low.c).

Richard.

> Thanks
> Martin
>
> >
> > Richard.
> >
> >
> >
> >> On 10/31/2018 10:33 AM, Martin Sebor wrote:
> >>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >>>
> >>> On 10/20/2018 06:01 PM, Martin Sebor wrote:
> >>>> On 10/16/2018 03:21 PM, Jeff Law wrote:
> >>>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
> >>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
> >>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
> >>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
> >>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
> >>>>>>>>>>>> the next statement after a truncating strncpy to see if it
> >>>>>>>>>>>> adds a terminating nul.  This only works when the next
> >>>>>>>>>>>> statement can be reached using the Gimple statement iterator
> >>>>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
> >>>>>>>>>>>> calls that truncate their constant argument that are being
> >>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
> >>>>>>>>>>>> followed by the nul assignment:
> >>>>>>>>>>>>
> >>>>>>>>>>>>   const char s[] = "12345";
> >>>>>>>>>>>>   char d[3];
> >>>>>>>>>>>>
> >>>>>>>>>>>>   void f (void)
> >>>>>>>>>>>>   {
> >>>>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>>>>>>>>>     d[sizeof d - 1] = 0;
> >>>>>>>>>>>>   }
> >>>>>>>>>>>>
> >>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
> >>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
> >>>>>>>>>>>> is in can be used to try to reach the next statement (this
> >>>>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
> >>>>>>>>>>>> fold things early but in the case of strncpy (a relatively
> >>>>>>>>>>>> rarely used function that is often misused), getting
> >>>>>>>>>>>> the warning right while folding a bit later but still fairly
> >>>>>>>>>>>> early on seems like a reasonable compromise.  I fear that
> >>>>>>>>>>>> otherwise, the false positives will drive users to adopt
> >>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
> >>>>>>>>>>>> bugs cannot be as readily detected.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tested on x86_64-linux.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Martin
> >>>>>>>>>>>>
> >>>>>>>>>>>> PS There still are outstanding cases where the warning can
> >>>>>>>>>>>> be avoided.  I xfailed them in the test for now but will
> >>>>>>>>>>>> still try to get them to work for GCC 9.
> >>>>>>>>>>>>
> >>>>>>>>>>>> gcc-87028.diff
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>>>>>>>>>> strncpy with global variable source string
> >>>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>>
> >>>>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
> >>>>>>>>>>>> folding when
> >>>>>>>>>>>>       statement doesn't belong to a basic block.
> >>>>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
> >>>>>>>>>>>> MEM_REF on
> >>>>>>>>>>>>       the left hand side of assignment.
> >>>>>>>>>>>>
> >>>>>>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>>>>>
> >>>>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>>>>>>>>>> index 07341eb..284c2fb 100644
> >>>>>>>>>>>> --- a/gcc/gimple-fold.c
> >>>>>>>>>>>> +++ b/gcc/gimple-fold.c
> >>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>>>>>>>>>> (gimple_stmt_iterator *gsi,
> >>>>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
> >>>>>>>>>>>>      return false;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
> >>>>>>>>>>>> basic
> >>>>>>>>>>>> +     block is reachable.  */
> >>>>>>>>>>>> +  if (!gimple_bb (stmt))
> >>>>>>>>>>>> +    return false;
> >>>>>>>>>>> I think you want cfun->cfg as the test here.  They should be
> >>>>>>>>>>> equivalent
> >>>>>>>>>>> in practice.
> >>>>>>>>>>
> >>>>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
> >>>>>>>>>> also accessible
> >>>>>>>>>> when there is no CFG.  I guess the issue is that we fold this
> >>>>>>>>>> during
> >>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
> >>>>>>>>>> GENERIC)?
> >>>>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
> >>>>>>>>> avoiding in that case.
> >>>>>>>>
> >>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> >>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
> >>>>>>> the
> >>>>>>> possibility of further fallout.
> >>>>>>>
> >>>>>>> I guess they could be written to target this case specifically to
> >>>>>>> minimize fallout, but that feels like we're doing the same thing
> >>>>>>> (band-aid) just in a different place.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
> >>>>>>>>>> can avoid that
> >>>>>>>>>> which is why we fold most stmts during gimplification.  We also do
> >>>>>>>>>> that because
> >>>>>>>>>> we now do less folding on GENERIC.
> >>>>>>>>> But an unfolded call in the IL should always be safe and we've got
> >>>>>>>>> plenty of opportunities to fold it later.
> >>>>>>>>
> >>>>>>>> Well - we do.  The very first one is forwprop though which means
> >>>>>>>> we'll miss to
> >>>>>>>> re-write some memcpy parts into SSA:
> >>>>>>>>
> >>>>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
> >>>>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
> >>>>>>>>              form if possible.  */
> >>>>>>>>           NEXT_PASS (pass_forwprop);
> >>>>>>>>
> >>>>>>>> likewise early object-size will be confused by memcpy calls that just
> >>>>>>>> exist
> >>>>>>>> to avoid TBAA issues (another of our recommendations besides using
> >>>>>>>> unions).
> >>>>>>>>
> >>>>>>>> We do fold mem* early for a reason ;)
> >>>>>>>>
> >>>>>>>> "We can always do warnings earlier" would be a similar true sentence.
> >>>>>>> I'm not disagreeing at all.  There's a natural tension between the
> >>>>>>> benefits of folding early to enable more optimizations downstream and
> >>>>>>> leaving the IL in a state where we can give actionable warnings.
> >>>>>>
> >>>>>> Similar trade-offs between folding early and losing information
> >>>>>> as a result also impact high-level optimizations.
> >>>>>>
> >>>>>> For instance, folding the strlen argument below
> >>>>>>
> >>>>>>   void f3 (struct A* p)
> >>>>>>   {
> >>>>>>     __builtin_strcpy (p->a, "123");
> >>>>>>
> >>>>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
> >>>>>>       __builtin_abort ();
> >>>>>>   }
> >>>>>>
> >>>>>> into
> >>>>>>
> >>>>>>   _2 = &MEM[(void *)p_4(D) + 2B];
> >>>>>>
> >>>>>> early on defeats the strlen optimization because there is no
> >>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
> >>>>>> to (this is bug 86955).
> >>>>>>
> >>>>>> Another example is folding of strlen calls with no-nconstant
> >>>>>> offsets into constant strings like here:
> >>>>>>
> >>>>>>   const char a[] = "123";
> >>>>>>
> >>>>>>   void f (int i)
> >>>>>>   {
> >>>>>>     if (__builtin_strlen (&a[i]) > 3)
> >>>>>>       __builtin_abort ();
> >>>>>>   }
> >>>>>>
> >>>>>> into sizeof a - 1 - i, which then prevents the result from
> >>>>>> being folded to false  (bug 86434), not to mention the code
> >>>>>> it emits for out-of-bounds indices.
> >>>>>>
> >>>>>> There are a number of other similar examples in Bugzilla
> >>>>>> that I've filed as I discovered then during testing my
> >>>>>> warnings (e.g., 86572).
> >>>>>>
> >>>>>> In my mind, transforming library calls into "lossy" low-level
> >>>>>> primitives like MEM_REF would be better done only after higher
> >>>>>> level optimizations have had a chance to analyze them.  Ditto
> >>>>>> for other similar transformations (like to other library calls).
> >>>>>> Having more accurate information helps both optimization and
> >>>>>> warnings.  It also makes the warnings more meaningful.
> >>>>>> Printing "memcpy overflows a buffer" when the source code
> >>>>>> has a call to strncpy is less than ideal.
> >>>>>>
> >>>>>>> Similarly there's a natural tension between warning early vs warning
> >>>>>>> late.  Code that triggers the warning may ultimately be proved
> >>>>>>> unreachable, or we may discover simplifications that either
> >>>>>>> suppress or
> >>>>>>> expose a warning.
> >>>>>>>
> >>>>>>> There is no easy answer here.  But I think we can legitimately ask
> >>>>>>> questions.  ie, does folding strnlen here really improve things
> >>>>>>> downstream in ways that are measurable?  Does the false positive
> >>>>>>> really
> >>>>>>> impact the utility of the warning?  etc.
> >>>>>>>
> >>>>>>> I'd hazard a guess that Martin is particularly sensitive to false
> >>>>>>> positives based on feedback he's received from our developer community
> >>>>>>> as well as downstream consumers of his work.
> >>>>>>
> >>>>>> Yes.  The kernel folks in particular have done a lot of work
> >>>>>> cleaning up their code in an effort to adopt the warning and
> >>>>>> attribute nonstring.  They have been keeping me in the loop
> >>>>>> on their progress (and feeding me back test cases with false
> >>>>>> positives and negatives they run into).
> >>>>> I can't recall seeing further guidance from Richi WRT putting the checks
> >>>>> earlier (maybe_fold_stmt).
> >>>>>
> >>>>> If the point here is to avoid false positives by not folding strncpy,
> >>>>> particularly in cases where we don't see the NUL in the copy, but it
> >>>>> appears in a subsequent store, then let's be fairly selective (so as not
> >>>>> to muck up things on the optimization side more than is necessary).
> >>>>>
> >>>>> ISTM we can do this by refactoring the warning bits so they're reusable
> >>>>> at different points in the pipeline.  Those bits would always return a
> >>>>> boolean indicating if the given statement might generate a warning or
> >>>>> not.
> >>>>>
> >>>>> When called early, they would not actually issue any warning.  They
> >>>>> would merely do the best analysis they can and return a status
> >>>>> indicating whether or not the statement would generate a warning given
> >>>>> current context.  The goal here is to leave statements that might
> >>>>> generate a warning as-is in the IL.
> >>>>>
> >>>>> When called late (assuming there is a point where we can walk the IL and
> >>>>> issue the appropriate warnings), the routine would actually issue the
> >>>>> warning.
> >>>>>
> >>>>> The kind of structure could potentially work for other builtins where we
> >>>>> may need to look at subsequent statements to avoid false positives, but
> >>>>> early folding hides cases by transforming the call into an undesirable
> >>>>> form.
> >>>>>
> >>>>> Note that for cases where a call looks problematical early because we
> >>>>> can't see statement which stores the terminator, but where the
> >>>>> terminator statement ultimately becomes visible, we still get folding,
> >>>>> it just happens later in the pipeline.
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> The warning only triggers when the bound is less than or equal
> >>>> to the length of the constant source string (i.e, when strncpy
> >>>> truncates).  So IIUC, your suggestion would defer folding only
> >>>> such strncpy calls and let gimple_fold_builtin_strncpy fold
> >>>> those with a constant bound that's greater than the length of
> >>>> the constant source string.  That would be fine with me, but
> >>>> since strncpy calls with a bound that's greater than the length
> >>>> of the source are pointless I don't think they are important
> >>>> enough to worry about folding super early.  The constant ones
> >>>> that serve any purpose (and that are presumably important to
> >>>> optimize) are those that truncate.
> >>>>
> >>>> That said, when optimization isn't enabled, I don't think users
> >>>> expect calls to library functions to be transformed to calls to
> >>>> other  functions, or inlined.  Yet that's just what GCC does.
> >>>> For example, besides triggering the warning, the following:
> >>>>
> >>>>   char a[4];
> >>>>
> >>>>   void f (char *s)
> >>>>   {
> >>>>     __builtin_strncpy (a, "1234", sizeof a);
> >>>>     a[3] = 0;
> >>>>   }
> >>>>
> >>>> is transformed, even at -O0, into:
> >>>>
> >>>>   f (char * s)
> >>>>   {
> >>>>     <bb 2> :
> >>>>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
> >>>>     a[3] = 0;
> >>>>     return;
> >>>>   }
> >>>>
> >>>> That doesn't seem right.  GCC should avoid these transformations
> >>>> at -O0, and one way to do that is to defer folding until the CFG
> >>>> is constructed.  The patch does it for strncpy but a more general
> >>>> solution would do that for all calls, e.g., in maybe_fold_stmt
> >>>> as Richard suggested (and I subsequently tested).
> >>>>
> >>>> Martin
> >>>
> >>
>
Richard Biener Nov. 30, 2018, 8:04 a.m. UTC | #40
On Fri, Nov 30, 2018 at 3:02 AM Jeff Law <law@redhat.com> wrote:
>
> On 11/29/18 4:43 PM, Martin Sebor wrote:
> >> The fallout on existing tests is minimal.  What's more concerning is
> >> that it doesn't actually pass the new test from Martin's original
> >> submission.  We get bogus warnings.
> >>
> >> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
> >> It can't handle something like this:
> >>
> >> test_literal (char * d, struct S * s)
> >> {
> >>    strncpy (d, "1234567890", 3);
> >>    _1 = d + 3;
> >>    *_1 = 0;
> >> }
> >>
> >>
> >> Note the pointer arithmetic between the strncpy and storing the NUL
> >> terminator.

I already said the way the code looks for the next stmt is totally backwards...

Oh, and I also already voiced my concerns about emitting warnings
from folding code, did I? ...

So, let's suppose we delay folding of (builtin [string]) calls to some
first special pass that also diagnoses those issues.  One obvious
place to place this pass would be where we now do the
early object-size pass.  In fact we might want to merge
the object-size pass with the strlen pass because they sound so
much related.  This pass would then fold the calls and set some
cfun->gimple_df->after_call_warnings flag we could test in the
folder (similar to how we have avoid_folding_inline_builtin ()).

This placement ensures that we already got functions early
inlined (albeit in "early optimized" form but with their diagnostics
already been emitted).

This is of course all GCC 10 material.

> > Right.  I'm less concerned about this case because it involves
> > a literal that's obviously longer than the destination but it
> > would be nice if the suppression worked here as well in case
> > the literal comes from macro expansion.  It will require
> > another tweak.
> OK.  If this isn't at the core of the regression BZ, then xfailing those
> particular cases and coming back to them is fine.
>
> >
> > But the test from my patch passes with the changes to calls.c
> > from my patch, so that's an improvement.
> >
> > I have done the test suite cleanup in the attached patch.  It
> > was indeed minimal -- not sure why I saw so many failures with
> > my initial approach.
> Richi's does the folding as part of gimple lowering.  So it's still
> pretty early -- basically it ends up hitting just a few tests that are
> looking for folded stuff in the .gimple dump.
>
> I had actually targeted this patch as one to work through and try to get
> resolved today.  Kind of funny that we were poking at it at the same time.
>
>
> >
> > I can submit a patch to handle the literal case above as
> > a followup unless you would prefer it done at the same time.
> Follow-up is fine by me.
>
> jeff
Jakub Jelinek Nov. 30, 2018, 8:30 a.m. UTC | #41
On Fri, Nov 30, 2018 at 09:04:56AM +0100, Richard Biener wrote:
> So, let's suppose we delay folding of (builtin [string]) calls to some
> first special pass that also diagnoses those issues.  One obvious
> place to place this pass would be where we now do the
> early object-size pass.  In fact we might want to merge
> the object-size pass with the strlen pass because they sound so
> much related.  This pass would then fold the calls and set some

I don't see them related at all and there is nothing they have in common in
the way how they are implemented actually.

That said, the object-size pass is like a fab pass for a single builtin,
so handling in its main loop other builtins is fine.

	Jakub
Martin Sebor Nov. 30, 2018, 3:51 p.m. UTC | #42
On 11/30/18 12:57 AM, Richard Biener wrote:
> On Thu, Nov 29, 2018 at 9:34 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 11/16/2018 02:07 AM, Richard Biener wrote:
>>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>>
>>>> Please let me know if there is something I need to change here
>>>> to make the fix acceptable or if I should stop trying.
>>>
>>> I have one more comment about
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>>> +
>>>
>>> it's not about the next statement in the basic-block being "reachable"
>>> (even w/o a CFG you can use gsi_next()) but rather that the next
>>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
>>> right?
>>
>> No, it's about the current statement not being associated with
>> a basic block yet when the warning code runs for the first time
>> (during gimplify_expr), and so gsi_next() returning null.
>>
>>> You apply this to gimple_fold_builtin_strncpy but I'd rather
>>> see us not sprinkling this over gimple-fold.c but instead do this
>>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>>>
>>> See the attached (untested).
>>
>> I would also prefer this solution.  I had tested it (in response
>> to you first mentioning it back in September) and it causes quite
>> a bit of fallout in tests that look for the folding to take place
>> very early.  See the end of my reply here:
>>
>>     https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>>
>> But I'm willing to do the test suite cleanup if you think it's
>> suitable for GCC 9.  (If you're thinking GCC 10 please let me
>> know now.)
> 
> I very much prefer that to the hacks in gimple-fold.c if it doesn't
> help now then I'll rather live with some bogus warnings for GCC 9
> and fix it up properly for GCC 10.
> 
> I expect the fallout to be quite minimal (also considering my
> suggestion to do the folding in gimple-low.c).

Yes, it is.  Please see the full patch in my reply to Jeff and
let me know if that's fine for GCC 9.

As we discussed before, for GCC 10 Jeff and I are already planning
to look into merging the strlen pass with others (sprintf and
perhaps also object size) to improve things.

Martin

> 
> Richard.
> 
>> Thanks
>> Martin
>>
>>>
>>> Richard.
>>>
>>>
>>>
>>>> On 10/31/2018 10:33 AM, Martin Sebor wrote:
>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>>>
>>>>> On 10/20/2018 06:01 PM, Martin Sebor wrote:
>>>>>> On 10/16/2018 03:21 PM, Jeff Law wrote:
>>>>>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
>>>>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>>>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>>>>>>>> followed by the nul assignment:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    const char s[] = "12345";
>>>>>>>>>>>>>>    char d[3];
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    void f (void)
>>>>>>>>>>>>>>    {
>>>>>>>>>>>>>>      strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>>>>>>>      d[sizeof d - 1] = 0;
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>>>>>>>> rarely used function that is often misused), getting
>>>>>>>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcc-87028.diff
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>>>>>>>> strncpy with global variable source string
>>>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        PR tree-optimization/87028
>>>>>>>>>>>>>>        * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>>>>>>>> folding when
>>>>>>>>>>>>>>        statement doesn't belong to a basic block.
>>>>>>>>>>>>>>        * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>>>>>>>> MEM_REF on
>>>>>>>>>>>>>>        the left hand side of assignment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        PR tree-optimization/87028
>>>>>>>>>>>>>>        * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>>>>>>>        * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>>>>>>>     if (tree_int_cst_lt (ssize, len))
>>>>>>>>>>>>>>       return false;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>> +     block is reachable.  */
>>>>>>>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>>>>>>>> +    return false;
>>>>>>>>>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>>>>>>>>>> equivalent
>>>>>>>>>>>>> in practice.
>>>>>>>>>>>>
>>>>>>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
>>>>>>>>>>>> also accessible
>>>>>>>>>>>> when there is no CFG.  I guess the issue is that we fold this
>>>>>>>>>>>> during
>>>>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>>>>>>>> GENERIC)?
>>>>>>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>>>>>>>> avoiding in that case.
>>>>>>>>>>
>>>>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>>>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
>>>>>>>>> the
>>>>>>>>> possibility of further fallout.
>>>>>>>>>
>>>>>>>>> I guess they could be written to target this case specifically to
>>>>>>>>> minimize fallout, but that feels like we're doing the same thing
>>>>>>>>> (band-aid) just in a different place.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>>>>>>>> can avoid that
>>>>>>>>>>>> which is why we fold most stmts during gimplification.  We also do
>>>>>>>>>>>> that because
>>>>>>>>>>>> we now do less folding on GENERIC.
>>>>>>>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>>>>>>>> plenty of opportunities to fold it later.
>>>>>>>>>>
>>>>>>>>>> Well - we do.  The very first one is forwprop though which means
>>>>>>>>>> we'll miss to
>>>>>>>>>> re-write some memcpy parts into SSA:
>>>>>>>>>>
>>>>>>>>>>            NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>>>>>>>            /* After CCP we rewrite no longer addressed locals into SSA
>>>>>>>>>>               form if possible.  */
>>>>>>>>>>            NEXT_PASS (pass_forwprop);
>>>>>>>>>>
>>>>>>>>>> likewise early object-size will be confused by memcpy calls that just
>>>>>>>>>> exist
>>>>>>>>>> to avoid TBAA issues (another of our recommendations besides using
>>>>>>>>>> unions).
>>>>>>>>>>
>>>>>>>>>> We do fold mem* early for a reason ;)
>>>>>>>>>>
>>>>>>>>>> "We can always do warnings earlier" would be a similar true sentence.
>>>>>>>>> I'm not disagreeing at all.  There's a natural tension between the
>>>>>>>>> benefits of folding early to enable more optimizations downstream and
>>>>>>>>> leaving the IL in a state where we can give actionable warnings.
>>>>>>>>
>>>>>>>> Similar trade-offs between folding early and losing information
>>>>>>>> as a result also impact high-level optimizations.
>>>>>>>>
>>>>>>>> For instance, folding the strlen argument below
>>>>>>>>
>>>>>>>>    void f3 (struct A* p)
>>>>>>>>    {
>>>>>>>>      __builtin_strcpy (p->a, "123");
>>>>>>>>
>>>>>>>>      if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>>>>>>>        __builtin_abort ();
>>>>>>>>    }
>>>>>>>>
>>>>>>>> into
>>>>>>>>
>>>>>>>>    _2 = &MEM[(void *)p_4(D) + 2B];
>>>>>>>>
>>>>>>>> early on defeats the strlen optimization because there is no
>>>>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
>>>>>>>> to (this is bug 86955).
>>>>>>>>
>>>>>>>> Another example is folding of strlen calls with no-nconstant
>>>>>>>> offsets into constant strings like here:
>>>>>>>>
>>>>>>>>    const char a[] = "123";
>>>>>>>>
>>>>>>>>    void f (int i)
>>>>>>>>    {
>>>>>>>>      if (__builtin_strlen (&a[i]) > 3)
>>>>>>>>        __builtin_abort ();
>>>>>>>>    }
>>>>>>>>
>>>>>>>> into sizeof a - 1 - i, which then prevents the result from
>>>>>>>> being folded to false  (bug 86434), not to mention the code
>>>>>>>> it emits for out-of-bounds indices.
>>>>>>>>
>>>>>>>> There are a number of other similar examples in Bugzilla
>>>>>>>> that I've filed as I discovered then during testing my
>>>>>>>> warnings (e.g., 86572).
>>>>>>>>
>>>>>>>> In my mind, transforming library calls into "lossy" low-level
>>>>>>>> primitives like MEM_REF would be better done only after higher
>>>>>>>> level optimizations have had a chance to analyze them.  Ditto
>>>>>>>> for other similar transformations (like to other library calls).
>>>>>>>> Having more accurate information helps both optimization and
>>>>>>>> warnings.  It also makes the warnings more meaningful.
>>>>>>>> Printing "memcpy overflows a buffer" when the source code
>>>>>>>> has a call to strncpy is less than ideal.
>>>>>>>>
>>>>>>>>> Similarly there's a natural tension between warning early vs warning
>>>>>>>>> late.  Code that triggers the warning may ultimately be proved
>>>>>>>>> unreachable, or we may discover simplifications that either
>>>>>>>>> suppress or
>>>>>>>>> expose a warning.
>>>>>>>>>
>>>>>>>>> There is no easy answer here.  But I think we can legitimately ask
>>>>>>>>> questions.  ie, does folding strnlen here really improve things
>>>>>>>>> downstream in ways that are measurable?  Does the false positive
>>>>>>>>> really
>>>>>>>>> impact the utility of the warning?  etc.
>>>>>>>>>
>>>>>>>>> I'd hazard a guess that Martin is particularly sensitive to false
>>>>>>>>> positives based on feedback he's received from our developer community
>>>>>>>>> as well as downstream consumers of his work.
>>>>>>>>
>>>>>>>> Yes.  The kernel folks in particular have done a lot of work
>>>>>>>> cleaning up their code in an effort to adopt the warning and
>>>>>>>> attribute nonstring.  They have been keeping me in the loop
>>>>>>>> on their progress (and feeding me back test cases with false
>>>>>>>> positives and negatives they run into).
>>>>>>> I can't recall seeing further guidance from Richi WRT putting the checks
>>>>>>> earlier (maybe_fold_stmt).
>>>>>>>
>>>>>>> If the point here is to avoid false positives by not folding strncpy,
>>>>>>> particularly in cases where we don't see the NUL in the copy, but it
>>>>>>> appears in a subsequent store, then let's be fairly selective (so as not
>>>>>>> to muck up things on the optimization side more than is necessary).
>>>>>>>
>>>>>>> ISTM we can do this by refactoring the warning bits so they're reusable
>>>>>>> at different points in the pipeline.  Those bits would always return a
>>>>>>> boolean indicating if the given statement might generate a warning or
>>>>>>> not.
>>>>>>>
>>>>>>> When called early, they would not actually issue any warning.  They
>>>>>>> would merely do the best analysis they can and return a status
>>>>>>> indicating whether or not the statement would generate a warning given
>>>>>>> current context.  The goal here is to leave statements that might
>>>>>>> generate a warning as-is in the IL.
>>>>>>>
>>>>>>> When called late (assuming there is a point where we can walk the IL and
>>>>>>> issue the appropriate warnings), the routine would actually issue the
>>>>>>> warning.
>>>>>>>
>>>>>>> The kind of structure could potentially work for other builtins where we
>>>>>>> may need to look at subsequent statements to avoid false positives, but
>>>>>>> early folding hides cases by transforming the call into an undesirable
>>>>>>> form.
>>>>>>>
>>>>>>> Note that for cases where a call looks problematical early because we
>>>>>>> can't see statement which stores the terminator, but where the
>>>>>>> terminator statement ultimately becomes visible, we still get folding,
>>>>>>> it just happens later in the pipeline.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> The warning only triggers when the bound is less than or equal
>>>>>> to the length of the constant source string (i.e, when strncpy
>>>>>> truncates).  So IIUC, your suggestion would defer folding only
>>>>>> such strncpy calls and let gimple_fold_builtin_strncpy fold
>>>>>> those with a constant bound that's greater than the length of
>>>>>> the constant source string.  That would be fine with me, but
>>>>>> since strncpy calls with a bound that's greater than the length
>>>>>> of the source are pointless I don't think they are important
>>>>>> enough to worry about folding super early.  The constant ones
>>>>>> that serve any purpose (and that are presumably important to
>>>>>> optimize) are those that truncate.
>>>>>>
>>>>>> That said, when optimization isn't enabled, I don't think users
>>>>>> expect calls to library functions to be transformed to calls to
>>>>>> other  functions, or inlined.  Yet that's just what GCC does.
>>>>>> For example, besides triggering the warning, the following:
>>>>>>
>>>>>>    char a[4];
>>>>>>
>>>>>>    void f (char *s)
>>>>>>    {
>>>>>>      __builtin_strncpy (a, "1234", sizeof a);
>>>>>>      a[3] = 0;
>>>>>>    }
>>>>>>
>>>>>> is transformed, even at -O0, into:
>>>>>>
>>>>>>    f (char * s)
>>>>>>    {
>>>>>>      <bb 2> :
>>>>>>      MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
>>>>>>      a[3] = 0;
>>>>>>      return;
>>>>>>    }
>>>>>>
>>>>>> That doesn't seem right.  GCC should avoid these transformations
>>>>>> at -O0, and one way to do that is to defer folding until the CFG
>>>>>> is constructed.  The patch does it for strncpy but a more general
>>>>>> solution would do that for all calls, e.g., in maybe_fold_stmt
>>>>>> as Richard suggested (and I subsequently tested).
>>>>>>
>>>>>> Martin
>>>>>
>>>>
>>
Jeff Law Dec. 5, 2018, 11:11 p.m. UTC | #43
On 11/29/18 4:43 PM, Martin Sebor wrote:
> On 11/29/18 4:07 PM, Jeff Law wrote:
>> On 11/29/18 1:34 PM, Martin Sebor wrote:
>>> On 11/16/2018 02:07 AM, Richard Biener wrote:
>>>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>>
>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>>>
>>>>> Please let me know if there is something I need to change here
>>>>> to make the fix acceptable or if I should stop trying.
>>>>
>>>> I have one more comment about
>>>>
>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>> +     block is reachable.  */
>>>> +  if (!gimple_bb (stmt))
>>>> +    return false;
>>>> +
>>>>
>>>> it's not about the next statement in the basic-block being "reachable"
>>>> (even w/o a CFG you can use gsi_next()) but rather that the next
>>>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
>>>>
>>>> right?
>>>
>>> No, it's about the current statement not being associated with
>>> a basic block yet when the warning code runs for the first time
>>> (during gimplify_expr), and so gsi_next() returning null.
>>>
>>>> You apply this to gimple_fold_builtin_strncpy but I'd rather
>>>> see us not sprinkling this over gimple-fold.c but instead do this
>>>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>>>>
>>>> See the attached (untested).
>>>
>>> I would also prefer this solution.  I had tested it (in response
>>> to you first mentioning it back in September) and it causes quite
>>> a bit of fallout in tests that look for the folding to take place
>>> very early.  See the end of my reply here:
>>>
>>>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>>>
>>> But I'm willing to do the test suite cleanup if you think it's
>>> suitable for GCC 9.  (If you're thinking GCC 10 please let me
>>> know now.)
>> The fallout on existing tests is minimal.  What's more concerning is
>> that it doesn't actually pass the new test from Martin's original
>> submission.  We get bogus warnings.
>>
>> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
>> It can't handle something like this:
>>
>> test_literal (char * d, struct S * s)
>> {
>>    strncpy (d, "1234567890", 3);
>>    _1 = d + 3;
>>    *_1 = 0;
>> }
>>
>>
>> Note the pointer arithmetic between the strncpy and storing the NUL
>> terminator.
> 
> Right.  I'm less concerned about this case because it involves
> a literal that's obviously longer than the destination but it
> would be nice if the suppression worked here as well in case
> the literal comes from macro expansion.  It will require
> another tweak.
> 
> But the test from my patch passes with the changes to calls.c
> from my patch, so that's an improvement.
> 
> I have done the test suite cleanup in the attached patch.  It
> was indeed minimal -- not sure why I saw so many failures with
> my initial approach.
> 
> I can submit a patch to handle the literal case above as
> a followup unless you would prefer it done at the same time.
> 
> Martin
> 
> gcc-87028.diff
> 
> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
> 	SSA_NAME_VAR.
> 	* gcc/gimple-low.c (lower_stmt): Delay foldin built-ins.
> 	* gimplify (maybe_fold_stmt): Avoid folding statements that
> 	don't belong to a basic block.
> 	* tree.h (SSA_NAME_VAR): Update comment.
> 	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/87028
> 	* c-c++-common/Wstringop-truncation.c: Remove xfails.
> 	* gcc.dg/Wstringop-truncation-5.c: New test.
> 	* gcc.dg/strcmpopt_1.c: Adjust.
> 	* gcc.dg/tree-ssa/pr79697.c: Same.
I fixed up the ChangeLog a little and installed the patch.

Thanks,
jeff
Christophe Lyon Dec. 6, 2018, 1 p.m. UTC | #44
On Thu, 6 Dec 2018 at 00:11, Jeff Law <law@redhat.com> wrote:
>
> On 11/29/18 4:43 PM, Martin Sebor wrote:
> > On 11/29/18 4:07 PM, Jeff Law wrote:
> >> On 11/29/18 1:34 PM, Martin Sebor wrote:
> >>> On 11/16/2018 02:07 AM, Richard Biener wrote:
> >>>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
> >>>>>
> >>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >>>>>
> >>>>> Please let me know if there is something I need to change here
> >>>>> to make the fix acceptable or if I should stop trying.
> >>>>
> >>>> I have one more comment about
> >>>>
> >>>> +  /* Defer warning (and folding) until the next statement in the basic
> >>>> +     block is reachable.  */
> >>>> +  if (!gimple_bb (stmt))
> >>>> +    return false;
> >>>> +
> >>>>
> >>>> it's not about the next statement in the basic-block being "reachable"
> >>>> (even w/o a CFG you can use gsi_next()) but rather that the next
> >>>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
> >>>>
> >>>> right?
> >>>
> >>> No, it's about the current statement not being associated with
> >>> a basic block yet when the warning code runs for the first time
> >>> (during gimplify_expr), and so gsi_next() returning null.
> >>>
> >>>> You apply this to gimple_fold_builtin_strncpy but I'd rather
> >>>> see us not sprinkling this over gimple-fold.c but instead do this
> >>>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
> >>>>
> >>>> See the attached (untested).
> >>>
> >>> I would also prefer this solution.  I had tested it (in response
> >>> to you first mentioning it back in September) and it causes quite
> >>> a bit of fallout in tests that look for the folding to take place
> >>> very early.  See the end of my reply here:
> >>>
> >>>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
> >>>
> >>> But I'm willing to do the test suite cleanup if you think it's
> >>> suitable for GCC 9.  (If you're thinking GCC 10 please let me
> >>> know now.)
> >> The fallout on existing tests is minimal.  What's more concerning is
> >> that it doesn't actually pass the new test from Martin's original
> >> submission.  We get bogus warnings.
> >>
> >> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
> >> It can't handle something like this:
> >>
> >> test_literal (char * d, struct S * s)
> >> {
> >>    strncpy (d, "1234567890", 3);
> >>    _1 = d + 3;
> >>    *_1 = 0;
> >> }
> >>
> >>
> >> Note the pointer arithmetic between the strncpy and storing the NUL
> >> terminator.
> >
> > Right.  I'm less concerned about this case because it involves
> > a literal that's obviously longer than the destination but it
> > would be nice if the suppression worked here as well in case
> > the literal comes from macro expansion.  It will require
> > another tweak.
> >
> > But the test from my patch passes with the changes to calls.c
> > from my patch, so that's an improvement.
> >
> > I have done the test suite cleanup in the attached patch.  It
> > was indeed minimal -- not sure why I saw so many failures with
> > my initial approach.
> >
> > I can submit a patch to handle the literal case above as
> > a followup unless you would prefer it done at the same time.
> >
> > Martin
> >
> > gcc-87028.diff
> >
> > PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> >
> > gcc/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
> >       SSA_NAME_VAR.
> >       * gcc/gimple-low.c (lower_stmt): Delay foldin built-ins.
> >       * gimplify (maybe_fold_stmt): Avoid folding statements that
> >       don't belong to a basic block.
> >       * tree.h (SSA_NAME_VAR): Update comment.
> >       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >       * gcc.dg/Wstringop-truncation-5.c: New test.
> >       * gcc.dg/strcmpopt_1.c: Adjust.
> >       * gcc.dg/tree-ssa/pr79697.c: Same.
> I fixed up the ChangeLog a little and installed the patch.
>

Hi,
The new test (Wstringop-truncation-5.c ) fails at least on arm and aarch64:
FAIL: gcc.dg/Wstringop-truncation-5.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:74:8: error:
redefinition of 'struct S'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:79:12: error:
redefinition of 'arr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:80:19: error:
redefinition of 'ptr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:82:12: error:
redefinition of 'arr2'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:84:6: error:
conflicting types for 'test_literal'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:90:6: error:
conflicting types for 'test_global_arr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:96:6: error:
conflicting types for 'test_global_arr2'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:104:6: error:
conflicting types for 'test_global_ptr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:110:6: error:
conflicting types for 'test_local_arr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:117:6: error:
conflicting types for 'test_local_ptr'
/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c:124:6: error:
conflicting types for 'test_compound_literal'

> Thanks,
> jeff
Jeff Law Dec. 6, 2018, 1:51 p.m. UTC | #45
On 12/6/18 6:00 AM, Christophe Lyon wrote:
> On Thu, 6 Dec 2018 at 00:11, Jeff Law <law@redhat.com> wrote:
>>
>> On 11/29/18 4:43 PM, Martin Sebor wrote:
>>> On 11/29/18 4:07 PM, Jeff Law wrote:
>>>> On 11/29/18 1:34 PM, Martin Sebor wrote:
>>>>> On 11/16/2018 02:07 AM, Richard Biener wrote:
>>>>>> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>>>>
>>>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>>>>>
>>>>>>> Please let me know if there is something I need to change here
>>>>>>> to make the fix acceptable or if I should stop trying.
>>>>>>
>>>>>> I have one more comment about
>>>>>>
>>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>>> +     block is reachable.  */
>>>>>> +  if (!gimple_bb (stmt))
>>>>>> +    return false;
>>>>>> +
>>>>>>
>>>>>> it's not about the next statement in the basic-block being "reachable"
>>>>>> (even w/o a CFG you can use gsi_next()) but rather that the next
>>>>>> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
>>>>>>
>>>>>> right?
>>>>>
>>>>> No, it's about the current statement not being associated with
>>>>> a basic block yet when the warning code runs for the first time
>>>>> (during gimplify_expr), and so gsi_next() returning null.
>>>>>
>>>>>> You apply this to gimple_fold_builtin_strncpy but I'd rather
>>>>>> see us not sprinkling this over gimple-fold.c but instead do this
>>>>>> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>>>>>>
>>>>>> See the attached (untested).
>>>>>
>>>>> I would also prefer this solution.  I had tested it (in response
>>>>> to you first mentioning it back in September) and it causes quite
>>>>> a bit of fallout in tests that look for the folding to take place
>>>>> very early.  See the end of my reply here:
>>>>>
>>>>>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>>>>>
>>>>> But I'm willing to do the test suite cleanup if you think it's
>>>>> suitable for GCC 9.  (If you're thinking GCC 10 please let me
>>>>> know now.)
>>>> The fallout on existing tests is minimal.  What's more concerning is
>>>> that it doesn't actually pass the new test from Martin's original
>>>> submission.  We get bogus warnings.
>>>>
>>>> At least part of the problem is weakness in maybe_diag_stxncpy_trunc.
>>>> It can't handle something like this:
>>>>
>>>> test_literal (char * d, struct S * s)
>>>> {
>>>>    strncpy (d, "1234567890", 3);
>>>>    _1 = d + 3;
>>>>    *_1 = 0;
>>>> }
>>>>
>>>>
>>>> Note the pointer arithmetic between the strncpy and storing the NUL
>>>> terminator.
>>>
>>> Right.  I'm less concerned about this case because it involves
>>> a literal that's obviously longer than the destination but it
>>> would be nice if the suppression worked here as well in case
>>> the literal comes from macro expansion.  It will require
>>> another tweak.
>>>
>>> But the test from my patch passes with the changes to calls.c
>>> from my patch, so that's an improvement.
>>>
>>> I have done the test suite cleanup in the attached patch.  It
>>> was indeed minimal -- not sure why I saw so many failures with
>>> my initial approach.
>>>
>>> I can submit a patch to handle the literal case above as
>>> a followup unless you would prefer it done at the same time.
>>>
>>> Martin
>>>
>>> gcc-87028.diff
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>
>>> gcc/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
>>>       SSA_NAME_VAR.
>>>       * gcc/gimple-low.c (lower_stmt): Delay foldin built-ins.
>>>       * gimplify (maybe_fold_stmt): Avoid folding statements that
>>>       don't belong to a basic block.
>>>       * tree.h (SSA_NAME_VAR): Update comment.
>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>       * gcc.dg/strcmpopt_1.c: Adjust.
>>>       * gcc.dg/tree-ssa/pr79697.c: Same.
>> I fixed up the ChangeLog a little and installed the patch.
>>
> 
> Hi,
> The new test (Wstringop-truncation-5.c ) fails at least on arm and aarch64:
> FAIL: gcc.dg/Wstringop-truncation-5.c (test for excess errors)
I must have applied the hunk more than once because the contents of the
test are duplicated resulting in the errors.  I removed the duplicate
copy of the test and that should fix this problem.
jeff
diff mbox series

Patch

PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
gcc/ChangeLog:

	PR tree-optimization/87028
	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
	statement doesn't belong to a basic block.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
	the left hand side of assignment.

gcc/testsuite/ChangeLog:

	PR tree-optimization/87028
	* c-c++-common/Wstringop-truncation.c: Remove xfails.
	* gcc.dg/Wstringop-truncation-5.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 07341eb..284c2fb 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1702,6 +1702,11 @@  gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
   if (tree_int_cst_lt (ssize, len))
     return false;
 
+  /* Defer warning (and folding) until the next statement in the basic
+     block is reachable.  */
+  if (!gimple_bb (stmt))
+    return false;
+
   /* Diagnose truncation that leaves the copy unterminated.  */
   maybe_diag_stxncpy_trunc (*gsi, src, len);
 
diff --git a/gcc/testsuite/c-c++-common/Wstringop-truncation.c b/gcc/testsuite/c-c++-common/Wstringop-truncation.c
index e78e85e..4ddb9bd 100644
--- a/gcc/testsuite/c-c++-common/Wstringop-truncation.c
+++ b/gcc/testsuite/c-c++-common/Wstringop-truncation.c
@@ -329,9 +329,8 @@  void test_strncpy_array (Dest *pd, int i, const char* s)
      of the array to NUL is not diagnosed.  */
   {
     /* This might be better written using memcpy() but it's safe so
-       it probably shouldn't be diagnosed.  It currently triggers
-       a warning because of bug 81704.  */
-    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+       it isn't diagnosed.  See pr81704 and pr87028.  */
+    strncpy (dst7, "0123456", sizeof dst7);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     dst7[sizeof dst7 - 1] = '\0';
     sink (dst7);
   }
@@ -350,7 +349,7 @@  void test_strncpy_array (Dest *pd, int i, const char* s)
   }
 
   {
-    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" "bug 81704" { xfail *-*-* } } */
+    strncpy (pd->a5, "01234", sizeof pd->a5);   /* { dg-bogus "\\\[-Wstringop-truncation]" } */
     pd->a5[sizeof pd->a5 - 1] = '\0';
     sink (pd);
   }
diff --git a/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c b/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c
new file mode 100644
index 0000000..03bcba9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-truncation-5.c
@@ -0,0 +1,111 @@ 
+/* PR tree-optimization/87028 - false positive -Wstringop-truncation
+   strncpy with global variable source string
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-truncation" } */
+
+char *strncpy (char *, const char *, __SIZE_TYPE__);
+
+void sink (char*);
+
+#define STR   "1234567890"
+
+struct S
+{
+  char a[5], b[5], *p;
+};
+
+const char arr[] = STR;
+const char arr2[][10] = { "123", STR };
+
+const char* const ptr = STR;
+
+char a[5];
+
+void test_literal (char *d, struct S *s)
+{
+  strncpy (d, STR, 3);                      /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  d[3] = '\0';
+
+  strncpy (a, STR, sizeof a - 1);           /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, STR, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+
+  strncpy (&s->b[0], STR, sizeof s->b - 1); /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->b[sizeof s->b - 1] = '\0';
+
+  strncpy (s->p, STR, 4);                   /* { dg-bogus "\\\[-Wstringop-truncation]" "pr?????" { xfail *-*-* } } */
+  s->p[4] = '\0';
+}
+
+void test_global_arr (char *d, struct S *s)
+{
+  strncpy (d, arr, 4);                      /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  d[4] = '\0';
+
+  strncpy (a, arr, sizeof a - 1);           /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, arr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+
+  strncpy (s->p, arr, 5);                   /* { dg-bogus "\\\[-Wstringop-truncation]" "pr?????" { xfail *-*-* } } */
+  s->p[5] = '\0';
+}
+
+void test_global_arr2 (char *d, struct S *s)
+{
+  strncpy (a, arr2[1], sizeof a - 1);       /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, arr2[1], sizeof s->a - 1); /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+
+  strncpy (s->b, arr2[0], sizeof s->a - 1);
+}
+
+void test_global_ptr (struct S *s)
+{
+  strncpy (a, ptr, sizeof a - 1);           /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+
+  strncpy (s->p, ptr, 6);                   /* { dg-bogus "\\\[-Wstringop-truncation]" "pr?????" { xfail *-*-* } } */
+  s->p[6] = '\0';
+}
+
+void test_local_arr (struct S *s)
+{
+  const char arr[] = STR;
+
+  strncpy (a, arr, sizeof a - 1);
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, arr, sizeof s->a - 1);
+  s->a[sizeof s->a - 1] = '\0';
+
+  char d[3];
+  strncpy (d, arr, sizeof d - 1);
+  d[sizeof d - 1] = '\0';
+  sink (d);
+}
+
+void test_local_ptr (struct S *s)
+{
+  const char* const ptr = STR;
+
+  strncpy (a, ptr, sizeof a - 1);           /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  a[sizeof a - 1] = '\0';
+
+  strncpy (s->a, ptr, sizeof s->a - 1);     /* { dg-bogus "\\\[-Wstringop-truncation]" } */
+  s->a[sizeof s->a - 1] = '\0';
+}
+
+void test_compound_literal (struct S *s)
+{
+  strncpy (s->a, (char[]){ STR }, sizeof s->a - 1);
+  s->a[sizeof s->a - 1] = '\0';
+}
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index d0792aa..f1988f6 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -1981,6 +1981,23 @@  maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
 	  && known_eq (dstoff, lhsoff)
 	  && operand_equal_p (dstbase, lhsbase, 0))
 	return false;
+
+      if (code == MEM_REF
+	  && TREE_CODE (lhsbase) == SSA_NAME
+	  && known_eq (dstoff, lhsoff))
+	{
+	  /* Extract the referenced variable from something like
+	       MEM[(char *)d_3(D) + 3B] = 0;  */
+	  gimple *def = SSA_NAME_DEF_STMT (lhsbase);
+	  if (gimple_nop_p (def))
+	    {
+	      lhsbase = SSA_NAME_VAR (lhsbase);
+	      if (lhsbase
+		  && dstbase
+		  && operand_equal_p (dstbase, lhsbase, 0))
+		return false;
+	    }
+	}
     }
 
   int prec = TYPE_PRECISION (TREE_TYPE (cnt));