diff mbox series

use pointer size rather than array size when storing the former (PR 93829)

Message ID 864beb08-0073-cde2-ad21-a855a7bf7dc9@gmail.com
State New
Headers show
Series use pointer size rather than array size when storing the former (PR 93829) | expand

Commit Message

Martin Sebor Feb. 20, 2020, 12:26 a.m. UTC
The buffer overflow detection for multi-char stores uses the size
of a source array even when what's actually being accessed (read
and stored) is a pointer to the array.  That leads to incorrect
warnings in some cases.

The attached patch corrects the function that computes the size of
the access to set it to that of a pointer instead if the source is
an address expression.

Tested on x86_64-linux.

Martin

Comments

Bernhard Reutner-Fischer Feb. 20, 2020, 7:03 a.m. UTC | #1
On 20 February 2020 01:26:58 CET, Martin Sebor <msebor@gmail.com> wrote:

+   Sets *NULTREM if the representation contains a zero byte, and sets

s/NULTREM/NULTERM/

thanks,
Jeff Law Feb. 26, 2020, 10:09 p.m. UTC | #2
On Wed, 2020-02-19 at 17:26 -0700, Martin Sebor wrote:
> The buffer overflow detection for multi-char stores uses the size
> of a source array even when what's actually being accessed (read
> and stored) is a pointer to the array.  That leads to incorrect
> warnings in some cases.
> 
> The attached patch corrects the function that computes the size of
> the access to set it to that of a pointer instead if the source is
> an address expression.
> 
> Tested on x86_64-linux.

>    if (TREE_CODE (exp) == ADDR_EXPR)
> -    exp = TREE_OPERAND (exp, 0);
> +    {
> +      /* If the size of the access hasn't been determined yet it's that
> +	 of a pointer.  */
> +      if (!nbytes)
> +	nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
> +      exp = TREE_OPERAND (exp, 0);
> +    }
>  
This doesn't make any sense to me.  You're always going to get the size of a
pointer here.  Don't you want the size of the TYPE of the operand?


Jeff
Martin Sebor Feb. 26, 2020, 11:18 p.m. UTC | #3
On 2/26/20 3:09 PM, Jeff Law wrote:
> On Wed, 2020-02-19 at 17:26 -0700, Martin Sebor wrote:
>> The buffer overflow detection for multi-char stores uses the size
>> of a source array even when what's actually being accessed (read
>> and stored) is a pointer to the array.  That leads to incorrect
>> warnings in some cases.
>>
>> The attached patch corrects the function that computes the size of
>> the access to set it to that of a pointer instead if the source is
>> an address expression.
>>
>> Tested on x86_64-linux.
> 
>>     if (TREE_CODE (exp) == ADDR_EXPR)
>> -    exp = TREE_OPERAND (exp, 0);
>> +    {
>> +      /* If the size of the access hasn't been determined yet it's that
>> +	 of a pointer.  */
>> +      if (!nbytes)
>> +	nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
>> +      exp = TREE_OPERAND (exp, 0);
>> +    }
>>   
> This doesn't make any sense to me.  You're always going to get the size of a
> pointer here.  Don't you want the size of the TYPE of the operand?

The function correctly handles cases when the expression is an array
because the array is what's being stored.  The bug is in stripping
the ADDR_EXPR and then using the size of the operand when what's
being stored is actually a pointer.

This test case illustrates the difference:

   struct S { char *p, *q, r[8]; } a;

   void f (void)
   {
     struct S b = { 0, "1234567", "abcdefgh" };
     __builtin_memcpy (&a, &b, sizeof a);
   }

The memcpy results in:

   MEM <char *> [(char * {ref-all})&b] = 0B;
   MEM <char *> [(char * {ref-all})&b + 8B] = "1234567";
   MEM <unsigned char[24]> [(char * {ref-all})&a] = MEM <unsigned 
char[24]> [(char * {ref-all})&b];

The RHS of the second MEM_REF is ADDR_EXPR (STRING_CST).  The RHS
of the third MEM_REF is the  array case (that's handled correctly).

I think computing the size of the store could be simplified (it
seems it should be the same as type of either the RHS or the LHS)
but I didn't want to mess with things too much this late.

Martin
Jeff Law Feb. 28, 2020, 9:11 p.m. UTC | #4
On Wed, 2020-02-26 at 16:18 -0700, Martin Sebor wrote:
> On 2/26/20 3:09 PM, Jeff Law wrote:
> > On Wed, 2020-02-19 at 17:26 -0700, Martin Sebor wrote:
> > > The buffer overflow detection for multi-char stores uses the size
> > > of a source array even when what's actually being accessed (read
> > > and stored) is a pointer to the array.  That leads to incorrect
> > > warnings in some cases.
> > > 
> > > The attached patch corrects the function that computes the size of
> > > the access to set it to that of a pointer instead if the source is
> > > an address expression.
> > > 
> > > Tested on x86_64-linux.
> > >     if (TREE_CODE (exp) == ADDR_EXPR)
> > > -    exp = TREE_OPERAND (exp, 0);
> > > +    {
> > > +      /* If the size of the access hasn't been determined yet it's that
> > > +	 of a pointer.  */
> > > +      if (!nbytes)
> > > +	nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
> > > +      exp = TREE_OPERAND (exp, 0);
> > > +    }
> > >   
> > This doesn't make any sense to me.  You're always going to get the size of
> > a
> > pointer here.  Don't you want the size of the TYPE of the operand?
> 
> The function correctly handles cases when the expression is an array
> because the array is what's being stored.  The bug is in stripping
> the ADDR_EXPR and then using the size of the operand when what's
> being stored is actually a pointer.
> 
> This test case illustrates the difference:
> 
>    struct S { char *p, *q, r[8]; } a;
> 
>    void f (void)
>    {
>      struct S b = { 0, "1234567", "abcdefgh" };
>      __builtin_memcpy (&a, &b, sizeof a);
>    }
> 
> The memcpy results in:
> 
>    MEM <char *> [(char * {ref-all})&b] = 0B;
>    MEM <char *> [(char * {ref-all})&b + 8B] = "1234567";
>    MEM <unsigned char[24]> [(char * {ref-all})&a] = MEM <unsigned 
> char[24]> [(char * {ref-all})&b];
> 
> The RHS of the second MEM_REF is ADDR_EXPR (STRING_CST).  The RHS
> of the third MEM_REF is the  array case (that's handled correctly).
> 
> I think computing the size of the store could be simplified (it
> seems it should be the same as type of either the RHS or the LHS)
> but I didn't want to mess with things too much this late.
Sorry, I should have just thrown this under the debugger, but I was short of
time.

ISTM like the problem is we have an RHS like:


> 
> >  <addr_expr 0x7fffea93f360
> >     type <pointer_type 0x7fffea939d20
> >         type <array_type 0x7fffea939690 type <integer_type 0x7fffea8193f0
> > char>
> >             BLK
> >             size <integer_cst 0x7fffea81e3a8 constant 328>
> >             unit-size <integer_cst 0x7fffea935f48 constant 41>
> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> > 0x7fffea939690 domain <integer_type 0x7fffea9395e8>
> >             pointer_to_this <pointer_type 0x7fffea939d20>>
> >         unsigned DI
> >         size <integer_cst 0x7fffea800cd8 constant 64>
> >         unit-size <integer_cst 0x7fffea800cf0 constant 8>
> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> > 0x7fffea939d20>
> >     readonly constant
> >     arg:0 <string_cst 0x7fffea814100 type <array_type 0x7fffea939690>
> >         readonly constant static
> > "0123456789012345678901234567890123456789\000">
> >     j.c:20:10 start: j.c:20:10 finish: j.c:20:10>
> > 

The current code unconditionally strips the ADDR_EXPR at this point and as a
result will use the length of STRING_CST node, which is obviously wrong.

With your change, if we haven't already determined a length, we extract the
length of the pointer then strip the ADDR_EXPR.

Yea, this is all ripe for some cleanup.  For example it's not at all clear that
just because we don't have a size at this point in count_nonzero_bytes and
we've got an ADDR_EXPR that we want the size of the pointer.    That may in
fact be true, but it's not easy to ascertain with the current spaghetti code.

OK for the trunk, but let's queue this area for some cleanup in gcc-11.

jeff






> Martin
>
diff mbox series

Patch

PR middle-end/93829 - bogus -Wstringop-overflow on memcpy of a struct with a pointer member from another with a long string

gcc/testsuite/ChangeLog:

	PR middle-end/93829
	* gcc.dg/Wstringop-overflow-32.c: New test.

gcc/ChangeLog:

	PR middle-end/93829
	* tree-ssa-strlen.c (count_nonzero_bytes): Set the size to that
	  of a pointer in the outermost ADDR_EXPRs.

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 9a88a85b07c..1cdb581a51f 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -4587,12 +4587,15 @@  int ssa_name_limit_t::next_ssa_name (tree ssa_name)
 
 /* Determines the minimum and maximum number of leading non-zero bytes
    in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
-   to each.  Sets LENRANGE[2] to the total number of bytes in
-   the representation.  Sets *NULTREM if the representation contains
-   a zero byte, and sets *ALLNUL if all the bytes are zero.
+   to each.
+   Sets LENRANGE[2] to the total size of the access (which may be less
+   than LENRANGE[1] when what's being referenced by EXP is a pointer
+   rather than an array).
+   Sets *NULTREM if the representation contains a zero byte, and sets
+   *ALLNUL if all the bytes are zero.
    OFFSET and NBYTES are the offset into the representation and
-   the size of the access to it determined from a MEM_REF or zero
-   for other expressions.
+   the size of the access to it determined from an ADDR_EXPR (i.e.,
+   a pointer) or MEM_REF or zero for other expressions.
    Uses RVALS to determine range information.
    Avoids recursing deeper than the limits in SNLIM allow.
    Returns true on success and false otherwise.  */
@@ -4692,7 +4695,13 @@  count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset,
     }
 
   if (TREE_CODE (exp) == ADDR_EXPR)
-    exp = TREE_OPERAND (exp, 0);
+    {
+      /* If the size of the access hasn't been determined yet it's that
+	 of a pointer.  */
+      if (!nbytes)
+	nbytes = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (exp)));
+      exp = TREE_OPERAND (exp, 0);
+    }
 
   if (TREE_CODE (exp) == SSA_NAME)
     {
@@ -4788,9 +4797,10 @@  count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset,
 	return false;
 
       if (!nbytes)
-	/* If NBYTES hasn't been determined earlier from MEM_REF,
-	   set it here.  It includes all internal nuls, including
-	   the terminating one if the string has one.  */
+	/* If NBYTES hasn't been determined earlier, either from ADDR_EXPR
+	   (i.e., it's the size of a pointer), or from MEM_REF (as the size
+	   of the access), set it here to the size of the string, including
+	   all internal and trailing nuls if the string has any.  */
 	nbytes = nchars - offset;
 
       prep = TREE_STRING_POINTER (exp) + offset;
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-32.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-32.c
new file mode 100644
index 00000000000..e5939567a4d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-32.c
@@ -0,0 +1,51 @@ 
+/* PR middle-end/93829 - bogus -Wstringop-overflow on memcpy of a struct
+   with a pointer member from another with a long string
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+extern void* memcpy (void*, const void*, __SIZE_TYPE__);
+
+#define S40 "0123456789012345678901234567890123456789"
+
+const char s40[] = S40;
+
+struct S
+{
+  const void *p, *q, *r;
+} s, sa[2];
+
+
+void test_lit_decl (void)
+{
+  struct S t = { 0, S40, 0 };
+
+  memcpy (&s, &t, sizeof t);    // { dg-bogus "-Wstringop-overflow" }
+}
+
+void test_str_decl (void)
+{
+  struct S t = { 0, s40, 0 };
+
+  memcpy (&s, &t, sizeof t);    // { dg-bogus "-Wstringop-overflow" }
+}
+
+
+void test_lit_ssa (int i)
+{
+  if (i < 1)
+    i = 1;
+  struct S *p = &sa[i];
+  struct S t = { 0, S40, 0 };
+
+  memcpy (p, &t, sizeof t);    // { dg-bogus "-Wstringop-overflow" }
+}
+
+void test_str_ssa (int i)
+{
+  if (i < 1)
+    i = 1;
+  struct S *p = &sa[i];
+  struct S t = { 0, s40, 0 };
+
+  memcpy (p, &t, sizeof t);    // { dg-bogus "-Wstringop-overflow" }
+}