diff mbox

[2/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

Message ID 1497231174.27153.9.camel@stu.xidian.edu.cn
State New
Headers show

Commit Message

Xi Ruoyao June 12, 2017, 1:32 a.m. UTC
This patch adds warning option -Wstring-plus-int for C/C++.

gcc/ChangeLog:

2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>

	* c-family/c.opt: New option -Wstring-plus-int.
	* c-family/c-common.c (pointer_int_sum): Checking for
	-Wstring-plus-int.
---
 gcc/c-family/c-common.c | 25 +++++++++++++++++++++++++
 gcc/c-family/c.opt      |  5 +++++
 2 files changed, 30 insertions(+)

Comments

Martin Sebor June 19, 2017, 4:51 p.m. UTC | #1
On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> This patch adds warning option -Wstring-plus-int for C/C++.
>
> gcc/ChangeLog:
>
> 2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>
>
> 	* c-family/c.opt: New option -Wstring-plus-int.
> 	* c-family/c-common.c (pointer_int_sum): Checking for
> 	-Wstring-plus-int.

This is a very useful warning but I would suggest to word it
in terms of what it actually does rather than what it might be
intended to do.  E.g., for

   const char *p = "123" + 7;

issue

   warning: offset 7 exceeds the upper bound 3 of the array

rather than

   warning: adding 'int' to a string does not append to the string

(I have trouble envisioning on what grounds someone might expect
the addition to have this effect.)

Given that the warning only triggers when the upper bound of
an array is exceeded I would also suggest to consider including
the warning in -Warray-bounds.  (With that, it would be useful
to also detect exceeding the upper bound of non-literal arrays
as well.)

Martin
Xi Ruoyao June 19, 2017, 5:28 p.m. UTC | #2
On 2017-06-19 10:51 -0600, Martin Sebor wrote:
> On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> > This patch adds warning option -Wstring-plus-int for C/C++.
> > 
> > gcc/ChangeLog:
> > 
> > 2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>
> > 
> > 	* c-family/c.opt: New option -Wstring-plus-int.
> > 	* c-family/c-common.c (pointer_int_sum): Checking for
> > 	-Wstring-plus-int.
> 
> This is a very useful warning but I would suggest to word it
> in terms of what it actually does rather than what it might be
> intended to do.  E.g., for
> 
>    const char *p = "123" + 7;
> 
> issue
> 
>    warning: offset 7 exceeds the upper bound 3 of the array
> 
> rather than
> 
>    warning: adding 'int' to a string does not append to the string
> 
> (I have trouble envisioning on what grounds someone might expect
> the addition to have this effect.)

How about something like `const char *p = "123" + getchar();` ?

I'd like this for -Wstring-plus-int=1:

    warning: adding 'int' to a string does not append to the string
    [-Wstring-plus-int=]
        const char *p = "123" + 7;
                              ^
    note: offset 7 exceeds the size 4 of the string, using the result
    may lead to undefined behaviour.

(Clang permits "123" + 4 since its result is well defined in standard.
Maybe we could permit "123" + 3 only.)

For level 1 we only warn for such obvious mistakes. And for
-Wstring-plus-int=2:

    warning: adding 'int' to a string does not append to the string
    [-Wstring-plus-int=]
        const char *p = "123" + getchar();
                              ^
    note: the offset may exceed the size of the string.

(Clang also warn while it's impossible to know whether the offset
exceeds.  It seems aggressively so we can make it level 2.)

> Given that the warning only triggers when the upper bound of
> an array is exceeded I would also suggest to consider including
> the warning in -Warray-bounds.  (With that, it would be useful
> to also detect exceeding the upper bound of non-literal arrays
> as well.)

We can let -Warray-bounds enable -Wstring-plus-int=1, but not =2.
Martin Sebor June 19, 2017, 6:44 p.m. UTC | #3
On 06/19/2017 11:28 AM, Xi Ruoyao wrote:
> On 2017-06-19 10:51 -0600, Martin Sebor wrote:
>> On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
>>> This patch adds warning option -Wstring-plus-int for C/C++.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>
>>>
>>> 	* c-family/c.opt: New option -Wstring-plus-int.
>>> 	* c-family/c-common.c (pointer_int_sum): Checking for
>>> 	-Wstring-plus-int.
>>
>> This is a very useful warning but I would suggest to word it
>> in terms of what it actually does rather than what it might be
>> intended to do.  E.g., for
>>
>>    const char *p = "123" + 7;
>>
>> issue
>>
>>    warning: offset 7 exceeds the upper bound 3 of the array
>>
>> rather than
>>
>>    warning: adding 'int' to a string does not append to the string
>>
>> (I have trouble envisioning on what grounds someone might expect
>> the addition to have this effect.)
>
> How about something like `const char *p = "123" + getchar();` ?

I'm not sure I correctly understand the question (or whether
it's meant in response to my comment in parentheses) but let
me clarify what I meant.

In my view, the group of C++ (and certainly C) programmers who
might expect "123" + i to append the string representation of
the integer result to a string literal isn't significant enough
to focus the warning on.

Whether or not the addition is valid depends on the value of
the integer operand.  There are three sets of cases where the
addition is or may be invalid:

1) the integer operand is an out of bounds constant
2) the integer operand's non-constant value or the lower bound
    of its range is known to be out of bounds,
3) the lower bound of the operand's range is in bounds but
    the upper bound is out of bounds (as in the getchar example).

(1) can be handled with lexical analysis alone (as in you parch)
but it's prone to a high rate of false negatives.  (3) can also
be handled by lexical analysis alone but it's prone to a high
rate of false positives.  (2) has no false positives but some
false negatives.  It can only be detected with optimization.

With that in mind the warning would serve a greater purpose
by being aimed more broadly and describing the nature of the
error: forming an invalid pointer.  I believe it would best
be implemented analogously to or even integrated into
-Warray-bounds.  I.e., I suggest covering set (2) above.

>
> I'd like this for -Wstring-plus-int=1:
>
>     warning: adding 'int' to a string does not append to the string
>     [-Wstring-plus-int=]
>         const char *p = "123" + 7;
>                               ^
>     note: offset 7 exceeds the size 4 of the string, using the result
>     may lead to undefined behaviour.

The addition itself is undefined, regardless of whether or not
the result is used.

>
> (Clang permits "123" + 4 since its result is well defined in standard.
> Maybe we could permit "123" + 3 only.)

"123" is an array of 4 elements, with "123" + 4 pointing just past
the last (NUL) element.  It's valid to form a pointer past the last
element of an array and warning about it would likely be viewed as
a false positive (certainly if it were an out-of-bounds type of
warning).

Martin
Xi Ruoyao June 19, 2017, 7:36 p.m. UTC | #4
On 2017-06-19 12:44 -0600, Martin Sebor wrote:
> On 06/19/2017 11:28 AM, Xi Ruoyao wrote:
> > On 2017-06-19 10:51 -0600, Martin Sebor wrote:
> > > On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> > > > This patch adds warning option -Wstring-plus-int for C/C++.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>
> > > > 
> > > > 	* c-family/c.opt: New option -Wstring-plus-int.
> > > > 	* c-family/c-common.c (pointer_int_sum): Checking for
> > > > 	-Wstring-plus-int.
> > > 
> > > This is a very useful warning but I would suggest to word it
> > > in terms of what it actually does rather than what it might be
> > > intended to do.  E.g., for
> > > 
> > >    const char *p = "123" + 7;
> > > 
> > > issue
> > > 
> > >    warning: offset 7 exceeds the upper bound 3 of the array
> > > 
> > > rather than
> > > 
> > >    warning: adding 'int' to a string does not append to the string
> > > 
> > > (I have trouble envisioning on what grounds someone might expect
> > > the addition to have this effect.)
> > 
> > How about something like `const char *p = "123" + getchar();` ?
> 
> I'm not sure I correctly understand the question (or whether
> it's meant in response to my comment in parentheses) but let
> me clarify what I meant.
> 
> In my view, the group of C++ (and certainly C) programmers who
> might expect "123" + i to append the string representation of
> the integer result to a string literal isn't significant enough
> to focus the warning on.
> 
> Whether or not the addition is valid depends on the value of
> the integer operand.  There are three sets of cases where the
> addition is or may be invalid:
> 
> 1) the integer operand is an out of bounds constant
> 2) the integer operand's non-constant value or the lower bound
>     of its range is known to be out of bounds,
> 3) the lower bound of the operand's range is in bounds but
>     the upper bound is out of bounds (as in the getchar example).
> 
> (1) can be handled with lexical analysis alone (as in you parch)
> but it's prone to a high rate of false negatives.  (3) can also
> be handled by lexical analysis alone but it's prone to a high
> rate of false positives.  (2) has no false positives but some
> false negatives.  It can only be detected with optimization.
> 
> With that in mind the warning would serve a greater purpose
> by being aimed more broadly and describing the nature of the
> error: forming an invalid pointer.  I believe it would best
> be implemented analogously to or even integrated into
> -Warray-bounds.  I.e., I suggest covering set (2) above.

Now I think I've been cheat by GCC wiki, which states PR 62181
is an "easy-hack" :)

I'll try to improve -Warray-bounds.

> > 
> > I'd like this for -Wstring-plus-int=1:
> > 
> >     warning: adding 'int' to a string does not append to the string
> >     [-Wstring-plus-int=]
> >         const char *p = "123" + 7;
> >                               ^
> >     note: offset 7 exceeds the size 4 of the string, using the result
> >     may lead to undefined behaviour.
> 
> The addition itself is undefined, regardless of whether or not
> the result is used.
> 
> > 
> > (Clang permits "123" + 4 since its result is well defined in standard.
> > Maybe we could permit "123" + 3 only.)
> 
> "123" is an array of 4 elements, with "123" + 4 pointing just past
> the last (NUL) element.  It's valid to form a pointer past the last
> element of an array and warning about it would likely be viewed as
> a false positive (certainly if it were an out-of-bounds type of
> warning).
> 
> Martin
Xi Ruoyao June 22, 2017, 10:26 a.m. UTC | #5
On 2017-06-19 12:44 -0600, Martin Sebor wrote:
> On 06/19/2017 11:28 AM, Xi Ruoyao wrote:
> > On 2017-06-19 10:51 -0600, Martin Sebor wrote:
> > > On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> > > > This patch adds warning option -Wstring-plus-int for C/C++.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2017-06-12  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>
> > > > 
> > > > 	* c-family/c.opt: New option -Wstring-plus-int.
> > > > 	* c-family/c-common.c (pointer_int_sum): Checking for
> > > > 	-Wstring-plus-int.
> > > 
> > > This is a very useful warning but I would suggest to word it
> > > in terms of what it actually does rather than what it might be
> > > intended to do.  E.g., for
> > > 
> > >    const char *p = "123" + 7;
> > > 
> > > issue
> > > 
> > >    warning: offset 7 exceeds the upper bound 3 of the array
> > > 
> > > rather than
> > > 
> > >    warning: adding 'int' to a string does not append to the string
> > > 
> > > (I have trouble envisioning on what grounds someone might expect
> > > the addition to have this effect.)
> > 
> > How about something like `const char *p = "123" + getchar();` ?
> 
> I'm not sure I correctly understand the question (or whether
> it's meant in response to my comment in parentheses) but let
> me clarify what I meant.
> 
> In my view, the group of C++ (and certainly C) programmers who
> might expect "123" + i to append the string representation of
> the integer result to a string literal isn't significant enough
> to focus the warning on.
> 
> Whether or not the addition is valid depends on the value of
> the integer operand.  There are three sets of cases where the
> addition is or may be invalid:
> 
> 1) the integer operand is an out of bounds constant
> 2) the integer operand's non-constant value or the lower bound
>     of its range is known to be out of bounds,
> 3) the lower bound of the operand's range is in bounds but
>     the upper bound is out of bounds (as in the getchar example).
> 
> (1) can be handled with lexical analysis alone (as in you parch)
> but it's prone to a high rate of false negatives.  (3) can also
> be handled by lexical analysis alone but it's prone to a high
> rate of false positives.  (2) has no false positives but some
> false negatives.  It can only be detected with optimization.
> 
> With that in mind the warning would serve a greater purpose
> by being aimed more broadly and describing the nature of the
> error: forming an invalid pointer.  I believe it would best
> be implemented analogously to or even integrated into
> -Warray-bounds.  I.e., I suggest covering set (2) above.

I created PR 81172.  For const char *p = "123" + 'c' we should have:

    warning: offset 99 is above array bounds, the behaviour is
    undefined [-Warray-bounds]
        const char *p = "123" + 'c';

and perhaps (for the case the pointer operand is a string or a string
pointer):
    note: adding integer to a string does not append to the string.

> > 
> > I'd like this for -Wstring-plus-int=1:
> > 
> >     warning: adding 'int' to a string does not append to the string
> >     [-Wstring-plus-int=]
> >         const char *p = "123" + 7;
> >                               ^
> >     note: offset 7 exceeds the size 4 of the string, using the result
> >     may lead to undefined behaviour.
> 
> The addition itself is undefined, regardless of whether or not
> the result is used.
> 
> > 
> > (Clang permits "123" + 4 since its result is well defined in standard.
> > Maybe we could permit "123" + 3 only.)
> 
> "123" is an array of 4 elements, with "123" + 4 pointing just past
> the last (NUL) element.  It's valid to form a pointer past the last
> element of an array and warning about it would likely be viewed as
> a false positive (certainly if it were an out-of-bounds type of
> warning).
> 
> Martin
Gerald Pfeifer July 15, 2017, 4:33 p.m. UTC | #6
On Thu, 22 Jun 2017, Xi Ruoyao wrote:
> I created PR 81172.  For const char *p = "123" + 'c' we should have:
> 
>     warning: offset 99 is above array bounds, the behaviour is
>     undefined [-Warray-bounds]
>         const char *p = "123" + 'c';
> 
> and perhaps (for the case the pointer operand is a string or a string
> pointer):
>     note: adding integer to a string does not append to the string.

I do think this makes sense.  I'm really not convinced there is a 
lot of code out there that uses the "123" + i idiom to create a pointer, 
even if it is legitimate code, so a warning makes sense to me.

Gerald
diff mbox

Patch

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4395e51..1eee48f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3100,6 +3100,31 @@  pointer_int_sum (location_t loc, enum tree_code resultcode,
   else
     size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type));
 
+  /* Handle -Wstring-plus-int, warn for adding string literals
+     and an integer which may result in a wild pointer.  */
+  if (warn_string_plus_int
+      && resultcode == PLUS_EXPR
+      && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (result_type))))
+    {
+      tree orig_ptrop = tree_strip_nop_conversions(ptrop);
+      if (TREE_CODE (orig_ptrop) == ADDR_EXPR)
+        {
+          tree obj = TREE_OPERAND (orig_ptrop, 0);
+          if (TREE_CODE (obj) == STRING_CST)
+            {
+              tree t = TYPE_DOMAIN (TREE_TYPE (obj));
+              if (TREE_CODE (intop) != INTEGER_CST
+                  || tree_int_cst_lt (intop, TYPE_MIN_VALUE (t))
+                  || int_cst_value (intop)
+                     > int_cst_value (TYPE_MAX_VALUE (t)) + 1)
+                warning_at (loc, OPT_Wstring_plus_int,
+                            "add %qT to string does not append to "
+                            "the string",
+                            TREE_TYPE (intop));
+            }
+        }
+    }
+
   /* We are manipulating pointer values, so we don't need to warn
      about relying on undefined signed overflow.  We disable the
      warning here because we use integer types so fold won't know that
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 37bb236..94ba3eb 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -732,6 +732,11 @@  C ObjC C++ ObjC++ Joined RejectNegative UInteger Var(warn_stringop_overflow) Ini
 Under the control of Object Size type, warn about buffer overflow in string
 manipulation functions like memcpy and strcpy.
 
+Wstring-plus-int
+C ObjC C++ ObjC++ Var(warn_string_plus_int) Warning
+Warn about adding strings and integers, which is likely an ill-formed
+attempt to append the string.
+
 Wsuggest-attribute=format
 C ObjC C++ ObjC++ Var(warn_suggest_attribute_format) Warning
 Warn about functions which might be candidates for format attributes.
-- 
2.7.1