diff mbox

C, C++: New warning for memset without multiply by elt size

Message ID 571DDE39.7060200@redhat.com
State New
Headers show

Commit Message

Bernd Schmidt April 25, 2016, 9:07 a.m. UTC
On 04/22/2016 03:57 PM, Jason Merrill wrote:
> This looks good, but can we move the code into c-common rather than
> duplicate it?

That would be this patch. Also passes testing on x86_64-linux.


Bernd

Comments

Jason Merrill April 25, 2016, 5:55 p.m. UTC | #1
On 04/25/2016 05:07 AM, Bernd Schmidt wrote:
> +		if (TREE_CODE (arg2) == CONST_DECL)
> +		  arg2 = DECL_INITIAL (arg2);
> +		int literal_mask = ((!!integer_zerop (arg1) << 1)
> +				    | (!!integer_zerop (arg2) << 2));

Are you deliberately treating an enumerator as a literal 0?  I'd think 
we should set literal_mask before stripping CONST_DECL.  OK with that 
change.

Jason
Bernd Schmidt April 26, 2016, 1:04 p.m. UTC | #2
On 04/25/2016 07:55 PM, Jason Merrill wrote:
> On 04/25/2016 05:07 AM, Bernd Schmidt wrote:
>> +        if (TREE_CODE (arg2) == CONST_DECL)
>> +          arg2 = DECL_INITIAL (arg2);
>> +        int literal_mask = ((!!integer_zerop (arg1) << 1)
>> +                    | (!!integer_zerop (arg2) << 2));
>
> Are you deliberately treating an enumerator as a literal 0?  I'd think
> we should set literal_mask before stripping CONST_DECL.  OK with that
> change.

Just to be sure, is that OK for everything or just the C++ parts?


Bernd
Jason Merrill April 26, 2016, 3:57 p.m. UTC | #3
On Tue, Apr 26, 2016 at 9:04 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 04/25/2016 07:55 PM, Jason Merrill wrote:
>>
>> On 04/25/2016 05:07 AM, Bernd Schmidt wrote:
>>>
>>> +        if (TREE_CODE (arg2) == CONST_DECL)
>>> +          arg2 = DECL_INITIAL (arg2);
>>> +        int literal_mask = ((!!integer_zerop (arg1) << 1)
>>> +                    | (!!integer_zerop (arg2) << 2));
>>
>>
>> Are you deliberately treating an enumerator as a literal 0?  I'd think
>> we should set literal_mask before stripping CONST_DECL.  OK with that
>> change.
>
>
> Just to be sure, is that OK for everything or just the C++ parts?

Everything.

Jason
Martin Sebor April 26, 2016, 9:23 p.m. UTC | #4
On 04/25/2016 03:07 AM, Bernd Schmidt wrote:
> On 04/22/2016 03:57 PM, Jason Merrill wrote:
>> This looks good, but can we move the code into c-common rather than
>> duplicate it?
>
> That would be this patch. Also passes testing on x86_64-linux.

Hi Bernd,

I like the idea behind the patch!  So much so in fact, and since
it happens to be in an area where I've been doing some of my own
work, I have a few observations and suggestions.  Please forgive
me for the length of the email.

The documentation for the new option implies that it should warn
for calls to memset where the third argument contains the number
of elements not multiplied by the element size.  But in my (quick)
testing it only warns when the argument is a constant equal to
the number of elements and less than the size of the array.  For
example, neither of the following is diagnosed:

     int a [4];
     __builtin_memset (a, 0, 2 + 2);
     __builtin_memset (a, 0, 4 * 1);
     __builtin_memset (a, 0, 3);
     __builtin_memset (a, 0, 4 * sizeof a);

If it's possible and not too difficult, it would be nice if
the detection logic could be made a bit smarter to also diagnose
these less trivial cases (and matched the documented behavior).

Even beyond that, I also wonder if the warning could also be
issued when writing any constant number of bytes that is not
a multiple of the element size. This would be useful not just
for memset but also for memcpy.  (The premise being that it's
unusual to want to zero out or copy just a few bytes of any
array element and leave the remaining bytes of that element
unchanged.)

I also have a comment on the text and content of the warning:

   memset used with length equal to number of elements without
     multiplication with element size

FWIW, multiplication is typically done "by" a number (not with
one).  More important, I often find it helpful if a diagnostic
spells out the numeric values involved in the computation (e.g.,
array bounds or sizes).  I find this especially helpful in C++
where the values of symbolic constants are often the result of
non-trivial computations involving non-type template parameters
or constexpr variables or function calls, and thus difficult
to derive just by looking at the source code.  Here's an idea
for rewording the diagnostic to include this information:

   warning: memset called to set '3' bytes which is not a positive
     multiple of element size in array 'a' with type int[3]'
   note: array 'a' is declared here

Finally, I would be remiss not to mention that the patch has
an instance of trailing space in it (gasp! ;)  Personally,
I'm not bothered by it but it seems like a good opportunity
to highlight that these things happen even to the most careful
of us, and not necessarily as a result of not being careful or
aware of the coding guidelines.  My point is that no amount of
documentation will or diligence will prevent these kinds of
problems, and dwelling on them in code reviews isn't the best
use of our time.  Let's put in place a tool that takes care of
these nits for us so we can focus on things a tool can't help
us with!

Martin
Bernd Schmidt April 27, 2016, 9:55 a.m. UTC | #5
On 04/26/2016 11:23 PM, Martin Sebor wrote:
> The documentation for the new option implies that it should warn
> for calls to memset where the third argument contains the number
> of elements not multiplied by the element size.  But in my (quick)
> testing it only warns when the argument is a constant equal to
> the number of elements and less than the size of the array.  For
> example, neither of the following is diagnosed:
>
>      int a [4];
>      __builtin_memset (a, 0, 2 + 2);
>      __builtin_memset (a, 0, 4 * 1);
>      __builtin_memset (a, 0, 3);
>      __builtin_memset (a, 0, 4 * sizeof a);
>
> If it's possible and not too difficult, it would be nice if
> the detection logic could be made a bit smarter to also diagnose
> these less trivial cases (and matched the documented behavior).

I've thought about some of these cases. The problem is there are 
legitimate cases of calling memset for only part of an array. I wanted 
to start with something that is unlikely to give false positives.

A multiplication by the wrong sizeof would be a nice thing to spot. 
Would you like to work on followup patches? I probably won't get to it 
in a while.

> Even beyond that, I also wonder if the warning could also be
> issued when writing any constant number of bytes that is not
> a multiple of the element size. This would be useful not just
> for memset but also for memcpy.  (The premise being that it's
> unusual to want to zero out or copy just a few bytes of any
> array element and leave the remaining bytes of that element
> unchanged.)

Probably a good idea.

> I also have a comment on the text and content of the warning:
>
>    memset used with length equal to number of elements without
>      multiplication with element size
>
> FWIW, multiplication is typically done "by" a number (not with
> one).

Fixed.

> Here's an idea
> for rewording the diagnostic to include this information:
>
>    warning: memset called to set '3' bytes which is not a positive
>      multiple of element size in array 'a' with type int[3]'
>    note: array 'a' is declared here

I'm finding this too verbose, but I guess that's a matter of taste.

> Finally, I would be remiss not to mention that the patch has
> an instance of trailing space in it (gasp! ;)

Fixed before committing.

> Personally,
> I'm not bothered by it but it seems like a good opportunity
> to highlight that these things happen even to the most careful
> of us, and not necessarily as a result of not being careful or
> aware of the coding guidelines.  My point is that no amount of
> documentation will or diligence will prevent these kinds of
> problems, and dwelling on them in code reviews isn't the best
> use of our time.

The first part is probably true, but we have code review exactly 
_because_ people make mistakes. Without it, the code base would 
degenerate rapidly. So, well spotted.

> Let's put in place a tool that takes care of
> these nits for us so we can focus on things a tool can't help
> us with!

I don't believe in tools for this, Machines are stupid, and I think the 
problem is AI-complete. In this case the check-GNU-style script has two 
other complaints about the patch which a human can tell are wrong. 
Enforcing patches to pass a mechanical check would be a mistake IMO, 
since it would force people into contortions trying to placate an 
unintelligent program, making code quality worse.


Bernd
Jeff Law April 27, 2016, 2:45 p.m. UTC | #6
On 04/27/2016 03:55 AM, Bernd Schmidt wrote:
> On 04/26/2016 11:23 PM, Martin Sebor wrote:
>> The documentation for the new option implies that it should warn
>> for calls to memset where the third argument contains the number
>> of elements not multiplied by the element size.  But in my (quick)
>> testing it only warns when the argument is a constant equal to
>> the number of elements and less than the size of the array.  For
>> example, neither of the following is diagnosed:
>>
>>      int a [4];
>>      __builtin_memset (a, 0, 2 + 2);
>>      __builtin_memset (a, 0, 4 * 1);
>>      __builtin_memset (a, 0, 3);
>>      __builtin_memset (a, 0, 4 * sizeof a);
>>
>> If it's possible and not too difficult, it would be nice if
>> the detection logic could be made a bit smarter to also diagnose
>> these less trivial cases (and matched the documented behavior).
>
> I've thought about some of these cases. The problem is there are
> legitimate cases of calling memset for only part of an array. I wanted
> to start with something that is unlikely to give false positives.
So I wonder if what we really want is to track which bytes in the object 
are set and which are not -- utilizing both memset and standard stores 
and if the object as a whole is not initialized, then warn.

We've actually got a lot of the code that would be necessary to detect 
this in tree DSE, with more coming in this stage1 as I extend it to 
handle some missing cases.

Clearly a follow-up rather than a requirement for the current patch to 
move forward.

Jeff
Martin Sebor April 27, 2016, 3:39 p.m. UTC | #7
On 04/27/2016 03:55 AM, Bernd Schmidt wrote:
> On 04/26/2016 11:23 PM, Martin Sebor wrote:
>> The documentation for the new option implies that it should warn
>> for calls to memset where the third argument contains the number
>> of elements not multiplied by the element size.  But in my (quick)
>> testing it only warns when the argument is a constant equal to
>> the number of elements and less than the size of the array.  For
>> example, neither of the following is diagnosed:
>>
>>      int a [4];
>>      __builtin_memset (a, 0, 2 + 2);
>>      __builtin_memset (a, 0, 4 * 1);
>>      __builtin_memset (a, 0, 3);
>>      __builtin_memset (a, 0, 4 * sizeof a);
>>
>> If it's possible and not too difficult, it would be nice if
>> the detection logic could be made a bit smarter to also diagnose
>> these less trivial cases (and matched the documented behavior).
>
> I've thought about some of these cases. The problem is there are
> legitimate cases of calling memset for only part of an array. I wanted
> to start with something that is unlikely to give false positives.
>
> A multiplication by the wrong sizeof would be a nice thing to spot.
> Would you like to work on followup patches? I probably won't get to it
> in a while.

Yes, I think enhancing this warning would be in line with
the _FORTIFY_SOURCE improvements I'm starting to look into now.
I agree that minimizing false positives is important.  I'm not
sure there is complete consensus on exactly what qualifies as
a false positive, but that's probably a discussion we can have
once we have a patch and some tests to look at).

Martin
Gerald Pfeifer May 15, 2016, 9:47 p.m. UTC | #8
For the record, this new warning/code found at least three actual 
bugs in Wine (two of which were in the testsuite, but still).

Definitely a useful addition.

Gerald
diff mbox

Patch

	* doc/invoke.texi (Warning Options): Add -Wmemset-elt-size.
	(-Wmemset-elt-size): New item.
c-family/
	* c.opt (Wmemset-elt-size): New option.
	* c-common.c (warn_for_memset): New function.
	* c-common.h (warn_for_memset): Declare.
c/
	* c-parser.c (c_parser_postfix_expression_after_primary): Call
	warn_for_memset instead of warning directly here.
cp/
	* parser.c (cp_parser_postfix_expression): Call
	warn_for_memset instead of warning directly here.
testsuite/
	* c-c++-common/memset-array.c: New test.

Index: gcc/c/c-parser.c
===================================================================
--- gcc/c/c-parser.c	(revision 235384)
+++ gcc/c/c-parser.c	(working copy)
@@ -8282,18 +8282,15 @@  c_parser_postfix_expression_after_primar
 					      expr.value, exprlist,
 					      sizeof_arg,
 					      sizeof_ptr_memacc_comptypes);
-	  if (warn_memset_transposed_args
-	      && TREE_CODE (expr.value) == FUNCTION_DECL
+	  if (TREE_CODE (expr.value) == FUNCTION_DECL
 	      && DECL_BUILT_IN_CLASS (expr.value) == BUILT_IN_NORMAL
 	      && DECL_FUNCTION_CODE (expr.value) == BUILT_IN_MEMSET
-	      && vec_safe_length (exprlist) == 3
-	      && integer_zerop ((*exprlist)[2])
-	      && (literal_zero_mask & (1 << 2)) != 0
-	      && (!integer_zerop ((*exprlist)[1])
-		  || (literal_zero_mask & (1 << 1)) == 0))
-	    warning_at (expr_loc, OPT_Wmemset_transposed_args,
-			"%<memset%> used with constant zero length parameter; "
-			"this could be due to transposed parameters");
+	      && vec_safe_length (exprlist) == 3)
+	    {
+	      tree arg0 = (*exprlist)[0];
+	      tree arg2 = (*exprlist)[2];
+	      warn_for_memset (expr_loc, arg0, arg2, literal_zero_mask);
+	    }
 
 	  start = expr.get_start ();
 	  finish = parser->tokens_buf[0].get_finish ();
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 235384)
+++ gcc/c-family/c-common.c	(working copy)
@@ -11767,6 +11767,49 @@  warn_for_div_by_zero (location_t loc, tr
     warning_at (loc, OPT_Wdiv_by_zero, "division by zero");
 }
 
+/* Warn for patterns where memset appears to be used incorrectly.  The
+   warning location should be LOC.  ARG0, and ARG2 are the first and
+   last arguments to the call, while LITERAL_ZERO_MASK has a 1 bit for
+   each argument that was a literal zero.  */
+
+void
+warn_for_memset (location_t loc, tree arg0, tree arg2,
+		 int literal_zero_mask)
+{
+  if (warn_memset_transposed_args
+      && integer_zerop (arg2)
+      && (literal_zero_mask & (1 << 2)) != 0
+      && (literal_zero_mask & (1 << 1)) == 0)
+    warning_at (loc, OPT_Wmemset_transposed_args,
+		"%<memset%> used with constant zero length "
+		"parameter; this could be due to transposed "
+		"parameters");
+
+  if (warn_memset_elt_size && TREE_CODE (arg2) == INTEGER_CST)
+    {
+      STRIP_NOPS (arg0);
+      if (TREE_CODE (arg0) == ADDR_EXPR)
+	arg0 = TREE_OPERAND (arg0, 0);
+      tree type = TREE_TYPE (arg0);
+      if (TREE_CODE (type) == ARRAY_TYPE)
+	{
+	  tree elt_type = TREE_TYPE (type);
+	  tree domain = TYPE_DOMAIN (type);
+	  if (!integer_onep (TYPE_SIZE_UNIT (elt_type))
+	      && TYPE_MAXVAL (domain)
+	      && TYPE_MINVAL (domain)
+	      && integer_zerop (TYPE_MINVAL (domain))
+	      && integer_onep (fold_build2 (MINUS_EXPR, domain,
+					    arg2,
+					    TYPE_MAXVAL (domain))))
+	    warning_at (loc, OPT_Wmemset_elt_size,
+			"%<memset%> used with length equal to "
+			"number of elements without multiplication "
+			"with element size");
+	}
+    }  
+}
+
 /* Subroutine of build_binary_op. Give warnings for comparisons
    between signed and unsigned quantities that may fail. Do the
    checking based on the original operand trees ORIG_OP0 and ORIG_OP1,
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 235384)
+++ gcc/c-family/c-common.h	(working copy)
@@ -902,6 +902,7 @@  extern void c_parse_file (void);
 extern void c_parse_final_cleanups (void);
 
 extern void warn_for_omitted_condop (location_t, tree);
+extern void warn_for_memset (location_t, tree, tree, int);
 
 /* These macros provide convenient access to the various _STMT nodes.  */
 
Index: gcc/c-family/c.opt
===================================================================
--- gcc/c-family/c.opt	(revision 235384)
+++ gcc/c-family/c.opt	(working copy)
@@ -565,6 +565,10 @@  Wmemset-transposed-args
 C ObjC C++ ObjC++ Var(warn_memset_transposed_args) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn about suspicious calls to memset where the third argument is constant literal zero and the second is not.
 
+Wmemset-elt-size
+C ObjC C++ ObjC++ Var(warn_memset_elt_size) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+Warn about suspicious calls to memset where the third argument contains the number of elements not multiplied by the element size.
+
 Wmisleading-indentation
 C C++ Common Var(warn_misleading_indentation) Warning LangEnabledBy(C C++,Wall)
 Warn when the indentation of the code does not reflect the block structure.
Index: gcc/cp/parser.c
===================================================================
--- gcc/cp/parser.c	(revision 235384)
+++ gcc/cp/parser.c	(working copy)
@@ -6829,20 +6829,19 @@  cp_parser_postfix_expression (cp_parser
 		  }
 	      }
 
-	    if (warn_memset_transposed_args)
+	    if (TREE_CODE (postfix_expression) == FUNCTION_DECL
+		&& DECL_BUILT_IN_CLASS (postfix_expression) == BUILT_IN_NORMAL
+		&& DECL_FUNCTION_CODE (postfix_expression) == BUILT_IN_MEMSET
+		&& vec_safe_length (args) == 3)
 	      {
-		if (TREE_CODE (postfix_expression) == FUNCTION_DECL
-		    && DECL_BUILT_IN_CLASS (postfix_expression) == BUILT_IN_NORMAL
-		    && DECL_FUNCTION_CODE (postfix_expression) == BUILT_IN_MEMSET
-		    && vec_safe_length (args) == 3
-		    && TREE_CODE ((*args)[2]) == INTEGER_CST
-		    && integer_zerop ((*args)[2])
-		    && !(TREE_CODE ((*args)[1]) == INTEGER_CST
-			 && integer_zerop ((*args)[1])))
-		  warning (OPT_Wmemset_transposed_args,
-			   "%<memset%> used with constant zero length "
-			   "parameter; this could be due to transposed "
-			   "parameters");
+		tree arg0 = (*args)[0];
+		tree arg1 = (*args)[1];
+		tree arg2 = (*args)[2];
+		if (TREE_CODE (arg2) == CONST_DECL)
+		  arg2 = DECL_INITIAL (arg2);
+		int literal_mask = ((!!integer_zerop (arg1) << 1)
+				    | (!!integer_zerop (arg2) << 2));
+		warn_for_memset (input_location, arg0, arg2, literal_mask);
 	      }
 
 	    if (TREE_CODE (postfix_expression) == COMPONENT_REF)
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 235384)
+++ gcc/doc/invoke.texi	(working copy)
@@ -273,7 +273,7 @@  Objective-C and Objective-C++ Dialects}.
 -Wno-int-to-pointer-cast -Winvalid-memory-model -Wno-invalid-offsetof @gol
 -Winvalid-pch -Wlarger-than=@var{len} @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
--Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol
+-Wmain -Wmaybe-uninitialized -Wmemset-elt-size -Wmemset-transposed-args @gol
 -Wmisleading-indentation -Wmissing-braces @gol
 -Wmissing-field-initializers -Wmissing-include-dirs @gol
 -Wno-multichar -Wnonnull -Wnonnull-compare @gol
@@ -3547,6 +3547,7 @@  Options} and @ref{Objective-C and Object
 -Wlogical-not-parentheses
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}  @gol
 -Wmaybe-uninitialized @gol
+-Wmemset-elt-size @gol
 -Wmemset-transposed-args @gol
 -Wmisleading-indentation @r{(only for C/C++)} @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
@@ -5256,6 +5257,15 @@  Warn when the @code{sizeof} operator is
 declared as an array in a function definition.  This warning is enabled by
 default for C and C++ programs.
 
+@item -Wmemset-elt-size
+@opindex Wmemset-elt-size
+@opindex Wno-memset-elt-size
+Warn for suspicious calls to the @code{memset} built-in function, if the
+first argument references an array, and the third argument is a number
+equal to the number of elements, but not equal to the size of the array
+in memory.  This indicates that the user has omitted a multiplication by
+the element size.  This warning is enabled by @option{-Wall}.
+
 @item -Wmemset-transposed-args
 @opindex Wmemset-transposed-args
 @opindex Wno-memset-transposed-args
Index: gcc/testsuite/c-c++-common/memset-array.c
===================================================================
--- gcc/testsuite/c-c++-common/memset-array.c	(revision 0)
+++ gcc/testsuite/c-c++-common/memset-array.c	(working copy)
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Wmemset-elt-size" } */
+enum a {
+  a_1,
+  a_2,
+  a_n
+};
+int t1[20];
+int t2[a_n];
+
+struct s
+{
+  int t[20];
+};
+
+void foo (struct s *s)
+{
+  __builtin_memset (t1, 0, 20); /* { dg-warning "element size" } */
+  __builtin_memset (t2, 0, a_n); /* { dg-warning "element size" } */
+  __builtin_memset (s->t, 0, 20); /* { dg-warning "element size" } */
+}
+
+char u1[20];
+char u2[a_n];
+
+struct s2
+{
+  char u[20];
+};
+
+void bar (struct s2 *s)
+{
+  __builtin_memset (u1, 0, 20);
+  __builtin_memset (u2, 0, a_n);
+  __builtin_memset (s->u, 0, 20);
+}