diff mbox

enable -Wformat-length for dynamically allocated buffers (pr 78245)

Message ID 8000f699-dee9-29be-d887-68b57ff0fafc@gmail.com
State New
Headers show

Commit Message

Martin Sebor Nov. 9, 2016, 12:09 a.m. UTC
The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)

Comments

Martin Sebor Nov. 16, 2016, 5:33 p.m. UTC | #1
I'm looking for a review of the patch below:

   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html

Thanks

On 11/08/2016 05:09 PM, Martin Sebor wrote:
> The -Wformat-length checker relies on the compute_builtin_object_size
> function to determine the size of the buffer it checks for overflow.
> The function returns either a size computed by the tree-object-size
> pass for objects referenced by the __builtin_object_size intrinsic
> (if it's used in the program) or it tries to compute it for a small
> subset of expressions otherwise.  This subset doesn't include objects
> allocated by either malloc or alloca, and so for those the function
> returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
> a consequence, -Wformat-length is unable to detect overflows
> involving such objects.
>
> The attached patch adds a new function, compute_object_size, that
> uses the existing algorithms to compute and return the sizes of
> allocated objects as well, as if they were referenced by
> __builtin_object_size in the program source, enabling the
> -Wformat-length checker to detect more buffer overflows.
>
> Martin
>
> PS The function makes use of the init_function_sizes API that is
> otherwise unused outside the tree-object-size pass to initialize
> the internal structures, but then calls fini_object_sizes to
> release them before returning.  That seems wasteful because
> the size of the same object or one related to it might need
> to computed again in the context of the same function.  I
> experimented with allocating and releasing the structures only
> when current_function_decl changes but that led to crashes.
> I suspect I'm missing something about the management of memory
> allocated for these structures.  Does anyone have any suggestions
> how to make this work?  (Do I perhaps need to allocate them using
> a special allocator so they don't get garbage collected?)
Martin Sebor Nov. 22, 2016, 12:01 a.m. UTC | #2
Ping.  Still looking for a review of the patch below:

On 11/16/2016 10:33 AM, Martin Sebor wrote:
> I'm looking for a review of the patch below:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html
>
> Thanks
>
> On 11/08/2016 05:09 PM, Martin Sebor wrote:
>> The -Wformat-length checker relies on the compute_builtin_object_size
>> function to determine the size of the buffer it checks for overflow.
>> The function returns either a size computed by the tree-object-size
>> pass for objects referenced by the __builtin_object_size intrinsic
>> (if it's used in the program) or it tries to compute it for a small
>> subset of expressions otherwise.  This subset doesn't include objects
>> allocated by either malloc or alloca, and so for those the function
>> returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
>> a consequence, -Wformat-length is unable to detect overflows
>> involving such objects.
>>
>> The attached patch adds a new function, compute_object_size, that
>> uses the existing algorithms to compute and return the sizes of
>> allocated objects as well, as if they were referenced by
>> __builtin_object_size in the program source, enabling the
>> -Wformat-length checker to detect more buffer overflows.
>>
>> Martin
>>
>> PS The function makes use of the init_function_sizes API that is
>> otherwise unused outside the tree-object-size pass to initialize
>> the internal structures, but then calls fini_object_sizes to
>> release them before returning.  That seems wasteful because
>> the size of the same object or one related to it might need
>> to computed again in the context of the same function.  I
>> experimented with allocating and releasing the structures only
>> when current_function_decl changes but that led to crashes.
>> I suspect I'm missing something about the management of memory
>> allocated for these structures.  Does anyone have any suggestions
>> how to make this work?  (Do I perhaps need to allocate them using
>> a special allocator so they don't get garbage collected?)
>
Jeff Law Nov. 22, 2016, 5 p.m. UTC | #3
On 11/08/2016 05:09 PM, Martin Sebor wrote:
> The -Wformat-length checker relies on the compute_builtin_object_size
> function to determine the size of the buffer it checks for overflow.
> The function returns either a size computed by the tree-object-size
> pass for objects referenced by the __builtin_object_size intrinsic
> (if it's used in the program) or it tries to compute it for a small
> subset of expressions otherwise.  This subset doesn't include objects
> allocated by either malloc or alloca, and so for those the function
> returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
> a consequence, -Wformat-length is unable to detect overflows
> involving such objects.
>
> The attached patch adds a new function, compute_object_size, that
> uses the existing algorithms to compute and return the sizes of
> allocated objects as well, as if they were referenced by
> __builtin_object_size in the program source, enabling the
> -Wformat-length checker to detect more buffer overflows.
>
> Martin
>
> PS The function makes use of the init_function_sizes API that is
> otherwise unused outside the tree-object-size pass to initialize
> the internal structures, but then calls fini_object_sizes to
> release them before returning.  That seems wasteful because
> the size of the same object or one related to it might need
> to computed again in the context of the same function.  I
> experimented with allocating and releasing the structures only
> when current_function_decl changes but that led to crashes.
> I suspect I'm missing something about the management of memory
> allocated for these structures.  Does anyone have any suggestions
> how to make this work?  (Do I perhaps need to allocate them using
> a special allocator so they don't get garbage collected?)
>
> gcc-78245.diff
>
>
> PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer
>
> gcc/testsuite/ChangeLog:
>
> 	PR middle-end/78245
> 	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.
>
> gcc/ChangeLog:
>
> 	PR middle-end/78245
> 	* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
> 	* tree-object-size.c (addr_object_size): Adjust.
> 	(pass_through_call): Adjust.
> 	(compute_object_size, internal_object_size): New functions.
> 	(compute_builtin_object_size): Call internal_object_size.
> 	(pass_object_sizes::execute): Adjust.
> 	* tree-object-size.h (compute_object_size): Declare.
Sorry.  Just not getting to many of the pre-stage1 close patches as fast 
as I'd like.

My only real concern here is that if we call compute_builtin_object_size 
without having initialized the passes, then we initialize, compute, then 
finalize.  Subsequent calls will go through the same process -- the key 
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either 
explicitly (which likely means externalizing the init/fini routines) or 
by wrapping all this stuff in a class and instantiating a suitable object?

I think the answer to your memory management question is that internal 
state is likely not marked as a GC root and thus if you get a GC call 
pieces of internal state are not seen as reachable, but you still can 
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see 
them.  Bitmaps, trees, rtl, are all good examples.  So marking the 
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where 
someone has an object expected to be live across passes, but it isn't 
reachable because someone failed to register a GC root.

Jeff
diff mbox

Patch

PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(compute_object_size, internal_object_size): New functions.
	(compute_builtin_object_size): Call internal_object_size.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 3138ad3..f360711 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2471,7 +2471,7 @@  get_destination_size (tree dest)
      object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, &size))
+  if (compute_object_size (dest, ost, &size))
     return size;
 
   return HOST_WIDE_INT_M1U;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@ 
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@ 
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)						\
-    do {								\
-      if (!LINE || __LINE__ == LINE)					\
-	{								\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);							\
-	}								\
-    } while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)				\
+  do {							\
+    if (!LINE || __LINE__ == LINE)			\
+      {							\
+	char *d;					\
+	ALLOC (d, bufsize);				\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);					\
+      }							\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
    of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@  void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  999));
   T ( 4, "%i",  Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of the destination" } */
 }
+
+/* Exercise ordinary sprintf with malloc.  */
+#undef TEST_SPRINTF
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)	\
+  __builtin_sprintf (d, fmt, __VA_ARGS__)
+
+void test_sprintf_malloc (const char *s, const char *t)
+{
+#define x x ()
+
+  T (1, "%-s", x ? "" : "1");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : "");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : "1");        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : s);        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : t);
+
+  T (2, "%-s", x ? "" : "1");
+  T (2, "%-s", x ? "" : s);
+  T (2, "%-s", x ? "1" : "");
+  T (2, "%-s", x ? s : "");
+  T (2, "%-s", x ? "1" : "2");
+  T (2, "%-s", x ? "" : "12");      /* { dg-warning "nul past the end" } */
+  T (2, "%-s", x ? "12" : "");      /* { dg-warning "nul past the end" } */
+
+  T (2, "%-s", x ? "" : "123");     /* { dg-warning "into a region" } */
+  T (2, "%-s", x ? "123" : "");     /* { dg-warning "into a region" } */
+
+#undef x
+}
+
+/* Exercise ordinary sprintf with alloca.  */
+#undef ALLOC
+#define ALLOC(p, n) (p) = __builtin_alloca (n)
+
+void test_sprintf_alloca (const char *s, const char *t)
+{
+#define x x ()
+
+  T (1, "%-s", x ? "" : "1");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : "");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : "1");        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : s);        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : t);
+
+  T (2, "%-s", x ? "" : "1");
+  T (2, "%-s", x ? "" : s);
+  T (2, "%-s", x ? "1" : "");
+  T (2, "%-s", x ? s : "");
+  T (2, "%-s", x ? "1" : "2");
+  T (2, "%-s", x ? "" : "12");      /* { dg-warning "nul past the end" } */
+  T (2, "%-s", x ? "12" : "");      /* { dg-warning "nul past the end" } */
+
+  T (2, "%-s", x ? "" : "123");     /* { dg-warning "into a region" } */
+  T (2, "%-s", x ? "123" : "");     /* { dg-warning "into a region" } */
+
+#undef x
+}
+
+/* Exercise ordinary sprintf with a VLA.  */
+#undef ALLOC
+#define ALLOC(p, n) char vla [i (n)]; (p) = vla
+
+void test_sprintf_vla (const char *s, const char *t)
+{
+#define x x ()
+
+  T (1, "%-s", x ? "" : "1");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : "");       /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : "1");        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : s);        /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : t);
+
+  T (2, "%-s", x ? "" : "1");
+  T (2, "%-s", x ? "" : s);
+  T (2, "%-s", x ? "1" : "");
+  T (2, "%-s", x ? s : "");
+  T (2, "%-s", x ? "1" : "2");
+  T (2, "%-s", x ? "" : "12");      /* { dg-warning "nul past the end" } */
+  T (2, "%-s", x ? "12" : "");      /* { dg-warning "nul past the end" } */
+
+  T (2, "%-s", x ? "" : "123");     /* { dg-warning "into a region" } */
+  T (2, "%-s", x ? "123" : "");     /* { dg-warning "into a region" } */
+
+#undef x
+}
diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..59ff90c 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -50,6 +50,8 @@  static const unsigned HOST_WIDE_INT unknown[4] = {
   0
 };
 
+static bool internal_object_size (tree, int, unsigned HOST_WIDE_INT *);
+static void fini_object_sizes (void);
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
 			      const_tree, int, unsigned HOST_WIDE_INT *);
@@ -187,7 +189,7 @@  addr_object_size (struct object_size_info *osi, const_tree ptr,
       if (!osi || (object_size_type & 1) != 0
 	  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
 	{
-	  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
+	  internal_object_size (TREE_OPERAND (pt_var, 0),
 				       object_size_type & ~1, &sz);
 	}
       else
@@ -491,14 +493,14 @@  pass_through_call (const gcall *call)
 }
 
 
-/* Compute __builtin_object_size value for PTR and set *PSIZE to
-   the resulting value.  OBJECT_SIZE_TYPE is the second argument
-   to __builtin_object_size.  Return true on success and false
-   when the object size could not be determined.  */
+/* Compute the size of the object pointed to by PTR and set *PSIZE
+   to the resulting value.  OBJECT_SIZE_TYPE is the second argument
+   to __builtin_object_size.  Return true on success and false when
+   the object size could not be determined.  */
 
 bool
-compute_builtin_object_size (tree ptr, int object_size_type,
-			     unsigned HOST_WIDE_INT *psize)
+internal_object_size (tree ptr, int object_size_type,
+		      unsigned HOST_WIDE_INT *psize)
 {
   gcc_assert (object_size_type >= 0 && object_size_type <= 3);
 
@@ -534,7 +536,7 @@  compute_builtin_object_size (tree ptr, int object_size_type,
 	      ptr = gimple_assign_rhs1 (def);
 
 	      if (cst_and_fits_in_hwi (offset)
-		  && compute_builtin_object_size (ptr, object_size_type, psize))
+		  && internal_object_size (ptr, object_size_type, psize))
 		{
 		  /* Return zero when the offset is out of bounds.  */
 		  unsigned HOST_WIDE_INT off = tree_to_shwi (offset);
@@ -664,6 +666,38 @@  compute_builtin_object_size (tree ptr, int object_size_type,
   return *psize != unknown[object_size_type];
 }
 
+/* Compute __builtin_object_size value for PTR and set *PSIZE to
+   the resulting value.  OBJECT_SIZE_TYPE is the second argument
+   to __builtin_object_size.  Return true on success and false
+   when the object size could not be determined.  */
+
+bool
+compute_builtin_object_size (tree ptr, int object_size_type,
+			     unsigned HOST_WIDE_INT *psize)
+{
+  return internal_object_size (ptr, object_size_type, psize);
+}
+
+
+/* Like compute_builtin_object_size but intended to be called
+   without a corresponding __builtin_object_size in the program.  */
+
+bool
+compute_object_size (tree ptr, int object_size_type,
+		     unsigned HOST_WIDE_INT *psize)
+{
+  bool init = computed[0] == NULL;
+  if (init)
+    init_object_sizes ();
+
+  bool retval = internal_object_size (ptr, object_size_type, psize);
+
+  if (init)
+    fini_object_sizes ();
+
+  return retval;
+}
+
 /* Compute object_sizes for PTR, defined to VALUE, which is not an SSA_NAME.  */
 
 static void
@@ -1325,7 +1359,7 @@  pass_object_sizes::execute (function *fun)
 		    {
 		      tree type = TREE_TYPE (lhs);
 		      unsigned HOST_WIDE_INT bytes;
-		      if (compute_builtin_object_size (ptr, object_size_type,
+		      if (internal_object_size (ptr, object_size_type,
 						       &bytes)
 			  && wi::fits_to_tree_p (bytes, type))
 			{
diff --git a/gcc/tree-object-size.h b/gcc/tree-object-size.h
index 38c3e07..60256e6 100644
--- a/gcc/tree-object-size.h
+++ b/gcc/tree-object-size.h
@@ -21,6 +21,7 @@  along with GCC; see the file COPYING3.  If not see
 #define GCC_TREE_OBJECT_SIZE_H
 
 extern void init_object_sizes (void);
+extern bool compute_object_size (tree, int, unsigned HOST_WIDE_INT *);
 extern bool compute_builtin_object_size (tree, int, unsigned HOST_WIDE_INT *);
 
 #endif  // GCC_TREE_OBJECT_SIZE_H