diff mbox

- improve sprintf buffer overflow detection (middle-end/49905)

Message ID 57B725F6.8000405@gmail.com
State New
Headers show

Commit Message

Martin Sebor Aug. 19, 2016, 3:29 p.m. UTC
> My biggest concern with this iteration is the tight integration between
> the optimization and warning.  We generally avoid that kind of tight
> integration such that enabling the warning does not affect the
> optimization and vice-versa.
>
> So ISTM you have to do the analysis if the optimization or warning has
> been requested.  Then you conditionalize whether or not the warnings are
> emitted by their flag and the optimization based on its flag.

As we discussed in IRC yesterday, the warning and the optimization
are independent of one another, and each controlled by its own option
(-Wformat-length and -fprintf-return-value).  In light of that we've
agreed that submitting both as part of the same patch is sufficient.

>
> I understand you're going to have some further work to do because of
> conflicts with David's patches.  With that in mind, I'd suggest a bit of
> carving things up so things can start moving forward.
>
>
> Patch #1.  All the fixes to static buffer sizes that were inspired by
> your warning.  These are all approved and can go in immediately.

Attached is this patch.

>
> Patch #2. Improvement to __builtin_object_size to handle
> POINTER_PLUS_EXPR on arrays.  This is something that stands on it own
> and ought to be reviewable quickly and doesn't really belong in the
> bowels of the warning/optimization patch you're developing.

Sure.  I'll submit this patch next.

>
> Patch #3. Core infrastructure and possibly the warning.  The reason I
> say possibly the warning is they may be intertwined enough that
> separating them makes more work than it saves.  I think the warning bits
> are largely ready to go and may just need twiddling due to conflicts
> with David's work.
>
> Patch #4. The optimizations you've got now which I'll want to take
> another look at.  Other than the overly tight integration with the
> warning, I don't see anything inherently wrong, but I would like to take
> another look at those once #1-#3 are done and dusted.

As we agreed, these will be submitted as one patch (probably
next week).

>
> Patch #5 and beyond: Further optimization work.

As one of the next steps I'd like to make this feature available
to user-defined sprintf-like functions decorated with attribute
format.  To do that, I'm thinking of adding either a fourth
(optional) argument to attribute format printf indicating which
of the function arguments is the destination buffer (to compute
its size), or perhaps a new attribute under its own name.  I'm
actually leaning toward latter since I think it could be used
in other contexts as well.  I welcome comments and suggestions
on this idea.

Thanks also for the rest of the detailed comments (snipped). I'll
also take care of those requests before I submit the next patch.

Martin

Comments

Jeff Law Aug. 19, 2016, 3:34 p.m. UTC | #1
On 08/19/2016 09:29 AM, Martin Sebor wrote:
>> My biggest concern with this iteration is the tight integration between
>> the optimization and warning.  We generally avoid that kind of tight
>> integration such that enabling the warning does not affect the
>> optimization and vice-versa.
>>
>> So ISTM you have to do the analysis if the optimization or warning has
>> been requested.  Then you conditionalize whether or not the warnings are
>> emitted by their flag and the optimization based on its flag.
>
> As we discussed in IRC yesterday, the warning and the optimization
> are independent of one another, and each controlled by its own option
> (-Wformat-length and -fprintf-return-value).  In light of that we've
> agreed that submitting both as part of the same patch is sufficient.
Right.  I must have mis-read something.  I'll look for the tidbit that 
made me think they were more intertwined than they really are.  It may 
be the case that we just want to tweak a comment.

>
>>
>> I understand you're going to have some further work to do because of
>> conflicts with David's patches.  With that in mind, I'd suggest a bit of
>> carving things up so things can start moving forward.
>>
>>
>> Patch #1.  All the fixes to static buffer sizes that were inspired by
>> your warning.  These are all approved and can go in immediately.
>
> Attached is this patch.
Approved.  Please install.


>
>>
>> Patch #2. Improvement to __builtin_object_size to handle
>> POINTER_PLUS_EXPR on arrays.  This is something that stands on it own
>> and ought to be reviewable quickly and doesn't really belong in the
>> bowels of the warning/optimization patch you're developing.
>
> Sure.  I'll submit this patch next.
Excellent.

>>
>> Patch #5 and beyond: Further optimization work.
>
> As one of the next steps I'd like to make this feature available
> to user-defined sprintf-like functions decorated with attribute
> format.  To do that, I'm thinking of adding either a fourth
> (optional) argument to attribute format printf indicating which
> of the function arguments is the destination buffer (to compute
> its size), or perhaps a new attribute under its own name.  I'm
> actually leaning toward latter since I think it could be used
> in other contexts as well.  I welcome comments and suggestions
> on this idea.
Whichever we think will be easier to use and thus encourage folks to 
annotate their code properly :-)

jeff
Trevor Saunders Aug. 20, 2016, 3:18 a.m. UTC | #2
> > > Patch #5 and beyond: Further optimization work.
> > 
> > As one of the next steps I'd like to make this feature available
> > to user-defined sprintf-like functions decorated with attribute
> > format.  To do that, I'm thinking of adding either a fourth
> > (optional) argument to attribute format printf indicating which
> > of the function arguments is the destination buffer (to compute
> > its size), or perhaps a new attribute under its own name.  I'm
> > actually leaning toward latter since I think it could be used
> > in other contexts as well.  I welcome comments and suggestions
> > on this idea.
> Whichever we think will be easier to use and thus encourage folks to
> annotate their code properly :-)

So, sort of related I've been thinking about writing C++ string building
classes some lately.  Where you'd have a fixed length buffer and a
pointer to the next buffer (so a degenerate rope with only one side of
the tree).  One use case for such a thing is building up strings of
assembly to output in gcc.  however one somewhat awkward bit is that we
often format assembly with printf, which isn't really great for this use
case, because you need to deal with the case the current buffer only has
space for part of the string you are formatting.  So it would be nice to
have something like an interuptable printf that tells you where in the
string it stopped formatting, and allows you to continue from there with
a new buffer.
 
 It also seems worth noting that in C++11 you can actually write a
 printf that is typesafe without the format attributes for example Tom
 has one here https://github.com/tromey/typesafe-printf.  I'm not sure
 putting such a thing in libc for C++ programs is a great idea because
 of compile time costs, but its tempting.

 Trev
Florian Weimer Aug. 22, 2016, 12:47 p.m. UTC | #3
On 08/20/2016 05:18 AM, Trevor Saunders wrote:
>>>> Patch #5 and beyond: Further optimization work.
>>>
>>> As one of the next steps I'd like to make this feature available
>>> to user-defined sprintf-like functions decorated with attribute
>>> format.  To do that, I'm thinking of adding either a fourth
>>> (optional) argument to attribute format printf indicating which
>>> of the function arguments is the destination buffer (to compute
>>> its size), or perhaps a new attribute under its own name.  I'm
>>> actually leaning toward latter since I think it could be used
>>> in other contexts as well.  I welcome comments and suggestions
>>> on this idea.
>> Whichever we think will be easier to use and thus encourage folks to
>> annotate their code properly :-)
>
> So, sort of related I've been thinking about writing C++ string building
> classes some lately.  Where you'd have a fixed length buffer and a
> pointer to the next buffer (so a degenerate rope with only one side of
> the tree).  One use case for such a thing is building up strings of
> assembly to output in gcc.  however one somewhat awkward bit is that we
> often format assembly with printf, which isn't really great for this use
> case, because you need to deal with the case the current buffer only has
> space for part of the string you are formatting.  So it would be nice to
> have something like an interuptable printf that tells you where in the
> string it stopped formatting, and allows you to continue from there with
> a new buffer.

glibc provides fopencookie, which can be used to print directly to a 
custom stream.  I don't know if it would provide any speed gains because 
of the internal complexities of libio.

It could be emulated using snprintf and a temporary buffer for non-glibc 
systems.

Florian
Ian Lance Taylor Sept. 9, 2016, 6:41 p.m. UTC | #4
On Fri, Aug 19, 2016 at 8:29 AM, Martin Sebor <msebor@gmail.com> wrote:
>> My biggest concern with this iteration is the tight integration between
>> the optimization and warning.  We generally avoid that kind of tight
>> integration such that enabling the warning does not affect the
>> optimization and vice-versa.
>>
>> So ISTM you have to do the analysis if the optimization or warning has
>> been requested.  Then you conditionalize whether or not the warnings are
>> emitted by their flag and the optimization based on its flag.
>
>
> As we discussed in IRC yesterday, the warning and the optimization
> are independent of one another, and each controlled by its own option
> (-Wformat-length and -fprintf-return-value).  In light of that we've
> agreed that submitting both as part of the same patch is sufficient.
>
>>
>> I understand you're going to have some further work to do because of
>> conflicts with David's patches.  With that in mind, I'd suggest a bit of
>> carving things up so things can start moving forward.
>>
>>
>> Patch #1.  All the fixes to static buffer sizes that were inspired by
>> your warning.  These are all approved and can go in immediately.
>
>
> Attached is this patch.

Hi, this patch changed the file gcc/go/gofrontend/expressions.cc.  As
described in gcc/go/gofrontend/README, the files in the directory
gcc/go/gofrontend should not be changed in the GCC repository.
Instead, they are mirrored into GCC from a separate repository,
https://go.googlesource.com/gofrontend .  When you need to make
changes in the gofrontend directory, you can either follow the
contribution process described as
https://golang.org/doc/gccgo_contribute.html, or simply ask me to make
the change.  Thanks.

Ian
Martin Sebor Sept. 9, 2016, 8:29 p.m. UTC | #5
>>> Patch #1.  All the fixes to static buffer sizes that were inspired by
>>> your warning.  These are all approved and can go in immediately.
>>
>>
>> Attached is this patch.
>
> Hi, this patch changed the file gcc/go/gofrontend/expressions.cc.  As
> described in gcc/go/gofrontend/README, the files in the directory
> gcc/go/gofrontend should not be changed in the GCC repository.
> Instead, they are mirrored into GCC from a separate repository,
> https://go.googlesource.com/gofrontend .  When you need to make
> changes in the gofrontend directory, you can either follow the
> contribution process described as
> https://golang.org/doc/gccgo_contribute.html, or simply ask me to make
> the change.  Thanks.

Sorry if this caused an inconvenience. I'll keep it in mind in
the future.

Thanks
Martin
diff mbox

Patch

gcc/c-family/ChangeLog:
2016-08-18  Martin Sebor  <msebor@redhat.com>

	* c-ada-spec.c (dump_ada_function_declaration): Increase buffer
	size to guarantee it fits the output of the formatted function
	regardless of its arguments.

gcc/cp/ChangeLog:
2016-08-18  Martin Sebor  <msebor@redhat.com>

	* mangle.c: Increase buffer size to guarantee it fits the output
	of the formatted function regardless of its arguments.

gcc/go/ChangeLog:
2016-08-18  Martin Sebor  <msebor@redhat.com>

	* gofrontend/expressions.cc: Increase buffer size to guarantee
	it fits the output of the formatted function regardless of its
	arguments.

gcc/java/ChangeLog:
2016-08-18  Martin Sebor  <msebor@redhat.com>

	* decl.c (give_name_to_locals): Increase buffer size to guarantee
	it fits the output of the formatted function regardless of its
	arguments.
	* mangle_name.c (append_unicode_mangled_name): Same.

gcc/ChangeLog:
2016-08-18  Martin Sebor  <msebor@redhat.com>

	* genmatch.c (parser::parse_expr): Increase buffer size to guarantee
	it fits the output of the formatted function regardless of its
	arguments.
	* gcc/genmodes.c (parser::parse_expr): Same.
	* gimplify.c (gimplify_asm_expr): Same.
	* passes.c (pass_manager::register_one_dump_file): Same.
	* print-tree.c (print_node): Same.

diff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index a4e0c38..6a8e04b 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -1603,7 +1603,7 @@  dump_ada_function_declaration (pretty_printer *buffer, tree func,
 {
   tree arg;
   const tree node = TREE_TYPE (func);
-  char buf[16];
+  char buf[17];
   int num = 0, num_args = 0, have_args = true, have_ellipsis = false;
 
   /* Compute number of arguments.  */
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index d8b5c45..5859d62 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -1740,7 +1740,9 @@  static void
 write_real_cst (const tree value)
 {
   long target_real[4];  /* largest supported float */
-  char buffer[9];       /* eight hex digits in a 32-bit number */
+  /* Buffer for eight hex digits in a 32-bit number but big enough
+     even for 64-bit long to avoid warnings.  */
+  char buffer[17];
   int i, limit, dir;
 
   tree type = TREE_TYPE (value);
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 02e945a..6195a3b 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -4051,7 +4051,8 @@  parser::parse_expr ()
   else if (force_capture)
     {
       unsigned num = capture_ids->elements ();
-      char id[8];
+      /* Big enough for a 32-bit UINT_MAX plus prefix.  */
+      char id[13];
       bool existed;
       sprintf (id, "__%u", num);
       capture_ids->get_or_insert (xstrdup (id), &existed);
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 1170d4f..92ca055 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -486,7 +486,8 @@  make_vector_modes (enum mode_class cl, unsigned int width,
 {
   struct mode_data *m;
   struct mode_data *v;
-  char buf[8];
+  /* Big enough for a 32-bit UINT_MAX plus the text.  */
+  char buf[12];
   unsigned int ncomponents;
   enum mode_class vclass = vector_class (cl);
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 1e43dbb..25cd019 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5346,7 +5346,8 @@  gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 	     flexibility, split it into separate input and output
  	     operands.  */
 	  tree input;
-	  char buf[10];
+	  /* Buffer big enough to format a 32-bit UINT_MAX into.  */
+	  char buf[11];
 
 	  /* Turn the in/out constraint into an output constraint.  */
 	  char *p = xstrdup (constraint);
@@ -5356,7 +5357,7 @@  gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 	  /* And add a matching input constraint.  */
 	  if (allows_reg)
 	    {
-	      sprintf (buf, "%d", i);
+	      sprintf (buf, "%u", i);
 
 	      /* If there are multiple alternatives in the constraint,
 		 handle each of them individually.  Those that allow register
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index bdc14aa..2c76b47 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -9050,7 +9050,8 @@  Call_expression::do_flatten(Gogo* gogo, Named_object*,
       Location loc = this->location();
 
       int i = 0;
-      char buf[10];
+      /* Buffer large enough for INT_MAX plus the prefix.  */
+      char buf[14];
       for (Typed_identifier_list::const_iterator p = results->begin();
            p != results->end();
            ++p, ++i)
diff --git a/gcc/java/decl.c b/gcc/java/decl.c
index 36989d3..70eac31 100644
--- a/gcc/java/decl.c
+++ b/gcc/java/decl.c
@@ -1721,7 +1721,8 @@  give_name_to_locals (JCF *jcf)
 	    DECL_NAME (parm) = get_identifier ("this");
 	  else
 	    {
-	      char buffer[12];
+	      /* Buffer large enough for INT_MAX plus prefix.  */
+	      char buffer[15];
 	      sprintf (buffer, "ARG_%d", arg_i);
 	      DECL_NAME (parm) = get_identifier (buffer);
 	    }
diff --git a/gcc/java/mangle_name.c b/gcc/java/mangle_name.c
index 00374db..7627c5d 100644
--- a/gcc/java/mangle_name.c
+++ b/gcc/java/mangle_name.c
@@ -231,7 +231,8 @@  void
 append_gpp_mangled_name (const char *name, int len)
 {
   int encoded_len, needs_escapes;
-  char buf[6];
+  /* Buffer large enough for INT_MIN.  */
+  char buf[9];
 
   MANGLE_CXX_KEYWORDS (name, len);
 
@@ -270,7 +271,8 @@  append_unicode_mangled_name (const char *name, int len)
       /* Everything else needs encoding */
       else
 	{
-	  char buf [9];
+	  /* Buffer large enough for UINT_MAX plus the prefix.  */
+	  char buf [13];
 	  if (ch == '_' || ch == 'U')
 	    {
 	      /* Prepare to recognize __U */
diff --git a/gcc/passes.c b/gcc/passes.c
index c7d7dbe..07ebf8b 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -771,7 +771,9 @@  pass_manager::register_one_dump_file (opt_pass *pass)
 {
   char *dot_name, *flag_name, *glob_name;
   const char *name, *full_name, *prefix;
-  char num[10];
+
+  /* Buffer big enough to format a 32-bit UINT_MAX into.  */
+  char num[11];
   int flags, id;
   int optgroup_flags = OPTGROUP_NONE;
   gcc::dump_manager *dumps = m_ctxt->get_dumps ();
@@ -779,7 +781,7 @@  pass_manager::register_one_dump_file (opt_pass *pass)
   /* See below in next_pass_1.  */
   num[0] = '\0';
   if (pass->static_pass_number != -1)
-    sprintf (num, "%d", ((int) pass->static_pass_number < 0
+    sprintf (num, "%u", ((int) pass->static_pass_number < 0
 			 ? 1 : pass->static_pass_number));
 
   /* The name is both used to identify the pass for the purposes of plugins,
diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index 468f1ff..d69bf2a 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -694,8 +694,10 @@  print_node (FILE *file, const char *prefix, tree node, int indent)
 	  i = 0;
 	  FOR_EACH_CALL_EXPR_ARG (arg, iter, node)
 	    {
-	      char temp[10];
-	      sprintf (temp, "arg %d", i);
+	      /* Buffer big enough to format a 32-bit UINT_MAX into, plus
+		 the text.  */
+	      char temp[15];
+	      sprintf (temp, "arg %u", i);
 	      print_node (file, temp, arg, indent + 4);
 	      i++;
 	    }
@@ -706,7 +708,9 @@  print_node (FILE *file, const char *prefix, tree node, int indent)
 
 	  for (i = 0; i < len; i++)
 	    {
-	      char temp[10];
+	      /* Buffer big enough to format a 32-bit UINT_MAX into, plus
+		 the text.  */
+	      char temp[15];
 
 	      sprintf (temp, "arg %d", i);
 	      print_node (file, temp, TREE_OPERAND (node, i), indent + 4);
@@ -814,7 +818,9 @@  print_node (FILE *file, const char *prefix, tree node, int indent)
 	  for (i = 0; i < len; i++)
 	    if (TREE_VEC_ELT (node, i))
 	      {
-		char temp[10];
+	      /* Buffer big enough to format a 32-bit UINT_MAX into, plus
+		 the text.  */
+		char temp[15];
 		sprintf (temp, "elt %d", i);
 		print_node (file, temp, TREE_VEC_ELT (node, i), indent + 4);
 	      }