diff mbox

- improve sprintf buffer overflow detection (middle-end/49905)

Message ID 1474050251.6782.70.camel@redhat.com
State New
Headers show

Commit Message

David Malcolm Sept. 16, 2016, 6:24 p.m. UTC
On Sun, 2016-09-11 at 20:03 -0600, Martin Sebor wrote:
> On 09/08/2016 04:10 PM, Joseph Myers wrote:
> > On Thu, 8 Sep 2016, Martin Sebor wrote:
> > 
> > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > > index da133a4..4607495 100644
> > > --- a/gcc/doc/tm.texi.in
> > > +++ b/gcc/doc/tm.texi.in
> > > @@ -4081,6 +4081,13 @@ In either case, it remains possible to
> > > select code-generation for the alternate
> > >   scheme, by means of compiler command line switches.
> > >   @end defmac
> > > 
> > > +@deftypefn {Target Hook} {const char *}
> > > TARGET_LIBC_PRINTF_POINTER_FORMAT (tree, const char **@var{flags}
> > > )
> > > +A hook to determine the target @code{printf} implementation
> > > format string
> > > +that the most closely corresponds to the @code{%p} format
> > > directive.
> > > +The object pointed to by the @var{flags} is set to a string
> > > consisting
> > > +of recognized format flags such as the @code{'#'} character.
> > > +@end deftypefn
> > 
> > No, the substance of hook documentation should go in target.def
> > with just
> > an @hook line in tm.texi.in leading to the documentation going in
> > tm.texi
> > automatically.
> > 
> > You appear to be defining a target macro masquerading as a hook. 
> >  Please
> > don't (new target macros should be avoided where possible); use a
> > proper
> > hook.  (Maybe the settings depending on OS rather than architecture
> > means
> > it needs to be one of those whose default is a manual setting in
> > target-def.h rather than automatically generated, but that should
> > be the
> > limit of deviation from the normal workings of hooks.)
> > 
> > > +  const char *pfmt = TARGET_LIBC_PRINTF_POINTER_FORMAT (arg,
> > > &flags);
> > 
> > With a proper hook them you'd call
> > targetm.libc_printf_pointer_format.
> > 
> > > +	inform (callloc,
> > > +		(nbytes + exact == 1
> > > +		 ? "format output %wu byte into a destination of
> > > size %wu"
> > > +		 : "format output %wu bytes into a destination
> > > of size %wu"),
> > > +		nbytes + exact, info.objsize);
> > 
> > You need to use G_() around both format strings in such a case;
> > xgettext
> > doesn't know how to extract them both.
> 
> Attached is revision 8 of the patch (hopefully) adjusted as
> requested above, and with a simple test with of the multiline
> diagnostic directives suggested by David.  This revision also
> enables the -fprintf-return-value option by default.  The libgomp
> test failures I was seeing in my earlier testing must have been
> caused by an older version of GMP or MPFR that I had inadvertently
> use (normally I use in-tree versions downloaded and expanded there
> by the download_prerequisites script but that time I forgot that
> step).
> 
> David, in the way of feedback, I spent hours debugging the simple
> multiline test I added.  It seems that DejaGnu is exquisitely
> sensitive to whitespaces in the multiline output.  I appreciate
> that spacing is essential to lining up the caret and the tildes
> with the text of the program but tests fail even when all the
> expected output is lined up right in the multiline directive but
> off by a space (or tab) with respect to the actual output.  That
> DejaGnu output doesn't make this very clear at all makes debugging
> these types of issues a pain.  I wonder if there's a better to do
> this.

Gah; I'm sorry this was such a pain to do.

I tend to just copy&paste the stuff I want to quote directly from
Emacs's compilation buffer.

However a nit, which I think is why you had problems...



You have two pairs of dg-begin/end-multiline-output, but I think it's
better to have four; use a single begin/end pair for each call to
diagnostic_show_locus.  Hopefully doing that ought to make things a bit
less sensitive to whitespace, and easier to debug: each begin/end can
be handled by itself, whereas by combining them it's harder for
multiline.exp to detect them.  If combined, they can only be detected
if all of the dg-warning/dg-messages work (and are thus pruned by
prune.exp), if any aren't pruned, the multiline outputs will also fail.
 Maybe this exacerbated the problem?

So something like this, grouping the 4 diagnostics together with their
diagnostic_show_locus output by opening and closing comments (line
numbers will need adjusting):

+void test (void)
+{
+  sprintf (dst + 7, "%-s", "1");
+  /* { dg-warning
"writing a terminating nul past the end of the destination" "" { target
*-*-*-* } 10 }
+     { dg-begin-multiline-output "" }
+   sprintf (dst +
7, "%-s", "1");
+                     ^~~~~
+     { dg-end-multiline
-output "" } */
+  /* { dg-message "format output 2 bytes into a
destination of size 1" "" { target *-*-*-* } 10 }
+     { dg-begin
-multiline-output "" }
+   sprintf (dst + 7, "%-s", "1");
+  
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     { dg-end-multiline-output "" } */
+
+
  sprintf (dst + 7, "%-s", "abcd");
+  /* { dg-warning ".%-s. directive
writing 4 bytes into a region of size 1" "" { target *-*-*-* } 20 }
+   
  { dg-begin-multiline-output "" }
+   sprintf (dst + 7, "%-s", "abcd");

+                      ^~~   ~~~~~~
+     { dg-end-multiline-output "" }
*/
+  /* { dg-message "format output 5 bytes into a destination of size
1" "" { target *-*-*-* } 20 }
+     { dg-begin-multiline-output "" }
+  
 sprintf (dst + 7, "%-s", "abcd");
+   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
     { dg-end-multiline-output "" } */
+}

Another nit, if I may: FWIW I'm not in love with the wording of the messages.  Sorry to bikeshed, but how about:
  warning: buffer overflow will occur when writing terminating NUL
and:
  note: formatted output of 2 bytes into a destination of size 1
or somesuch.

Comments

Martin Sebor Sept. 20, 2016, 6:02 p.m. UTC | #1
>> David, in the way of feedback, I spent hours debugging the simple
>> multiline test I added.  It seems that DejaGnu is exquisitely
>> sensitive to whitespaces in the multiline output.  I appreciate
>> that spacing is essential to lining up the caret and the tildes
>> with the text of the program but tests fail even when all the
>> expected output is lined up right in the multiline directive but
>> off by a space (or tab) with respect to the actual output.  That
>> DejaGnu output doesn't make this very clear at all makes debugging
>> these types of issues a pain.  I wonder if there's a better to do
>> this.
>
> Gah; I'm sorry this was such a pain to do.
>
> I tend to just copy&paste the stuff I want to quote directly from
> Emacs's compilation buffer.

Yes, I did start with that.  The problem is that I have a script
that I run to help massage my patches to the format required by
the GCC Coding Conventions.  The script mainly replaces blocks
of spaces with tabs. So while my initial patch worked, the final
one didn't because of the tabs, and it took me hours to figure
out why.  Because I couldn't see the difference I thought it was
a bug in DejaGnu or some other tool that was different between
the two machines I was using for testing the two patches.  Until
it occurred to me to diff the test itself... (FWIW, I think this
style convention is bad and should be abolished in favor of using
only plain spaces.)

>
> However a nit, which I think is why you had problems...
>
...
>
> You have two pairs of dg-begin/end-multiline-output, but I think it's
> better to have four; use a single begin/end pair for each call to
> diagnostic_show_locus.  Hopefully doing that ought to make things a bit
> less sensitive to whitespace, and easier to debug: each begin/end can
> be handled by itself, whereas by combining them it's harder for
> multiline.exp to detect them.  If combined, they can only be detected
> if all of the dg-warning/dg-messages work (and are thus pruned by
> prune.exp), if any aren't pruned, the multiline outputs will also fail.
>  Maybe this exacerbated the problem?

Maybe.  I remember experimenting with the multiline directives when
I first added the test and was struggling to get it to work.  I think
the problems I was having were unrelated to tabs but probably had
something to do with the exact number of leading spaces.

FWIW, to make the multiline directives less prone to this type of
a problem it seems that instead of counting each leading space and
tab character and treating them as ordinary they could verify that
the first non-whitespace character on each line of the actual output
lines up with the first non-whitespace character of the expected
output with some constant offset.  That way spaces vs tabs shouldn't
matter as long as they were used consistently (or they could be made
not to matter at all if a tab was treated as a single space).  What
do you think about that?

>
> So something like this, grouping the 4 diagnostics together with their
> diagnostic_show_locus output by opening and closing comments (line
> numbers will need adjusting):

Thanks.  Let me apply it and see how it works.

>
> Another nit, if I may: FWIW I'm not in love with the wording of the messages.  Sorry to bikeshed, but how about:
>   warning: buffer overflow will occur when writing terminating NUL
> and:
>   note: formatted output of 2 bytes into a destination of size 1
> or somesuch.

I won't claim the text of the messages is perfect but I did spend
a lot of time tweaking them.  That's not to say they can't be
improved but changing them means quite a bit of work adjusting
the tests.  At this point, I'd like to focus on getting the patch
committed.  After that, if there's still time, I'm happy to take
feedback and tweak the diagnostics based on it.

Thanks again for your help and the suggestions above!

Martin
Gerald Pfeifer Sept. 21, 2016, 7:40 p.m. UTC | #2
I noticed the following bootstrap failure on i?86-unknown-freebsd
that started in the last 24 hours:

/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c: In member function ‘void vec_usage::dump(mem_location*, mem_usage&) const’:
/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:79:3: error: ‘%s’ directive writing between 0 and 4294967295 bytes into a region of size 4096 [-Werror=format-length=]
   dump (mem_location *loc, mem_usage &total) const
   ^~~~
/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:83:36: note: format output between 6 and4294967311 bytes into a destination of size 4096
       loc->m_line, loc->m_function);
                                    ^
cc1plus: all warnings being treated as errors
gmake[3]: *** [Makefile:2557: build/vec.o] Error 1
gmake[3]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705/gcc'
gmake[2]: *** [Makefile:4612: all-stage2-gcc] Error 2
gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705'
gmake[1]: *** [Makefile:24365: stage2-bubble] Error 2

Is it possible that is related to your warning patches?

Gerald
Martin Sebor Sept. 21, 2016, 7:55 p.m. UTC | #3
On 09/21/2016 01:40 PM, Gerald Pfeifer wrote:
> I noticed the following bootstrap failure on i?86-unknown-freebsd
> that started in the last 24 hours:
>
> /scratch/tmp/gerald/gcc-HEAD/gcc/vec.c: In member function ‘void vec_usage::dump(mem_location*, mem_usage&) const’:
> /scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:79:3: error: ‘%s’ directive writing between 0 and 4294967295 bytes into a region of size 4096 [-Werror=format-length=]
>    dump (mem_location *loc, mem_usage &total) const
>    ^~~~
> /scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:83:36: note: format output between 6 and4294967311 bytes into a destination of size 4096
>        loc->m_line, loc->m_function);
>                                     ^
> cc1plus: all warnings being treated as errors
> gmake[3]: *** [Makefile:2557: build/vec.o] Error 1
> gmake[3]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705/gcc'
> gmake[2]: *** [Makefile:4612: all-stage2-gcc] Error 2
> gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705'
> gmake[1]: *** [Makefile:24365: stage2-bubble] Error 2
>
> Is it possible that is related to your warning patches?

Yes, this is likely the same bug as mentioned in comment #6 on
pr77676.  The bug in the comment ILP32-specific and only tangentially
related to the PR itself.  I'm testing the patch that's attached to
the PR that should fix both of these problems.  I don't have access
to i?86-unknown-freebsd so if you could help validate it there I'd
be grateful.  (The patch just successfully bootstrapped on
i386-pc-linux-gnu.)

Thanks and sorry for the breakage.
Martin
Jakub Jelinek Sept. 21, 2016, 8:55 p.m. UTC | #4
On Wed, Sep 21, 2016 at 01:55:33PM -0600, Martin Sebor wrote:
> On 09/21/2016 01:40 PM, Gerald Pfeifer wrote:
> >I noticed the following bootstrap failure on i?86-unknown-freebsd
> >that started in the last 24 hours:
> >
> >/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c: In member function ‘void vec_usage::dump(mem_location*, mem_usage&) const’:
> >/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:79:3: error: ‘%s’ directive writing between 0 and 4294967295 bytes into a region of size 4096 [-Werror=format-length=]
> >   dump (mem_location *loc, mem_usage &total) const
> >   ^~~~
> >/scratch/tmp/gerald/gcc-HEAD/gcc/vec.c:83:36: note: format output between 6 and4294967311 bytes into a destination of size 4096
> >       loc->m_line, loc->m_function);
> >                                    ^
> >cc1plus: all warnings being treated as errors
> >gmake[3]: *** [Makefile:2557: build/vec.o] Error 1
> >gmake[3]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705/gcc'
> >gmake[2]: *** [Makefile:4612: all-stage2-gcc] Error 2
> >gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ-0921-1705'
> >gmake[1]: *** [Makefile:24365: stage2-bubble] Error 2
> >
> >Is it possible that is related to your warning patches?
> 
> Yes, this is likely the same bug as mentioned in comment #6 on
> pr77676.  The bug in the comment ILP32-specific and only tangentially
> related to the PR itself.  I'm testing the patch that's attached to
> the PR that should fix both of these problems.  I don't have access
> to i?86-unknown-freebsd so if you could help validate it there I'd
> be grateful.  (The patch just successfully bootstrapped on
> i386-pc-linux-gnu.)

Looking at target_int_max you are using in the new patch:
static unsigned HOST_WIDE_INT
target_int_max ()
{
  static const unsigned HOST_WIDE_INT int_max
    = HOST_WIDE_INT_M1U >> (sizeof int_max * CHAR_BIT
                            - TYPE_PRECISION (integer_type_node) + 1);
  return int_max;
}

1) sizeof int_max * CHAR_BIT should IMNSHO be HOST_BITS_PER_WIDE_INT
2) why is the var static, subtraction and shift is very cheap, while C++
   local statics are expensive?  It needs a guard variable,
   __cxa_guard_acquire, __cxa_guard_release calls, etc.

	Jakub
Gerald Pfeifer Sept. 21, 2016, 9:08 p.m. UTC | #5
On Wed, 21 Sep 2016, Martin Sebor wrote:
>> Is it possible that is related to your warning patches?
> Yes, this is likely the same bug as mentioned in comment #6 on
> pr77676.  The bug in the comment ILP32-specific and only tangentially
> related to the PR itself.  I'm testing the patch that's attached to
> the PR that should fix both of these problems.  I don't have access
> to i?86-unknown-freebsd so if you could help validate it there I'd
> be grateful.

Yes, with this patch I was able to bootstrap i?86-unknown-freebsd11
again.

Thanks for looking into this so quickly, Martin!

Gerald
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
new file mode 100644
index 0000000..a601ba4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
@@ -0,0 +1,29 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret -ftrack-macro-expansion=0" } */
+
+extern int sprintf (char*, const char*, ...);
+
+char dst [8];
+
+void test (void)
+{
+  sprintf (dst + 7, "%-s", "1");
+  /* { dg-warning "writing a terminating nul past the end of the destination" "" { target *-*-*-* } 10 }
+     { dg-message "format output 2 bytes into a destination of size 1" "" { target *-*-*-* } 10 }
+    { dg-begin-multiline-output "" }
+   sprintf (dst + 7, "%-s", "1");
+                     ^~~~~
+   sprintf (dst + 7, "%-s", "1");
+   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    { dg-end-multiline-output "" } */
+
+  sprintf (dst + 7, "%-s", "abcd");
+  /* { dg-warning ".%-s. directive writing 4 bytes into a region of size 1" "" { target *-*-*-* } 20 }
+     { dg-message "format output 5 bytes into a destination of size 1" "" { target *-*-*-* } 20 }
+    { dg-begin-multiline-output "" }
+   sprintf (dst + 7, "%-s", "abcd");
+                      ^~~   ~~~~~~
+   sprintf (dst + 7, "%-s", "abcd");
+   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    { dg-end-multiline-output "" } */
+}