diff mbox

Allocate extra 16 bytes for -fsanitize=address

Message ID 20121123172337.GA26106@gmail.com
State New
Headers show

Commit Message

H.J. Lu Nov. 23, 2012, 5:23 p.m. UTC
Hi,

This patch allocates extra 16 bytes for -fsanitize=address so that
asan won't report read beyond memory buffer. It is used by
bootstrap-asan.  OK to install?

Thanks.


H.J.
---
2012-11-21  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/55380
	* charset.c (_cpp_convert_input): Allocate extra 16 bytes for
	-fsanitize=address if __SANITIZE_ADDRESS__ is defined.

Comments

Jakub Jelinek Nov. 23, 2012, 5:38 p.m. UTC | #1
On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
> This patch allocates extra 16 bytes for -fsanitize=address so that
> asan won't report read beyond memory buffer. It is used by
> bootstrap-asan.  OK to install?

As valgrind warns about that too, I'd say we should do that unconditionally,
the additional 16-bytes just aren't a big deal.
But, _cpp_convert_input isn't the only place which needs that, IMHO you want
to also change the caller, read_file_guts, where it does
  buf = XNEWVEC (uchar, size + 1);
and
          buf = XRESIZEVEC (uchar, buf, size + 1);
I'll defer the review to Tom though.

> 2012-11-21  H.J. Lu  <hongjiu.lu@intel.com>
> 
> 	PR bootstrap/55380
> 	* charset.c (_cpp_convert_input): Allocate extra 16 bytes for
> 	-fsanitize=address if __SANITIZE_ADDRESS__ is defined.
> 
> diff --git a/libcpp/charset.c b/libcpp/charset.c
> index cba19a6..dea8bb1 100644
> --- a/libcpp/charset.c
> +++ b/libcpp/charset.c
> @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
>      iconv_close (input_cset.cd);
>  
>    /* Resize buffer if we allocated substantially too much, or if we
> -     haven't enough space for the \n-terminator.  */
> +     haven't enough space for the \n-terminator.  Allocate extra 16
> +     bytes for -fsanitize=address.  */
>    if (to.len + 4096 < to.asize || to.len >= to.asize)
> -    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
> +    {
> +#ifdef __SANITIZE_ADDRESS__
> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
> +#else
> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
> +#endif
> +    }
>  
>    /* If the file is using old-school Mac line endings (\r only),
>       terminate with another \r, not an \n, so that we do not mistake

	Jakub
H.J. Lu Nov. 23, 2012, 6:08 p.m. UTC | #2
On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
>> This patch allocates extra 16 bytes for -fsanitize=address so that
>> asan won't report read beyond memory buffer. It is used by
>> bootstrap-asan.  OK to install?
>
> As valgrind warns about that too, I'd say we should do that unconditionally,
> the additional 16-bytes just aren't a big deal.

This isn't sufficient for valgrind since valgrind will also report
reading uninitialized data, which requires additional memset
on extra 16 bytes.

> But, _cpp_convert_input isn't the only place which needs that, IMHO you want

This patch is sufficient to compile libcpp with -fsanitize=address.

> to also change the caller, read_file_guts, where it does
>   buf = XNEWVEC (uchar, size + 1);
> and
>           buf = XRESIZEVEC (uchar, buf, size + 1);

I don't think it is necessary since there is no real data in
those extra 16 bytes.  They can have garbage and libcpp still
functions correctly.  They are purely used to silence ASAN.

> I'll defer the review to Tom though.
>
>> 2012-11-21  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>       PR bootstrap/55380
>>       * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
>>       -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
>>
>> diff --git a/libcpp/charset.c b/libcpp/charset.c
>> index cba19a6..dea8bb1 100644
>> --- a/libcpp/charset.c
>> +++ b/libcpp/charset.c
>> @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
>>      iconv_close (input_cset.cd);
>>
>>    /* Resize buffer if we allocated substantially too much, or if we
>> -     haven't enough space for the \n-terminator.  */
>> +     haven't enough space for the \n-terminator.  Allocate extra 16
>> +     bytes for -fsanitize=address.  */
>>    if (to.len + 4096 < to.asize || to.len >= to.asize)
>> -    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
>> +    {
>> +#ifdef __SANITIZE_ADDRESS__
>> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
>> +#else
>> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
>> +#endif
>> +    }
>>
>>    /* If the file is using old-school Mac line endings (\r only),
>>       terminate with another \r, not an \n, so that we do not mistake
>
>         Jakub
Konstantin Serebryany Nov. 23, 2012, 6:12 p.m. UTC | #3
On Fri, Nov 23, 2012 at 10:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
>>> This patch allocates extra 16 bytes for -fsanitize=address so that
>>> asan won't report read beyond memory buffer. It is used by
>>> bootstrap-asan.  OK to install?
>>
>> As valgrind warns about that too, I'd say we should do that unconditionally,
>> the additional 16-bytes just aren't a big deal.
>
> This isn't sufficient for valgrind since valgrind will also report
> reading uninitialized data, which requires additional memset
> on extra 16 bytes.

Valgrind should not report reading uninitialized data if that data is
actually not used later (extra bytes are cleared by AND-ing with
zeroes).
If the code was mine, I would simply use (to.len + 17) w/o any conditionals.

--kcc


>
>> But, _cpp_convert_input isn't the only place which needs that, IMHO you want
>
> This patch is sufficient to compile libcpp with -fsanitize=address.
>
>> to also change the caller, read_file_guts, where it does
>>   buf = XNEWVEC (uchar, size + 1);
>> and
>>           buf = XRESIZEVEC (uchar, buf, size + 1);
>
> I don't think it is necessary since there is no real data in
> those extra 16 bytes.  They can have garbage and libcpp still
> functions correctly.  They are purely used to silence ASAN.
>
>> I'll defer the review to Tom though.
>>
>>> 2012-11-21  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>       PR bootstrap/55380
>>>       * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
>>>       -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
>>>
>>> diff --git a/libcpp/charset.c b/libcpp/charset.c
>>> index cba19a6..dea8bb1 100644
>>> --- a/libcpp/charset.c
>>> +++ b/libcpp/charset.c
>>> @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
>>>      iconv_close (input_cset.cd);
>>>
>>>    /* Resize buffer if we allocated substantially too much, or if we
>>> -     haven't enough space for the \n-terminator.  */
>>> +     haven't enough space for the \n-terminator.  Allocate extra 16
>>> +     bytes for -fsanitize=address.  */
>>>    if (to.len + 4096 < to.asize || to.len >= to.asize)
>>> -    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
>>> +    {
>>> +#ifdef __SANITIZE_ADDRESS__
>>> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
>>> +#else
>>> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
>>> +#endif
>>> +    }
>>>
>>>    /* If the file is using old-school Mac line endings (\r only),
>>>       terminate with another \r, not an \n, so that we do not mistake
>>
>>         Jakub
>
>
>
> --
> H.J.
Jakub Jelinek Nov. 23, 2012, 6:12 p.m. UTC | #4
On Fri, Nov 23, 2012 at 10:08:11AM -0800, H.J. Lu wrote:
> > to also change the caller, read_file_guts, where it does
> >   buf = XNEWVEC (uchar, size + 1);
> > and
> >           buf = XRESIZEVEC (uchar, buf, size + 1);
> 
> I don't think it is necessary since there is no real data in
> those extra 16 bytes.  They can have garbage and libcpp still
> functions correctly.  They are purely used to silence ASAN.

The thing is, if the buf from the caller has such size/total
combination that if (to.len + 4096 < to.asize || to.len >= to.asize)
isn't true, there won't be any reallocation and the buffer passed
in from the caller will be used instead.  And, if it doesn't have those
extra 16 bytes, it will still result in asan warning...
Guess you need to read file from stdin instead of a file for that
to trigger that.

For valgrind I bet just clearing those 16 bytes might still be cheap enough
not to worry about it.

> > I'll defer the review to Tom though.
> >
> >> 2012-11-21  H.J. Lu  <hongjiu.lu@intel.com>
> >>
> >>       PR bootstrap/55380
> >>       * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
> >>       -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
> >>
> >> diff --git a/libcpp/charset.c b/libcpp/charset.c
> >> index cba19a6..dea8bb1 100644
> >> --- a/libcpp/charset.c
> >> +++ b/libcpp/charset.c
> >> @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
> >>      iconv_close (input_cset.cd);
> >>
> >>    /* Resize buffer if we allocated substantially too much, or if we
> >> -     haven't enough space for the \n-terminator.  */
> >> +     haven't enough space for the \n-terminator.  Allocate extra 16
> >> +     bytes for -fsanitize=address.  */
> >>    if (to.len + 4096 < to.asize || to.len >= to.asize)
> >> -    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
> >> +    {
> >> +#ifdef __SANITIZE_ADDRESS__
> >> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
> >> +#else
> >> +      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
> >> +#endif
> >> +    }
> >>
> >>    /* If the file is using old-school Mac line endings (\r only),
> >>       terminate with another \r, not an \n, so that we do not mistake

	Jakub
H.J. Lu Nov. 23, 2012, 6:17 p.m. UTC | #5
On Fri, Nov 23, 2012 at 10:12 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 10:08:11AM -0800, H.J. Lu wrote:
>> > to also change the caller, read_file_guts, where it does
>> >   buf = XNEWVEC (uchar, size + 1);
>> > and
>> >           buf = XRESIZEVEC (uchar, buf, size + 1);
>>
>> I don't think it is necessary since there is no real data in
>> those extra 16 bytes.  They can have garbage and libcpp still
>> functions correctly.  They are purely used to silence ASAN.
>
> The thing is, if the buf from the caller has such size/total
> combination that if (to.len + 4096 < to.asize || to.len >= to.asize)
> isn't true, there won't be any reallocation and the buffer passed
> in from the caller will be used instead.  And, if it doesn't have those
> extra 16 bytes, it will still result in asan warning...
> Guess you need to read file from stdin instead of a file for that
> to trigger that.

I see.  I will double check and update my patch.

> For valgrind I bet just clearing those 16 bytes might still be cheap enough
> not to worry about it.

This is what I did for valgrind:

http://gcc.gnu.org/git/?p=gcc.git;a=blobdiff;f=libcpp/charset.c;h=16a37c36f0cbf7a69c93ae7acb5d79893d57b643;hp=cba19a67178c796dcef1c8c70ac5c43dcbc69071;hb=8ab0e2cd4ae0fae01d0b84d6ccc382acb81ab876;hpb=e5e0d29f8b8ccff799a26fc3e6435af8dbf358fc

   /* Resize buffer if we allocated substantially too much, or if we
-     haven't enough space for the \n-terminator.  */
+     haven't enough space for the \n-terminator.  Allocate and
+     initialize extra 16 bytes for --enable-checking=valgrind.  */
   if (to.len + 4096 < to.asize || to.len >= to.asize)
-    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+    {
+#ifdef ENABLE_VALGRIND_CHECKING
+      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
+      memset (to.text + to.len + 1, 0, 16);
+#else
+      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+#endif
+    }

I will check if I need to update it for stdin.

Thanks.
Hans-Peter Nilsson Nov. 24, 2012, 10:52 a.m. UTC | #6
On Fri, 23 Nov 2012, H.J. Lu wrote:
> On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
> >> This patch allocates extra 16 bytes for -fsanitize=address so that
> >> asan won't report read beyond memory buffer. It is used by
> >> bootstrap-asan.  OK to install?
> >
> > As valgrind warns about that too, I'd say we should do that unconditionally,
> > the additional 16-bytes just aren't a big deal.
>
> This isn't sufficient for valgrind since valgrind will also report
> reading uninitialized data,

Only if that initialized data is actually used in a conditional.

> which requires additional memset
> on extra 16 bytes.

(For plain operations, valgrind just operate on, typically
passing along, the undefinedness.)

Maybe that's what you meant, in which case my comment just
clarifies what I saw as an ambiguity in your statement.

brgds, H-P
diff mbox

Patch

diff --git a/libcpp/charset.c b/libcpp/charset.c
index cba19a6..dea8bb1 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -1729,9 +1729,16 @@  _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
     iconv_close (input_cset.cd);
 
   /* Resize buffer if we allocated substantially too much, or if we
-     haven't enough space for the \n-terminator.  */
+     haven't enough space for the \n-terminator.  Allocate extra 16
+     bytes for -fsanitize=address.  */
   if (to.len + 4096 < to.asize || to.len >= to.asize)
-    to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+    {
+#ifdef __SANITIZE_ADDRESS__
+      to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
+#else
+      to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+#endif
+    }
 
   /* If the file is using old-school Mac line endings (\r only),
      terminate with another \r, not an \n, so that we do not mistake