diff mbox

libsanitizer merge from upstream r173241

Message ID 20130211113800.GZ4385@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Feb. 11, 2013, 11:38 a.m. UTC
On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote:
> > > What if glibc adds a scanf hook (like it has already printf hooks), apps
> > > could then register their own stuff and the above would then break.  It
> > > really should be very conservative, and should be checked e.g. with all
> > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c,
> > > stdio-common/tst-sscanf.c).
> 
> I'll add support for the missing %specs. About the testing, these
> files seem like GPL, so I'd prefer not to look at them at all. We've
> got a smallish test for the scanf implementation in sanitizer_common,
> but it would be really great to run it on full glibc scanf tests.
> Would you be willing to setup such testing gnu-side?

Seems the code in llvm repo looks much better now to me than it used to,
still it breaks on:

#define _GNU_SOURCE 1
#include <stdio.h>
#include <string.h>

int
main ()
{
  char *p;
  long double q;
  if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2
      || strcmp (p, "abcdefghijklmno") != 0
      || q != 1.0L)
    return 1;
  return 0;
}

because it mishandles %ms.

Attached patch fixes that and also handles %a when possible.
There are cases where it is unambiguous, e.g.
s%Las is s %La s, reading long double, or %ar is %a r, reading
float, other cases where we can check at least something and keep checking
the rest of the format string, e.g. for:
%as
we know it will either store float, or char * pointer, so just assume
conservatively the smaller size, and other cases where we have to punt,
%a[a%d]
(where % appears in the [...] list).

I've tested various glibc *scanf testcases with the GCC libsanitizer
+ enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS
+ llvm svn diff between last merge point to GCC and current llvm trunk
+ this patch, and it looked good.

There is one further issue, I'd say you need to pass down to scanf_common
the result from the intercepted call, and use it to see if it is valid
to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN.
Note 'n' doesn't count towards that.  Because, it is unsafe to call strlen
on arguments that haven't been assigned yet.  Testcase that still fails:

#define _GNU_SOURCE 1
#include <stdio.h>
#include <string.h>

int
main ()
{
  int a, b, c, d;
  char e[3], f[10];
  memset (e, ' ', sizeof e);
  memset (f, ' ', sizeof f);
  if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3
      || a != 1
      || b != 1
      || c != 1
      || d != 2
      || strcmp (e, "a") != 0)
    return 1;
  return 0;
}

Oh, one more thing, on Linux for recent enough glibc's it would be
desirable to also intercept:
__isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf,
__isoc99_vfscanf, __isoc99_vscanf
functions (guard the interception with
#if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7)
).  All these work exactly like the corresponding non-__isoc99_
functions, just %a followed by s, S or [ always writes float, thus
there is no ambiguity.

	Jakub

Comments

Evgenii Stepanov Feb. 11, 2013, noon UTC | #1
On Mon, Feb 11, 2013 at 3:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote:
>> > > What if glibc adds a scanf hook (like it has already printf hooks), apps
>> > > could then register their own stuff and the above would then break.  It
>> > > really should be very conservative, and should be checked e.g. with all
>> > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c,
>> > > stdio-common/tst-sscanf.c).
>>
>> I'll add support for the missing %specs. About the testing, these
>> files seem like GPL, so I'd prefer not to look at them at all. We've
>> got a smallish test for the scanf implementation in sanitizer_common,
>> but it would be really great to run it on full glibc scanf tests.
>> Would you be willing to setup such testing gnu-side?
>
> Seems the code in llvm repo looks much better now to me than it used to,
> still it breaks on:
>
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <string.h>
>
> int
> main ()
> {
>   char *p;
>   long double q;
>   if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2
>       || strcmp (p, "abcdefghijklmno") != 0
>       || q != 1.0L)
>     return 1;
>   return 0;
> }
>
> because it mishandles %ms.
>
> Attached patch fixes that and also handles %a when possible.
> There are cases where it is unambiguous, e.g.
> s%Las is s %La s, reading long double, or %ar is %a r, reading
> float, other cases where we can check at least something and keep checking
> the rest of the format string, e.g. for:
> %as
> we know it will either store float, or char * pointer, so just assume
> conservatively the smaller size, and other cases where we have to punt,
> %a[a%d]
> (where % appears in the [...] list).
>
> I've tested various glibc *scanf testcases with the GCC libsanitizer
> + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS
> + llvm svn diff between last merge point to GCC and current llvm trunk
> + this patch, and it looked good.

Sounds great. I'll commit the patch to compiler-rt if you are ok with that.

> There is one further issue, I'd say you need to pass down to scanf_common
> the result from the intercepted call, and use it to see if it is valid
> to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN.
> Note 'n' doesn't count towards that.  Because, it is unsafe to call strlen
> on arguments that haven't been assigned yet.  Testcase that still fails:

Definitely. Also, I'm undecided on this, but we probably don't want to
report access ranges for the arguments that were not written to. On
the other hand, those are potential writes that did not happen due to,
in most cases, user input - we might want to report them anyway.

>
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <string.h>
>
> int
> main ()
> {
>   int a, b, c, d;
>   char e[3], f[10];
>   memset (e, ' ', sizeof e);
>   memset (f, ' ', sizeof f);
>   if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3
>       || a != 1
>       || b != 1
>       || c != 1
>       || d != 2
>       || strcmp (e, "a") != 0)
>     return 1;
>   return 0;
> }
>
> Oh, one more thing, on Linux for recent enough glibc's it would be
> desirable to also intercept:
> __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf,
> __isoc99_vfscanf, __isoc99_vscanf
> functions (guard the interception with
> #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7)
> ).  All these work exactly like the corresponding non-__isoc99_
> functions, just %a followed by s, S or [ always writes float, thus
> there is no ambiguity.
>
>         Jakub
Jakub Jelinek Feb. 11, 2013, 12:11 p.m. UTC | #2
On Mon, Feb 11, 2013 at 04:00:50PM +0400, Evgeniy Stepanov wrote:
> > I've tested various glibc *scanf testcases with the GCC libsanitizer
> > + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS
> > + llvm svn diff between last merge point to GCC and current llvm trunk
> > + this patch, and it looked good.
> 
> Sounds great. I'll commit the patch to compiler-rt if you are ok with that.

Yes, that is fine, thanks.

> > There is one further issue, I'd say you need to pass down to scanf_common
> > the result from the intercepted call, and use it to see if it is valid
> > to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN.
> > Note 'n' doesn't count towards that.  Because, it is unsafe to call strlen
> > on arguments that haven't been assigned yet.  Testcase that still fails:
> 
> Definitely. Also, I'm undecided on this, but we probably don't want to
> report access ranges for the arguments that were not written to. On
> the other hand, those are potential writes that did not happen due to,
> in most cases, user input - we might want to report them anyway.

Yeah.  If used with say sscanf, the user might in theory know that
the extra arguments won't be written to, on the other side, it would be
bad programming practice in that case and potential problem if the string
passed is changed.  But, using strlen is certainly wrong.

> > Oh, one more thing, on Linux for recent enough glibc's it would be
> > desirable to also intercept:
> > __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf,
> > __isoc99_vfscanf, __isoc99_vscanf
> > functions (guard the interception with
> > #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7)
> > ).  All these work exactly like the corresponding non-__isoc99_
> > functions, just %a followed by s, S or [ always writes float, thus
> > there is no ambiguity.

Note, these calls are used even in -std=c99 compiled code if -D_GNU_SOURCE
isn't used, so it might appear pretty often in various apps.

	Jakub
Jack Howarth Feb. 11, 2013, 1:55 p.m. UTC | #3
On Mon, Feb 11, 2013 at 12:38:00PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote:
> > > > What if glibc adds a scanf hook (like it has already printf hooks), apps
> > > > could then register their own stuff and the above would then break.  It
> > > > really should be very conservative, and should be checked e.g. with all
> > > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c,
> > > > stdio-common/tst-sscanf.c).
> > 
> > I'll add support for the missing %specs. About the testing, these
> > files seem like GPL, so I'd prefer not to look at them at all. We've
> > got a smallish test for the scanf implementation in sanitizer_common,
> > but it would be really great to run it on full glibc scanf tests.
> > Would you be willing to setup such testing gnu-side?
> 
> Seems the code in llvm repo looks much better now to me than it used to,
> still it breaks on:

Jakub,
  So there is likely to be at least one more remerge in libsanitizer for the
gcc 4.8 release? I saw that Richard felt that PR56128 justified one. FYI,
I'm interested because it would get us the new allocator support on darwin.
       Jack

> 
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <string.h>
> 
> int
> main ()
> {
>   char *p;
>   long double q;
>   if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2
>       || strcmp (p, "abcdefghijklmno") != 0
>       || q != 1.0L)
>     return 1;
>   return 0;
> }
> 
> because it mishandles %ms.
> 
> Attached patch fixes that and also handles %a when possible.
> There are cases where it is unambiguous, e.g.
> s%Las is s %La s, reading long double, or %ar is %a r, reading
> float, other cases where we can check at least something and keep checking
> the rest of the format string, e.g. for:
> %as
> we know it will either store float, or char * pointer, so just assume
> conservatively the smaller size, and other cases where we have to punt,
> %a[a%d]
> (where % appears in the [...] list).
> 
> I've tested various glibc *scanf testcases with the GCC libsanitizer
> + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS
> + llvm svn diff between last merge point to GCC and current llvm trunk
> + this patch, and it looked good.
> 
> There is one further issue, I'd say you need to pass down to scanf_common
> the result from the intercepted call, and use it to see if it is valid
> to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN.
> Note 'n' doesn't count towards that.  Because, it is unsafe to call strlen
> on arguments that haven't been assigned yet.  Testcase that still fails:
> 
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <string.h>
> 
> int
> main ()
> {
>   int a, b, c, d;
>   char e[3], f[10];
>   memset (e, ' ', sizeof e);
>   memset (f, ' ', sizeof f);
>   if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3
>       || a != 1
>       || b != 1
>       || c != 1
>       || d != 2
>       || strcmp (e, "a") != 0)
>     return 1;
>   return 0;
> }
> 
> Oh, one more thing, on Linux for recent enough glibc's it would be
> desirable to also intercept:
> __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf,
> __isoc99_vfscanf, __isoc99_vscanf
> functions (guard the interception with
> #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7)
> ).  All these work exactly like the corresponding non-__isoc99_
> functions, just %a followed by s, S or [ always writes float, thus
> there is no ambiguity.
> 
> 	Jakub

> --- sanitizer_common_interceptors_scanf.inc.jj	2013-02-08 13:21:51.000000000 +0100
> +++ sanitizer_common_interceptors_scanf.inc	2013-02-11 12:15:27.575871424 +0100
> @@ -18,11 +18,12 @@
>  
>  struct ScanfDirective {
>    int argIdx;      // argument index, or -1 of not specified ("%n$")
> -  bool suppressed; // suppress assignment ("*")
>    int fieldWidth;
> +  bool suppressed; // suppress assignment ("*")
>    bool allocate; // allocate space ("m")
>    char lengthModifier[2];
>    char convSpecifier;
> +  bool maybeGnuMalloc;
>  };
>  
>  static const char *parse_number(const char *p, int *out) {
> @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons
>                    // Consume the closing ']'.
>        ++p;
>      }
> +    // This is unfortunately ambiguous between old GNU extension
> +    // of %as, %aS and %a[...] and newer POSIX %a followed by
> +    // letters s, S or [.
> +    if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) {
> +      if (*p == 's' || *p == 'S') {
> +	dir->maybeGnuMalloc = true;
> +	++p;
> +      } else if (*p == '[') {
> +	// Watch for %a[h-j%d], if % appears in the
> +	// [...] range, then we need to give up, we don't know
> +	// if scanf will parse it as POSIX %a [h-j %d ] or
> +	// GNU allocation of string with range dh-j plus %.
> +	const char *q = p + 1;
> +	if (*q == '^')
> +	  ++q;
> +	if (*q == ']')
> +	  ++q;
> +	while (*q && *q != ']' && *q != '%')
> +	  ++q;
> +	if (*q == 0 || *q == '%')
> +	  return 0;
> +	p = q + 1; // Consume the closing ']'.
> +	dir->maybeGnuMalloc = true;
> +      }
> +    }
>      break;
>    }
>    return p;
> @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c
>  
>  // Returns true if the character is an floating point conversion specifier.
>  static bool scanf_is_float_conv(char c) {
> -  return char_is_one_of(c, "AeEfFgG");
> -  // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore,
> -  // unsupported.
> +  return char_is_one_of(c, "aAeEfFgG");
>  }
>  
>  // Returns string output character size for string-like conversions,
> @@ -168,6 +192,21 @@ enum ScanfStoreSize {
>  // Returns the store size of a scanf directive (if >0), or a value of
>  // ScanfStoreSize.
>  static int scanf_get_store_size(ScanfDirective *dir) {
> +  if (dir->allocate) {
> +    if (!char_is_one_of(dir->convSpecifier, "cCsS["))
> +      return SSS_INVALID;
> +    return sizeof(char *);
> +  }
> +
> +  if (dir->maybeGnuMalloc) {
> +    if (dir->convSpecifier != 'a' || dir->lengthModifier[0])
> +      return SSS_INVALID;
> +    // This is ambiguous, so check the smaller size of char * (if it is
> +    // a GNU extension of %as, %aS or %a[...]) and float (if it is
> +    // POSIX %a followed by s, S or [ letters).
> +    return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float);
> +  }
> +
>    if (scanf_is_integer_conv(dir->convSpecifier)) {
>      switch (dir->lengthModifier[0]) {
>      case 'h':
> @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons
>      }
>      if (dir.suppressed)
>        continue;
> -    if (dir.allocate) {
> -      // Unsupported;
> -      continue;
> -    }
>  
>      int size = scanf_get_store_size(&dir);
>      if (size == SSS_INVALID)
Evgenii Stepanov Feb. 12, 2013, 1:28 p.m. UTC | #4
Hey,

seems like that last of the scanf changes are in.
We're intercepting __isoc99_*scanf irrespective of the glibc version,
because (a) it does not hurt (and with the static runtime, even
interceptor itself is thrown out by the linker), and (b) user program
and tool's runtime can be built with different libc versions.

Thanks for the help with scanf testing, we've got much more confidence
in our implementation now.


On Mon, Feb 11, 2013 at 5:55 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
> On Mon, Feb 11, 2013 at 12:38:00PM +0100, Jakub Jelinek wrote:
>> On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote:
>> > > > What if glibc adds a scanf hook (like it has already printf hooks), apps
>> > > > could then register their own stuff and the above would then break.  It
>> > > > really should be very conservative, and should be checked e.g. with all
>> > > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c,
>> > > > stdio-common/tst-sscanf.c).
>> >
>> > I'll add support for the missing %specs. About the testing, these
>> > files seem like GPL, so I'd prefer not to look at them at all. We've
>> > got a smallish test for the scanf implementation in sanitizer_common,
>> > but it would be really great to run it on full glibc scanf tests.
>> > Would you be willing to setup such testing gnu-side?
>>
>> Seems the code in llvm repo looks much better now to me than it used to,
>> still it breaks on:
>
> Jakub,
>   So there is likely to be at least one more remerge in libsanitizer for the
> gcc 4.8 release? I saw that Richard felt that PR56128 justified one. FYI,
> I'm interested because it would get us the new allocator support on darwin.
>        Jack
>
>>
>> #define _GNU_SOURCE 1
>> #include <stdio.h>
>> #include <string.h>
>>
>> int
>> main ()
>> {
>>   char *p;
>>   long double q;
>>   if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2
>>       || strcmp (p, "abcdefghijklmno") != 0
>>       || q != 1.0L)
>>     return 1;
>>   return 0;
>> }
>>
>> because it mishandles %ms.
>>
>> Attached patch fixes that and also handles %a when possible.
>> There are cases where it is unambiguous, e.g.
>> s%Las is s %La s, reading long double, or %ar is %a r, reading
>> float, other cases where we can check at least something and keep checking
>> the rest of the format string, e.g. for:
>> %as
>> we know it will either store float, or char * pointer, so just assume
>> conservatively the smaller size, and other cases where we have to punt,
>> %a[a%d]
>> (where % appears in the [...] list).
>>
>> I've tested various glibc *scanf testcases with the GCC libsanitizer
>> + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS
>> + llvm svn diff between last merge point to GCC and current llvm trunk
>> + this patch, and it looked good.
>>
>> There is one further issue, I'd say you need to pass down to scanf_common
>> the result from the intercepted call, and use it to see if it is valid
>> to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN.
>> Note 'n' doesn't count towards that.  Because, it is unsafe to call strlen
>> on arguments that haven't been assigned yet.  Testcase that still fails:
>>
>> #define _GNU_SOURCE 1
>> #include <stdio.h>
>> #include <string.h>
>>
>> int
>> main ()
>> {
>>   int a, b, c, d;
>>   char e[3], f[10];
>>   memset (e, ' ', sizeof e);
>>   memset (f, ' ', sizeof f);
>>   if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3
>>       || a != 1
>>       || b != 1
>>       || c != 1
>>       || d != 2
>>       || strcmp (e, "a") != 0)
>>     return 1;
>>   return 0;
>> }
>>
>> Oh, one more thing, on Linux for recent enough glibc's it would be
>> desirable to also intercept:
>> __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf,
>> __isoc99_vfscanf, __isoc99_vscanf
>> functions (guard the interception with
>> #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7)
>> ).  All these work exactly like the corresponding non-__isoc99_
>> functions, just %a followed by s, S or [ always writes float, thus
>> there is no ambiguity.
>>
>>       Jakub
>
>> --- sanitizer_common_interceptors_scanf.inc.jj        2013-02-08 13:21:51.000000000 +0100
>> +++ sanitizer_common_interceptors_scanf.inc   2013-02-11 12:15:27.575871424 +0100
>> @@ -18,11 +18,12 @@
>>
>>  struct ScanfDirective {
>>    int argIdx;      // argument index, or -1 of not specified ("%n$")
>> -  bool suppressed; // suppress assignment ("*")
>>    int fieldWidth;
>> +  bool suppressed; // suppress assignment ("*")
>>    bool allocate; // allocate space ("m")
>>    char lengthModifier[2];
>>    char convSpecifier;
>> +  bool maybeGnuMalloc;
>>  };
>>
>>  static const char *parse_number(const char *p, int *out) {
>> @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons
>>                    // Consume the closing ']'.
>>        ++p;
>>      }
>> +    // This is unfortunately ambiguous between old GNU extension
>> +    // of %as, %aS and %a[...] and newer POSIX %a followed by
>> +    // letters s, S or [.
>> +    if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) {
>> +      if (*p == 's' || *p == 'S') {
>> +     dir->maybeGnuMalloc = true;
>> +     ++p;
>> +      } else if (*p == '[') {
>> +     // Watch for %a[h-j%d], if % appears in the
>> +     // [...] range, then we need to give up, we don't know
>> +     // if scanf will parse it as POSIX %a [h-j %d ] or
>> +     // GNU allocation of string with range dh-j plus %.
>> +     const char *q = p + 1;
>> +     if (*q == '^')
>> +       ++q;
>> +     if (*q == ']')
>> +       ++q;
>> +     while (*q && *q != ']' && *q != '%')
>> +       ++q;
>> +     if (*q == 0 || *q == '%')
>> +       return 0;
>> +     p = q + 1; // Consume the closing ']'.
>> +     dir->maybeGnuMalloc = true;
>> +      }
>> +    }
>>      break;
>>    }
>>    return p;
>> @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c
>>
>>  // Returns true if the character is an floating point conversion specifier.
>>  static bool scanf_is_float_conv(char c) {
>> -  return char_is_one_of(c, "AeEfFgG");
>> -  // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore,
>> -  // unsupported.
>> +  return char_is_one_of(c, "aAeEfFgG");
>>  }
>>
>>  // Returns string output character size for string-like conversions,
>> @@ -168,6 +192,21 @@ enum ScanfStoreSize {
>>  // Returns the store size of a scanf directive (if >0), or a value of
>>  // ScanfStoreSize.
>>  static int scanf_get_store_size(ScanfDirective *dir) {
>> +  if (dir->allocate) {
>> +    if (!char_is_one_of(dir->convSpecifier, "cCsS["))
>> +      return SSS_INVALID;
>> +    return sizeof(char *);
>> +  }
>> +
>> +  if (dir->maybeGnuMalloc) {
>> +    if (dir->convSpecifier != 'a' || dir->lengthModifier[0])
>> +      return SSS_INVALID;
>> +    // This is ambiguous, so check the smaller size of char * (if it is
>> +    // a GNU extension of %as, %aS or %a[...]) and float (if it is
>> +    // POSIX %a followed by s, S or [ letters).
>> +    return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float);
>> +  }
>> +
>>    if (scanf_is_integer_conv(dir->convSpecifier)) {
>>      switch (dir->lengthModifier[0]) {
>>      case 'h':
>> @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons
>>      }
>>      if (dir.suppressed)
>>        continue;
>> -    if (dir.allocate) {
>> -      // Unsupported;
>> -      continue;
>> -    }
>>
>>      int size = scanf_get_store_size(&dir);
>>      if (size == SSS_INVALID)
>
diff mbox

Patch

--- sanitizer_common_interceptors_scanf.inc.jj	2013-02-08 13:21:51.000000000 +0100
+++ sanitizer_common_interceptors_scanf.inc	2013-02-11 12:15:27.575871424 +0100
@@ -18,11 +18,12 @@ 
 
 struct ScanfDirective {
   int argIdx;      // argument index, or -1 of not specified ("%n$")
-  bool suppressed; // suppress assignment ("*")
   int fieldWidth;
+  bool suppressed; // suppress assignment ("*")
   bool allocate; // allocate space ("m")
   char lengthModifier[2];
   char convSpecifier;
+  bool maybeGnuMalloc;
 };
 
 static const char *parse_number(const char *p, int *out) {
@@ -119,6 +120,31 @@  static const char *scanf_parse_next(cons
                   // Consume the closing ']'.
       ++p;
     }
+    // This is unfortunately ambiguous between old GNU extension
+    // of %as, %aS and %a[...] and newer POSIX %a followed by
+    // letters s, S or [.
+    if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) {
+      if (*p == 's' || *p == 'S') {
+	dir->maybeGnuMalloc = true;
+	++p;
+      } else if (*p == '[') {
+	// Watch for %a[h-j%d], if % appears in the
+	// [...] range, then we need to give up, we don't know
+	// if scanf will parse it as POSIX %a [h-j %d ] or
+	// GNU allocation of string with range dh-j plus %.
+	const char *q = p + 1;
+	if (*q == '^')
+	  ++q;
+	if (*q == ']')
+	  ++q;
+	while (*q && *q != ']' && *q != '%')
+	  ++q;
+	if (*q == 0 || *q == '%')
+	  return 0;
+	p = q + 1; // Consume the closing ']'.
+	dir->maybeGnuMalloc = true;
+      }
+    }
     break;
   }
   return p;
@@ -131,9 +157,7 @@  static bool scanf_is_integer_conv(char c
 
 // Returns true if the character is an floating point conversion specifier.
 static bool scanf_is_float_conv(char c) {
-  return char_is_one_of(c, "AeEfFgG");
-  // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore,
-  // unsupported.
+  return char_is_one_of(c, "aAeEfFgG");
 }
 
 // Returns string output character size for string-like conversions,
@@ -168,6 +192,21 @@  enum ScanfStoreSize {
 // Returns the store size of a scanf directive (if >0), or a value of
 // ScanfStoreSize.
 static int scanf_get_store_size(ScanfDirective *dir) {
+  if (dir->allocate) {
+    if (!char_is_one_of(dir->convSpecifier, "cCsS["))
+      return SSS_INVALID;
+    return sizeof(char *);
+  }
+
+  if (dir->maybeGnuMalloc) {
+    if (dir->convSpecifier != 'a' || dir->lengthModifier[0])
+      return SSS_INVALID;
+    // This is ambiguous, so check the smaller size of char * (if it is
+    // a GNU extension of %as, %aS or %a[...]) and float (if it is
+    // POSIX %a followed by s, S or [ letters).
+    return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float);
+  }
+
   if (scanf_is_integer_conv(dir->convSpecifier)) {
     switch (dir->lengthModifier[0]) {
     case 'h':
@@ -256,10 +295,6 @@  static void scanf_common(void *ctx, cons
     }
     if (dir.suppressed)
       continue;
-    if (dir.allocate) {
-      // Unsupported;
-      continue;
-    }
 
     int size = scanf_get_store_size(&dir);
     if (size == SSS_INVALID)