Message ID | 20130211113800.GZ4385@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Mon, Feb 11, 2013 at 3:38 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote: >> > > What if glibc adds a scanf hook (like it has already printf hooks), apps >> > > could then register their own stuff and the above would then break. It >> > > really should be very conservative, and should be checked e.g. with all >> > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c, >> > > stdio-common/tst-sscanf.c). >> >> I'll add support for the missing %specs. About the testing, these >> files seem like GPL, so I'd prefer not to look at them at all. We've >> got a smallish test for the scanf implementation in sanitizer_common, >> but it would be really great to run it on full glibc scanf tests. >> Would you be willing to setup such testing gnu-side? > > Seems the code in llvm repo looks much better now to me than it used to, > still it breaks on: > > #define _GNU_SOURCE 1 > #include <stdio.h> > #include <string.h> > > int > main () > { > char *p; > long double q; > if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2 > || strcmp (p, "abcdefghijklmno") != 0 > || q != 1.0L) > return 1; > return 0; > } > > because it mishandles %ms. > > Attached patch fixes that and also handles %a when possible. > There are cases where it is unambiguous, e.g. > s%Las is s %La s, reading long double, or %ar is %a r, reading > float, other cases where we can check at least something and keep checking > the rest of the format string, e.g. for: > %as > we know it will either store float, or char * pointer, so just assume > conservatively the smaller size, and other cases where we have to punt, > %a[a%d] > (where % appears in the [...] list). > > I've tested various glibc *scanf testcases with the GCC libsanitizer > + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS > + llvm svn diff between last merge point to GCC and current llvm trunk > + this patch, and it looked good. Sounds great. I'll commit the patch to compiler-rt if you are ok with that. > There is one further issue, I'd say you need to pass down to scanf_common > the result from the intercepted call, and use it to see if it is valid > to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN. > Note 'n' doesn't count towards that. Because, it is unsafe to call strlen > on arguments that haven't been assigned yet. Testcase that still fails: Definitely. Also, I'm undecided on this, but we probably don't want to report access ranges for the arguments that were not written to. On the other hand, those are potential writes that did not happen due to, in most cases, user input - we might want to report them anyway. > > #define _GNU_SOURCE 1 > #include <stdio.h> > #include <string.h> > > int > main () > { > int a, b, c, d; > char e[3], f[10]; > memset (e, ' ', sizeof e); > memset (f, ' ', sizeof f); > if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3 > || a != 1 > || b != 1 > || c != 1 > || d != 2 > || strcmp (e, "a") != 0) > return 1; > return 0; > } > > Oh, one more thing, on Linux for recent enough glibc's it would be > desirable to also intercept: > __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf, > __isoc99_vfscanf, __isoc99_vscanf > functions (guard the interception with > #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7) > ). All these work exactly like the corresponding non-__isoc99_ > functions, just %a followed by s, S or [ always writes float, thus > there is no ambiguity. > > Jakub
On Mon, Feb 11, 2013 at 04:00:50PM +0400, Evgeniy Stepanov wrote: > > I've tested various glibc *scanf testcases with the GCC libsanitizer > > + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS > > + llvm svn diff between last merge point to GCC and current llvm trunk > > + this patch, and it looked good. > > Sounds great. I'll commit the patch to compiler-rt if you are ok with that. Yes, that is fine, thanks. > > There is one further issue, I'd say you need to pass down to scanf_common > > the result from the intercepted call, and use it to see if it is valid > > to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN. > > Note 'n' doesn't count towards that. Because, it is unsafe to call strlen > > on arguments that haven't been assigned yet. Testcase that still fails: > > Definitely. Also, I'm undecided on this, but we probably don't want to > report access ranges for the arguments that were not written to. On > the other hand, those are potential writes that did not happen due to, > in most cases, user input - we might want to report them anyway. Yeah. If used with say sscanf, the user might in theory know that the extra arguments won't be written to, on the other side, it would be bad programming practice in that case and potential problem if the string passed is changed. But, using strlen is certainly wrong. > > Oh, one more thing, on Linux for recent enough glibc's it would be > > desirable to also intercept: > > __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf, > > __isoc99_vfscanf, __isoc99_vscanf > > functions (guard the interception with > > #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7) > > ). All these work exactly like the corresponding non-__isoc99_ > > functions, just %a followed by s, S or [ always writes float, thus > > there is no ambiguity. Note, these calls are used even in -std=c99 compiled code if -D_GNU_SOURCE isn't used, so it might appear pretty often in various apps. Jakub
On Mon, Feb 11, 2013 at 12:38:00PM +0100, Jakub Jelinek wrote: > On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote: > > > > What if glibc adds a scanf hook (like it has already printf hooks), apps > > > > could then register their own stuff and the above would then break. It > > > > really should be very conservative, and should be checked e.g. with all > > > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c, > > > > stdio-common/tst-sscanf.c). > > > > I'll add support for the missing %specs. About the testing, these > > files seem like GPL, so I'd prefer not to look at them at all. We've > > got a smallish test for the scanf implementation in sanitizer_common, > > but it would be really great to run it on full glibc scanf tests. > > Would you be willing to setup such testing gnu-side? > > Seems the code in llvm repo looks much better now to me than it used to, > still it breaks on: Jakub, So there is likely to be at least one more remerge in libsanitizer for the gcc 4.8 release? I saw that Richard felt that PR56128 justified one. FYI, I'm interested because it would get us the new allocator support on darwin. Jack > > #define _GNU_SOURCE 1 > #include <stdio.h> > #include <string.h> > > int > main () > { > char *p; > long double q; > if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2 > || strcmp (p, "abcdefghijklmno") != 0 > || q != 1.0L) > return 1; > return 0; > } > > because it mishandles %ms. > > Attached patch fixes that and also handles %a when possible. > There are cases where it is unambiguous, e.g. > s%Las is s %La s, reading long double, or %ar is %a r, reading > float, other cases where we can check at least something and keep checking > the rest of the format string, e.g. for: > %as > we know it will either store float, or char * pointer, so just assume > conservatively the smaller size, and other cases where we have to punt, > %a[a%d] > (where % appears in the [...] list). > > I've tested various glibc *scanf testcases with the GCC libsanitizer > + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS > + llvm svn diff between last merge point to GCC and current llvm trunk > + this patch, and it looked good. > > There is one further issue, I'd say you need to pass down to scanf_common > the result from the intercepted call, and use it to see if it is valid > to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN. > Note 'n' doesn't count towards that. Because, it is unsafe to call strlen > on arguments that haven't been assigned yet. Testcase that still fails: > > #define _GNU_SOURCE 1 > #include <stdio.h> > #include <string.h> > > int > main () > { > int a, b, c, d; > char e[3], f[10]; > memset (e, ' ', sizeof e); > memset (f, ' ', sizeof f); > if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3 > || a != 1 > || b != 1 > || c != 1 > || d != 2 > || strcmp (e, "a") != 0) > return 1; > return 0; > } > > Oh, one more thing, on Linux for recent enough glibc's it would be > desirable to also intercept: > __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf, > __isoc99_vfscanf, __isoc99_vscanf > functions (guard the interception with > #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7) > ). All these work exactly like the corresponding non-__isoc99_ > functions, just %a followed by s, S or [ always writes float, thus > there is no ambiguity. > > Jakub > --- sanitizer_common_interceptors_scanf.inc.jj 2013-02-08 13:21:51.000000000 +0100 > +++ sanitizer_common_interceptors_scanf.inc 2013-02-11 12:15:27.575871424 +0100 > @@ -18,11 +18,12 @@ > > struct ScanfDirective { > int argIdx; // argument index, or -1 of not specified ("%n$") > - bool suppressed; // suppress assignment ("*") > int fieldWidth; > + bool suppressed; // suppress assignment ("*") > bool allocate; // allocate space ("m") > char lengthModifier[2]; > char convSpecifier; > + bool maybeGnuMalloc; > }; > > static const char *parse_number(const char *p, int *out) { > @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons > // Consume the closing ']'. > ++p; > } > + // This is unfortunately ambiguous between old GNU extension > + // of %as, %aS and %a[...] and newer POSIX %a followed by > + // letters s, S or [. > + if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) { > + if (*p == 's' || *p == 'S') { > + dir->maybeGnuMalloc = true; > + ++p; > + } else if (*p == '[') { > + // Watch for %a[h-j%d], if % appears in the > + // [...] range, then we need to give up, we don't know > + // if scanf will parse it as POSIX %a [h-j %d ] or > + // GNU allocation of string with range dh-j plus %. > + const char *q = p + 1; > + if (*q == '^') > + ++q; > + if (*q == ']') > + ++q; > + while (*q && *q != ']' && *q != '%') > + ++q; > + if (*q == 0 || *q == '%') > + return 0; > + p = q + 1; // Consume the closing ']'. > + dir->maybeGnuMalloc = true; > + } > + } > break; > } > return p; > @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c > > // Returns true if the character is an floating point conversion specifier. > static bool scanf_is_float_conv(char c) { > - return char_is_one_of(c, "AeEfFgG"); > - // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore, > - // unsupported. > + return char_is_one_of(c, "aAeEfFgG"); > } > > // Returns string output character size for string-like conversions, > @@ -168,6 +192,21 @@ enum ScanfStoreSize { > // Returns the store size of a scanf directive (if >0), or a value of > // ScanfStoreSize. > static int scanf_get_store_size(ScanfDirective *dir) { > + if (dir->allocate) { > + if (!char_is_one_of(dir->convSpecifier, "cCsS[")) > + return SSS_INVALID; > + return sizeof(char *); > + } > + > + if (dir->maybeGnuMalloc) { > + if (dir->convSpecifier != 'a' || dir->lengthModifier[0]) > + return SSS_INVALID; > + // This is ambiguous, so check the smaller size of char * (if it is > + // a GNU extension of %as, %aS or %a[...]) and float (if it is > + // POSIX %a followed by s, S or [ letters). > + return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float); > + } > + > if (scanf_is_integer_conv(dir->convSpecifier)) { > switch (dir->lengthModifier[0]) { > case 'h': > @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons > } > if (dir.suppressed) > continue; > - if (dir.allocate) { > - // Unsupported; > - continue; > - } > > int size = scanf_get_store_size(&dir); > if (size == SSS_INVALID)
Hey, seems like that last of the scanf changes are in. We're intercepting __isoc99_*scanf irrespective of the glibc version, because (a) it does not hurt (and with the static runtime, even interceptor itself is thrown out by the linker), and (b) user program and tool's runtime can be built with different libc versions. Thanks for the help with scanf testing, we've got much more confidence in our implementation now. On Mon, Feb 11, 2013 at 5:55 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: > On Mon, Feb 11, 2013 at 12:38:00PM +0100, Jakub Jelinek wrote: >> On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote: >> > > > What if glibc adds a scanf hook (like it has already printf hooks), apps >> > > > could then register their own stuff and the above would then break. It >> > > > really should be very conservative, and should be checked e.g. with all >> > > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c, >> > > > stdio-common/tst-sscanf.c). >> > >> > I'll add support for the missing %specs. About the testing, these >> > files seem like GPL, so I'd prefer not to look at them at all. We've >> > got a smallish test for the scanf implementation in sanitizer_common, >> > but it would be really great to run it on full glibc scanf tests. >> > Would you be willing to setup such testing gnu-side? >> >> Seems the code in llvm repo looks much better now to me than it used to, >> still it breaks on: > > Jakub, > So there is likely to be at least one more remerge in libsanitizer for the > gcc 4.8 release? I saw that Richard felt that PR56128 justified one. FYI, > I'm interested because it would get us the new allocator support on darwin. > Jack > >> >> #define _GNU_SOURCE 1 >> #include <stdio.h> >> #include <string.h> >> >> int >> main () >> { >> char *p; >> long double q; >> if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2 >> || strcmp (p, "abcdefghijklmno") != 0 >> || q != 1.0L) >> return 1; >> return 0; >> } >> >> because it mishandles %ms. >> >> Attached patch fixes that and also handles %a when possible. >> There are cases where it is unambiguous, e.g. >> s%Las is s %La s, reading long double, or %ar is %a r, reading >> float, other cases where we can check at least something and keep checking >> the rest of the format string, e.g. for: >> %as >> we know it will either store float, or char * pointer, so just assume >> conservatively the smaller size, and other cases where we have to punt, >> %a[a%d] >> (where % appears in the [...] list). >> >> I've tested various glibc *scanf testcases with the GCC libsanitizer >> + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS >> + llvm svn diff between last merge point to GCC and current llvm trunk >> + this patch, and it looked good. >> >> There is one further issue, I'd say you need to pass down to scanf_common >> the result from the intercepted call, and use it to see if it is valid >> to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN. >> Note 'n' doesn't count towards that. Because, it is unsafe to call strlen >> on arguments that haven't been assigned yet. Testcase that still fails: >> >> #define _GNU_SOURCE 1 >> #include <stdio.h> >> #include <string.h> >> >> int >> main () >> { >> int a, b, c, d; >> char e[3], f[10]; >> memset (e, ' ', sizeof e); >> memset (f, ' ', sizeof f); >> if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3 >> || a != 1 >> || b != 1 >> || c != 1 >> || d != 2 >> || strcmp (e, "a") != 0) >> return 1; >> return 0; >> } >> >> Oh, one more thing, on Linux for recent enough glibc's it would be >> desirable to also intercept: >> __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf, >> __isoc99_vfscanf, __isoc99_vscanf >> functions (guard the interception with >> #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7) >> ). All these work exactly like the corresponding non-__isoc99_ >> functions, just %a followed by s, S or [ always writes float, thus >> there is no ambiguity. >> >> Jakub > >> --- sanitizer_common_interceptors_scanf.inc.jj 2013-02-08 13:21:51.000000000 +0100 >> +++ sanitizer_common_interceptors_scanf.inc 2013-02-11 12:15:27.575871424 +0100 >> @@ -18,11 +18,12 @@ >> >> struct ScanfDirective { >> int argIdx; // argument index, or -1 of not specified ("%n$") >> - bool suppressed; // suppress assignment ("*") >> int fieldWidth; >> + bool suppressed; // suppress assignment ("*") >> bool allocate; // allocate space ("m") >> char lengthModifier[2]; >> char convSpecifier; >> + bool maybeGnuMalloc; >> }; >> >> static const char *parse_number(const char *p, int *out) { >> @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons >> // Consume the closing ']'. >> ++p; >> } >> + // This is unfortunately ambiguous between old GNU extension >> + // of %as, %aS and %a[...] and newer POSIX %a followed by >> + // letters s, S or [. >> + if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) { >> + if (*p == 's' || *p == 'S') { >> + dir->maybeGnuMalloc = true; >> + ++p; >> + } else if (*p == '[') { >> + // Watch for %a[h-j%d], if % appears in the >> + // [...] range, then we need to give up, we don't know >> + // if scanf will parse it as POSIX %a [h-j %d ] or >> + // GNU allocation of string with range dh-j plus %. >> + const char *q = p + 1; >> + if (*q == '^') >> + ++q; >> + if (*q == ']') >> + ++q; >> + while (*q && *q != ']' && *q != '%') >> + ++q; >> + if (*q == 0 || *q == '%') >> + return 0; >> + p = q + 1; // Consume the closing ']'. >> + dir->maybeGnuMalloc = true; >> + } >> + } >> break; >> } >> return p; >> @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c >> >> // Returns true if the character is an floating point conversion specifier. >> static bool scanf_is_float_conv(char c) { >> - return char_is_one_of(c, "AeEfFgG"); >> - // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore, >> - // unsupported. >> + return char_is_one_of(c, "aAeEfFgG"); >> } >> >> // Returns string output character size for string-like conversions, >> @@ -168,6 +192,21 @@ enum ScanfStoreSize { >> // Returns the store size of a scanf directive (if >0), or a value of >> // ScanfStoreSize. >> static int scanf_get_store_size(ScanfDirective *dir) { >> + if (dir->allocate) { >> + if (!char_is_one_of(dir->convSpecifier, "cCsS[")) >> + return SSS_INVALID; >> + return sizeof(char *); >> + } >> + >> + if (dir->maybeGnuMalloc) { >> + if (dir->convSpecifier != 'a' || dir->lengthModifier[0]) >> + return SSS_INVALID; >> + // This is ambiguous, so check the smaller size of char * (if it is >> + // a GNU extension of %as, %aS or %a[...]) and float (if it is >> + // POSIX %a followed by s, S or [ letters). >> + return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float); >> + } >> + >> if (scanf_is_integer_conv(dir->convSpecifier)) { >> switch (dir->lengthModifier[0]) { >> case 'h': >> @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons >> } >> if (dir.suppressed) >> continue; >> - if (dir.allocate) { >> - // Unsupported; >> - continue; >> - } >> >> int size = scanf_get_store_size(&dir); >> if (size == SSS_INVALID) >
--- sanitizer_common_interceptors_scanf.inc.jj 2013-02-08 13:21:51.000000000 +0100 +++ sanitizer_common_interceptors_scanf.inc 2013-02-11 12:15:27.575871424 +0100 @@ -18,11 +18,12 @@ struct ScanfDirective { int argIdx; // argument index, or -1 of not specified ("%n$") - bool suppressed; // suppress assignment ("*") int fieldWidth; + bool suppressed; // suppress assignment ("*") bool allocate; // allocate space ("m") char lengthModifier[2]; char convSpecifier; + bool maybeGnuMalloc; }; static const char *parse_number(const char *p, int *out) { @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons // Consume the closing ']'. ++p; } + // This is unfortunately ambiguous between old GNU extension + // of %as, %aS and %a[...] and newer POSIX %a followed by + // letters s, S or [. + if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) { + if (*p == 's' || *p == 'S') { + dir->maybeGnuMalloc = true; + ++p; + } else if (*p == '[') { + // Watch for %a[h-j%d], if % appears in the + // [...] range, then we need to give up, we don't know + // if scanf will parse it as POSIX %a [h-j %d ] or + // GNU allocation of string with range dh-j plus %. + const char *q = p + 1; + if (*q == '^') + ++q; + if (*q == ']') + ++q; + while (*q && *q != ']' && *q != '%') + ++q; + if (*q == 0 || *q == '%') + return 0; + p = q + 1; // Consume the closing ']'. + dir->maybeGnuMalloc = true; + } + } break; } return p; @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c // Returns true if the character is an floating point conversion specifier. static bool scanf_is_float_conv(char c) { - return char_is_one_of(c, "AeEfFgG"); - // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore, - // unsupported. + return char_is_one_of(c, "aAeEfFgG"); } // Returns string output character size for string-like conversions, @@ -168,6 +192,21 @@ enum ScanfStoreSize { // Returns the store size of a scanf directive (if >0), or a value of // ScanfStoreSize. static int scanf_get_store_size(ScanfDirective *dir) { + if (dir->allocate) { + if (!char_is_one_of(dir->convSpecifier, "cCsS[")) + return SSS_INVALID; + return sizeof(char *); + } + + if (dir->maybeGnuMalloc) { + if (dir->convSpecifier != 'a' || dir->lengthModifier[0]) + return SSS_INVALID; + // This is ambiguous, so check the smaller size of char * (if it is + // a GNU extension of %as, %aS or %a[...]) and float (if it is + // POSIX %a followed by s, S or [ letters). + return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float); + } + if (scanf_is_integer_conv(dir->convSpecifier)) { switch (dir->lengthModifier[0]) { case 'h': @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons } if (dir.suppressed) continue; - if (dir.allocate) { - // Unsupported; - continue; - } int size = scanf_get_store_size(&dir); if (size == SSS_INVALID)