From patchwork Mon Feb 11 11:38:00 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 219585 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 25DD22C0293 for ; Mon, 11 Feb 2013 22:38:21 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1361187502; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Date:From:To:Cc:Subject:Message-ID:Reply-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent:Mailing-List:Precedence:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=H5ZtL0K9xyjbkrJBHWjj8+J1IcM=; b=HHxSn/vhlPCmpOn IOUBbjpbVHvWQy7FR3fvkB8NvYVIiP6HK+xQfXhyaEHbKhQKEmQjZIPVfLIZ70iv 0Z/bE93rpOIhvtjeTEt/lKdZobcqUQxOOhgN5N22cKYr+kdjQPUz6/9EhE4wBW71 YQ+26A6bZM2qgALxcaasnkHPXv7s= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Date:From:To:Cc:Subject:Message-ID:Reply-To:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=G6xS13tAYHlZSLYbCQm/QywnQrhE30F/FLAxvLJ2UqiAyqaQgTfnh5mVh1NlzD WBC4WGmT3bGFi8qtFRli2MyV7RKlOUz1tSKaK5SR8iZlFHWqexEy8TOUBkgfZD1E N9/v9fbS1XiUULOcCNpkc/maTzZZfDLhill6l4k9Mffyw=; Received: (qmail 20686 invoked by alias); 11 Feb 2013 11:38:16 -0000 Received: (qmail 20662 invoked by uid 22791); 11 Feb 2013 11:38:13 -0000 X-SWARE-Spam-Status: No, hits=-6.2 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, KHOP_SPAMHAUS_DROP, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_LV, TW_SV X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 11 Feb 2013 11:38:04 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r1BBc4eZ020573 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 11 Feb 2013 06:38:04 -0500 Received: from zalov.redhat.com (vpn1-4-246.ams2.redhat.com [10.36.4.246]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r1BBc2wE031972 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 11 Feb 2013 06:38:03 -0500 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.redhat.com (8.14.5/8.14.5) with ESMTP id r1BBc1os014809; Mon, 11 Feb 2013 12:38:01 +0100 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id r1BBc13d014808; Mon, 11 Feb 2013 12:38:01 +0100 Date: Mon, 11 Feb 2013 12:38:00 +0100 From: Jakub Jelinek To: Evgeniy Stepanov Cc: Konstantin Serebryany , GCC Patches , Dodji Seketeli , Dmitry Vyukov Subject: Re: libsanitizer merge from upstream r173241 Message-ID: <20130211113800.GZ4385@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20130123111352.GD7269@tucnak.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Wed, Jan 23, 2013 at 04:24:01PM +0400, Evgeniy Stepanov wrote: > > > What if glibc adds a scanf hook (like it has already printf hooks), apps > > > could then register their own stuff and the above would then break. It > > > really should be very conservative, and should be checked e.g. with all > > > glibc's *scanf tests (e.g. stdio-common/scanf[0-9]*.c, > > > stdio-common/tst-sscanf.c). > > I'll add support for the missing %specs. About the testing, these > files seem like GPL, so I'd prefer not to look at them at all. We've > got a smallish test for the scanf implementation in sanitizer_common, > but it would be really great to run it on full glibc scanf tests. > Would you be willing to setup such testing gnu-side? Seems the code in llvm repo looks much better now to me than it used to, still it breaks on: #define _GNU_SOURCE 1 #include #include int main () { char *p; long double q; if (sscanf ("abcdefghijklmno 1.0", "%ms %Lf", &p, &q) != 2 || strcmp (p, "abcdefghijklmno") != 0 || q != 1.0L) return 1; return 0; } because it mishandles %ms. Attached patch fixes that and also handles %a when possible. There are cases where it is unambiguous, e.g. s%Las is s %La s, reading long double, or %ar is %a r, reading float, other cases where we can check at least something and keep checking the rest of the format string, e.g. for: %as we know it will either store float, or char * pointer, so just assume conservatively the smaller size, and other cases where we have to punt, %a[a%d] (where % appears in the [...] list). I've tested various glibc *scanf testcases with the GCC libsanitizer + enabling of # define SANITIZER_INTERCEPT_SCANF SI_NOT_WINDOWS + llvm svn diff between last merge point to GCC and current llvm trunk + this patch, and it looked good. There is one further issue, I'd say you need to pass down to scanf_common the result from the intercepted call, and use it to see if it is valid to call strlen on the ptr, or if just 1 should be assumed for SSS_STRLEN. Note 'n' doesn't count towards that. Because, it is unsafe to call strlen on arguments that haven't been assigned yet. Testcase that still fails: #define _GNU_SOURCE 1 #include #include int main () { int a, b, c, d; char e[3], f[10]; memset (e, ' ', sizeof e); memset (f, ' ', sizeof f); if (sscanf ("1 2 a", "%d%n%n%d %s %s", &a, &b, &c, &d, e, f) != 3 || a != 1 || b != 1 || c != 1 || d != 2 || strcmp (e, "a") != 0) return 1; return 0; } Oh, one more thing, on Linux for recent enough glibc's it would be desirable to also intercept: __isoc99_sscanf, __isoc99_scanf, __isoc99_vsscanf, __isoc99_fscanf, __isoc99_vfscanf, __isoc99_vscanf functions (guard the interception with #if defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 7) ). All these work exactly like the corresponding non-__isoc99_ functions, just %a followed by s, S or [ always writes float, thus there is no ambiguity. Jakub --- sanitizer_common_interceptors_scanf.inc.jj 2013-02-08 13:21:51.000000000 +0100 +++ sanitizer_common_interceptors_scanf.inc 2013-02-11 12:15:27.575871424 +0100 @@ -18,11 +18,12 @@ struct ScanfDirective { int argIdx; // argument index, or -1 of not specified ("%n$") - bool suppressed; // suppress assignment ("*") int fieldWidth; + bool suppressed; // suppress assignment ("*") bool allocate; // allocate space ("m") char lengthModifier[2]; char convSpecifier; + bool maybeGnuMalloc; }; static const char *parse_number(const char *p, int *out) { @@ -119,6 +120,31 @@ static const char *scanf_parse_next(cons // Consume the closing ']'. ++p; } + // This is unfortunately ambiguous between old GNU extension + // of %as, %aS and %a[...] and newer POSIX %a followed by + // letters s, S or [. + if (dir->convSpecifier == 'a' && !dir->lengthModifier[0]) { + if (*p == 's' || *p == 'S') { + dir->maybeGnuMalloc = true; + ++p; + } else if (*p == '[') { + // Watch for %a[h-j%d], if % appears in the + // [...] range, then we need to give up, we don't know + // if scanf will parse it as POSIX %a [h-j %d ] or + // GNU allocation of string with range dh-j plus %. + const char *q = p + 1; + if (*q == '^') + ++q; + if (*q == ']') + ++q; + while (*q && *q != ']' && *q != '%') + ++q; + if (*q == 0 || *q == '%') + return 0; + p = q + 1; // Consume the closing ']'. + dir->maybeGnuMalloc = true; + } + } break; } return p; @@ -131,9 +157,7 @@ static bool scanf_is_integer_conv(char c // Returns true if the character is an floating point conversion specifier. static bool scanf_is_float_conv(char c) { - return char_is_one_of(c, "AeEfFgG"); - // NOTE: c == 'a' is ambiguous between POSIX and GNU and, therefore, - // unsupported. + return char_is_one_of(c, "aAeEfFgG"); } // Returns string output character size for string-like conversions, @@ -168,6 +192,21 @@ enum ScanfStoreSize { // Returns the store size of a scanf directive (if >0), or a value of // ScanfStoreSize. static int scanf_get_store_size(ScanfDirective *dir) { + if (dir->allocate) { + if (!char_is_one_of(dir->convSpecifier, "cCsS[")) + return SSS_INVALID; + return sizeof(char *); + } + + if (dir->maybeGnuMalloc) { + if (dir->convSpecifier != 'a' || dir->lengthModifier[0]) + return SSS_INVALID; + // This is ambiguous, so check the smaller size of char * (if it is + // a GNU extension of %as, %aS or %a[...]) and float (if it is + // POSIX %a followed by s, S or [ letters). + return sizeof(char *) < sizeof(float) ? sizeof(char *) : sizeof(float); + } + if (scanf_is_integer_conv(dir->convSpecifier)) { switch (dir->lengthModifier[0]) { case 'h': @@ -256,10 +295,6 @@ static void scanf_common(void *ctx, cons } if (dir.suppressed) continue; - if (dir.allocate) { - // Unsupported; - continue; - } int size = scanf_get_store_size(&dir); if (size == SSS_INVALID)