From patchwork Sat Aug 4 18:46:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Martin Sebor X-Patchwork-Id: 953476 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-483172-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="rjbsTqs2"; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dRJNAUps"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41jXvf6rrfz9sXZ for ; Sun, 5 Aug 2018 04:47:07 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=jA4890BC0dDUIzYF2HAlSuorhTtpAE8kBgiykWwysQerLfC/N4 RaXbe+KUkWBIyG4hVxgnuSENi/IEzyqj10tkZr8zejeBFiwo4fnYNT/TjtUy8qyc vnhaL/BryDm+1vFSFOrzdm+RnlCjWNiMfQXAMkpyQnNlCFNbKR6wFGq7A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=gBE8gsfOVNTFymFuYebeGEsnlWM=; b=rjbsTqs2Lf4nZQ4dby9J XpsSOySH8XO3SrBZWi84yXqk6TsKw1ciNKJoqAVXU36dzri3seGvpjn3SMnl6NOI yyIXZB6T83aJadfJc4qyb0lKsI21R+UG1+7U8kOimbCpPiH830MlBUnSF/rjtvzZ fYtdy5OMCcUU6k+X2vM2jJw= Received: (qmail 125201 invoked by alias); 4 Aug 2018 18:46:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 125185 invoked by uid 89); 4 Aug 2018 18:46:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=l1, lz, L1, warned X-HELO: mail-qk0-f195.google.com Received: from mail-qk0-f195.google.com (HELO mail-qk0-f195.google.com) (209.85.220.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 04 Aug 2018 18:46:54 +0000 Received: by mail-qk0-f195.google.com with SMTP id z74-v6so6268816qkb.10 for ; Sat, 04 Aug 2018 11:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version; bh=eOGXNln1c2JH4ZyeSUgsZWFkmWatdvTDvE8rf/gjS1M=; b=dRJNAUpst/71LFfs1DfK/DomZWOrw6KGdiHmMhwpa8H44DBWu55m3zbUkZi/m1x1Cm GTZd2oG+N8MSFjOXy6N+xUddBR0WDJldG5cMrGNH0qP9YMi3cOekijOUo3cNDulXTcqW SFWcc7eFFF8We5kweuchnqYrKlN0GmnGjjYxsq9WFOMZcD1yc/kPkwqCEMgTecd4PCw5 8mulZMlqp0p2m1DnY78YkIfUigfOpTsfFNUpYBUo9W30B25H6NR3N8K/MdKPZjlK5pe9 8guVoJgx+Cs9TK19DmVS6fxIz/Asciq3qZJz2oCQzAwDFtXiZazhC4uveJ+abGlcIzO2 h86g== Received: from localhost.localdomain (97-118-124-30.hlrn.qwest.net. [97.118.124.30]) by smtp.gmail.com with ESMTPSA id s25-v6sm7441882qth.59.2018.08.04.11.46.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Aug 2018 11:46:51 -0700 (PDT) To: Jeff Law , Gcc Patch List From: Martin Sebor Subject: [PATCH] assume sprintf formatting of wide characters may fail (PR 86853) Message-ID: <0e8d373d-16e1-0620-e8e5-590db87033e6@gmail.com> Date: Sat, 4 Aug 2018 12:46:46 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 X-IsSubscribed: yes The sprintf handling of wide characters neglects to consider that calling the function may fail due to a conversion error (when the wide character is invalid or not representable in the current locale). The handling also misinterprets the POSIX %S wide string directive as a plain narrow %s and doesn't include %C (the POSIX equivalent of %lc). The attached patch corrects these oversights by extending the data structures to indicate when a directive may fail, and extending the UNDER4K member of the format_result structure to also encode calls with such directives. Tested on x86_64-linux. Besides the trunk, since this bug can affect code correctness I'd like to backport this patch to both release branches (7 and 8). Martin PR tree-optimization/86853 - sprintf optimization for wide strings doesn't account for conversion failure gcc/ChangeLog: PR tree-optimization/86853 * gimple-ssa-sprintf.c (struct format_result): Rename member. (struct fmtresult): Add member and initialize it in ctors. (format_character): Handle %C. Extend range to NUL. Set MAYFAIL. (format_string): Handle %S the same as %ls. Set MAYFAIL. (format_directive): Set POSUNDER4K when MAYFAIL is set. (parse_directive): Handle %C same as %c. (sprintf_dom_walker::compute_format_length): Adjust. (is_call_safe): Adjust. gcc/testsuite/ChangeLog: PR tree-optimization/86853 * gcc.dg/tree-ssa/builtin-sprintf-10.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-11.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: Adjust. Index: gcc/gimple-ssa-sprintf.c =================================================================== --- gcc/gimple-ssa-sprintf.c (revision 263295) +++ gcc/gimple-ssa-sprintf.c (working copy) @@ -211,12 +211,14 @@ struct format_result the return value optimization. */ bool knownrange; - /* True if no individual directive resulted in more than 4095 bytes - of output (the total NUMBER_CHARS_{MIN,MAX} might be greater). - Implementations are not required to handle directives that produce - more than 4K bytes (leading to undefined behavior) and so when one - is found it disables the return value optimization. */ - bool under4k; + /* True if no individual directive could fail or result in more than + 4095 bytes of output (the total NUMBER_CHARS_{MIN,MAX} might be + greater). Implementations are not required to handle directives + that produce more than 4K bytes (leading to undefined behavior) + and so when one is found it disables the return value optimization. + Similarly, directives that can fail (such as wide character + directives) disable the optimization. */ + bool posunder4k; /* True when a floating point directive has been seen in the format string. */ @@ -650,7 +652,7 @@ struct fmtresult fmtresult (unsigned HOST_WIDE_INT min = HOST_WIDE_INT_MAX) : argmin (), argmax (), knownrange (min < HOST_WIDE_INT_MAX), - nullp () + mayfail (), nullp () { range.min = min; range.max = min; @@ -664,7 +666,7 @@ struct fmtresult unsigned HOST_WIDE_INT likely = HOST_WIDE_INT_MAX) : argmin (), argmax (), knownrange (min < HOST_WIDE_INT_MAX && max < HOST_WIDE_INT_MAX), - nullp () + mayfail (), nullp () { range.min = min; range.max = max; @@ -694,6 +696,10 @@ struct fmtresult heuristics that depend on warning levels. */ bool knownrange; + /* True for a directive that may fail (such as wide character + directives). */ + bool mayfail; + /* True when the argument is a null pointer. */ bool nullp; }; @@ -2202,7 +2208,8 @@ format_character (const directive &dir, tree arg, res.knownrange = true; - if (dir.modifier == FMT_LEN_l) + if (dir.specifier == 'C' + || dir.modifier == FMT_LEN_l) { /* A wide character can result in as few as zero bytes. */ res.range.min = 0; @@ -2217,13 +2224,20 @@ format_character (const directive &dir, tree arg, res.range.likely = 0; res.range.unlikely = 0; } - else if (min > 0 && min < 128) + else if (min >= 0 && min < 128) { + /* Be conservative if the target execution character set + is not a 1-to-1 mapping to the source character set or + if the source set is not ASCII. */ + bool one_2_one_ascii + = (target_to_host_charmap[0] == 1 && target_to_host ('a') == 97); + /* A wide character in the ASCII range most likely results in a single byte, and only unlikely in up to MB_LEN_MAX. */ - res.range.max = 1; + res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();; res.range.likely = 1; res.range.unlikely = target_mb_len_max (); + res.mayfail = !one_2_one_ascii; } else { @@ -2232,6 +2246,8 @@ format_character (const directive &dir, tree arg, res.range.max = target_mb_len_max (); res.range.likely = 2; res.range.unlikely = res.range.max; + /* Converting such a character may fail. */ + res.mayfail = true; } } else @@ -2241,6 +2257,7 @@ format_character (const directive &dir, tree arg, res.range.max = target_mb_len_max (); res.range.likely = 2; res.range.unlikely = res.range.max; + res.mayfail = true; } } else @@ -2276,7 +2293,8 @@ format_string (const directive &dir, tree arg, vr_ /* A '%s' directive with a string argument with constant length. */ res.range = slen.range; - if (dir.modifier == FMT_LEN_l) + if (dir.specifier == 'S' + || dir.modifier == FMT_LEN_l) { /* In the worst case the length of output of a wide string S is bounded by MB_LEN_MAX * wcslen (S). */ @@ -2302,6 +2320,10 @@ format_string (const directive &dir, tree arg, vr_ /* Even a non-empty wide character string need not convert into any bytes. */ res.range.min = 0; + + /* A non-emtpy wide character conversion may fail. */ + if (slen.range.max > 0) + res.mayfail = true; } else { @@ -2340,7 +2362,8 @@ format_string (const directive &dir, tree arg, vr_ at level 2. This result is adjust upward for width (if it's specified). */ - if (dir.modifier == FMT_LEN_l) + if (dir.specifier == 'S' + || dir.modifier == FMT_LEN_l) { /* A wide character converts to as few as zero bytes. */ slen.range.min = 0; @@ -2352,6 +2375,10 @@ format_string (const directive &dir, tree arg, vr_ if (slen.range.likely < target_int_max ()) slen.range.unlikely *= target_mb_len_max (); + + /* A non-emtpy wide character conversion may fail. */ + if (slen.range.max > 0) + res.mayfail = true; } res.range = slen.range; @@ -2904,11 +2931,14 @@ format_directive (const sprintf_dom_walker::call_i of 4095 bytes required to be supported? */ bool minunder4k = fmtres.range.min < 4096; bool maxunder4k = fmtres.range.max < 4096; - /* Clear UNDER4K in the overall result if the maximum has exceeded - the 4k (this is necessary to avoid the return valuye optimization + /* Clear POSUNDER4K in the overall result if the maximum has exceeded + the 4k (this is necessary to avoid the return value optimization that may not be safe in the maximum case). */ if (!maxunder4k) - res->under4k = false; + res->posunder4k = false; + /* Also clear POSUNDER4K if the directive may fail. */ + if (fmtres.mayfail) + res->posunder4k = false; if (!warned /* Only warn at level 2. */ @@ -3354,12 +3384,15 @@ parse_directive (sprintf_dom_walker::call_info &in dir.fmtfunc = format_none; break; + case 'C': case 'c': + /* POSIX wide character and C/POSIX narrow character. */ dir.fmtfunc = format_character; break; case 'S': case 's': + /* POSIX wide string and C/POSIX narrow character string. */ dir.fmtfunc = format_string; break; @@ -3517,10 +3550,10 @@ sprintf_dom_walker::compute_format_length (call_in res->range.min = res->range.max = 0; /* No directive has been seen yet so the length of output is bounded - by the known range [0, 0] (with no conversion producing more than - 4K bytes) until determined otherwise. */ + by the known range [0, 0] (with no conversion resulting in a failure + or producing more than 4K bytes) until determined otherwise. */ res->knownrange = true; - res->under4k = true; + res->posunder4k = true; res->floating = false; res->warned = false; @@ -3588,7 +3621,7 @@ is_call_safe (const sprintf_dom_walker::call_info const format_result &res, bool under4k, unsigned HOST_WIDE_INT retval[2]) { - if (under4k && !res.under4k) + if (under4k && !res.posunder4k) return false; /* The minimum return value. */ Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-10.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-10.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-10.c (working copy) @@ -0,0 +1,119 @@ +/* PR tree-optimization/86853 - sprintf optimization for wide strings + doesn't account for conversion failure​ + { dg-do compile } + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + +typedef __SIZE_TYPE__ size_t; +typedef __WCHAR_TYPE__ wchar_t; + +extern int snprintf (char*, size_t, const char*, ...); + +#define CONCAT(x, y) x ## y +#define CAT(x, y) CONCAT (x, y) +#define FAILNAME(name) CAT (call_ ## name ##_on_line_, __LINE__) + +#define FAIL(name) do { \ + extern void FAILNAME (name) (void); \ + FAILNAME (name)(); \ + } while (0) + +/* Macro to emit a call to funcation named + call_in_true_branch_not_eliminated_on_line_NNN() + for each call that's expected to be eliminated. The dg-final + scan-tree-dump-time directive at the bottom of the test verifies + that no such call appears in output. */ +#define ELIM(expr) \ + if (!(expr)) FAIL (in_true_branch_not_eliminated); else (void)0 + +/* Macro to emit a call to a function named + call_made_in_{true,false}_branch_on_line_NNN() + for each call that's expected to be retained. The dg-final + scan-tree-dump-time directive at the bottom of the test verifies + that the expected number of both kinds of calls appears in output + (a pair for each line with the invocation of the KEEP() macro. */ +#define KEEP(expr) \ + if (expr) \ + FAIL (made_in_true_branch); \ + else \ + FAIL (made_in_false_branch) + + +extern wchar_t wc; +extern wchar_t ws[]; + +const wchar_t ws3[] = L"12\xff"; + +/* Verify that the following calls are eliminated. */ + +void elim_wide_char_call (void) +{ + ELIM (snprintf (0, 0, "%lc", L'\0')); + ELIM (snprintf (0, 0, "%lc", L'1')); + ELIM (snprintf (0, 0, "%lc", L'a')); + ELIM (snprintf (0, 0, "%lc", ws3[0])); + ELIM (snprintf (0, 0, "%lc", ws3[1])); + ELIM (snprintf (0, 0, "%lc", ws3[3])); + + ELIM (snprintf (0, 0, "%C", L'\0')); + ELIM (snprintf (0, 0, "%C", L'9')); + ELIM (snprintf (0, 0, "%C", L'z')); + ELIM (snprintf (0, 0, "%C", ws3[0])); + ELIM (snprintf (0, 0, "%C", ws3[1])); + ELIM (snprintf (0, 0, "%C", ws3[3])); + + /* Verify an unknown character value within the ASCII range. */ + if (wc < 1 || 127 < wc) + wc = 0; + + ELIM (snprintf (0, 0, "%C", wc)); + ELIM (snprintf (0, 0, "%C", wc)); +} + +void elim_wide_string_call (void) +{ + ELIM (snprintf (0, 0, "%ls", L"")); +} + + +#line 1000 + + /* Verify that the following calls are not eliminated. */ + +void keep_wide_char_call (void) +{ + KEEP (snprintf (0, 0, "%lc", L'\xff')); + KEEP (snprintf (0, 0, "%lc", L'\xffff')); + KEEP (snprintf (0, 0, "%lc", wc)); + KEEP (snprintf (0, 0, "%lc", ws3[2])); + + KEEP (snprintf (0, 0, "%C", L'\xff')); + KEEP (snprintf (0, 0, "%C", L'\xffff')); + KEEP (snprintf (0, 0, "%C", wc)); + KEEP (snprintf (0, 0, "%C", ws3[2])); + + /* Verify an unknown character value outside the ASCII range + (with 128 being the only one). */ + if (wc < 32 || 128 < wc) + wc = 32; + + KEEP (snprintf (0, 0, "%lc", wc)); + KEEP (snprintf (0, 0, "%C", wc)); +} + +void keep_wide_string_call (void) +{ + KEEP (snprintf (0, 0, "%ls", L"\xff")); + KEEP (snprintf (0, 0, "%ls", L"\xffff")); + KEEP (snprintf (0, 0, "%ls", ws)); + KEEP (snprintf (0, 0, "%ls", ws3)); + + KEEP (snprintf (0, 0, "%S", L"\xff")); + KEEP (snprintf (0, 0, "%S", L"\xffff")); + KEEP (snprintf (0, 0, "%S", ws)); + KEEP (snprintf (0, 0, "%S", ws3)); +} + +/* { dg-final { scan-tree-dump-times "call_made_in_true_branch_not_eliminated" 0 "optimized" } } + + { dg-final { scan-tree-dump-times "call_made_in_true_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 18 "optimized" } } + { dg-final { scan-tree-dump-times "call_made_in_false_branch_on_line_1\[0-9\]\[0-9\]\[0-9\]" 18 "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-11.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-11.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-11.c (working copy) @@ -0,0 +1,65 @@ +/* PR tree-optimization/86853 - sprintf optimization for wide strings + doesn't account for conversion failure​ + Exercise wide character handling in an EBCDIC execution charset. + { dg-do compile } + { dg-require-iconv "IBM1047" } + { dg-options "-O2 -Wall -Wno-format -Wformat-overflow -fexec-charset=IBM1047 -fdump-tree-optimized" } */ + +typedef __WCHAR_TYPE__ wchar_t; + +/* Exercise wide character constants. */ + +void test_lc_cst (void) +{ + /* IBM1047 0x30 maps to ASCII 0x94 which neeed not be representable + in the current locale (and the snprintf() call may fail). Verify + that snprintf() doesn't assume it is. */ + wchar_t wc = 0x30; + + int n = __builtin_snprintf (0, 0, "%lc", wc); + if (n < 0) + __builtin_abort (); +} + +void test_C_cst (void) +{ + /* Same as above but for %C and 0x31 which maps to 0x95. */ + wchar_t wc = 0x31; + + int n = __builtin_snprintf (0, 0, "%C", wc); + if (n < 0) + __builtin_abort (); +} + +/* Exercise wide character values in known ranges. */ + +void test_lc_range (wchar_t wc) +{ + if (wc < 0x40 || 0x49 < wc) + wc = 0x40; + + int n = __builtin_snprintf (0, 0, "%lc", wc); + if (n < 0) + __builtin_abort (); +} + +void test_C_range (wchar_t wc) +{ + if (wc < 0x41 || 0x48 < wc) + wc = 0x41; + + int n = __builtin_snprintf (0, 0, "%C", wc); + if (n < 0) + __builtin_abort (); +} + +/* Exercise unknown wide character values. */ + +void test_var (wchar_t wc) +{ + int n = __builtin_snprintf (0, 0, "%lc", wc); + if (n < 0) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-times "abort" 5 "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (revision 263295) +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (working copy) @@ -18,7 +18,7 @@ void test_characters () T ("%A", 0.0); /* { dg-warning ".%A. directive writing between 6 and 20 " } */ T ("%a", 0.0); /* { dg-warning ".%a. directive writing between 6 and 20 " } */ - T ("%C", 'a'); /* { dg-warning ".%C. directive writing 1 byte" "bug 80537" { xfail *-*-* } } */ + T ("%C", L'a'); /* { dg-warning ".%C. directive writing up to 6 bytes" } */ T ("%c", 'a'); /* { dg-warning ".%c. directive writing 1 byte" } */ T ("%d", 12); /* { dg-warning ".%d. directive writing 2 bytes" } */ @@ -93,7 +93,8 @@ void test_characters () T ("%x", 1234); /* { dg-warning ".%x. directive writing 3 bytes" } */ T ("%#X", 1235); /* { dg-warning ".%#X. directive writing 5 bytes" } */ - T ("%S", L"1"); /* { dg-warning ".%S. directive writing 1 byte" } */ + T ("%S", L"1"); /* { dg-warning ".%S. directive writing up to 6 bytes" } */ + T ("%ls", L"12"); /* { dg-warning ".%ls. directive writing up to 12 bytes" } */ T ("%-s", "1"); /* { dg-warning ".%-s. directive writing 1 byte" } */ /* Verify that characters in the source character set appear in