From patchwork Sat Jul 7 21:47:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Sebor X-Patchwork-Id: 940873 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-481180-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Ocpk8KAD"; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Y50sja9G"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41NQDj2ngVz9s0w for ; Sun, 8 Jul 2018 07:47:33 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=PN0QaC3f5G35eYRbWH7jXPaf9nPh47N2+6D6TBaJsKi1kr7hMrTHv mcuPLh5EAtiXLk0kI0NXZ4HS+kcXW6tn8IM86U89mrHGGWxK2u9Sbi0CEu0j/P2V kaQzlMSpYJn81r+0q0rltZanDAHsV+/NPd1iNRcuBtuN+d2gtkM7LA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=37gU6XgTI9Vx0fEzxraL1Qjm0P0=; b=Ocpk8KADno8Tl0JOdIXm ue87zm1PI1JZkp8IRTPKVwMy+AnnZ97WQDi9IxCCh4w8LrEJjtK/shFrOar9osw0 tK+cyX9H6yNuXe5k7q84q5sKBxZhmC+R6IJPxy7h9d+25cbt2e8dLxV/BcKk8EH0 dFUiO6v/BeMQFIwxgzBzGzE= Received: (qmail 6322 invoked by alias); 7 Jul 2018 21:47:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6307 invoked by uid 89); 7 Jul 2018 21:47:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=2004, contribute, 262437, 1992 X-HELO: mail-qt0-f170.google.com Received: from mail-qt0-f170.google.com (HELO mail-qt0-f170.google.com) (209.85.216.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 07 Jul 2018 21:47:23 +0000 Received: by mail-qt0-f170.google.com with SMTP id f18-v6so12718689qtp.10 for ; Sat, 07 Jul 2018 14:47:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version; bh=Rg8pFWijduosZ1j/zaEzPOTlecShOuSPh56LjDpzJSg=; b=Y50sja9G7+lSqVnh315t3g+wFcp+lNc69vUq9Qv6gU3M194t5k4ufpQW1SnHbVRvmm 8Tg6LD88laJ2tcBi+4Nslv43cEB6NC5RCYkGyI5GpnVSrWS2RPUZm+gMZe1EOG4ZEefX D+psw4KzQK5KsuoMeyvShIGP/m9SUJBKS4L5r4T08wMfWM0cbNJTKiHGqlQdW4byR1Fd LKYc2Si4BLthl7leuabdvpvCyiTvvxdAebIhJuz1+2de8C582p2e2PNLTVvstCUsNnIF epmU0x5WERQUYhkTTq0zaykWrZHM+OC8cYEPeETBdbPOI9PdugbZ57+u7vU54zRX7OLB jDfw== Received: from localhost.localdomain (75-166-107-65.hlrn.qwest.net. [75.166.107.65]) by smtp.gmail.com with ESMTPSA id y23-v6sm7219318qto.9.2018.07.07.14.47.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 07 Jul 2018 14:47:19 -0700 (PDT) From: Martin Sebor Subject: [PATCH] fold strings as long as the array they're stored in (PR 86428) To: Gcc Patch List Message-ID: <812effdd-d102-4266-0cee-362ac7cae19f@gmail.com> Date: Sat, 7 Jul 2018 15:47:16 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 X-IsSubscribed: yes While working on other string folding improvements (PR 77357) I came across another distinct case where GCC could do better: it doesn't consider as candidates strings that have as many elements as the size of the array they are stored in, even if their length is within the bounds of the array. For instance, only the first strlen() call below is folded even though the arguments to both are valid NUL-terminated strings. const char a[4] = "123"; int f (void) { return strlen (a); // folded } const char b[4] = "123\0"; int g (void) { return strlen (b); // not folded } The attached tiny patch adjusts the the string_constant() function to recognize as valid string constants all those whose length (as returned by strlen()) is less than the size of the array they are stored in. Tested on x86_64-linux. Testing of an earlier version of the patch exposed what I think is a deficiency in the (very) early strlen folding: c_strlen() folds expressions of the form strlen(A + I) to sizeof(A) - I when A is an array of known size and I a non-const variable. This not only prevents -Warray-bounds from warning on invalid indices but also defeats other, possibly more profitable optimizations based on the range of the result of the strlen() call. The logs show that the code dates at least as far back as 1992. With VRP and other data flow based optimizations I think we can do better today. I opened bug 86434 to improve things. Martin PR tree-optimization/86428 - strlen of const array initialized with a string of the same length not folded gcc/ChangeLog: PR tree-optimization/86428 * expr.c (string_constant): Handle string literals of length up to the size of the array they are stored in. gcc/testsuite/ChangeLog: PR tree-optimization/86428 * gcc.dg/strlenopt-49.c: New test. * gcc.c-torture/execute/builtins/strlen-3.c: Adjust. Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 262437) +++ gcc/expr.c (working copy) @@ -11358,7 +11358,6 @@ string_constant (tree arg, tree *ptr_offset) } else if (VAR_P (array) || TREE_CODE (array) == CONST_DECL) { - int length; tree init = ctor_for_folding (array); /* Variables initialized to string literals can be handled too. */ @@ -11367,22 +11366,25 @@ string_constant (tree arg, tree *ptr_offset) || TREE_CODE (init) != STRING_CST) return 0; - /* Avoid const char foo[4] = "abcde"; */ - if (DECL_SIZE_UNIT (array) == NULL_TREE - || TREE_CODE (DECL_SIZE_UNIT (array)) != INTEGER_CST - || (length = TREE_STRING_LENGTH (init)) <= 0 - || compare_tree_int (DECL_SIZE_UNIT (array), length) < 0) - return 0; + tree array_size = DECL_SIZE_UNIT (array); + if (!array_size || TREE_CODE (array_size) != INTEGER_CST) + return NULL_TREE; - /* If variable is bigger than the string literal, OFFSET must be constant - and inside of the bounds of the string literal. */ - offset = fold_convert (sizetype, offset); - if (compare_tree_int (DECL_SIZE_UNIT (array), length) > 0 - && (! tree_fits_uhwi_p (offset) - || compare_tree_int (offset, length) >= 0)) - return 0; + /* Avoid returning a string that doesn't fit in the array + it is stored in, like + const char a[4] = "abcde"; + but do handle those that fit even if they have excess + initializers, such as in + const char a[4] = "abc\000\000"; + The excess elements contribute to TREE_STRING_LENGTH() + but not to strlen(). */ + unsigned HOST_WIDE_INT length = strlen (TREE_STRING_POINTER (init)); + if (compare_tree_int (array_size, length + 1) < 0) + return NULL_TREE; - *ptr_offset = offset; + /* Callers should verify that OFFSET is within the bounds + of the array and warn for out-of-bounds offsets. */ + *ptr_offset = fold_convert (sizetype, offset); return init; } Index: gcc/testsuite/gcc.c-torture/execute/builtins/strlen-3.c =================================================================== --- gcc/testsuite/gcc.c-torture/execute/builtins/strlen-3.c (revision 262437) +++ gcc/testsuite/gcc.c-torture/execute/builtins/strlen-3.c (working copy) @@ -2,8 +2,11 @@ Test strlen on const variables initialized to string literals. - Written by Jakub Jelinek, 9/14/2004. */ + Written by Jakub Jelinek, 9/14/2004. + { dg-do compile } + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + extern void abort (void); extern __SIZE_TYPE__ strlen (const char *); extern char *strcpy (char *, const char *); @@ -10,7 +13,6 @@ extern char *strcpy (char *, const char *); static const char bar[] = "Hello, World!"; static const char baz[] = "hello, world?"; static const char larger[20] = "short string"; -extern int inside_main; int l1 = 1; int x = 6; @@ -59,12 +61,10 @@ main_test(void) if (strlen (&larger[10]) != 2) abort (); - inside_main = 0; - /* This will result in strlen call, because larger - array is bigger than its initializer. */ if (strlen (larger + (x++ & 7)) != 5) abort (); if (x != 8) abort (); - inside_main = 1; } + +/* { dg-final { scan-tree-dump-not "strlen" "optimized" } } */ Index: gcc/testsuite/gcc.dg/strlenopt-49.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-49.c (nonexistent) +++ gcc/testsuite/gcc.dg/strlenopt-49.c (working copy) @@ -0,0 +1,53 @@ +/* PR tree-optimization/86428 - strlen of const array initialized with + a string of the same length not folded + { dg-do compile } + { dg-options "-O0 -Wall -fdump-tree-gimple" } */ + +#include "strlenopt.h" + +const char a1[1] = "\0"; +const char a2[2] = "1\0"; +const char a3[3] = "12\0"; +const char a8[8] = "1234567\0"; +const char a9[9] = "12345678\0"; + +const char ax[9] = "12345678\0\0\0\0"; /* { dg-warning "initializer-string for array of chars is too long" } */ +const char ay[9] = "\00012345678\0\0\0\0"; /* { dg-warning "initializer-string for array of chars is too long" } */ + + +int len1 (void) +{ + size_t len0 = strlen (a1); + return len0; +} + +int len (void) +{ + size_t len = strlen (a2) + strlen (a3) + strlen (a8) + strlen (a9); + return len; +} + +int lenx (void) +{ + size_t lenx = strlen (ax); + return lenx; +} + +int leny (void) +{ + size_t leny = strlen (ay); + return leny; +} + +int cmp88 (void) +{ + int cmp88 = memcmp (a8, "1234567\0", sizeof a8); + return cmp88; +} + +/* { dg-final { scan-tree-dump-times "strlen" 0 "gimple" } } + { dg-final { scan-tree-dump-times "len0 = 0;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "len = 18;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "lenx = 8;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "leny = 0;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "cmp88 = 0;" 1 "gimple" } } */