From patchwork Sat Dec 9 02:45:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 1874013 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=kheRV8kA; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SnC793XDlz23nC for ; Sat, 9 Dec 2023 13:45:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D63C83858C2A for ; Sat, 9 Dec 2023 02:45:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id 4B5863858CDB for ; Sat, 9 Dec 2023 02:45:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4B5863858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4B5863858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::631 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702089942; cv=none; b=sXwfk5bMwhUi0VuhthiXQxoSOBIqUOxo6GxkVD+ufTqEE3XDpBEdU4UJ9xhGCC8zeMzYebLn00o+2OwN0e+U15wK0AoaUUEDv5tQDA9RM/0enYGSaABsF8Zn14b2SYBOU6q2BwnVnQSIHvvPKPDa34mbUxn05fRgi3sREnhuXxY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702089942; c=relaxed/simple; bh=WshtLDXFwUuGY1X/agcmuP9jP4V8eo6sHRAjDZUAzo4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=M6Dtx8zWWRVG3kFJ4joVspEJl9Ynwf8JMMP1ij/TEHFgg/RVsrTzPaSO5oj7zQmg0ofkXV7qxBFlqil2x5ve4otXqXQCZyPmtEJCTIUpJJE+kRzNJDuXU6+/4r1/WOpzxPynz3J5pDcqvS9u1A5VbGOiho2C+MtoUV8/cb/2Qr4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1d0538d9bbcso24990455ad.3 for ; Fri, 08 Dec 2023 18:45:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1702089939; x=1702694739; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:date:organization:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=mdRQPqFikACmq9aZSyrKQ6M2jh0vSCM0i4zxvxnYMEw=; b=kheRV8kAaSeyRtk5ZZfpbAHWho5V24efEOgspBLO2cnvHOPPz8GE6Iw1T8/g0tZytr OIgKxP4mg68XIwJeYgrIjKFOLRtsJ05Indeb08B1tuNVcltOFDk0MCLKrWhb0l/N9tz7 5AXSzikEYOq4SbsmEEDCSQVyTmPbjYfibb91xEy3sUq8/OZjayE2Api/20G9PmqvqqYt nsCmBISJiRfpNNXRYKQ9J/HYoJebTmPiauiM745b+2eSSnKNYlqCdvQdZO86VCTUVQdF 2AYN1koaKZaYpnAkhyC4nx6REjS8gOus9nAO1+vbfbO/hL0fHlPzzTYINgWeijaiQk+t ejCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702089939; x=1702694739; h=mime-version:user-agent:message-id:date:organization:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mdRQPqFikACmq9aZSyrKQ6M2jh0vSCM0i4zxvxnYMEw=; b=bW22qdOc/pdc0H/FsU/0vkINnbWAS5zJ3aU0sh9cwrk/G6LFLC76Y6hLV3bpuI9ein 6WBV9yYNEP9GjJgxicUrSDsQ7Pb9QiRSx9t3Ds9CU62FsFTI3CZeySS2lw0s8jhFMulm P95NYxemQ2sqmDrGJlA7R+sjxpvw/62JoV6VyOmxFM9oAieQDaS1S1Hem6GCcu30IHYo xrg8JI6TWQ7lNPl2raaMJ/0iuwsPxKGWWMGVzwcry0C3DUWZCNXLUjG8T6peFd6wajuu CUHw3rex+VhdoPPq52MrE+4jByF0BF1lzc1dZI2BjUd/r7HTfR7rx25aGlbuPZU8irtf Cu+w== X-Gm-Message-State: AOJu0Yz8wrFx2rWBXbs3yo2E/zIn2SWjHZpvtpwmJG99do9YzQgZDw95 E8qbLBXoqNAda8nex8/VJssqzn7Vb5dNlWCJicjqLA== X-Google-Smtp-Source: AGHT+IGIaU5fVLFZSAvDOvWBfRt8yB3dICOQ8toI0MH3dqDFyeVNXE+PjngckMtbDuLfdWj2KK6ydw== X-Received: by 2002:a17:902:ea0c:b0:1d0:53b1:400 with SMTP id s12-20020a170902ea0c00b001d053b10400mr1233949plg.68.1702089938924; Fri, 08 Dec 2023 18:45:38 -0800 (PST) Received: from free.home ([2804:7f1:2080:b40a:22de:62ca:39d7:fbd4]) by smtp.gmail.com with ESMTPSA id e8-20020a17090301c800b001c7443d0890sm2389219plh.102.2023.12.08.18.45.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 18:45:38 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 3B92jTQL317990 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 8 Dec 2023 23:45:30 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Subject: [PATCH] -finline-stringops: check base blksize for memset [PR112778] Organization: Free thinker, does not speak for AdaCore Date: Fri, 08 Dec 2023 23:45:29 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The recently-added logic for -finline-stringops=memset introduced an assumption that doesn't necessarily hold, namely, that can_store_by_pieces of a larger size implies can_store_by_pieces by smaller sizes. Checks for all sizes the by-multiple-pieces machinery might use before committing to an expansion pattern. Regstrapped (and slightly different version) and regstrapping this one on x86_64-linux-gnu. Ok to install? (FWIW, for completeness, I've just launched bootstraps with -finline-stringops on ppc64le-linux-gnu, and aarch64-linux-gnu, and will do so on x86_64-linux-gnu as soon as my retesting completes.) for gcc/ChangeLog PR target/112778 * builtins.cc (can_store_by_multiple_pieces): New. (try_store_by_multiple_pieces): Call it. for gcc/testsuite/ChangeLog PR target/112778 * gcc.dg/inline-mem-cmp-pr112778.c: New. --- gcc/builtins.cc | 57 ++++++++++++++++++++---- gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c | 10 ++++ 2 files changed, 58 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 12a535d313f12..ad8497192a2dd 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -4284,6 +4284,40 @@ expand_builtin_memset (tree exp, rtx target, machine_mode mode) return expand_builtin_memset_args (dest, val, len, target, mode, exp); } +/* Check that store_by_pieces allows BITS + LEN (so that we don't + expand something too unreasonably long), and every power of 2 in + BITS. It is assumed that LEN has already been tested by + itself. */ +static bool +can_store_by_multiple_pieces (unsigned HOST_WIDE_INT bits, + by_pieces_constfn constfun, + void *constfundata, unsigned int align, + bool memsetp, + unsigned HOST_WIDE_INT len) +{ + if (bits + && !can_store_by_pieces (bits + len, constfun, constfundata, + align, memsetp)) + return false; + + /* Avoid the loop if we're just going to repeat the same single + test. */ + if (!len && popcount_hwi (bits) == 1) + return true; + + for (int i = ctz_hwi (bits); i >= 0; i = ctz_hwi (bits)) + { + unsigned HOST_WIDE_INT bit = 1; + bit <<= i; + bits &= ~bit; + if (!can_store_by_pieces (bit, constfun, constfundata, + align, memsetp)) + return false; + } + + return true; +} + /* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. Return TRUE if successful, FALSE otherwise. TO is assumed to be aligned at an ALIGN-bits boundary. LEN must be a multiple of @@ -4341,7 +4375,11 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, else /* Huh, max_len < min_len? Punt. See pr100843.c. */ return false; - if (min_len >= blksize) + if (min_len >= blksize + /* ??? Maybe try smaller fixed-prefix blksizes before + punting? */ + && can_store_by_pieces (blksize, builtin_memset_read_str, + &valc, align, true)) { min_len -= blksize; min_bits = floor_log2 (min_len); @@ -4367,8 +4405,9 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, happen because of the way max_bits and blksize are related, but it doesn't hurt to test. */ if (blksize > xlenest - || !can_store_by_pieces (xlenest, builtin_memset_read_str, - &valc, align, true)) + || !can_store_by_multiple_pieces (xlenest - blksize, + builtin_memset_read_str, + &valc, align, true, blksize)) { if (!(flag_inline_stringops & ILSOP_MEMSET)) return false; @@ -4386,17 +4425,17 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, of overflow. */ if (max_bits < orig_max_bits && xlenest + blksize >= xlenest - && can_store_by_pieces (xlenest + blksize, - builtin_memset_read_str, - &valc, align, true)) + && can_store_by_multiple_pieces (xlenest, + builtin_memset_read_str, + &valc, align, true, blksize)) { max_loop = true; break; } if (blksize - && can_store_by_pieces (xlenest, - builtin_memset_read_str, - &valc, align, true)) + && can_store_by_multiple_pieces (xlenest, + builtin_memset_read_str, + &valc, align, true, 0)) { max_len += blksize; min_len += blksize; diff --git a/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c b/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c new file mode 100644 index 0000000000000..fdfc5b6f28c8e --- /dev/null +++ b/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-finline-stringops" } */ + +char buf[3]; + +int +f () +{ + __builtin_memset (buf, 'v', 3); +}