From patchwork Thu May 26 20:43:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1635981 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=nXpvvXsT; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4L8Kds6KZWz9s0w for ; Fri, 27 May 2022 06:43:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66F8C3831EF4 for ; Thu, 26 May 2022 20:43:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 66F8C3831EF4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1653597806; bh=Rsv5xkahS10jfKvmzf3tWicMxZTHaO0gLOt2gVcnm5s=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=nXpvvXsTXLlj+1BwpurZ7usMzhqGYBReIjiaZzPGnSzv71qrhmU9MF5zm7lqp7GCU Oehm1EO7Vlz1OZBk4vJ/OlF1kEsW3PJ4+qkx0lw22vPdalquuPuS4nERQE6bXL/QWo USjAP4sjB/kSpE+wjvNr4mSNe8BgXgolZAPnSrL4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by sourceware.org (Postfix) with ESMTPS id 3C2983831EDE for ; Thu, 26 May 2022 20:43:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3C2983831EDE Received: by mail-pf1-x434.google.com with SMTP id c14so2649266pfn.2 for ; Thu, 26 May 2022 13:43:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Rsv5xkahS10jfKvmzf3tWicMxZTHaO0gLOt2gVcnm5s=; b=qbeFF8Edm/GuJpkVlSyqZmlBUgXBl1gRoQiO/lCtSAnFcbobv1RMGwtHxZJuD14Awt wf3KhtGqc3mg6WQm6GYahBX75xjEa0K0SjYA8KmYCDTBPAKX109aVQSOFYwV9selmkLl WLoQKYygwXSVhqJ1ksrnE8ToI4kZWfExRu9GIxItBKqtoV744kk3/OI87JF95QisvlzP tgYL3AJUr1QG4OUkJWzz/p44mLAQn2L/xGs3vH4pHdFWixY2sIH+KtVLuTQRme3VBcFi q0vBNoWxJBK4V5aAKTcYTfNSADmuEGshDcI4K4Ak39/IZNK9JL2hW4dggUq8nMIpEjsT SIXg== X-Gm-Message-State: AOAM532Z2useYQdDT/KD/WyOvBdYEsYtj45HRJhqJ8k7joXpsreICSkT 9PJMf9QdpaQT/lK9DMc050U= X-Google-Smtp-Source: ABdhPJx8yKOl3IU6UsCqLxPYHCfw/GKOJmJDs7+ated297VxMn9ENXsfnSiyjjGJv97xzjKLBHkOAg== X-Received: by 2002:a05:6a00:1582:b0:518:7aa0:d6d8 with SMTP id u2-20020a056a00158200b005187aa0d6d8mr30216363pfk.27.1653597783511; Thu, 26 May 2022 13:43:03 -0700 (PDT) Received: from gnu-tgl-3.localdomain ([172.58.88.122]) by smtp.gmail.com with ESMTPSA id p2-20020a170902c70200b001637704269fsm1930629plp.223.2022.05.26.13.43.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 May 2022 13:43:03 -0700 (PDT) Received: by gnu-tgl-3.localdomain (Postfix, from userid 1000) id ECFE8C043A; Thu, 26 May 2022 13:43:01 -0700 (PDT) Date: Thu, 26 May 2022 13:43:01 -0700 To: Gcc-patches , Richard Biener , richard.sandiford@arm.com Subject: [PATCH v3] DSE: Use the constant store source if possible Message-ID: References: <20220521030120.1977551-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Gcc-patches" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: > "H.J. Lu" writes: > > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford > > wrote: > >> > >> "H.J. Lu via Gcc-patches" writes: > >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: > >> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches > >> >> wrote: > >> >> > > >> >> > When recording store for RTL dead store elimination, check if the source > >> >> > register is set only once to a constant. If yes, record the constant > >> >> > as the store source. It eliminates unrolled zero stores after memset 0 > >> >> > in a loop where a vector register is used as the zero store source. > >> >> > > >> >> > gcc/ > >> >> > > >> >> > PR rtl-optimization/105638 > >> >> > * dse.cc (record_store): Use the constant source if the source > >> >> > register is set only once. > >> >> > > >> >> > gcc/testsuite/ > >> >> > > >> >> > PR rtl-optimization/105638 > >> >> > * g++.target/i386/pr105638.C: New test. > >> >> > --- > >> >> > gcc/dse.cc | 19 ++++++++++ > >> >> > gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ > >> >> > 2 files changed, 63 insertions(+) > >> >> > create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C > >> >> > > >> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc > >> >> > index 30c11cee034..0433dd3d846 100644 > >> >> > --- a/gcc/dse.cc > >> >> > +++ b/gcc/dse.cc > >> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info) > >> >> > > >> >> > if (tem && CONSTANT_P (tem)) > >> >> > const_rhs = tem; > >> >> > + else > >> >> > + { > >> >> > + /* If RHS is set only once to a constant, set CONST_RHS > >> >> > + to the constant. */ > >> >> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); > >> >> > + if (def != nullptr > >> >> > + && !DF_REF_IS_ARTIFICIAL (def) > >> >> > + && !DF_REF_NEXT_REG (def)) > >> >> > + { > >> >> > + rtx_insn *def_insn = DF_REF_INSN (def); > >> >> > + rtx def_body = PATTERN (def_insn); > >> >> > + if (GET_CODE (def_body) == SET) > >> >> > + { > >> >> > + rtx def_src = SET_SRC (def_body); > >> >> > + if (CONSTANT_P (def_src)) > >> >> > + const_rhs = def_src; > >> >> > >> >> doesn't DSE have its own tracking of stored values? Shouldn't we > >> > > >> > It tracks stored values only within the basic block. When RTL loop > >> > invariant motion hoists a constant initialization out of the loop into > >> > a separate basic block, the constant store value becomes unknown > >> > within the original basic block. > >> > > >> >> improve _that_ if it is not enough? I also wonder if you need to > >> > > >> > My patch extends DSE stored value tracking to include the constant which > >> > is set only once in another basic block. > >> > > >> >> verify the SET isn't partial? > >> >> > >> > > >> > Here is the v2 patch to check that the constant is set by a non-partial > >> > unconditional load. > >> > > >> > OK for master? > >> > > >> > Thanks. > >> > > >> > H.J. > >> > --- > >> > RTL DSE tracks redundant constant stores within a basic block. When RTL > >> > loop invariant motion hoists a constant initialization out of the loop > >> > into a separate basic block, the constant store value becomes unknown > >> > within the original basic block. When recording store for RTL DSE, check > >> > if the source register is set only once to a constant by a non-partial > >> > unconditional load. If yes, record the constant as the constant store > >> > source. It eliminates unrolled zero stores after memset 0 in a loop > >> > where a vector register is used as the zero store source. > >> > > >> > gcc/ > >> > > >> > PR rtl-optimization/105638 > >> > * dse.cc (record_store): Use the constant source if the source > >> > register is set only once. > >> > > >> > gcc/testsuite/ > >> > > >> > PR rtl-optimization/105638 > >> > * g++.target/i386/pr105638.C: New test. > >> > --- > >> > gcc/dse.cc | 22 ++++++++++++ > >> > gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++ > >> > 2 files changed, 66 insertions(+) > >> > create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C > >> > > >> > diff --git a/gcc/dse.cc b/gcc/dse.cc > >> > index 30c11cee034..af8e88dac32 100644 > >> > --- a/gcc/dse.cc > >> > +++ b/gcc/dse.cc > >> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info) > >> > > >> > if (tem && CONSTANT_P (tem)) > >> > const_rhs = tem; > >> > + else > >> > + { > >> > + /* If RHS is set only once to a constant, set CONST_RHS > >> > + to the constant. */ > >> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); > >> > + if (def != nullptr > >> > + && !DF_REF_IS_ARTIFICIAL (def) > >> > + && !(DF_REF_FLAGS (def) > >> > + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) > >> > + && !DF_REF_NEXT_REG (def)) > >> > >> Can we introduce a helper for this? There are already similar tests > >> in ira and loop-iv, and it seems a bit too complex to have to open-code > >> each time. > > > > I can use find_single_def_src in loop-iv.cc: > > > > /* If REGNO has a single definition, return its known value, otherwise return > > null. */ > > > > rtx > > find_single_def_src (unsigned int regno) > > Yeah, reusing that sounds good. Perhaps we should move it into df-core.cc, > alongside the df_reg_used group of functions. > > I think the mode check in your original patch should be kept though, > so how about we change the parameter to an rtx reg and use rtx_equal in: > > rtx set = single_set (DF_REF_INSN (adef)); > if (set == NULL || !REG_P (SET_DEST (set)) > || REGNO (SET_DEST (set)) != regno) > return NULL_RTX; Fixed. > rather than the existing !REG_P and REGNO checks. We should also add: > > > > > and do > > > > /* If RHS is set only once to a constant, set CONST_RHS > > to the constant. */ > > rtx def_src = find_single_def_src (REGNO (rhs)); > > if (def_src != nullptr && CONSTANT_P (def_src)) > > { > > df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); > > if (!(DF_REF_FLAGS (def) > > & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) > > …this check to the function, since it's needed for correctness even > in the loop-iv.cc use. Fixed. > > Thanks, > Richard Here is the v3 patch. OK for master? Thanks. H.J. --- RTL DSE tracks redundant constant stores within a basic block. When RTL loop invariant motion hoists a constant initialization out of the loop into a separate basic block, the constant store value becomes unknown within the original basic block. When recording store for RTL DSE, check if the source register is set only once to a constant by a non-partial unconditional load. If yes, record the constant as the constant store source. It eliminates unrolled zero stores after memset 0 in a loop where a vector register is used as the zero store source. Extract find_single_def_src from loop-iv.cc and move it to df-core.cc: 1. Rename to df_find_single_def_src. 2. Change the argument to rtx and use rtx_equal_p. 3. Return null for partial or conditional defs. gcc/ PR rtl-optimization/105638 * df-core.cc (df_find_single_def_sr): Moved and renamed from find_single_def_src in loop-iv.cc. Change the argument to rtx and use rtx_equal_p. Return null for partial or conditional defs. * df.h (df_find_single_def_src): New prototype. * dse.cc (record_store): Use the constant source if the source register is set only once. * loop-iv.cc (find_single_def_src): Moved to df-core.cc. (replace_single_def_regs): Replace find_single_def_src with df_find_single_def_src. gcc/testsuite/ PR rtl-optimization/105638 * g++.target/i386/pr105638.C: New test. --- gcc/df-core.cc | 44 +++++++++++++++++++++++ gcc/df.h | 1 + gcc/dse.cc | 14 ++++++++ gcc/loop-iv.cc | 45 +----------------------- gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++ 5 files changed, 104 insertions(+), 44 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C diff --git a/gcc/df-core.cc b/gcc/df-core.cc index a901b84878f..f9b4de8eb7a 100644 --- a/gcc/df-core.cc +++ b/gcc/df-core.cc @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg) return df_find_use (insn, reg) != NULL; } +/* If REG has a single definition, return its known value, otherwise return + null. */ + +rtx +df_find_single_def_src (rtx reg) +{ + rtx src = NULL_RTX; + + /* Don't look through unbounded number of single definition REG copies, + there might be loops for sources with uninitialized variables. */ + for (int cnt = 0; cnt < 128; cnt++) + { + df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg)); + if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL + || DF_REF_IS_ARTIFICIAL (adef) + || (DF_REF_FLAGS (adef) + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) + return NULL_RTX; + + rtx set = single_set (DF_REF_INSN (adef)); + if (set == NULL || !rtx_equal_p (SET_DEST (set), reg)) + return NULL_RTX; + + rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef)); + if (note && function_invariant_p (XEXP (note, 0))) + { + src = XEXP (note, 0); + break; + } + src = SET_SRC (set); + + if (REG_P (src)) + { + reg = src; + continue; + } + break; + } + if (!function_invariant_p (src)) + return NULL_RTX; + + return src; +} + /*---------------------------------------------------------------------------- Debugging and printing functions. diff --git a/gcc/df.h b/gcc/df.h index bd329205d08..71e249ad20a 100644 --- a/gcc/df.h +++ b/gcc/df.h @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx); extern bool df_reg_defined (rtx_insn *, rtx); extern df_ref df_find_use (rtx_insn *, rtx); extern bool df_reg_used (rtx_insn *, rtx); +extern rtx df_find_single_def_src (rtx); extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int); extern void df_print_regset (FILE *file, const_bitmap r); extern void df_print_word_regset (FILE *file, const_bitmap r); diff --git a/gcc/dse.cc b/gcc/dse.cc index 30c11cee034..c915266f025 100644 --- a/gcc/dse.cc +++ b/gcc/dse.cc @@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info) if (tem && CONSTANT_P (tem)) const_rhs = tem; + else + { + /* If RHS is set only once to a constant, set CONST_RHS + to the constant. */ + rtx def_src = df_find_single_def_src (rhs); + if (def_src != nullptr && CONSTANT_P (def_src)) + { + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs)); + rtx_insn *def_insn = DF_REF_INSN (def); + rtx def_body = single_set (def_insn); + if (rhs == SET_DEST (def_body)) + const_rhs = def_src; + } + } } } diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc index 0eafe7d2362..d639336445a 100644 --- a/gcc/loop-iv.cc +++ b/gcc/loop-iv.cc @@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs) } } -/* If REGNO has a single definition, return its known value, otherwise return - null. */ - -static rtx -find_single_def_src (unsigned int regno) -{ - rtx src = NULL_RTX; - - /* Don't look through unbounded number of single definition REG copies, - there might be loops for sources with uninitialized variables. */ - for (int cnt = 0; cnt < 128; cnt++) - { - df_ref adef = DF_REG_DEF_CHAIN (regno); - if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL - || DF_REF_IS_ARTIFICIAL (adef)) - return NULL_RTX; - - rtx set = single_set (DF_REF_INSN (adef)); - if (set == NULL || !REG_P (SET_DEST (set)) - || REGNO (SET_DEST (set)) != regno) - return NULL_RTX; - - rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef)); - if (note && function_invariant_p (XEXP (note, 0))) - { - src = XEXP (note, 0); - break; - } - src = SET_SRC (set); - - if (REG_P (src)) - { - regno = REGNO (src); - continue; - } - break; - } - if (!function_invariant_p (src)) - return NULL_RTX; - - return src; -} - /* If any registers in *EXPR that have a single definition, try to replace them with the known-equivalent values. */ @@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr) { rtx x = *iter; if (REG_P (x)) - if (rtx new_x = find_single_def_src (REGNO (x))) + if (rtx new_x = df_find_single_def_src (x)) { *expr = simplify_replace_rtx (*expr, x, new_x); goto repeat; diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C new file mode 100644 index 00000000000..ff40a459de1 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr105638.C @@ -0,0 +1,44 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */ +/* { dg-final { scan-assembler-not "vpxor" } } */ + +#include +#include +#include + +class FastBoard { +public: + typedef std::pair movescore_t; + typedef std::tr1::array scoredlist_t; + +protected: + std::vector m_critical; + + int m_boardsize; +}; + +class FastState { +public: + FastBoard board; + + int movenum; +protected: + FastBoard::scoredlist_t scoredmoves; +}; + +class KoState : public FastState { +private: + std::vector ko_hash_history; + std::vector hash_history; +}; + +class GameState : public KoState { +public: + void foo (); +private: + std::vector game_history; +}; + +void GameState::foo() { + game_history.resize(movenum); +}