From patchwork Thu Feb 17 13:46:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1594252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=H5BOeh0h; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jzx341xy8z9s1l for ; Fri, 18 Feb 2022 00:47:27 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2BD6D385AC09 for ; Thu, 17 Feb 2022 13:47:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2BD6D385AC09 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1645105643; bh=59kzZRuAwpw3ddTg6IfTjHBwznFCtoj2y5Xf84zJuSg=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=H5BOeh0hHy66bw3nojC32+ZzjATHeV7/u6apKzZVnz+aI/f/C/zQGW6XdisLeSIcF g7XatOmUt9VGpwUOgg2uirIaTZc45QGs7rfzRqZomJybRXIjZw8IZz8NwfRLhYEQpm 6Gheqrk59w/UrM+eqA7frop/akhHIogChgpZkcrw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 526733858D20 for ; Thu, 17 Feb 2022 13:46:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 526733858D20 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3F91A2198B; Thu, 17 Feb 2022 13:46:40 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 20BA913BF6; Thu, 17 Feb 2022 13:46:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mF21BsBRDmIiJgAAMHmgww (envelope-from ); Thu, 17 Feb 2022 13:46:40 +0000 Date: Thu, 17 Feb 2022 14:46:39 +0100 (CET) To: gcc-patches@gcc.gnu.org Subject: [PATCH] target/104581 - compile-time regression in mode-switching MIME-Version: 1.0 Message-Id: <20220217134640.20BA913BF6@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The x86 backend piggy-backs on mode-switching for insertion of vzeroupper. A recent improvement there was implemented in a way to walk possibly the whole basic-block for all DF reg def definitions in its mode_needed hook which is called for each instruction in a basic-block during mode-switching local analysis. The following mostly reverts this improvement. It needs to be re-done in a way more consistent with a local dataflow which probably means making targets aware of the state of the local dataflow analysis. This improves compile-time of some 538.imagick_r TU from 362s to 16s with -Ofast -mavx2 -fprofile-generate. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2022-02-17 Richard Biener PR target/104581 * config/i386/i386.cc (ix86_avx_u128_mode_source): Remove. (ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY instead of calling ix86_avx_u128_mode_source which would eventually have returned AVX_U128_ANY in some very special case. * gcc.target/i386/pr101456-1.c: XFAIL. --- gcc/config/i386/i386.cc | 78 +--------------------- gcc/testsuite/gcc.target/i386/pr101456-1.c | 3 +- 2 files changed, 5 insertions(+), 76 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index cf246e74e57..e4b42fbba6f 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14377,80 +14377,12 @@ ix86_check_avx_upper_register (const_rtx exp) static void ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data) - { - if (ix86_check_avx_upper_register (dest)) +{ + if (ix86_check_avx_upper_register (dest)) { bool *used = (bool *) data; *used = true; } - } - -/* For YMM/ZMM store or YMM/ZMM extract. Return mode for the source - operand of SRC DEFs in the same basic block before INSN. */ - -static int -ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src) -{ - basic_block bb = BLOCK_FOR_INSN (insn); - rtx_insn *end = BB_END (bb); - - /* Return AVX_U128_DIRTY if there is no DEF in the same basic - block. */ - int status = AVX_U128_DIRTY; - - for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src)); - def; def = DF_REF_NEXT_REG (def)) - if (DF_REF_BB (def) == bb) - { - /* Ignore DEF from different basic blocks. */ - rtx_insn *def_insn = DF_REF_INSN (def); - - /* Check if DEF_INSN is before INSN. */ - rtx_insn *next; - for (next = NEXT_INSN (def_insn); - next != nullptr && next != end && next != insn; - next = NEXT_INSN (next)) - ; - - /* Skip if DEF_INSN isn't before INSN. */ - if (next != insn) - continue; - - /* Return AVX_U128_DIRTY if the source operand of DEF_INSN - isn't constant zero. */ - - if (CALL_P (def_insn)) - { - bool avx_upper_reg_found = false; - note_stores (def_insn, - ix86_check_avx_upper_stores, - &avx_upper_reg_found); - - /* Return AVX_U128_DIRTY if call returns AVX. */ - if (avx_upper_reg_found) - return AVX_U128_DIRTY; - - continue; - } - - rtx set = single_set (def_insn); - if (!set) - return AVX_U128_DIRTY; - - rtx dest = SET_DEST (set); - - /* Skip if DEF_INSN is not an AVX load. Return AVX_U128_DIRTY - if the source operand isn't constant zero. */ - if (ix86_check_avx_upper_register (dest) - && standard_sse_constant_p (SET_SRC (set), - GET_MODE (dest)) != 1) - return AVX_U128_DIRTY; - - /* We get here only if all AVX loads are from constant zero. */ - status = AVX_U128_ANY; - } - - return status; } /* Return needed mode for entity in optimize_mode_switching pass. */ @@ -14520,11 +14452,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) { FOR_EACH_SUBRTX (iter, array, src, NONCONST) if (ix86_check_avx_upper_register (*iter)) - { - int status = ix86_avx_u128_mode_source (insn, *iter); - if (status == AVX_U128_DIRTY) - return status; - } + return AVX_U128_DIRTY; } /* This isn't YMM/ZMM load/store. */ diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c index 803fc6e0207..7fb3a3f055c 100644 --- a/gcc/testsuite/gcc.target/i386/pr101456-1.c +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c @@ -30,4 +30,5 @@ foo3 (void) bar (); } -/* { dg-final { scan-assembler-not "vzeroupper" } } */ +/* See PR104581 for the XFAIL reason. */ +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */