From patchwork Thu Sep 9 07:54:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1526115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=VXkAnJ/Q; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4H4rrT5xdDz9sXk for ; Thu, 9 Sep 2021 17:54:48 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 51954384B0E8 for ; Thu, 9 Sep 2021 07:54:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 51954384B0E8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631174085; bh=kKASUw+3jmYIdRMiDoYRKRIVTt2tvVc7tifBjOD5VeE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=VXkAnJ/QpVnlg+qRpMU1MDs3eVObILUeAfC5z9zebMutYJp1Eg1KKzQhIQTnnCKYg XQI7j/AStt2v94u6KVm6l2HboRwZhTkhxYziE06yu0aSpwMYmw2nzwvhp0UuIBui57 Et0fpxxJIonQs+qAVu+G8X4KykXyyZ6K+LM/Laa4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 3193C384C002 for ; Thu, 9 Sep 2021 07:54:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3193C384C002 X-IronPort-AV: E=McAfee;i="6200,9189,10101"; a="284420606" X-IronPort-AV: E=Sophos;i="5.85,279,1624345200"; d="scan'208";a="284420606" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Sep 2021 00:54:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,279,1624345200"; d="scan'208";a="548406903" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga002.fm.intel.com with ESMTP; 09 Sep 2021 00:54:22 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1897sKpF007239; Thu, 9 Sep 2021 00:54:21 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH] [i386] Remove copysign post_reload splitter for scalar modes. Date: Thu, 9 Sep 2021 15:54:20 +0800 Message-Id: <20210909075420.2442868-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi: As a follow up of [1], the patch removes all scalar mode copysign related post_reload splitter/define_insn and expand copysign directly into below using paradoxical subregs. op3 = op1 & ~mask; op4 = op2 & mask; dest = op3 | op4; It can sometimes generate better code just like avx512dq-abs-copysign-1.c shows. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. gcc/ChangeLog: * config/i386/i386-expand.c (ix86_expand_copysign): Expand right into ANDNOT + AND + IOR, using paradoxical subregs. (ix86_split_copysign_const): Remove. (ix86_split_copysign_var): Ditto. * config/i386/i386-protos.h (ix86_split_copysign_const): Dotto. (ix86_split_copysign_var): Ditto. * config/i386/i386.md (@copysign3_const): Ditto. (@copysign3_var): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512dq-abs-copysign-1.c: Adjust testcase. * gcc.target/i386/avx512vl-abs-copysign-1.c: Adjust testcase. --- gcc/config/i386/i386-expand.c | 152 +++--------------- gcc/config/i386/i386-protos.h | 2 - gcc/config/i386/i386.md | 44 ----- .../gcc.target/i386/avx512dq-abs-copysign-1.c | 4 +- .../gcc.target/i386/avx512vl-abs-copysign-1.c | 4 +- 5 files changed, 30 insertions(+), 176 deletions(-) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index badbacc19d8..a0262a8f47d 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -2115,13 +2115,9 @@ void ix86_expand_copysign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask; + rtx dest, op0, op1, mask, op2, op3; - dest = operands[0]; - op0 = operands[1]; - op1 = operands[2]; - - mode = GET_MODE (dest); + mode = GET_MODE (operands[0]); if (mode == SFmode) vmode = V4SFmode; @@ -2132,136 +2128,40 @@ ix86_expand_copysign (rtx operands[]) else gcc_unreachable (); - mask = ix86_build_signbit_mask (vmode, 0, 0); - - if (CONST_DOUBLE_P (op0)) + if (rtx_equal_p (operands[1], operands[2])) { - if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0))) - op0 = simplify_unary_operation (ABS, mode, op0, mode); - - if (mode == SFmode || mode == DFmode) - { - if (op0 == CONST0_RTX (mode)) - op0 = CONST0_RTX (vmode); - else - { - rtx v = ix86_build_const_vector (vmode, false, op0); - - op0 = force_reg (vmode, v); - } - } - else if (op0 != CONST0_RTX (mode)) - op0 = force_reg (mode, op0); - - emit_insn (gen_copysign3_const (mode, dest, op0, op1, mask)); - } - else - { - rtx nmask = ix86_build_signbit_mask (vmode, 0, 1); - - emit_insn (gen_copysign3_var - (mode, dest, NULL_RTX, op0, op1, nmask, mask)); - } -} - -/* Deconstruct a copysign operation into bit masks. Operand 0 is known to - be a constant, and so has already been expanded into a vector constant. */ - -void -ix86_split_copysign_const (rtx operands[]) -{ - machine_mode mode, vmode; - rtx dest, op0, mask, x; - - dest = operands[0]; - op0 = operands[1]; - mask = operands[3]; - - mode = GET_MODE (dest); - vmode = GET_MODE (mask); - - dest = lowpart_subreg (vmode, dest, mode); - x = gen_rtx_AND (vmode, dest, mask); - emit_insn (gen_rtx_SET (dest, x)); - - if (op0 != CONST0_RTX (vmode)) - { - x = gen_rtx_IOR (vmode, dest, op0); - emit_insn (gen_rtx_SET (dest, x)); - } -} - -/* Deconstruct a copysign operation into bit masks. Operand 0 is variable, - so we have to do two masks. */ - -void -ix86_split_copysign_var (rtx operands[]) -{ - machine_mode mode, vmode; - rtx dest, scratch, op0, op1, mask, nmask, x; - - dest = operands[0]; - scratch = operands[1]; - op0 = operands[2]; - op1 = operands[3]; - nmask = operands[4]; - mask = operands[5]; - - mode = GET_MODE (dest); - vmode = GET_MODE (mask); - - if (rtx_equal_p (op0, op1)) - { - /* Shouldn't happen often (it's useless, obviously), but when it does - we'd generate incorrect code if we continue below. */ - emit_move_insn (dest, op0); + emit_move_insn (operands[0], operands[1]); return; } - if (REG_P (mask) && REGNO (dest) == REGNO (mask)) /* alternative 0 */ - { - gcc_assert (REGNO (op1) == REGNO (scratch)); - - x = gen_rtx_AND (vmode, scratch, mask); - emit_insn (gen_rtx_SET (scratch, x)); + dest = lowpart_subreg (vmode, operands[0], mode); + op1 = lowpart_subreg (vmode, operands[2], mode); + mask = ix86_build_signbit_mask (vmode, 0, 0); - dest = mask; - op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_NOT (vmode, dest); - x = gen_rtx_AND (vmode, x, op0); - emit_insn (gen_rtx_SET (dest, x)); - } - else + if (CONST_DOUBLE_P (operands[1])) { - if (REGNO (op1) == REGNO (scratch)) /* alternative 1,3 */ - { - x = gen_rtx_AND (vmode, scratch, mask); - } - else /* alternative 2,4 */ + op0 = simplify_unary_operation (ABS, mode, operands[1], mode); + /* Optimize for 0, simplify b = copy_signf (0.0f, a) to b = mask & a. */ + if (op0 == CONST0_RTX (mode)) { - gcc_assert (REGNO (mask) == REGNO (scratch)); - op1 = lowpart_subreg (vmode, op1, mode); - x = gen_rtx_AND (vmode, scratch, op1); + emit_move_insn (dest, gen_rtx_AND (vmode, mask, op1)); + return; } - emit_insn (gen_rtx_SET (scratch, x)); - if (REGNO (op0) == REGNO (dest)) /* alternative 1,2 */ - { - dest = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_AND (vmode, dest, nmask); - } - else /* alternative 3,4 */ - { - gcc_assert (REGNO (nmask) == REGNO (dest)); - dest = nmask; - op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_AND (vmode, dest, op0); - } - emit_insn (gen_rtx_SET (dest, x)); + if (GET_MODE_SIZE (mode) < 16) + op0 = ix86_build_const_vector (vmode, false, op0); + op0 = force_reg (vmode, op0); } - - x = gen_rtx_IOR (vmode, dest, scratch); - emit_insn (gen_rtx_SET (dest, x)); + else + op0 = lowpart_subreg (vmode, operands[1], mode); + + op2 = gen_reg_rtx (vmode); + op3 = gen_reg_rtx (vmode); + emit_move_insn (op2, gen_rtx_AND (vmode, + gen_rtx_NOT (vmode, mask), + op0)); + emit_move_insn (op3, gen_rtx_AND (vmode, mask, op1)); + emit_move_insn (dest, gen_rtx_IOR (vmode, op2, op3)); } /* Expand an xorsign operation. */ diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 72644e33a92..dcae34b915e 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -135,8 +135,6 @@ extern void ix86_expand_fp_absneg_operator (enum rtx_code, machine_mode, extern void ix86_split_fp_absneg_operator (enum rtx_code, machine_mode, rtx[]); extern void ix86_expand_copysign (rtx []); -extern void ix86_split_copysign_const (rtx []); -extern void ix86_split_copysign_var (rtx []); extern void ix86_expand_xorsign (rtx []); extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[]); extern bool ix86_match_ccmode (rtx, machine_mode); diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6b4ceb2bce3..ba0058dad81 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10861,50 +10861,6 @@ (define_expand "copysign3" || (TARGET_SSE && (mode == TFmode))" "ix86_expand_copysign (operands); DONE;") -(define_insn_and_split "@copysign3_const" - [(set (match_operand:SSEMODEF 0 "register_operand" "=Yv") - (unspec:SSEMODEF - [(match_operand: 1 "nonimm_or_0_operand" "YvmC") - (match_operand:SSEMODEF 2 "register_operand" "0") - (match_operand: 3 "nonimmediate_operand" "Yvm")] - UNSPEC_COPYSIGN))] - "(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))" - "#" - "&& reload_completed" - [(const_int 0)] - "ix86_split_copysign_const (operands); DONE;") - -(define_insn "@copysign3_var" - [(set (match_operand:SSEMODEF 0 "register_operand" "=Yv,Yv,Yv,Yv,Yv") - (unspec:SSEMODEF - [(match_operand:SSEMODEF 2 "register_operand" "Yv,0,0,Yv,Yv") - (match_operand:SSEMODEF 3 "register_operand" "1,1,Yv,1,Yv") - (match_operand: 4 - "nonimmediate_operand" "X,Yvm,Yvm,0,0") - (match_operand: 5 - "nonimmediate_operand" "0,Yvm,1,Yvm,1")] - UNSPEC_COPYSIGN)) - (clobber (match_scratch: 1 "=Yv,Yv,Yv,Yv,Yv"))] - "(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))" - "#") - -(define_split - [(set (match_operand:SSEMODEF 0 "register_operand") - (unspec:SSEMODEF - [(match_operand:SSEMODEF 2 "register_operand") - (match_operand:SSEMODEF 3 "register_operand") - (match_operand: 4) - (match_operand: 5)] - UNSPEC_COPYSIGN)) - (clobber (match_scratch: 1))] - "((SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))) - && reload_completed" - [(const_int 0)] - "ix86_split_copysign_var (operands); DONE;") - (define_expand "xorsign3" [(match_operand:MODEF 0 "register_operand") (match_operand:MODEF 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c index cb542d09058..0107df7741a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c @@ -64,8 +64,8 @@ f6 (double x) } /* { dg-final { scan-assembler "vandps\[^\n\r\]*xmm16" } } */ -/* { dg-final { scan-assembler "vorps\[^\n\r\]*xmm16" } } */ +/* { dg-final { scan-assembler "vpternlogd\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vxorps\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vandpd\[^\n\r\]*xmm18" } } */ -/* { dg-final { scan-assembler "vorpd\[^\n\r\]*xmm18" } } */ +/* { dg-final { scan-assembler "vpternlogq\[^\n\r\]*xmm18" } } */ /* { dg-final { scan-assembler "vxorpd\[^\n\r\]*xmm18" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c index b375c5fad80..b27335b9d99 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c @@ -64,8 +64,8 @@ f6 (double x) } /* { dg-final { scan-assembler "vpandd\[^\n\r\]*xmm16" } } */ -/* { dg-final { scan-assembler "vpord\[^\n\r\]*xmm16" } } */ +/* { dg-final { scan-assembler "vpternlogd\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vpxord\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vpandq\[^\n\r\]*xmm18" } } */ -/* { dg-final { scan-assembler "vporq\[^\n\r\]*xmm18" } } */ +/* { dg-final { scan-assembler "vpternlogq\[^\n\r\]*xmm18" } } */ /* { dg-final { scan-assembler "vpxorq\[^\n\r\]*xmm18" } } */