From patchwork Sat Apr 8 13:34:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1766909 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=dqFNiANN; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Ptx7c6xL9z1yZ5 for ; Sat, 8 Apr 2023 23:35:23 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 986F9385842A for ; Sat, 8 Apr 2023 13:35:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 986F9385842A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1680960918; bh=XU2OVsL7/+CfEBMfl0Kr0byFDbkQ2Ji0Ux/CIzCpSXA=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=dqFNiANNgx9mMI80Bhhy2OXNRtnx4IZk066J3ws9uWrWl6i1PynpWm5Poj0wHN6Dt LMKoFvSc+iHDrrQeHjHnb8LeIhEO46JN09HXyvyOE4yeGjo94p1Wdan95i5bjJNo8j v7cGhmOSapsIGoE+b9xob3hTEUZGFOt4xOk36l24= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E8DDB3858D1E for ; Sat, 8 Apr 2023 13:34:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E8DDB3858D1E Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 338BjUUV003078; Sat, 8 Apr 2023 13:34:57 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pu0jxer03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 Apr 2023 13:34:57 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 338DUPvO003960; Sat, 8 Apr 2023 13:34:56 GMT Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pu0jxeqyu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 Apr 2023 13:34:56 +0000 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 338DD9iO012931; Sat, 8 Apr 2023 13:34:56 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([9.208.130.98]) by ppma02wdc.us.ibm.com (PPS) with ESMTPS id 3pu0f99tqs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 Apr 2023 13:34:56 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 338DYrwZ8847942 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 8 Apr 2023 13:34:54 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D1B245805D; Sat, 8 Apr 2023 13:34:53 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E6B9258043; Sat, 8 Apr 2023 13:34:52 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.59.115]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Sat, 8 Apr 2023 13:34:52 +0000 (GMT) Date: Sat, 8 Apr 2023 09:34:51 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt , chip.kerchner@ibm.com Subject: [PATCH, V3] PR target/70243 - Do not generate vmaddfp or vnmsubdp Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt , chip.kerchner@ibm.com MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Y3Z4OblJgcMopBqC10GqZdWR55WoJY6C X-Proofpoint-GUID: VyjVM7oaC7gtgVTbb20dhyXuXaHuAMfv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-08_07,2023-04-06_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 lowpriorityscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 suspectscore=0 malwarescore=0 priorityscore=1501 adultscore=0 impostorscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304080121 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This is version 3 of the patch. This is essentially version 1 with the removal of changes to altivec.md, and cleanup of the comments. Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used, and those changes are deleted in this patch. The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating these instructions seems to break Eigen on big endian systems. I have done bootstrap builds on power9 little endian (with both IEEE long double and IBM long double). I have also done the builds and test on a power8 big endian system (testing both 32-bit and 64-bit code generation). Chip has verified that it fixes the problem that Eigen encountered. Can I check this into the master GCC branch? After a burn-in period, can I check this patch into the active GCC branches? Thanks in advance. 2023-04-07 Michael Meissner gcc/ PR target/70243 * config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate vmaddfp. (vsx_nfmsv4sf4): Do not generate vnmsubfp. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test. --- gcc/config/rs6000/vsx.md | 31 ++++++++-------- gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++++ 2 files changed, 55 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0865608f94a..c4c503cacad 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt2_internal" "xtsqrtp %0,%x1" [(set_attr "type" "")]) -;; Fused vector multiply/add instructions. Support the classical Altivec -;; versions of fma, which allows the target to be a separate register from the -;; 3 inputs. Under VSX, the target must be either the addend or the first -;; multiply. - +;; Fused vector multiply/add instructions. Do not generate the Altivec versions +;; of fma (vmaddfp and vnmsubfp). These instructions allows the target to be a +;; separate register from the 3 inputs, but they have different rounding +;; behaviors than the VSX instructions. (define_insn "*vsx_fmav4sf4" - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") (fma:V4SF - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))] + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))] "VECTOR_UNIT_VSX_P (V4SFmode)" "@ xvmaddasp %x0,%x1,%x2 - xvmaddmsp %x0,%x1,%x3 - vmaddfp %0,%1,%2,%3" + xvmaddmsp %x0,%x1,%x3" [(set_attr "type" "vecfloat")]) (define_insn "*vsx_fmav2df4" @@ -2066,18 +2064,17 @@ (define_insn "*vsx_nfma4" [(set_attr "type" "")]) (define_insn "*vsx_nfmsv4sf4" - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") (neg:V4SF (fma:V4SF - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") (neg:V4SF - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))] + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))] "VECTOR_UNIT_VSX_P (V4SFmode)" "@ xvnmsubasp %x0,%x1,%x2 - xvnmsubmsp %x0,%x1,%x3 - vnmsubfp %0,%1,%2,%3" + xvnmsubmsp %x0,%x1,%x3" [(set_attr "type" "vecfloat")]) (define_insn "*vsx_nfmsv2df4" diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c new file mode 100644 index 00000000000..18a5ce78792 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c @@ -0,0 +1,41 @@ +/* { dg-do compile */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* PR 70423, Make sure we don't generate vmaddfp or vnmsubfp. These + instructions have different rounding modes than the VSX instructions + xvmaddsp and xvnmsubsp. These tests are written where the 3 inputs and + target are all separate registers. Because vmaddfp and vnmsubfp are no + longer generated the compiler will have to generate an xsmaddsp or xsnmsubsp + instruction followed by a move operation. */ + +#include + +vector float +do_add1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return (a * b) + c; +} + +vector float +do_nsub1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return -((a * b) - c); +} + +vector float +do_add2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_madd (a, b, c); +} + +vector float +do_nsub2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_nmsub (a, b, c); +} + +/* { dg-final { scan-assembler {\mxvmadd[am]sp\M} } } */ +/* { dg-final { scan-assembler {\mxvnmsub[am]sp\M} } } */ +/* { dg-final { scan-assembler-not {\mvmaddfp\M} } } */ +/* { dg-final { scan-assembler-not {\mvnmsubfp\M} } } */