From patchwork Mon Feb 21 16:45:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1595683 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Otyf/S+0; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4K2SrC4888z9sGP for ; Tue, 22 Feb 2022 03:46:51 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E1A5E3858439 for ; Mon, 21 Feb 2022 16:46:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E1A5E3858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1645462008; bh=yqJM3XOjo1owZ/0Of8CEfIlB7PeI7WvAkm5U8Le0BJ4=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Otyf/S+0tGm9CsI1PV1sU7M2UToRUxHN2iuEGcw8YXs3HT0e23sXzLWJ7KareBzyo XMzE65HOz5abXQODYS+0Dv5iIf4rxXWxusleZ0YpQngx4JCX/03NXPO/Xgz2fXZAYX GrSXmZig8WRb15a54Csm0RM2oySQSdIPsDtaC9XY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 979803858410 for ; Mon, 21 Feb 2022 16:46:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 979803858410 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-149-SsSpuTb2N5G5vopdxEYdHg-1; Mon, 21 Feb 2022 11:46:02 -0500 X-MC-Unique: SsSpuTb2N5G5vopdxEYdHg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8061C343DB; Mon, 21 Feb 2022 16:46:01 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.125]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1992977C8A; Mon, 21 Feb 2022 16:46:00 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 21LGjwcg2866296 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 21 Feb 2022 17:45:58 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 21LGjvMq2866295; Mon, 21 Feb 2022 17:45:57 +0100 Date: Mon, 21 Feb 2022 17:45:57 +0100 To: Uros Bizjak , Hongtao Liu Subject: [PATCH] i386: Fix up copysign/xorsign expansion [PR104612] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi! We ICE on the following testcase for -m32 since r12-3435. because operands[2] is (subreg:SF (reg:DI ...) 0) and lowpart_subreg (V4SFmode, operands[2], SFmode) returns NULL, and that is what we use in AND etc. insns we emit. The following patch (non-attached) fixes that by calling force_reg for the input operands, to make sure they are really REGs and so lowpart_subreg will succeed on them - even for theoretical MEMs using REGs there seems desirable, we don't want to read following memory slots for the paradoxical subreg. For the outputs, I thought we'd get better code by always computing result into a new pseudo and them move lowpart of that pseudo into dest. I've bootstrapped/regtested this version on x86_64-linux and i686-linux, unfortunately it regressed FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps on which the patch changes: vandps .LC0(%rip), %xmm1, %xmm1 - vxorps %xmm0, %xmm1, %xmm0 + vxorps %xmm0, %xmm1, %xmm1 + vmovaps %xmm1, %xmm0 ret The RA sees: (insn 8 4 9 2 (set (reg:V4SF 85) (and:V4SF (subreg:V4SF (reg:SF 90) 0) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3} (expr_list:REG_DEAD (reg:SF 90) (nil))) (insn 9 8 10 2 (set (reg:V4SF 87) (xor:V4SF (reg:V4SF 85) (subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842 {*xorv4sf3} (expr_list:REG_DEAD (reg:SF 89) (expr_list:REG_DEAD (reg:V4SF 85) (nil)))) (insn 10 9 14 2 (set (reg:SF 82 [ ]) (subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142 {*movsf_internal} (expr_list:REG_DEAD (reg:V4SF 87) (nil))) (insn 14 10 15 2 (set (reg/i:SF 20 xmm0) (reg:SF 82 [ ])) "pr89984-2.c":8:1 142 {*movsf_internal} (expr_list:REG_DEAD (reg:SF 82 [ ]) (nil))) (insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1 (nil)) and doesn't know that if it would use xmm0 not just for pseudo 82 but also for pseudo 87, it could create a noop move in insn 10 and so could avoid an extra register copy and nothing later on is able to figure that out either. I don't know how the RA should know that though. Anyway, so that we don't regress, I have an alternative patch in attachment, which will do this stuff (i.e. use fresh vector pseudo as destination and then move lowpart of that to dest) over what it used before (i.e. use paradoxical subreg of the dest) only if lowpart_subreg returns NULL. Ok for trunk if the attached version passes bootstrap/regtest? 2022-02-21 Jakub Jelinek PR target/104612 * config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg on input operands before calling lowpart_subreg on it. For output operand, use a vmode pseudo as destination and then move its lowpart subreg into operands[0]. (ix86_expand_xorsign): Likewise. * gcc.dg/pr104612.c: New test. Jakub 2022-02-21 Jakub Jelinek PR target/104612 * config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg on input operands before calling lowpart_subreg on it. For output operand, use a vmode pseudo as destination and then move its lowpart subreg into operands[0] if lowpart_subreg fails on dest. (ix86_expand_xorsign): Likewise. * gcc.dg/pr104612.c: New test. --- gcc/config/i386/i386-expand.cc.jj 2022-02-21 16:51:36.639411090 +0100 +++ gcc/config/i386/i386-expand.cc 2022-02-21 17:20:11.655150129 +0100 @@ -2153,7 +2153,7 @@ void ix86_expand_copysign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask, op2, op3; + rtx dest, vdest, op0, op1, mask, op2, op3; mode = GET_MODE (operands[0]); @@ -2174,8 +2174,13 @@ ix86_expand_copysign (rtx operands[]) return; } - dest = lowpart_subreg (vmode, operands[0], mode); - op1 = lowpart_subreg (vmode, operands[2], mode); + dest = operands[0]; + vdest = lowpart_subreg (vmode, dest, mode); + if (vdest == NULL_RTX) + vdest = gen_reg_rtx (vmode); + else + dest = NULL_RTX; + op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); mask = ix86_build_signbit_mask (vmode, 0, 0); if (CONST_DOUBLE_P (operands[1])) @@ -2184,7 +2189,9 @@ ix86_expand_copysign (rtx operands[]) /* Optimize for 0, simplify b = copy_signf (0.0f, a) to b = mask & a. */ if (op0 == CONST0_RTX (mode)) { - emit_move_insn (dest, gen_rtx_AND (vmode, mask, op1)); + emit_move_insn (vdest, gen_rtx_AND (vmode, mask, op1)); + if (dest) + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); return; } @@ -2193,7 +2200,7 @@ ix86_expand_copysign (rtx operands[]) op0 = force_reg (vmode, op0); } else - op0 = lowpart_subreg (vmode, operands[1], mode); + op0 = lowpart_subreg (vmode, force_reg (mode, operands[1]), mode); op2 = gen_reg_rtx (vmode); op3 = gen_reg_rtx (vmode); @@ -2201,7 +2208,9 @@ ix86_expand_copysign (rtx operands[]) gen_rtx_NOT (vmode, mask), op0)); emit_move_insn (op3, gen_rtx_AND (vmode, mask, op1)); - emit_move_insn (dest, gen_rtx_IOR (vmode, op2, op3)); + emit_move_insn (vdest, gen_rtx_IOR (vmode, op2, op3)); + if (dest) + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); } /* Expand an xorsign operation. */ @@ -2210,7 +2219,7 @@ void ix86_expand_xorsign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask, x, temp; + rtx dest, vdest, op0, op1, mask, x, temp; dest = operands[0]; op0 = operands[1]; @@ -2230,15 +2239,22 @@ ix86_expand_xorsign (rtx operands[]) temp = gen_reg_rtx (vmode); mask = ix86_build_signbit_mask (vmode, 0, 0); - op1 = lowpart_subreg (vmode, op1, mode); + op1 = lowpart_subreg (vmode, force_reg (mode, op1), mode); x = gen_rtx_AND (vmode, op1, mask); emit_insn (gen_rtx_SET (temp, x)); - op0 = lowpart_subreg (vmode, op0, mode); + op0 = lowpart_subreg (vmode, force_reg (mode, op0), mode); x = gen_rtx_XOR (vmode, temp, op0); - dest = lowpart_subreg (vmode, dest, mode); - emit_insn (gen_rtx_SET (dest, x)); + vdest = lowpart_subreg (vmode, dest, mode); + if (vdest == NULL_RTX) + vdest = gen_reg_rtx (vmode); + else + dest = NULL_RTX; + emit_insn (gen_rtx_SET (vdest, x)); + + if (dest) + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); } static rtx ix86_expand_compare (enum rtx_code code, rtx op0, rtx op1); --- gcc/testsuite/gcc.dg/pr104612.c.jj 2022-02-21 17:16:32.947140573 +0100 +++ gcc/testsuite/gcc.dg/pr104612.c 2022-02-21 17:16:32.947140573 +0100 @@ -0,0 +1,27 @@ +/* PR target/104612 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-additional-options "-msse2 -mfpmath=sse" { target i?86-*-* x86_64-*-* } } */ + +struct V { float x, y; }; + +struct V +foo (struct V v) +{ + struct V ret; + ret.x = __builtin_copysignf (1.0e+0, v.x); + ret.y = __builtin_copysignf (1.0e+0, v.y); + return ret; +} + +float +bar (struct V v) +{ + return __builtin_copysignf (v.x, v.y); +} + +float +baz (struct V v) +{ + return v.x * __builtin_copysignf (1.0f, v.y); +} --- gcc/config/i386/i386-expand.cc.jj 2022-02-09 20:45:03.463499205 +0100 +++ gcc/config/i386/i386-expand.cc 2022-02-21 13:14:31.756657743 +0100 @@ -2153,7 +2153,7 @@ void ix86_expand_copysign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask, op2, op3; + rtx dest, vdest, op0, op1, mask, op2, op3; mode = GET_MODE (operands[0]); @@ -2174,8 +2174,9 @@ ix86_expand_copysign (rtx operands[]) return; } - dest = lowpart_subreg (vmode, operands[0], mode); - op1 = lowpart_subreg (vmode, operands[2], mode); + dest = operands[0]; + vdest = gen_reg_rtx (vmode); + op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); mask = ix86_build_signbit_mask (vmode, 0, 0); if (CONST_DOUBLE_P (operands[1])) @@ -2184,7 +2185,8 @@ ix86_expand_copysign (rtx operands[]) /* Optimize for 0, simplify b = copy_signf (0.0f, a) to b = mask & a. */ if (op0 == CONST0_RTX (mode)) { - emit_move_insn (dest, gen_rtx_AND (vmode, mask, op1)); + emit_move_insn (vdest, gen_rtx_AND (vmode, mask, op1)); + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); return; } @@ -2193,7 +2195,7 @@ ix86_expand_copysign (rtx operands[]) op0 = force_reg (vmode, op0); } else - op0 = lowpart_subreg (vmode, operands[1], mode); + op0 = lowpart_subreg (vmode, force_reg (mode, operands[1]), mode); op2 = gen_reg_rtx (vmode); op3 = gen_reg_rtx (vmode); @@ -2201,7 +2203,8 @@ ix86_expand_copysign (rtx operands[]) gen_rtx_NOT (vmode, mask), op0)); emit_move_insn (op3, gen_rtx_AND (vmode, mask, op1)); - emit_move_insn (dest, gen_rtx_IOR (vmode, op2, op3)); + emit_move_insn (vdest, gen_rtx_IOR (vmode, op2, op3)); + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); } /* Expand an xorsign operation. */ @@ -2210,7 +2213,7 @@ void ix86_expand_xorsign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask, x, temp; + rtx dest, vdest, op0, op1, mask, x, temp; dest = operands[0]; op0 = operands[1]; @@ -2230,15 +2233,17 @@ ix86_expand_xorsign (rtx operands[]) temp = gen_reg_rtx (vmode); mask = ix86_build_signbit_mask (vmode, 0, 0); - op1 = lowpart_subreg (vmode, op1, mode); + op1 = lowpart_subreg (vmode, force_reg (mode, op1), mode); x = gen_rtx_AND (vmode, op1, mask); emit_insn (gen_rtx_SET (temp, x)); - op0 = lowpart_subreg (vmode, op0, mode); + op0 = lowpart_subreg (vmode, force_reg (mode, op0), mode); x = gen_rtx_XOR (vmode, temp, op0); - dest = lowpart_subreg (vmode, dest, mode); - emit_insn (gen_rtx_SET (dest, x)); + vdest = gen_reg_rtx (vmode); + emit_insn (gen_rtx_SET (vdest, x)); + + emit_move_insn (dest, lowpart_subreg (mode, vdest, vmode)); } static rtx ix86_expand_compare (enum rtx_code code, rtx op0, rtx op1); --- gcc/testsuite/gcc.dg/pr104612.c.jj 2022-02-21 13:26:41.134606451 +0100 +++ gcc/testsuite/gcc.dg/pr104612.c 2022-02-21 13:26:18.247922789 +0100 @@ -0,0 +1,27 @@ +/* PR target/104612 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-additional-options "-msse2 -mfpmath=sse" { target i?86-*-* x86_64-*-* } } */ + +struct V { float x, y; }; + +struct V +foo (struct V v) +{ + struct V ret; + ret.x = __builtin_copysignf (1.0e+0, v.x); + ret.y = __builtin_copysignf (1.0e+0, v.y); + return ret; +} + +float +bar (struct V v) +{ + return __builtin_copysignf (v.x, v.y); +} + +float +baz (struct V v) +{ + return v.x * __builtin_copysignf (1.0f, v.y); +}