From patchwork Thu Feb 8 17:10:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 870973 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472875-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="doODISeE"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zcl7x4BgYz9s7F for ; Fri, 9 Feb 2018 04:10:36 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=eIBTvhkbOds8cbVAM cnOW1Lb5Ih2RtdjaYIXLc+Z/53l4G/QLcX1jvVFgjhTKLs+ZtZgh8+AJb28BlwbI wQH6IAc81x5M7APx8M/+jQey+Wk03ZkjW4euNF5BGMU0LMNn6ry+Pn6yfJJwiM7K qdAT2zQ5LjIuvbOa1IidHJuBhY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=pKspAg7FtxfV88CZ0Bm8xUU Bg2w=; b=doODISeE/HhB9rbmOEjzKZ6fFekBzS4Kjc5ke+fMnAT0y2MrIB+xYOH Lrtb1tA77MawEgmweevQJYYnsbynQIRi53t8Tao9KJvqYgqYxXqpp4btmlGndNHR 0yppUYePdHQTFA81lRaKjDUPuUcoL3CnlIDW+t0ktSAHPDEgIluQ= Received: (qmail 72098 invoked by alias); 8 Feb 2018 17:10:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 72087 invoked by uid 89); 8 Feb 2018 17:10:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-23.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, LIKELY_SPAM_BODY, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=usd X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Feb 2018 17:10:26 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A6C780D; Thu, 8 Feb 2018 09:10:24 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AB8D23F25C; Thu, 8 Feb 2018 09:10:23 -0800 (PST) Message-ID: <5A7C847E.1020704@foss.arm.com> Date: Thu, 08 Feb 2018 17:10:22 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: "Richard Earnshaw (lists)" , "gcc-patches@gcc.gnu.org" CC: Marcus Shawcroft , James Greenhalgh Subject: [PATCH][AArch64][1/3] PR target/84164: Simplify subreg + redundant AND-immediate (was: PR target/84164: Relax predicate in *aarch64__reg_di3_mask2) References: <5A747F56.7020403@foss.arm.com> <5A748725.4020500@foss.arm.com> <03571583-e3fa-cdc4-8abe-343bb02f6ec6@arm.com> In-Reply-To: <03571583-e3fa-cdc4-8abe-343bb02f6ec6@arm.com> On 06/02/18 11:32, Richard Earnshaw (lists) wrote: > On 02/02/18 15:43, Kyrill Tkachov wrote: >> Hi Richard, >> >> On 02/02/18 15:25, Richard Earnshaw (lists) wrote: >>> On 02/02/18 15:10, Kyrill Tkachov wrote: >>>> Hi all, >>>> >>>> In this [8 Regression] PR we ICE because we can't recognise the insn: >>>> (insn 59 58 43 7 (set (reg:DI 124) >>>> (rotatert:DI (reg:DI 125 [ c ]) >>>> (subreg:QI (and:SI (reg:SI 128) >>>> (const_int 65535 [0xffff])) 0))) >>> Aren't we heading off down the wrong path here? >>> >>> (subreg:QI (and:SI (reg:SI 128) (const_int 65535 [0xffff])) 0)) >>> >>> can be simplified to >>> >>> (subreg:QI (reg:SI 128) 0) >>> >>> since the AND operation is redundant as we're only looking at the bottom >>> 8 bits. >> I have tried implementing such a transformation in combine [1] >> but it was not clear that it was universally beneficial. >> See the linked thread and PR 70119 for the discussion (the thread >> continues into the next month). > Is that really the same thing? The example there was using a mask that > was narrower than the subreg and thus not redundant. The case here is > where the mask is completely redundant because we are only looking at > the bottom 8 bits of the result (which are not changed by the AND > operation). I think you're right Richard, we can teach simplify-rtx to handle this. This would make the fix a bit more involved. The attached patch implements it. It adds a simplification rule to simplify_subreg to collapse subregs of AND-immediate operations when the mask covers the mode mask. This allows us to simplify (subreg:QI (and:SI (reg:SI) (const_int 0xff)) 0) into (subreg:QI (reg:SI) 0). We cannot completely remove the creation of SUBREG-AND-immediate RTXes in the aarch64 splitters because there are cases where we still need that construct, in particular when the mask covers the just the mode size, for example: (subreg:QI (and:SI (reg:SI) (const_int 31 [0x1F])) 0). In this case we cannot simplify this to (subreg:QI (reg:SI) 0) in the midend and we cannot rely on the aarch64-specific behaviour of the integer LSR, LSL, ROR instructions to truncate the register shift amount by the mode width because these patterns may match the integer SISD alternatives (USHR and friends) that don't perform an implicit truncation. This patch passes bootstrap and test on arm-none-linux-gnueabihf and aarch64-none-linux-gnu. There is a regression on aarch64 in the gcc.target/aarch64/bfxil_1.c testcase that I address with a separate patch. There is also an i386 regression that I address separately too. Is this version preferable? I'll ping the midend maintainers for the simplify-rtx.c change if so. Thanks, Kyrill 2018-02-08 Kyrylo Tkachov PR target/84164 * simplify-rtx.c (simplify_subreg): Simplify subreg of masking operation when mask covers the outermode bits. * config/aarch64/aarch64.md (*aarch64_reg_3_neg_mask2): Use simplify_gen_subreg when creating a SUBREG. (*aarch64_reg_3_minus_mask): Likewise. (*aarch64__reg_di3_mask2): Use const_int_operand predicate for operand 3. 2018-02-08 Kyrylo Tkachov PR target/84164 * gcc.c-torture/compile/pr84164.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 0d13c356d2a9b86c701c15b818e95df1a00abfc5..d9b4a405c9a1e5c09278b2329e5d73f12b0b963d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4278,8 +4278,10 @@ (define_insn_and_split "*aarch64_reg_3_neg_mask2" emit_insn (gen_negsi2 (tmp, operands[2])); rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]); - rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, - SUBREG_BYTE (operands[4])); + rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[4]), and_op, + SImode, SUBREG_BYTE (operands[4])); + + gcc_assert (subreg_tmp); emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); DONE; } @@ -4305,9 +4307,10 @@ (define_insn_and_split "*aarch64_reg_3_minus_mask" emit_insn (gen_negsi2 (tmp, operands[3])); rtx and_op = gen_rtx_AND (SImode, tmp, operands[4]); - rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[5]), and_op, - SUBREG_BYTE (operands[5])); + rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[5]), and_op, + SImode, SUBREG_BYTE (operands[5])); + gcc_assert (subreg_tmp); emit_insn (gen_ashl3 (operands[0], operands[1], subreg_tmp)); DONE; } @@ -4318,9 +4321,9 @@ (define_insn "*aarch64__reg_di3_mask2" (SHIFT:DI (match_operand:DI 1 "register_operand" "r") (match_operator 4 "subreg_lowpart_operator" - [(and:SI (match_operand:SI 2 "register_operand" "r") - (match_operand 3 "aarch64_shift_imm_di" "Usd"))])))] - "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode)-1)) == 0)" + [(and:SI (match_operand:SI 2 "register_operand" "r") + (match_operand 3 "const_int_operand" "n"))])))] + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode) - 1)) == 0)" { rtx xop[3]; xop[0] = operands[0]; diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 2e7aa5c12952ab1a9b49b5adaf23710327e577d3..af06d7502cebac03cefc689b2646874b8397e767 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -6474,6 +6474,18 @@ simplify_subreg (machine_mode outermode, rtx op, return NULL_RTX; } + /* Simplify (subreg:QI (and:SI (reg:SI) (const_int 0xffff)) 0) + into (subreg:QI (reg:SI) 0). */ + scalar_int_mode int_outermode, int_innermode; + if (!paradoxical_subreg_p (outermode, innermode) + && is_a (outermode, &int_outermode) + && is_a (innermode, &int_innermode) + && GET_CODE (op) == AND && CONST_INT_P (XEXP (op, 1)) + && known_eq (subreg_lowpart_offset (outermode, innermode), byte) + && (~INTVAL (XEXP (op, 1)) & GET_MODE_MASK (int_outermode)) == 0 + && validate_subreg (outermode, innermode, XEXP (op, 0), byte)) + return gen_rtx_SUBREG (outermode, XEXP (op, 0), byte); + /* A SUBREG resulting from a zero extension may fold to zero if it extracts higher bits that the ZERO_EXTEND's source bits. */ if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode)) @@ -6483,7 +6495,7 @@ simplify_subreg (machine_mode outermode, rtx op, return CONST0_RTX (outermode); } - scalar_int_mode int_outermode, int_innermode; + if (is_a (outermode, &int_outermode) && is_a (innermode, &int_innermode) && known_eq (byte, subreg_lowpart_offset (int_outermode, int_innermode))) diff --git a/gcc/testsuite/gcc.c-torture/compile/pr84164.c b/gcc/testsuite/gcc.c-torture/compile/pr84164.c new file mode 100644 index 0000000000000000000000000000000000000000..d663fe5d6649e495f3e956e6a3bc938236a4bf91 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr84164.c @@ -0,0 +1,17 @@ +/* PR target/84164. */ + +int b, d; +unsigned long c; + +static inline __attribute__ ((always_inline)) void +bar (int e) +{ + e += __builtin_sub_overflow_p (b, d, c); + c = c << ((short) ~e & 3) | c >> (-((short) ~e & 3) & 63); +} + +int +foo (void) +{ + bar (~1); +}