From patchwork Thu Mar 19 06:48:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 1258004 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=q+gXYU/u; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48jcty2SF5z9sPF for ; Thu, 19 Mar 2020 17:48:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3B3A83947436; Thu, 19 Mar 2020 06:48:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3B3A83947436 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1584600499; bh=H7G/Gc/zCUcC3BVb5LsC2m+qfZEQEwHC47+Bw1+uSYU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=q+gXYU/uWrx5IrWbHIH6rucSoEKz6vlsG0ZNXU4W5WeU1IL5M11kaUyvzE4Zz/f66 3rWN2dUQ4JF9iiB+JHtFh54b2M6GXipmluhpXT7HTvixYokLuLi+pNfH3/WpS52Bh+ sU8t4l7vlP/LTZAeyP61tRbEnT7FId5O5LD/9bOA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pj1-x1041.google.com (mail-pj1-x1041.google.com [IPv6:2607:f8b0:4864:20::1041]) by sourceware.org (Postfix) with ESMTPS id BA62D394BE2C for ; Thu, 19 Mar 2020 06:48:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BA62D394BE2C Received: by mail-pj1-x1041.google.com with SMTP id j20so589430pjz.0 for ; Wed, 18 Mar 2020 23:48:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H7G/Gc/zCUcC3BVb5LsC2m+qfZEQEwHC47+Bw1+uSYU=; b=Nd0v51AJp2Gpnl9N3LqZJv4WLlkKN3/Aa6U1sx03KlDr9RQD2T2vtE01msgZyRp//M X8ZJ8aJPzwsr2Ebpcl12Nb6QydqsVUxwCh0XlfTVrbMo5qiCjY1K5tgZZSIN1IOv9S6h 9klu01Dm9xGW3rdBtjZBSsBa1Y9zKHA/ubl7iOsuqoW//t9A5DQtFNqKJhSCl6mhTcBi bYQ7QLvRFdl2OgrsIabqqkczTIiUtC3Kc+D9rDiQjkPGzCS7psA/5h2+crlM2tiaJcUZ A3X+pjzpTMbCFlDfubQI4DeuYq2f2nRoUy2ktkG9oahF6bVc8xv+hbDTAxq0Hejj4vOx Ye1Q== X-Gm-Message-State: ANhLgQ3cFDGOKcSPv6E1mN4JeK8EdMy8LB1Z51KfoXzBfQCWVYaYSg+t DimkaOzGn7uJT/NEqu/jrpcLKY1t/pI= X-Google-Smtp-Source: ADFU+vsYcP8B1f6/s+Fc78wxTQtoytuEUQwGHkvIrAy9Cum1c7ia1xTc9ZDKK7DRd8C3rMAZMiyNRQ== X-Received: by 2002:a17:902:222:: with SMTP id 31mr2185115plc.108.1584600494277; Wed, 18 Mar 2020 23:48:14 -0700 (PDT) Received: from localhost.localdomain (97-126-123-70.tukw.qwest.net. [97.126.123.70]) by smtp.gmail.com with ESMTPSA id kb18sm1028081pjb.14.2020.03.18.23.48.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Mar 2020 23:48:13 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] aarch64: Implement TImode comparisons Date: Wed, 18 Mar 2020 23:48:05 -0700 Message-Id: <20200319064805.17739-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200319064805.17739-1-richard.henderson@linaro.org> References: <20200319064805.17739-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-25.2 required=5.0 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Henderson via Gcc-patches From: "Li, Pan2 via Gcc-patches" Reply-To: Richard Henderson Cc: richard.earnshaw@arm.com, marcus.shawcroft@arm.com Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Use ccmp to perform all TImode comparisons branchless. * config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of the comparisons for TImode, not just NE. * config/aarch64/aarch64.md (cbranchti4, cstoreti4): New. --- gcc/config/aarch64/aarch64.c | 182 +++++++++++++++++++++++++++++++--- gcc/config/aarch64/aarch64.md | 28 ++++++ 2 files changed, 196 insertions(+), 14 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d7899dad759..911dc1c91cd 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2363,32 +2363,186 @@ rtx aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y) { machine_mode cmp_mode = GET_MODE (x); - machine_mode cc_mode; rtx cc_reg; if (cmp_mode == TImode) { - gcc_assert (code == NE); - - cc_mode = CCmode; - cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM); - rtx x_lo = operand_subword (x, 0, 0, TImode); - rtx y_lo = operand_subword (y, 0, 0, TImode); - emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo)); - rtx x_hi = operand_subword (x, 1, 0, TImode); - rtx y_hi = operand_subword (y, 1, 0, TImode); - emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi, - gen_rtx_EQ (cc_mode, cc_reg, const0_rtx), - GEN_INT (aarch64_nzcv_codes[AARCH64_NE]))); + rtx y_lo, y_hi, tmp; + + if (y == const0_rtx) + { + y_lo = y_hi = y; + switch (code) + { + case EQ: + case NE: + /* For equality, IOR the two halves together. If this gets + used for a branch, we expect this to fold to cbz/cbnz; + otherwise it's no larger than cmp+ccmp below. Beware of + the compare-and-swap post-reload split and use cmp+ccmp. */ + if (!can_create_pseudo_p ()) + break; + tmp = gen_reg_rtx (DImode); + emit_insn (gen_iordi3 (tmp, x_hi, x_lo)); + emit_insn (gen_cmpdi (tmp, const0_rtx)); + cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); + goto done; + + case LT: + case GE: + /* Check only the sign bit. Choose to expose this detail, + lest something later tries to use a COMPARE in a way + that doesn't correspond. This is "tst". */ + cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN)); + tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx); + emit_set_insn (cc_reg, tmp); + code = (code == LT ? NE : EQ); + goto done; + + case LE: + case GT: + /* For GT, (x_hi >= 0) && ((x_hi | x_lo) != 0), + and of course the inverse for LE. */ + emit_insn (gen_cmpdi (x_hi, const0_rtx)); + + tmp = gen_reg_rtx (DImode); + emit_insn (gen_iordi3 (tmp, x_hi, x_lo)); + + /* Combine the two terms: + (GE ? (compare tmp 0) : EQ), + so that the whole term is true for NE, false for EQ. */ + cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); + emit_insn (gen_ccmpccdi + (cc_reg, tmp, const0_rtx, + gen_rtx_GE (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_EQ]))); + + /* The result is entirely within the Z bit. */ + code = (code == GT ? NE : EQ); + goto done; + + default: + break; + } + } + else + { + y_lo = operand_subword (y, 0, 0, TImode); + y_hi = operand_subword (y, 1, 0, TImode); + } + + cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); + switch (code) + { + case EQ: + case NE: + /* For EQ, (x_lo == y_lo) && (x_hi == y_hi). */ + emit_insn (gen_cmpdi (x_lo, y_lo)); + emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi, + gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_NE]))); + break; + + case LEU: + case GTU: + std::swap (x_lo, y_lo); + std::swap (x_hi, y_hi); + code = swap_condition (code); + /* fall through */ + + case LTU: + case GEU: + /* For LTU, (x - y), as double-word arithmetic. */ + emit_insn (gen_cmpdi (x_lo, y_lo)); + /* The ucmp*_carryinC pattern uses zero_extend, and so cannot + take the constant 0 we allow elsewhere. Force to reg now + and allow combine to eliminate via simplification. */ + x_hi = force_reg (DImode, x_hi); + y_hi = force_reg (DImode, y_hi); + emit_insn (gen_ucmpdi3_carryinC(x_hi, y_hi)); + /* The result is entirely within the C bit. */ + break; + + case LE: + case GT: + /* + * For LE, + * !((x_hi > y_hi) || (x_hi == y_hi && x_lo > y_lo)) + * -> !(x_hi > y_hi) && !(x_hi == y_hi && x_lo > y_lo) + * -> (x_hi <= y_hi) && !(x_hi == y_hi && x_lo > y_lo) + */ + + /* Compute the first term (x_hi <= y_hi) and save it in tmp. */ + tmp = gen_reg_rtx (SImode); + emit_insn (gen_cmpdi (x_hi, y_hi)); + emit_set_insn (tmp, gen_rtx_LE (SImode, cc_reg, const0_rtx)); + + /* Compute the second term (x_hi == y_hi && x_lo > y_lo): + (EQ ? (compare x_lo y_lo) : LE), + so that the whole term is true for GT, false for LE. */ + emit_insn (gen_ccmpccdi (cc_reg, x_lo, y_lo, + gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_LE]))); + + /* Combine the two terms. Since we want !(second_term): + (LE ? (compare tmp 0) : EQ), + so that the whole term is true for NE, false for EQ. */ + emit_insn (gen_ccmpccsi (cc_reg, tmp, const0_rtx, + gen_rtx_LE (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_EQ]))); + + /* The result is entirely within the Z bit. */ + code = (code == GE ? NE : EQ); + break; + + case LT: + case GE: + /* + * For GE, + * !((x_hi < y_hi) || (x_hi == y_hi && x_lo < y_lo)) + * -> !(x_hi < y_hi) && !(x_hi == y_hi && x_lo < y_lo) + * -> (x_hi >= y_hi) && !(x_hi == y_hi && x_lo < y_lo) + * and of course the inverse for LT. + */ + + /* Compute the first term (x_hi >= y_hi) and save it in tmp. */ + tmp = gen_reg_rtx (SImode); + emit_insn (gen_cmpdi (x_hi, y_hi)); + emit_set_insn (tmp, gen_rtx_GE (SImode, cc_reg, const0_rtx)); + + /* Compute the second term (x_hi == y_hi && x_lo < y_lo): + (EQ ? (compare x_lo y_lo) : GE), + so that the whole term is true for LT, false for GE. */ + emit_insn (gen_ccmpccdi (cc_reg, x_lo, y_lo, + gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_GE]))); + + /* Combine the two terms. Since we want !(second_term): + (GE ? (compare tmp 0) : EQ), + so that the whole term is true for NE, false for EQ. */ + emit_insn (gen_ccmpccsi (cc_reg, tmp, const0_rtx, + gen_rtx_GE (VOIDmode, cc_reg, const0_rtx), + GEN_INT (aarch64_nzcv_codes[AARCH64_EQ]))); + + /* The result is entirely within the Z bit. */ + code = (code == GE ? NE : EQ); + break; + + default: + gcc_unreachable (); + } } else { - cc_mode = SELECT_CC_MODE (code, x, y); + machine_mode cc_mode = SELECT_CC_MODE (code, x, y); cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM); emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y)); } + + done: return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx); } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c789b641e7c..fb076b60e3c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -471,6 +471,20 @@ operands[2] = const0_rtx; }) +(define_expand "cbranchti4" + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" + [(match_operand:TI 1 "register_operand") + (match_operand:TI 2 "aarch64_reg_or_zero")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "" +{ + operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], + operands[2]); + operands[1] = XEXP (operands[0], 0); + operands[2] = const0_rtx; +}) + (define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPF 1 "register_operand") @@ -4144,6 +4158,20 @@ operands[3] = const0_rtx; }) +(define_expand "cstoreti4" + [(set (match_operand:SI 0 "register_operand") + (match_operator:SI 1 "aarch64_comparison_operator" + [(match_operand:TI 2 "register_operand") + (match_operand:TI 3 "aarch64_reg_or_zero")]))] + "" +{ + operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2], + operands[3]); + PUT_MODE (operands[1], SImode); + operands[2] = XEXP (operands[1], 0); + operands[3] = const0_rtx; +}) + (define_expand "cstorecc4" [(set (match_operand:SI 0 "register_operand") (match_operator 1 "aarch64_comparison_operator_mode"