From patchwork Thu Mar 10 17:54:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick O'Neill X-Patchwork-Id: 1604053 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=rivosinc-com.20210112.gappssmtp.com header.i=@rivosinc-com.20210112.gappssmtp.com header.a=rsa-sha256 header.s=20210112 header.b=YFCvkbCr; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KDxZQ73kyz9sGD for ; Fri, 11 Mar 2022 04:56:14 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 245A6385803B for ; Thu, 10 Mar 2022 17:56:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id E5EDC3858C39 for ; Thu, 10 Mar 2022 17:55:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E5EDC3858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pg1-x533.google.com with SMTP id t14so5357387pgr.3 for ; Thu, 10 Mar 2022 09:55:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3DMQrWqZjfUQseDipjPFrZTsxuYuSATFNKQdTCceewI=; b=YFCvkbCrJoohdcG8tcVm78yLyuiKLUhNg7rk8GiuN6Q00pUICJPDwlidL2ep9+H7Wd +GjXX3nRvGU44gzwG68xbiuXW4lbs6/KlpELNjCK2j39LFTHa60xlxAOtyVID2IcNupc mZU4CjPdoN3tuIjHHEUQsSpqTOMBoV532qave2DOg4yPMoUF0qX1XfIyN/TgPmlF3xGv K13RKURG5s3YpdvlkIhqVxjwJKpIMpC3nGuXZnOZZYzOhUMVBdSkQB32equco5nUepSX fbfa0M2jcZCRqytAyMETcgdkZxSI+VuqVu2OJ5ZJGjJab42/+PNNVQZHjqHT1DQPIdzB h32g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3DMQrWqZjfUQseDipjPFrZTsxuYuSATFNKQdTCceewI=; b=4zfLQtAwOqxswrK9HUvoNabtZKw+7Z8JBG556BW53OQ39El4nj5DmBbZJ+hnvoSYa7 J/pb4PWN3l+G9GTkZNTukvIzaQFbubcMcyNU7iZ9gxq8lx4OHAE1He4gc7AAHTSB2hSD dNw83fOnNlq/08HaJ3smAueZbp+fa+eVi94cDvhBQJlmA7FlczOnIaaL9d4WWAHNe0WX 2key0hdpYt35cxAq+7N33uTsJFIdADRxv+X6N79rFmjby8M1Cw9uJ/inrdab+lOIoGs1 zCATYq5vuL6ZjQmoYPCEE44bIvoc+cf9OlkaXqKRo+AGFtYEIhbi/3Xt0iaAPTF5U1Bq PkGA== X-Gm-Message-State: AOAM530kujHTj0+5W3tbuMsHjS6H4HYoohD/9bvlM3URNzsrKDJdenY+ aCYeWWQTSZ2Et/X2CPuSRQiTK6fu/AF6EyfA X-Google-Smtp-Source: ABdhPJxdmo7DD0guHyYYMh5q71NWuJUBdt4bSfShTBwZ5y1+WKUsG+CkXxLqQ5LrrK8cxPjqotJoGg== X-Received: by 2002:a62:8c44:0:b0:4c4:8072:e588 with SMTP id m65-20020a628c44000000b004c48072e588mr5957311pfd.11.1646934927393; Thu, 10 Mar 2022 09:55:27 -0800 (PST) Received: from patrick-ThinkPad-X1-Carbon-Gen-8.hq.rivosinc.com ([12.3.194.138]) by smtp.gmail.com with ESMTPSA id k14-20020aa7820e000000b004f7134a70cdsm7223435pfi.61.2022.03.10.09.55.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Mar 2022 09:55:26 -0800 (PST) From: Patrick O'Neill To: gcc-patches@gcc.gnu.org Subject: [RFC v2] RISCV: Combine Pass Clobber Ops Date: Thu, 10 Mar 2022 09:54:55 -0800 Message-Id: <20220310175455.3178485-1-patrick@rivosinc.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick O'Neill Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" RISC-V's C-extension describes 2-byte instructions with special constraints. One of those constraints is that one of the sources/dest registers are equal (op will clobber one of it's operands). This patch adds support for combining simple sequences: r1 = r2 + r3 (4 bytes) r2 DEAD r4 = r1 + r5 (4 bytes) r1 DEAD Combine pass now generates: r2 = r2 + r3 (2 bytes) r4 = r2 + r5 (4 bytes) r2 DEAD This change results in a ~150 Byte decrease in the linux kernel's compiled size (text: 5327254 Bytes -> 5327102 Bytes). I added this enforcement during the combine pass since it looks at the cost of certian expressions and can rely on the target to tell the pass that clobber-ops are cheaper than regular ops. The main thing holding this RFC back is the combine pass's behavior for sequences like this: b = a << 5; c = b + 2; Normally the combine pass modifies the RTL to be: c = (a << 5) + 2 before expanding it back to the original statement. With my changes, the RTL is prevented from being combined like that and instead results in RTL like this: c = 2 which is clearly wrong. I think that the next step would be to figure out where this re-expansion takes place and implement the same-register constraint there. However, I'm opening the RFC for any input: 1. Are there better ways to enforce same-register constraints during the combine pass other than declaring the source/dest register to be the same in RTL? Specifically, I'm concerned that this addition may restrict subsequent RTL pass optimizations. 2. Are there other concerns with implementing source-dest constraints within the combine pass? 3. Any other thoughts/input you have is welcome! 2022-03-10 Patrick O'Neill * combine.cc: Add register equality replacement. * riscv.cc (riscv_insn_cost): Add in order to tell combine pass that clobber-ops are cheaper. * riscv.h: Add c extension argument macros. Signed-off-by: Patrick O'Neill --- Changelog: v2: - Fix whitespace - Rearrange conditionals to break long lines --- gcc/combine.cc | 78 +++++++++++++++++++++++++++++++++++++++ gcc/config/riscv/riscv.cc | 42 +++++++++++++++++++++ gcc/config/riscv/riscv.h | 7 ++++ 3 files changed, 127 insertions(+) diff --git a/gcc/combine.cc b/gcc/combine.cc index 8f06ee0e54f..1b14e802166 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -3280,6 +3280,84 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, i2_is_used = n_occurrences; } +/* Attempt to replace ops with clobber-ops. + If the target implements clobber ops (set r1 (plus (r1)(r2))) as cheaper, + this pattern allows the combine pass to optimize with that in mind. + NOTE: This conditional is not triggered in most cases. Ideally we would be + able to move it above the if (i2_is_used == 0), but that breaks the + testsuite. + See RFC blurb for more info. */ + if (!i0 && !i1 && i2 && i3 && GET_CODE(PATTERN(i2)) == SET + && GET_CODE(SET_DEST(PATTERN(i2))) == REG + && GET_CODE(PATTERN(i3)) == SET + && (GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_BIN_ARITH + || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_COMM_ARITH + || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_UNARY) + && GET_CODE(SET_DEST(PATTERN(i3))) == REG + && REGNO(XEXP(SET_SRC(PATTERN(i3)), 0)) != REGNO(SET_DEST(PATTERN(i3)))) { + + rtx_code insn_class = GET_CODE(SET_SRC (PATTERN(i2))); + + if (GET_RTX_CLASS(insn_class) == RTX_BIN_ARITH + || GET_RTX_CLASS(insn_class) == RTX_COMM_ARITH + || GET_RTX_CLASS(insn_class) == RTX_UNARY) { + + rtx operand1 = XEXP (SET_SRC (PATTERN (i2)), 0); + rtx prior_reg = SET_DEST (PATTERN (i2)); + + if (GET_CODE(operand1) == REG + && GET_MODE(operand1) == GET_MODE(prior_reg) + && find_reg_note (i2, REG_DEAD, operand1) + && regno_use_in (REGNO(prior_reg), PATTERN(i3)) + && find_reg_note (i3, REG_DEAD, SET_DEST (PATTERN(i2)))) { + + // Now we have a dead operand register, and we know where the dest dies. + + // Remove the note declaring the register as dead + rtx note = find_reg_note (i2, REG_DEAD, operand1); + remove_note (i2, note); + + // Overwrite i2 dest with operand1 + rtx i2_dest = copy_rtx(operand1); + SUBST (SET_DEST (PATTERN (i2)), i2_dest); + + // Replace the previous i2 dest register with operand1 in i3 + rtx op1_copy = copy_rtx(operand1); + rtx new_src = simplify_replace_rtx(SET_SRC (PATTERN (i3)), + prior_reg, op1_copy); + SUBST (SET_SRC (PATTERN (i3)), new_src); + + // Move the dest dead note to the new register + note = find_reg_note (i3, REG_DEAD, prior_reg); + if (note) { + remove_note (i3, note); + //add_reg_note (i3, REG_DEAD, op1_copy); + } + + newi2pat = PATTERN (i2); + newpat = PATTERN (i3); + + subst_insn = i3; + subst_low_luid = DF_INSN_LUID (i2); + added_sets_2 = added_sets_1 = added_sets_0 = 0; + i2dest = i2_dest; + i2dest_killed = dead_or_set_p (i2, i2dest); + + changed_i3_dest = false; + + i2_code_number = recog_for_combine (&newi2pat, i2, &new_i2_notes); + + if (i2_code_number >= 0) + insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes); + + if (insn_code_number >= 0) + swap_i2i3 = true; + + goto validate_replacement; + } + } + } + /* If we already got a failure, don't try to do more. Otherwise, try to substitute I1 if we have it. */ diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 7da9d377120..9f0bd59ac41 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2160,6 +2160,46 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } } +/* Helper for INSN_COST. + + Copied from arc.cc + + Per Segher Boessenkool: rtx_costs computes the cost for any rtx (an + insn, a set, a set source, any random piece of one). set_src_cost, + set_rtx_cost, etc. are helper functions that use that. + + Those functions do not work for parallels. Also, costs are not + additive like this simplified model assumes. Also, more complex + backends tend to miss many cases in their rtx_costs function. + + Many passes that want costs want to know the cost of a full insn. Like + combine. That's why I created insn_cost: it solves all of the above + problems. */ + +static int +riscv_insn_cost (rtx_insn *insn, bool speed) +{ + rtx pat = PATTERN (insn); + + if (TARGET_RVC && !speed) { + if (GET_CODE(pat) == SET && GET_CODE(SET_DEST(pat)) == REG) { + rtx src = SET_SRC(pat); + rtx dest = SET_DEST(pat); + if (GET_CODE(src) == PLUS && GET_CODE(XEXP(src, 0)) == REG && REGNO(XEXP(src, 0)) == REGNO(dest)) { + if (GET_CODE(XEXP(src, 1)) == REG) + return 2; + else if (GET_CODE(XEXP(src, 1)) == CONST_INT && CMPRESD_OPERAND(INTVAL(XEXP(src, 1)))) + return 2; + } else if (GET_CODE(src) == ASHIFT && GET_CODE(XEXP(src, 0)) == REG && REGNO(dest) == REGNO(XEXP(src, 0))) { + if (GET_CODE(XEXP(src, 1)) == CONST_INT && UNSIGNED_CMPRESD_OPERAND(INTVAL(XEXP(src, 1)))) + return 2; + } + } + } + + return pattern_cost (pat, speed); +} + /* Implement TARGET_ADDRESS_COST. */ static int @@ -5618,6 +5658,8 @@ riscv_asan_shadow_offset (void) #define TARGET_RTX_COSTS riscv_rtx_costs #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST riscv_address_cost +#undef TARGET_INSN_COST +#define TARGET_INSN_COST riscv_insn_cost #undef TARGET_ASM_FILE_START #define TARGET_ASM_FILE_START riscv_file_start diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 8a4d2cf7f85..be5e77b1394 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -522,6 +522,13 @@ enum reg_class #define SMALL_OPERAND(VALUE) \ ((unsigned HOST_WIDE_INT) (VALUE) + IMM_REACH/2 < IMM_REACH) +#define CMPRESD_OPERAND(VALUE) \ + (VALUE < 32 && VALUE >= -32) + +/* True if VALUE is an unsigned 5-bit number. */ +#define UNSIGNED_CMPRESD_OPERAND(VALUE) \ + (VALUE < 64 && VALUE >= 0) + /* True if VALUE can be loaded into a register using LUI. */ #define LUI_OPERAND(VALUE) \