From patchwork Fri Oct 11 13:25:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeffrey Law X-Patchwork-Id: 1996128 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=bQ3ZNCCr; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XQ6p62MVmz1xtp for ; Sat, 12 Oct 2024 00:26:04 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 696563857358 for ; Fri, 11 Oct 2024 13:26:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id E9DE43858D26 for ; Fri, 11 Oct 2024 13:25:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E9DE43858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E9DE43858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::429 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728653144; cv=none; b=vlIBOkOrC/4pHWM1aN0AzS2yAgKAn1Fwp3n0ON4rDV427cUos2BfqztL4mpnOyJ/s8+mcImIKjBZbldVv8mvBmG/RsvIrVqQtMtKSR3PdVdDdh251X+yUphBvOyF7XiAVtJwk4SNlza0MKz2t00hw4NYDw8sVKE13X3Fe7AWzSs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728653144; c=relaxed/simple; bh=jQWmZL1iOoFJxPGiJP49OTf5LKRWhMvFy5xlF4SgcSQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=mlnXlt/GY4n3PkIuNnvWrTp3b5ZpIjbd/qu6sbndOXb9J/y5M9IKq2V2rkF1xfqa+m3ID/RGcYsY3nBGgpi1OTT4c7jzxPEw8qESNGa5+KHv7dYIeeYwpRl+8ZxlLXBNS6JctQH8vYZ5xxjWn94PvLGhJeLc9sCgIK1ogC/IY8Q= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-71e427e29c7so250615b3a.3 for ; Fri, 11 Oct 2024 06:25:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1728653140; x=1729257940; darn=gcc.gnu.org; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=5yXDTqPZC0g0aCQaw9Iok0KLTz30FMSRLz8H54hQgYU=; b=bQ3ZNCCrp/cgcrzAmYGUIQceNmm7SVpp3ymiudSeUnGfXEglVwnuwsjho3RfuDwwzn xruequ0EPaNmr5NXowhO42+MOQurm4ZIb29qphDi30VruLSAuNY6ajtvHrwJYeCmJnXf huVgjUXw7PHlmM5nxLBeGbb9P+d8457pjPBIPa51XPp3kGfJoi7JT/OsSuB3l88vgqGz AmmAUKcT/CJbebgvAiqji1thciawjIY8ASqDvUG89B25vjzRZkJmuGbRVO+eJJPOgudd LNFuXnwQtjz4tAej7IeGYG4QryujJvr6MsQYAUmkWc0ZtzXiK7sMop0TUdAvXD14LjfL yh/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728653140; x=1729257940; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5yXDTqPZC0g0aCQaw9Iok0KLTz30FMSRLz8H54hQgYU=; b=TJAcKwJyrPLf/lfDy/1dLV5+wxLHOPyBaYt1q4ifHOomlJc9rwYP3UGVUXB4vHEQw2 Fumu6rescrHg59T6iBdY2SSxTWYH10k0bq51JNUYAwZmJ/MJvtli1wg88gVOUNVrm3qx NJPsFK9T7r8+3Sb3SOdLGaHHGOS/Q7+q2r+KjJi7FzPuVNLp5VeBkk4WdCl8FMQGjN32 Q1dNvLNveFB3cIkkApKR8G/BrmFKGQuoZUplKAjc4JxM7CK/LRkQr7Pe0CHUEXbxnfP9 48NPFKaTaHXxlHyo/59NgQlndu9sGGKlo9fcSLVL2sXGaL66FNJOexDp4itlNPkElnmI lVIg== X-Gm-Message-State: AOJu0YxgbqmMeAffuBom2G3WgByCr6x9+yt8HORBa5j15+9P6zZzKAAr tTn4aoavdfhwpRvu/Oxan1GIQ/kbmvNzbOpc7O9MWbjt5OyiNzc8M8LJ2vIwqMIgiOSECl3s5EO / X-Google-Smtp-Source: AGHT+IHs4KQ46skM54qFeOpP6kncX1EgDmszYH1srJOi7ooUtRTmjzWHKujbiciYwAZpVLWSqkAItg== X-Received: by 2002:a05:6a00:2e21:b0:71d:f15e:d026 with SMTP id d2e1a72fcca58-71e37e2f5eamr4483671b3a.3.1728653140336; Fri, 11 Oct 2024 06:25:40 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e47eab194sm357299b3a.23.2024.10.11.06.25.38 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Oct 2024 06:25:39 -0700 (PDT) Message-ID: <25c5da43-6da0-41bb-a3e8-9c50f73ad90f@ventanamicro.com> Date: Fri, 11 Oct 2024 07:25:35 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Jeff Law To: "gcc-patches@gcc.gnu.org" Subject: [to-be-committed][RISC-V] Slightly improve broadcasting small constants into vectors X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org I probably spent way more time on this than it's worth... I was looking at the code we generate for vector SAD and noticed that we were being a bit silly. Specifically: li a4,0 # 272 [c=4 l=4] *movsi_internal/1 Followed shortly by: vmv.s.x v3,a4 # 261 [c=4 l=4] *pred_broadcastrvvm1si/6 And no other uses of a4. We could have used x0 trivially. First we adjust the expander so that it doesn't force the constant into a register. In the matching pattern we change the appropriate source constraints from "r" to "rJ" and the output template is changed to use %z for the operand. The net is we drop the li completely and emit vmv.s.x,v3,x0. But wait, there's more. If we're broadcasting a constant in the range [-16..15] into a vector, we currently load the constant into a register and use vmv.v.r. We can instead use vmv.v.i, which avoids loading the constant into a GPR. For that case we again avoid forcing the constant into a register in the expander and adjust the output template to emit vmv.v.x or vmv.v.i based on whether or not the appropriate operand is a constant or general purpose register. So again, we'll drop a load immediate into a scalar for this case. Whether or not we should use vmv.v.i vs vmv.s.x for loading [-16..15] into the 0th element is probably uarch dependent. The tradeoff is loading the GPR vs the broadcast in the vector unit. I didn't bother with this case. Tested in my tester (which tests rv64gcv as a default codegen option). Will wait for the pre-commit tester to render a verdict. Jeff * config/riscv/constraints.md (P): New constraint for constant integers -16..15. * config/riscv/vector.md (pred_broadcast expander): Do not force constants into registers quite so aggressively. (pred_broadcast insn & splitter): Adjust constraints to allow constants in a few cases and adjust output appropriately. diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 45f8e9602d2..9638942b733 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -70,6 +70,11 @@ (define_constraint "c08" (and (match_code "const_int") (match_test "ival == 8"))) +(define_constraint "P" + "A 5-bit signed immediate for vmv.v.i." + (and (match_code "const_int") + (match_test "IN_RANGE (ival, -16, 15)"))) + (define_constraint "K" "A 5-bit unsigned immediate for CSR access instructions." (and (match_code "const_int") diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 7c8780dc7c7..b3038087aa5 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -2118,6 +2118,16 @@ (define_expand "@pred_broadcast" emit_move_insn (tmp, gen_int_mode (value, Pmode)); operands[3] = gen_rtx_SIGN_EXTEND (mode, tmp); } + /* Never load (const_int 0) into a register, that's silly. */ + else if (operands[3] == CONST0_RTX (mode)) + ; + /* If we're broadcasting [-16..15] across more than just + element 0, then we can use vmv.v.i directly, thus avoiding + the load of the constant into a GPR. */ + else if (CONST_INT_P (operands[3]) + && IN_RANGE (INTVAL (operands[3]), -16, 15) + && !satisfies_constraint_Wb1 (operands[1])) + ; else operands[3] = force_reg (mode, operands[3]); }) @@ -2134,18 +2144,18 @@ (define_insn_and_split "*pred_broadcast" (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (vec_duplicate:V_VLSI - (match_operand: 3 "direct_broadcast_operand" " r, r,Wdm,Wdm,Wdm,Wdm, r, r")) - (match_operand:V_VLSI 2 "vector_merge_operand" "vu, 0, vu, 0, vu, 0, vu, 0")))] + (match_operand: 3 "direct_broadcast_operand" "rP,rP,Wdm,Wdm,Wdm,Wdm, rJ, rJ")) + (match_operand:V_VLSI 2 "vector_merge_operand" "vu, 0, vu, 0, vu, 0, vu, 0")))] "TARGET_VECTOR" "@ - vmv.v.x\t%0,%3 - vmv.v.x\t%0,%3 + vmv.v.%o3\t%0,%3 + vmv.v.%o3\t%0,%3 vlse.v\t%0,%3,zero,%1.t vlse.v\t%0,%3,zero,%1.t vlse.v\t%0,%3,zero vlse.v\t%0,%3,zero - vmv.s.x\t%0,%3 - vmv.s.x\t%0,%3" + vmv.s.x\t%0,%z3 + vmv.s.x\t%0,%z3" "(register_operand (operands[3], mode) || CONST_POLY_INT_P (operands[3])) && GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode)"