From patchwork Thu May 23 18:45:58 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Law <jlaw@ventanamicro.com>
X-Patchwork-Id: 1938534
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com
 header.a=rsa-sha256 header.s=google header.b=o5CI5xaL;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4VlcZt6hX2z1ynR
	for <incoming@patchwork.ozlabs.org>; Fri, 24 May 2024 04:46:30 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 11651386546B
	for <incoming@patchwork.ozlabs.org>; Thu, 23 May 2024 18:46:29 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com
 [IPv6:2607:f8b0:4864:20::634])
 by sourceware.org (Postfix) with ESMTPS id C01103858CD1
 for <gcc-patches@gcc.gnu.org>; Thu, 23 May 2024 18:46:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C01103858CD1
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=ventanamicro.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=ventanamicro.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C01103858CD1
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::634
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716489970; cv=none;
 b=Yt5t5rUUDmWJDmzUVE32GF+21kha6QHbw3O55dTU/xUhfRAN92SFCRyhq4OwpUPDKxYfej8vfT+9LP+VqovXdKVu8SPU0xU+7MAlDROYGkCL3zKcOk1w2jrHw4l69yor6d6LbUqWE4HtyOGdHv9vQjP7rsETfNekMPe5Yl9DNCM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1716489970; c=relaxed/simple;
 bh=/D2XmkRI23iKRCoeq84FbVy62WnFuGFwFcbcrvM1EFQ=;
 h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject;
 b=A9DWupwqGShJPXgxJEm2UAff/mirMTpAAMA3OeS8IMTozg945NumprUCEmAiFpiQwYGHByssNERacIKMxwYnRGyLYV3TgIEdtKRmToPIxH5tyF1oLe0MB80q9642+vpZ18Plk3Sw2LhenfTLxqN4F5mfjF3yUQ2dDK+auUWrjos=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pl1-x634.google.com with SMTP id
 d9443c01a7336-1ed96772f92so24155245ad.0
 for <gcc-patches@gcc.gnu.org>; Thu, 23 May 2024 11:46:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=ventanamicro.com; s=google; t=1716489961; x=1717094761; darn=gcc.gnu.org;
 h=subject:to:from:content-language:user-agent:mime-version:date
 :message-id:from:to:cc:subject:date:message-id:reply-to;
 bh=wuj1bgZF72tLIti+cTNPfD6A1kIMK2QpaAcwFuO65qE=;
 b=o5CI5xaLn1pxZvY88m5I72L1+HnfqPkD6bJhsdUSeTgnRe+XcZgJDG1JTzXtHjIW5L
 2OyZqYLkkzVAakaj+j86icbi+zYAj/CxsdHuV6CtZA5LhwzBzoy8AZBKr7UEvPDggKTA
 /gT5CFM1PshoNjr1VBQ06alXEnny6Hii+bx8EthwdgDcfiGdRAgOlEptavJIO8gkMid7
 1lcs8SYtEoszxZYEcUORlTEGHJ5kGT2uXaQMQ2El59ucCIWHR3qJlrptuSMF0szJcqCl
 GlR05uf2mDJTfZvB4zWew9adt4OBcuf5R2CNsvDFZNVe7cuJWlfhv0jYTvp8m1Ze9Is3
 sryA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1716489961; x=1717094761;
 h=subject:to:from:content-language:user-agent:mime-version:date
 :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=wuj1bgZF72tLIti+cTNPfD6A1kIMK2QpaAcwFuO65qE=;
 b=guSZ/cakfkxr817xLo7AqFCqLxZ64Ta8UJC0ba6wywfDi+3jDcJetr0+YB6jkvL47P
 i68SRg4Z9FXr0pdE8MA7zagLE/8WpDLxZnkuK1hvv8QiWBV3amZPOKZm5TgaDdzQCPkT
 IMzJpVKxDl0lwmaf0ASCORq7iE5Ur2ngR1LdAEYXBywpJ0MTVL4clmNGLkxQj7wtnlrU
 o8jR3M8tKg4mFUng1jtQ90E/3jczNQv4bMiSnfPD9ddlsjZ8MtSSa5QiJm7+xH/wwbIJ
 Bu2Rj7d6qLakEaARzBQTc4AKujqpmxFCP0OWhxvr06BvCFR4K+V1SVtOxVTVQG9fhvK3
 0Aeg==
X-Gm-Message-State: AOJu0YxSg1vs4upV2l25eQTEFH2g7yZi6wTGA8SbU7sOj04T38l5yBoC
 g71mIttUgXe+2w8JjwtUHa0k7WqwGfIWJoj4Q5Scg0eRneQB3v2awyHYOHRdam1GlfHR1Ubxu3+
 h
X-Google-Smtp-Source: 
 AGHT+IGbHj2Z1JbsOl6VpxOUqOXkk9W/watCkZSN+YHo/inpieOvuPiP5/Z9wCtpHg1pZJLayRHsVA==
X-Received: by 2002:a17:902:cec8:b0:1ec:c6bd:4157 with SMTP id
 d9443c01a7336-1f4497df6d3mr2324895ad.59.1716489960626;
 Thu, 23 May 2024 11:46:00 -0700 (PDT)
Received: from [172.31.0.109] ([136.36.72.243])
 by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1f32a24e0ddsm31226305ad.253.2024.05.23.11.45.59
 for <gcc-patches@gcc.gnu.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 23 May 2024 11:46:00 -0700 (PDT)
Message-ID: <4c22a73c-4d09-4a17-bbe4-d31e876b1f69@ventanamicro.com>
Date: Thu, 23 May 2024 12:45:58 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
Content-Language: en-US
From: Jeff Law <jlaw@ventanamicro.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [to-be-committed] [RISC-V] Use bclri in constant synthesis
X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

So this is conceptually similar to how we handled direct generation of 
bseti for constant synthesis, but this time for bclr.

In the bclr case, we already have an expander for AND.  So we just 
needed to adjust the predicate to accept another class of constant 
operands (those with a single bit clear).

With that in place constant synthesis is adjusted so that it counts the 
number of bits clear in the high 33 bits of a 64bit word.  If that 
number is small relative to the current best cost, then we try to 
generate the constant with a lui based sequence for the low half which 
implicitly sets the upper 32 bits as well.  Then we bclri one or more of 
those upper 33 bits.

So as an example, this code goes from 4 instructions down to 3:

> unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; }


Note the use of 33 bits above.  That's meant to capture cases like this:


> unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; }


We can use lui+addi+bclri+bclri to synthesize that in 4 instructions 
instead of 5.


I'm including a handful of cases covering the two basic ideas above that 
were found by the testing code.

And, no, we're not done yet.  I see at least one more notable idiom 
missing before exploring zbkb's potential to improve things.

Tested in my tester and waiting on Rivos CI system before moving forward.

jeff
gcc/

	* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to..
	(arith_or_mode_mask_or_zbs_operand): New predicate.
	* config/riscv/riscv.md (and<mode>3): Update predicate for operand 2.
	* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
	bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

	* gcc.target/riscv/synthesis-6.c: New test.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..c1c693c7617 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
        (match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-       (and (match_code "const_int")
-            (match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-			 || UINTVAL (op) == GET_MODE_MASK (SImode)"))))
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
        (match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
        (match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+       (and (match_test "TARGET_ZBS")
+	    (match_operand 0 "not_single_bit_mask_operand"))
+       (and (match_code "const_int")
+            (match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+			 || UINTVAL (op) == GET_MODE_MASK (SImode)"))))
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
        (match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..3b32b515fac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS],
 	  codes[1].use_uw = false;
 	  cost = 2;
 	}
+
+      /* If LUI/ADDI are going to set bits 32..63 and we need a small
+	 number of them cleared, we might be able to use bclri profitably. 
+
+	 Note we may allow clearing of bit 31 using bclri.  There's a class
+	 of constants with that bit clear where this helps.  */
+      else if (TARGET_64BIT
+	       && TARGET_ZBS
+	       && (32 - popcount_hwi (value & HOST_WIDE_INT_C (0xffffffff80000000))) + 1 < cost)
+	{
+	  /* Turn on all those upper bits and synthesize the result.  */
+	  HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0xffffffff80000000);
+	  alt_cost = riscv_build_integer_1 (alt_codes, nval, mode);
+
+	  /* Now iterate over the bits we want to clear until the cost is
+	     too high or we're done.  */
+	  nval = value ^ HOST_WIDE_INT_C (-1);
+	  nval &= HOST_WIDE_INT_C (~0x7fffffff);
+	  while (nval && alt_cost < cost)
+	    {
+	      HOST_WIDE_INT bit = ctz_hwi (nval);
+	      alt_codes[alt_cost].code = AND;
+	      alt_codes[alt_cost].value = ~(1UL << bit);
+	      alt_codes[alt_cost].use_uw = false;
+	      alt_cost++;
+	      nval &= ~(1UL << bit);
+	    }
+
+	  if (alt_cost <= cost)
+	    {
+	      memcpy (codes, alt_codes, sizeof (alt_codes));
+	      cost = alt_cost;
+	    }
+	}
     }
 
   if (cost > 2 && TARGET_64BIT && TARGET_ZBA)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 78c16adee98..1bef1d67efa 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1648,7 +1648,7 @@ (define_insn "smax<mode>3"
 (define_expand "and<mode>3"
   [(set (match_operand:X                0 "register_operand")
         (and:X (match_operand:X 1 "register_operand")
-                       (match_operand:X 2 "arith_operand_or_mode_mask")))]
+                       (match_operand:X 2 "arith_or_mode_mask_or_zbs_operand")))]
   ""
 {
   /* If the second operand is a mode mask, emit an extension
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-6.c b/gcc/testsuite/gcc.target/riscv/synthesis-6.c
new file mode 100644
index 00000000000..65cf748f4b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-6.c
@@ -0,0 +1,95 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions. 
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|bclri|li|ret|sh1add|sh2add|sh3add|slli)" 156 } } */
+
+
+unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; }
+
+unsigned long foo_0xfffffff7fffff7ff(void) { return 0xfffffff7fffff7ffUL; }
+
+unsigned long foo_0xffffffeffffff7ff(void) { return 0xffffffeffffff7ffUL; }
+
+unsigned long foo_0xffffffdffffff7ff(void) { return 0xffffffdffffff7ffUL; }
+
+unsigned long foo_0xffffffbffffff7ff(void) { return 0xffffffbffffff7ffUL; }
+
+unsigned long foo_0xffffff7ffffff7ff(void) { return 0xffffff7ffffff7ffUL; }
+
+unsigned long foo_0xfffffefffffff7ff(void) { return 0xfffffefffffff7ffUL; }
+
+unsigned long foo_0xfffffdfffffff7ff(void) { return 0xfffffdfffffff7ffUL; }
+
+unsigned long foo_0xfffffbfffffff7ff(void) { return 0xfffffbfffffff7ffUL; }
+
+unsigned long foo_0xfffff7fffffff7ff(void) { return 0xfffff7fffffff7ffUL; }
+
+unsigned long foo_0xffffeffffffff7ff(void) { return 0xffffeffffffff7ffUL; }
+
+unsigned long foo_0xffffdffffffff7ff(void) { return 0xffffdffffffff7ffUL; }
+
+unsigned long foo_0xffffbffffffff7ff(void) { return 0xffffbffffffff7ffUL; }
+
+unsigned long foo_0xffff7ffffffff7ff(void) { return 0xffff7ffffffff7ffUL; }
+
+unsigned long foo_0xfffefffffffff7ff(void) { return 0xfffefffffffff7ffUL; }
+
+unsigned long foo_0xfffdfffffffff7ff(void) { return 0xfffdfffffffff7ffUL; }
+
+unsigned long foo_0xfffbfffffffff7ff(void) { return 0xfffbfffffffff7ffUL; }
+
+unsigned long foo_0xfff7fffffffff7ff(void) { return 0xfff7fffffffff7ffUL; }
+
+unsigned long foo_0xffeffffffffff7ff(void) { return 0xffeffffffffff7ffUL; }
+
+unsigned long foo_0xffdffffffffff7ff(void) { return 0xffdffffffffff7ffUL; }
+
+unsigned long foo_0xffbffffffffff7ff(void) { return 0xffbffffffffff7ffUL; }
+
+unsigned long foo_0xff7ffffffffff7ff(void) { return 0xff7ffffffffff7ffUL; }
+
+unsigned long foo_0xfefffffffffff7ff(void) { return 0xfefffffffffff7ffUL; }
+
+unsigned long foo_0xfdfffffffffff7ff(void) { return 0xfdfffffffffff7ffUL; }
+
+unsigned long foo_0xfbfffffffffff7ff(void) { return 0xfbfffffffffff7ffUL; }
+
+unsigned long foo_0xf7fffffffffff7ff(void) { return 0xf7fffffffffff7ffUL; }
+
+unsigned long foo_0xeffffffffffff7ff(void) { return 0xeffffffffffff7ffUL; }
+
+unsigned long foo_0xdffffffffffff7ff(void) { return 0xdffffffffffff7ffUL; }
+
+unsigned long foo_0xbffffffffffff7ff(void) { return 0xbffffffffffff7ffUL; }
+
+unsigned long foo_0xffffffff7fffd7ff(void) { return 0xffffffff7fffd7ffUL; }
+
+unsigned long foo_0xffffffff7ffdf7ff(void) { return 0xffffffff7ffdf7ffUL; }
+
+unsigned long foo_0xffffffff7fdff7ff(void) { return 0xffffffff7fdff7ffUL; }
+
+unsigned long foo_0xffffffff7dfff7ff(void) { return 0xffffffff7dfff7ffUL; }
+
+unsigned long foo_0xffffffff5ffff7ff(void) { return 0xffffffff5ffff7ffUL; }
+
+unsigned long foo_0xfffff7ff7ffff7ff(void) { return 0xfffff7ff7ffff7ffUL; }
+
+unsigned long foo_0xffffdfff7ffff7ff(void) { return 0xffffdfff7ffff7ffUL; }
+
+unsigned long foo_0xffff7fff7ffff7ff(void) { return 0xffff7fff7ffff7ffUL; }
+
+unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; }
+
+
+