From patchwork Fri May 24 04:44:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1938778 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=Gh9nHUiE; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vlss51t4tz1ydW for ; Fri, 24 May 2024 14:44:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 953193865488 for ; Fri, 24 May 2024 04:44:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id 5CBEB3858D33 for ; Fri, 24 May 2024 04:44:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5CBEB3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5CBEB3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::430 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716525851; cv=none; b=kpJIlpzKJTcK/0jM5rLoujUm+scDrO+B9TY7oYiHahyPN2az0643sYYq2+8ubJx8t8x9vE7wQBSZ51fEW7qRIWAp3yUUT/y0GyVVszNHNdPgQOsAVzVNchoJZZ+jo+v4nHuPKitzpSsklRAzTCC3vgFNl2S6Gvn46f2SAMrkRnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716525851; c=relaxed/simple; bh=s3Ov3SOAtCJgJRRmsxnaqf1tqzEZo5vYGtcnm3fzqQ0=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=A1b0m8y6qviXw0+d6fxa7CFWhRgKTD5IW3QUauNWznKgQeJAWdnUQQPBFmzNScb2xflDkFnOA9z1+h+fM6MITMVFBqyjTaeenM4bxQUWvHXbBlBEPWSax1f32WklpiHBs8ncw4k802TvGYQL5HWUjlSPxD2mcA79KZFJ0bzmX0Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-6f5053dc057so4185166b3a.2 for ; Thu, 23 May 2024 21:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716525848; x=1717130648; darn=gcc.gnu.org; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=/TrRBeCnHQ/N510A9UkTJlcBA4Z/r0oZRuN5d/9ffBY=; b=Gh9nHUiEkh6AJBiFDov2NSlIHMdycR6Pp+1sABPPk0mhxjZLrWogMwkdiVaP8/AW5q 1F7XiBqcqmbJZj0YmP7avg1l+mq6KVUJmbdQZ1mc46t7guvL9OLoXmn/DKvl8GVrzFPo ye0AytT8HBCTW2G5ZlFxxoqwMG2x3D/7EiifxNfqKVku7H9NMxUHKGnYUmxMnfDh656f GzV6Vl5Asjb/lvU2Rw2Dm8htFsr7kwVjla4tyvv64Yrbn0W4MNgNt/bdkgBQoIVx+1fP X6W4tLQK5zMW7/5jKZuU/Y1X7X95LGiIBCXE+GDtAGE0kkb/OHrSDiJ5mfnXM2T/rNFF QiNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716525848; x=1717130648; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/TrRBeCnHQ/N510A9UkTJlcBA4Z/r0oZRuN5d/9ffBY=; b=UZw6YeH+vFm/WsORPeVNVL5X/XLZN1fFu9fezxTDdqVso6lhLqlsPGvD0uN86QilTI Prdqes8t45lqyemVWyjx8pUZ2Wy/tKuBZNkBFdp9hceBqy4rLICGbg29mGLvHNSfk36g Gi7arzZlBSJ5LncfJdz14Nfg3FazK+vPDHxebiox6nbzEe2GUailDpOU5H9BOzWp/IDS fAYyyjbsAf0Qpn8JARh0l8CGxulpHnV2t3NOo5Au2ZGXQfgdNSoQo9dbqopyY5Ui266a pWwwv3hprCDmUSLCWcdrXf8tGVOVyedYbiQZH6NWVtOPo/KoA+dyTYBY+GqU/1joNy32 KyBw== X-Gm-Message-State: AOJu0Yw44ANTzMW/5puAew7toHPy2OGSp3IZYcVWskUeN8rN7PWZYptu +KdiPuQRkWCZJpn3DO5Q8gRqAdZTIUYjeT0/5N0h7muA42eOcULP52zgxsIq X-Google-Smtp-Source: AGHT+IHpfJOyEZmD1PHLm/h/xD8DSfQ/2KO496OrPT41+rGflK2aBiV4Gpvos6bNUWcWXULuZsM1pw== X-Received: by 2002:a05:6a00:1d8f:b0:6f6:7a47:3686 with SMTP id d2e1a72fcca58-6f8f35dac57mr1302593b3a.9.1716525847755; Thu, 23 May 2024 21:44:07 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-6f8fcbea493sm384160b3a.131.2024.05.23.21.44.06 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 May 2024 21:44:07 -0700 (PDT) Message-ID: <023c8378-f024-494d-9efc-c9c9399d87a4@gmail.com> Date: Thu, 23 May 2024 22:44:05 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law Subject: [to-be-committed][v2][RISC-V] Use bclri in constant synthesis To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Testing with Zbs enabled by default showed a minor logic error. After the loop clearing things with bclri, we can only use the sequence if we were able to clear all the necessary bits. If any bits are still on, then the bclr sequence turned out to not be profitable. --- So this is conceptually similar to how we handled direct generation of bseti for constant synthesis, but this time for bclr. In the bclr case, we already have an expander for AND. So we just needed to adjust the predicate to accept another class of constant operands (those with a single bit clear). With that in place constant synthesis is adjusted so that it counts the number of bits clear in the high 33 bits of a 64bit word. If that number is small relative to the current best cost, then we try to generate the constant with a lui based sequence for the low half which implicitly sets the upper 32 bits as well. Then we bclri one or more of those upper 33 bits. So as an example, this code goes from 4 instructions down to 3: > unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; } Note the use of 33 bits above. That's meant to capture cases like this: > unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; } We can use lui+addi+bclri+bclri to synthesize that in 4 instructions instead of 5. I'm including a handful of cases covering the two basic ideas above that were found by the testing code. And, no, we're not done yet. I see at least one more notable idiom missing before exploring zbkb's potential to improve things. Tested in my tester and waiting on Rivos CI system before moving forward. gcc/ * config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to.. (arith_or_mode_mask_or_zbs_operand): New predicate. * config/riscv/riscv.md (and3): Update predicate for operand 2. * config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear bits, particularly bits 31..63 when profitable to do so. gcc/testsuite/ * gcc.target/riscv/synthesis-6.c: New test. diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 8948fbfc363..c1c693c7617 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -27,12 +27,6 @@ (define_predicate "arith_operand" (ior (match_operand 0 "const_arith_operand") (match_operand 0 "register_operand"))) -(define_predicate "arith_operand_or_mode_mask" - (ior (match_operand 0 "arith_operand") - (and (match_code "const_int") - (match_test "UINTVAL (op) == GET_MODE_MASK (HImode) - || UINTVAL (op) == GET_MODE_MASK (SImode)")))) - (define_predicate "lui_operand" (and (match_code "const_int") (match_test "LUI_OPERAND (INTVAL (op))"))) @@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand" (and (match_code "const_int") (match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))"))) +(define_predicate "arith_or_mode_mask_or_zbs_operand" + (ior (match_operand 0 "arith_operand") + (and (match_test "TARGET_ZBS") + (match_operand 0 "not_single_bit_mask_operand")) + (and (match_code "const_int") + (match_test "UINTVAL (op) == GET_MODE_MASK (HImode) + || UINTVAL (op) == GET_MODE_MASK (SImode)")))) + (define_predicate "const_si_mask_operand" (and (match_code "const_int") (match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1)) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 85df5b7ab49..3b32b515fac 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], codes[1].use_uw = false; cost = 2; } + + /* If LUI/ADDI are going to set bits 32..63 and we need a small + number of them cleared, we might be able to use bclri profitably. + + Note we may allow clearing of bit 31 using bclri. There's a class + of constants with that bit clear where this helps. */ + else if (TARGET_64BIT + && TARGET_ZBS + && (32 - popcount_hwi (value & HOST_WIDE_INT_C (0xffffffff80000000))) + 1 < cost) + { + /* Turn on all those upper bits and synthesize the result. */ + HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0xffffffff80000000); + alt_cost = riscv_build_integer_1 (alt_codes, nval, mode); + + /* Now iterate over the bits we want to clear until the cost is + too high or we're done. */ + nval = value ^ HOST_WIDE_INT_C (-1); + nval &= HOST_WIDE_INT_C (~0x7fffffff); + while (nval && alt_cost < cost) + { + HOST_WIDE_INT bit = ctz_hwi (nval); + alt_codes[alt_cost].code = AND; + alt_codes[alt_cost].value = ~(1UL << bit); + alt_codes[alt_cost].use_uw = false; + alt_cost++; + nval &= ~(1UL << bit); + } + + if (nval == 0 && alt_cost <= cost) + { + memcpy (codes, alt_codes, sizeof (alt_codes)); + cost = alt_cost; + } + } } if (cost > 2 && TARGET_64BIT && TARGET_ZBA) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 78c16adee98..1bef1d67efa 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1648,7 +1648,7 @@ (define_insn "smax3" (define_expand "and3" [(set (match_operand:X 0 "register_operand") (and:X (match_operand:X 1 "register_operand") - (match_operand:X 2 "arith_operand_or_mode_mask")))] + (match_operand:X 2 "arith_or_mode_mask_or_zbs_operand")))] "" { /* If the second operand is a mode mask, emit an extension diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-6.c b/gcc/testsuite/gcc.target/riscv/synthesis-6.c new file mode 100644 index 00000000000..65cf748f4b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-6.c @@ -0,0 +1,95 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|bclri|li|ret|sh1add|sh2add|sh3add|slli)" 156 } } */ + + +unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; } + +unsigned long foo_0xfffffff7fffff7ff(void) { return 0xfffffff7fffff7ffUL; } + +unsigned long foo_0xffffffeffffff7ff(void) { return 0xffffffeffffff7ffUL; } + +unsigned long foo_0xffffffdffffff7ff(void) { return 0xffffffdffffff7ffUL; } + +unsigned long foo_0xffffffbffffff7ff(void) { return 0xffffffbffffff7ffUL; } + +unsigned long foo_0xffffff7ffffff7ff(void) { return 0xffffff7ffffff7ffUL; } + +unsigned long foo_0xfffffefffffff7ff(void) { return 0xfffffefffffff7ffUL; } + +unsigned long foo_0xfffffdfffffff7ff(void) { return 0xfffffdfffffff7ffUL; } + +unsigned long foo_0xfffffbfffffff7ff(void) { return 0xfffffbfffffff7ffUL; } + +unsigned long foo_0xfffff7fffffff7ff(void) { return 0xfffff7fffffff7ffUL; } + +unsigned long foo_0xffffeffffffff7ff(void) { return 0xffffeffffffff7ffUL; } + +unsigned long foo_0xffffdffffffff7ff(void) { return 0xffffdffffffff7ffUL; } + +unsigned long foo_0xffffbffffffff7ff(void) { return 0xffffbffffffff7ffUL; } + +unsigned long foo_0xffff7ffffffff7ff(void) { return 0xffff7ffffffff7ffUL; } + +unsigned long foo_0xfffefffffffff7ff(void) { return 0xfffefffffffff7ffUL; } + +unsigned long foo_0xfffdfffffffff7ff(void) { return 0xfffdfffffffff7ffUL; } + +unsigned long foo_0xfffbfffffffff7ff(void) { return 0xfffbfffffffff7ffUL; } + +unsigned long foo_0xfff7fffffffff7ff(void) { return 0xfff7fffffffff7ffUL; } + +unsigned long foo_0xffeffffffffff7ff(void) { return 0xffeffffffffff7ffUL; } + +unsigned long foo_0xffdffffffffff7ff(void) { return 0xffdffffffffff7ffUL; } + +unsigned long foo_0xffbffffffffff7ff(void) { return 0xffbffffffffff7ffUL; } + +unsigned long foo_0xff7ffffffffff7ff(void) { return 0xff7ffffffffff7ffUL; } + +unsigned long foo_0xfefffffffffff7ff(void) { return 0xfefffffffffff7ffUL; } + +unsigned long foo_0xfdfffffffffff7ff(void) { return 0xfdfffffffffff7ffUL; } + +unsigned long foo_0xfbfffffffffff7ff(void) { return 0xfbfffffffffff7ffUL; } + +unsigned long foo_0xf7fffffffffff7ff(void) { return 0xf7fffffffffff7ffUL; } + +unsigned long foo_0xeffffffffffff7ff(void) { return 0xeffffffffffff7ffUL; } + +unsigned long foo_0xdffffffffffff7ff(void) { return 0xdffffffffffff7ffUL; } + +unsigned long foo_0xbffffffffffff7ff(void) { return 0xbffffffffffff7ffUL; } + +unsigned long foo_0xffffffff7fffd7ff(void) { return 0xffffffff7fffd7ffUL; } + +unsigned long foo_0xffffffff7ffdf7ff(void) { return 0xffffffff7ffdf7ffUL; } + +unsigned long foo_0xffffffff7fdff7ff(void) { return 0xffffffff7fdff7ffUL; } + +unsigned long foo_0xffffffff7dfff7ff(void) { return 0xffffffff7dfff7ffUL; } + +unsigned long foo_0xffffffff5ffff7ff(void) { return 0xffffffff5ffff7ffUL; } + +unsigned long foo_0xfffff7ff7ffff7ff(void) { return 0xfffff7ff7ffff7ffUL; } + +unsigned long foo_0xffffdfff7ffff7ff(void) { return 0xffffdfff7ffff7ffUL; } + +unsigned long foo_0xffff7fff7ffff7ff(void) { return 0xffff7fff7ffff7ffUL; } + +unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; } + + +