From patchwork Thu May 23 18:45:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1938534 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=o5CI5xaL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VlcZt6hX2z1ynR for ; Fri, 24 May 2024 04:46:30 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 11651386546B for ; Thu, 23 May 2024 18:46:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id C01103858CD1 for ; Thu, 23 May 2024 18:46:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C01103858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C01103858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::634 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716489970; cv=none; b=Yt5t5rUUDmWJDmzUVE32GF+21kha6QHbw3O55dTU/xUhfRAN92SFCRyhq4OwpUPDKxYfej8vfT+9LP+VqovXdKVu8SPU0xU+7MAlDROYGkCL3zKcOk1w2jrHw4l69yor6d6LbUqWE4HtyOGdHv9vQjP7rsETfNekMPe5Yl9DNCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716489970; c=relaxed/simple; bh=/D2XmkRI23iKRCoeq84FbVy62WnFuGFwFcbcrvM1EFQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=A9DWupwqGShJPXgxJEm2UAff/mirMTpAAMA3OeS8IMTozg945NumprUCEmAiFpiQwYGHByssNERacIKMxwYnRGyLYV3TgIEdtKRmToPIxH5tyF1oLe0MB80q9642+vpZ18Plk3Sw2LhenfTLxqN4F5mfjF3yUQ2dDK+auUWrjos= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1ed96772f92so24155245ad.0 for ; Thu, 23 May 2024 11:46:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1716489961; x=1717094761; darn=gcc.gnu.org; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=wuj1bgZF72tLIti+cTNPfD6A1kIMK2QpaAcwFuO65qE=; b=o5CI5xaLn1pxZvY88m5I72L1+HnfqPkD6bJhsdUSeTgnRe+XcZgJDG1JTzXtHjIW5L 2OyZqYLkkzVAakaj+j86icbi+zYAj/CxsdHuV6CtZA5LhwzBzoy8AZBKr7UEvPDggKTA /gT5CFM1PshoNjr1VBQ06alXEnny6Hii+bx8EthwdgDcfiGdRAgOlEptavJIO8gkMid7 1lcs8SYtEoszxZYEcUORlTEGHJ5kGT2uXaQMQ2El59ucCIWHR3qJlrptuSMF0szJcqCl GlR05uf2mDJTfZvB4zWew9adt4OBcuf5R2CNsvDFZNVe7cuJWlfhv0jYTvp8m1Ze9Is3 sryA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716489961; x=1717094761; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wuj1bgZF72tLIti+cTNPfD6A1kIMK2QpaAcwFuO65qE=; b=guSZ/cakfkxr817xLo7AqFCqLxZ64Ta8UJC0ba6wywfDi+3jDcJetr0+YB6jkvL47P i68SRg4Z9FXr0pdE8MA7zagLE/8WpDLxZnkuK1hvv8QiWBV3amZPOKZm5TgaDdzQCPkT IMzJpVKxDl0lwmaf0ASCORq7iE5Ur2ngR1LdAEYXBywpJ0MTVL4clmNGLkxQj7wtnlrU o8jR3M8tKg4mFUng1jtQ90E/3jczNQv4bMiSnfPD9ddlsjZ8MtSSa5QiJm7+xH/wwbIJ Bu2Rj7d6qLakEaARzBQTc4AKujqpmxFCP0OWhxvr06BvCFR4K+V1SVtOxVTVQG9fhvK3 0Aeg== X-Gm-Message-State: AOJu0YxSg1vs4upV2l25eQTEFH2g7yZi6wTGA8SbU7sOj04T38l5yBoC g71mIttUgXe+2w8JjwtUHa0k7WqwGfIWJoj4Q5Scg0eRneQB3v2awyHYOHRdam1GlfHR1Ubxu3+ h X-Google-Smtp-Source: AGHT+IGbHj2Z1JbsOl6VpxOUqOXkk9W/watCkZSN+YHo/inpieOvuPiP5/Z9wCtpHg1pZJLayRHsVA== X-Received: by 2002:a17:902:cec8:b0:1ec:c6bd:4157 with SMTP id d9443c01a7336-1f4497df6d3mr2324895ad.59.1716489960626; Thu, 23 May 2024 11:46:00 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f32a24e0ddsm31226305ad.253.2024.05.23.11.45.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 May 2024 11:46:00 -0700 (PDT) Message-ID: <4c22a73c-4d09-4a17-bbe4-d31e876b1f69@ventanamicro.com> Date: Thu, 23 May 2024 12:45:58 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law To: "gcc-patches@gcc.gnu.org" Subject: [to-be-committed] [RISC-V] Use bclri in constant synthesis X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org So this is conceptually similar to how we handled direct generation of bseti for constant synthesis, but this time for bclr. In the bclr case, we already have an expander for AND. So we just needed to adjust the predicate to accept another class of constant operands (those with a single bit clear). With that in place constant synthesis is adjusted so that it counts the number of bits clear in the high 33 bits of a 64bit word. If that number is small relative to the current best cost, then we try to generate the constant with a lui based sequence for the low half which implicitly sets the upper 32 bits as well. Then we bclri one or more of those upper 33 bits. So as an example, this code goes from 4 instructions down to 3: > unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; } Note the use of 33 bits above. That's meant to capture cases like this: > unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; } We can use lui+addi+bclri+bclri to synthesize that in 4 instructions instead of 5. I'm including a handful of cases covering the two basic ideas above that were found by the testing code. And, no, we're not done yet. I see at least one more notable idiom missing before exploring zbkb's potential to improve things. Tested in my tester and waiting on Rivos CI system before moving forward. jeff gcc/ * config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to.. (arith_or_mode_mask_or_zbs_operand): New predicate. * config/riscv/riscv.md (and3): Update predicate for operand 2. * config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear bits, particularly bits 31..63 when profitable to do so. gcc/testsuite/ * gcc.target/riscv/synthesis-6.c: New test. diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 8948fbfc363..c1c693c7617 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -27,12 +27,6 @@ (define_predicate "arith_operand" (ior (match_operand 0 "const_arith_operand") (match_operand 0 "register_operand"))) -(define_predicate "arith_operand_or_mode_mask" - (ior (match_operand 0 "arith_operand") - (and (match_code "const_int") - (match_test "UINTVAL (op) == GET_MODE_MASK (HImode) - || UINTVAL (op) == GET_MODE_MASK (SImode)")))) - (define_predicate "lui_operand" (and (match_code "const_int") (match_test "LUI_OPERAND (INTVAL (op))"))) @@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand" (and (match_code "const_int") (match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))"))) +(define_predicate "arith_or_mode_mask_or_zbs_operand" + (ior (match_operand 0 "arith_operand") + (and (match_test "TARGET_ZBS") + (match_operand 0 "not_single_bit_mask_operand")) + (and (match_code "const_int") + (match_test "UINTVAL (op) == GET_MODE_MASK (HImode) + || UINTVAL (op) == GET_MODE_MASK (SImode)")))) + (define_predicate "const_si_mask_operand" (and (match_code "const_int") (match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1)) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 85df5b7ab49..3b32b515fac 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], codes[1].use_uw = false; cost = 2; } + + /* If LUI/ADDI are going to set bits 32..63 and we need a small + number of them cleared, we might be able to use bclri profitably. + + Note we may allow clearing of bit 31 using bclri. There's a class + of constants with that bit clear where this helps. */ + else if (TARGET_64BIT + && TARGET_ZBS + && (32 - popcount_hwi (value & HOST_WIDE_INT_C (0xffffffff80000000))) + 1 < cost) + { + /* Turn on all those upper bits and synthesize the result. */ + HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0xffffffff80000000); + alt_cost = riscv_build_integer_1 (alt_codes, nval, mode); + + /* Now iterate over the bits we want to clear until the cost is + too high or we're done. */ + nval = value ^ HOST_WIDE_INT_C (-1); + nval &= HOST_WIDE_INT_C (~0x7fffffff); + while (nval && alt_cost < cost) + { + HOST_WIDE_INT bit = ctz_hwi (nval); + alt_codes[alt_cost].code = AND; + alt_codes[alt_cost].value = ~(1UL << bit); + alt_codes[alt_cost].use_uw = false; + alt_cost++; + nval &= ~(1UL << bit); + } + + if (alt_cost <= cost) + { + memcpy (codes, alt_codes, sizeof (alt_codes)); + cost = alt_cost; + } + } } if (cost > 2 && TARGET_64BIT && TARGET_ZBA) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 78c16adee98..1bef1d67efa 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1648,7 +1648,7 @@ (define_insn "smax3" (define_expand "and3" [(set (match_operand:X 0 "register_operand") (and:X (match_operand:X 1 "register_operand") - (match_operand:X 2 "arith_operand_or_mode_mask")))] + (match_operand:X 2 "arith_or_mode_mask_or_zbs_operand")))] "" { /* If the second operand is a mode mask, emit an extension diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-6.c b/gcc/testsuite/gcc.target/riscv/synthesis-6.c new file mode 100644 index 00000000000..65cf748f4b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-6.c @@ -0,0 +1,95 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|bclri|li|ret|sh1add|sh2add|sh3add|slli)" 156 } } */ + + +unsigned long foo_0xfffffffbfffff7ff(void) { return 0xfffffffbfffff7ffUL; } + +unsigned long foo_0xfffffff7fffff7ff(void) { return 0xfffffff7fffff7ffUL; } + +unsigned long foo_0xffffffeffffff7ff(void) { return 0xffffffeffffff7ffUL; } + +unsigned long foo_0xffffffdffffff7ff(void) { return 0xffffffdffffff7ffUL; } + +unsigned long foo_0xffffffbffffff7ff(void) { return 0xffffffbffffff7ffUL; } + +unsigned long foo_0xffffff7ffffff7ff(void) { return 0xffffff7ffffff7ffUL; } + +unsigned long foo_0xfffffefffffff7ff(void) { return 0xfffffefffffff7ffUL; } + +unsigned long foo_0xfffffdfffffff7ff(void) { return 0xfffffdfffffff7ffUL; } + +unsigned long foo_0xfffffbfffffff7ff(void) { return 0xfffffbfffffff7ffUL; } + +unsigned long foo_0xfffff7fffffff7ff(void) { return 0xfffff7fffffff7ffUL; } + +unsigned long foo_0xffffeffffffff7ff(void) { return 0xffffeffffffff7ffUL; } + +unsigned long foo_0xffffdffffffff7ff(void) { return 0xffffdffffffff7ffUL; } + +unsigned long foo_0xffffbffffffff7ff(void) { return 0xffffbffffffff7ffUL; } + +unsigned long foo_0xffff7ffffffff7ff(void) { return 0xffff7ffffffff7ffUL; } + +unsigned long foo_0xfffefffffffff7ff(void) { return 0xfffefffffffff7ffUL; } + +unsigned long foo_0xfffdfffffffff7ff(void) { return 0xfffdfffffffff7ffUL; } + +unsigned long foo_0xfffbfffffffff7ff(void) { return 0xfffbfffffffff7ffUL; } + +unsigned long foo_0xfff7fffffffff7ff(void) { return 0xfff7fffffffff7ffUL; } + +unsigned long foo_0xffeffffffffff7ff(void) { return 0xffeffffffffff7ffUL; } + +unsigned long foo_0xffdffffffffff7ff(void) { return 0xffdffffffffff7ffUL; } + +unsigned long foo_0xffbffffffffff7ff(void) { return 0xffbffffffffff7ffUL; } + +unsigned long foo_0xff7ffffffffff7ff(void) { return 0xff7ffffffffff7ffUL; } + +unsigned long foo_0xfefffffffffff7ff(void) { return 0xfefffffffffff7ffUL; } + +unsigned long foo_0xfdfffffffffff7ff(void) { return 0xfdfffffffffff7ffUL; } + +unsigned long foo_0xfbfffffffffff7ff(void) { return 0xfbfffffffffff7ffUL; } + +unsigned long foo_0xf7fffffffffff7ff(void) { return 0xf7fffffffffff7ffUL; } + +unsigned long foo_0xeffffffffffff7ff(void) { return 0xeffffffffffff7ffUL; } + +unsigned long foo_0xdffffffffffff7ff(void) { return 0xdffffffffffff7ffUL; } + +unsigned long foo_0xbffffffffffff7ff(void) { return 0xbffffffffffff7ffUL; } + +unsigned long foo_0xffffffff7fffd7ff(void) { return 0xffffffff7fffd7ffUL; } + +unsigned long foo_0xffffffff7ffdf7ff(void) { return 0xffffffff7ffdf7ffUL; } + +unsigned long foo_0xffffffff7fdff7ff(void) { return 0xffffffff7fdff7ffUL; } + +unsigned long foo_0xffffffff7dfff7ff(void) { return 0xffffffff7dfff7ffUL; } + +unsigned long foo_0xffffffff5ffff7ff(void) { return 0xffffffff5ffff7ffUL; } + +unsigned long foo_0xfffff7ff7ffff7ff(void) { return 0xfffff7ff7ffff7ffUL; } + +unsigned long foo_0xffffdfff7ffff7ff(void) { return 0xffffdfff7ffff7ffUL; } + +unsigned long foo_0xffff7fff7ffff7ff(void) { return 0xffff7fff7ffff7ffUL; } + +unsigned long foo_0xfffdffff7ffff7ff(void) { return 0xfffdffff7ffff7ffUL; } + + +