diff mbox series

[WIP] nvptx: Also allow immediate input operand to 'bitrev<mode>2'

Message ID 87bkehx5x1.fsf@euler.schwinge.homeip.net
State New
Headers show
Series [WIP] nvptx: Also allow immediate input operand to 'bitrev<mode>2' | expand

Commit Message

Thomas Schwinge Sept. 4, 2023, 4:24 p.m. UTC
Hi!

I'm working towards reviewing some (of Roger's) GCC/nvptx patches, and
therefore learning some more GCC/nvptx, and generally RTL etc., and the
conventions around it.  Please bear with me asking "obvious" questions.

For the PTX bit reverse instruction, GCC/nvptx currently ("forever")
defines:

    (define_insn "bitrev<mode>2"
      [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
           (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
                        UNSPEC_BITREV))]
      ""
      "%.\\tbrev.b%T0\\t%0, %1;")

..., with:

    (define_predicate "nvptx_register_operand"
      (match_code "reg")
    {
      return register_operand (op, mode);
    })

    (define_constraint "R"
      "A pseudo register."
      (match_code "reg"))

That is, only a register input operand is permitted, not an immediate.
However, I don't see such a restriction in the manual,
<https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-brev>.

If I change that 'define_insn':

    -       (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
    +       (unspec:SDIM [(match_operand:SDIM 1 "nvptx_nonmemory_operand" "Ri")]

..., with (existing):

    (define_predicate "nvptx_nonmemory_operand"
      (match_code "reg,const_int,const_double")
    {
      return (REG_P (op) ? register_operand (op, mode)
              : immediate_operand (op, mode));
    })

..., then a simple code:

    return __builtin_nvptx_brev(0xe6a2c480) != 0x01234567;

... for '-O1' subsequently changes:

    [...]
     .reg .u32 %r22;
    -.reg .u32 %r25;
     .reg .pred %r29;
    -mov.u32 %r25,-425540480;
    -brev.b32 %r22,%r25;
    +brev.b32 %r22,-425540480;
     setp.ne.u32 %r29,%r22,19088743;
    [...]

(I understand that, in the end, that's probably equivalent, assuming that
the later PTX -> SASS compiler does the same optimization, but I find it
easier to read: one less '.reg' to keep track of textually/mentally.)

Does that make sense to you, too?  I'd then extend my attached
"[WIP] nvptx: Also allow immediate input operand to 'bitrev<mode>2'" with
a test case.

(Similarly then, a number of other GCC/nvptx 'define_insn's to be
reviewed/revised, later on.)

Relatedly, I see that a lot of GCC/nvptx' two input operands instructions
('add<mode>3', etc.) similarly do allow for their second input operand to
be an immediate in addition to a register.  I suppose that only allowing
for the second input operand to be an immediate is sufficient/desirable:
reduces load on the matching; we shouldn't ever end up with 'IMM + IMM',
for example: should've optimized that before.  As I learn from
<https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html>,
"for commutative [...] operators, a constant is always made the second
operand".  (Confirmed to ICE if swapping that around for 'add<mode>3';
so, that's all as expected.)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

From 4a5138eb61a026ad6bc5470a648ebc596af1b1ed Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 4 Sep 2023 16:48:53 +0200
Subject: [PATCH] [WIP] nvptx: Also allow immediate input operand to
 'bitrev<mode>2'

---
 gcc/config/nvptx/nvptx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1bb93045403..e1c822f2ea8 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -636,7 +636,7 @@ 
 
 (define_insn "bitrev<mode>2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
-	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
+	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_nonmemory_operand" "Ri")]
 		     UNSPEC_BITREV))]
   ""
   "%.\\tbrev.b%T0\\t%0, %1;")
-- 
2.34.1