diff mbox series

[xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.md

Message ID 026001d9755d$19b30030$4d190090$@nextmovesoftware.com
State New
Headers show
Series [xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.md | expand

Commit Message

Roger Sayle April 22, 2023, 8:57 p.m. UTC
This patch adds a pair of define_insn patterns to the xstormy16 machine
description that provide extendhisi2 and zero_extendhisi2, i.e. 16-bit
to 32-bit sign- and zero-extension respectively.  This functionality is
already synthesized during RTL expansion, but providing patterns allow
the semantics to be exposed to the RTL optimizers.  To simplify things,
this patch introduces a new %h0 output format, for emitting the high_part
register name of a double-word (SImode) register pair.  The actual
code generated is identical to before.

Whilst there, I also fixed the instruction lengths and formatting of
the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
alternative to zero_extendqihi2 that takes advantage of its efficient
instruction encoding.

This patch has been tested by building a cross-compiler to xstormy16-elf
on x86_64-pc-linux-gnu, and confirming that the new test case passes with
"make -k check-gcc".  Ok for mainline?


2023-04-22  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
        format specifier to output high_part register name of SImode reg.
        * config/stormy16/stormy16.md (extendhisi2): New define_insn.
        (zero_extendqihi2): Fix lengths, consistent formatting and add
        "and Rx,#255" alternative, for documentation purposes.
        (zero_extendhisi2): New define_insn.

gcc/testsuite/ChangeLog
        * gcc.target/xstormy16/extendhisi2.c: New test case.
        * gcc.target/xstormy16/zextendhisi2.c: Likewise.


Thanks in advance,
Roger
--

Comments

Jeff Law April 22, 2023, 10:08 p.m. UTC | #1
On 4/22/23 14:57, Roger Sayle wrote:
> 
> This patch adds a pair of define_insn patterns to the xstormy16 machine
> description that provide extendhisi2 and zero_extendhisi2, i.e. 16-bit
> to 32-bit sign- and zero-extension respectively.  This functionality is
> already synthesized during RTL expansion, but providing patterns allow
> the semantics to be exposed to the RTL optimizers.  To simplify things,
> this patch introduces a new %h0 output format, for emitting the high_part
> register name of a double-word (SImode) register pair.  The actual
> code generated is identical to before.
> 
> Whilst there, I also fixed the instruction lengths and formatting of
> the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
> as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
> alternative to zero_extendqihi2 that takes advantage of its efficient
> instruction encoding.
> 
> This patch has been tested by building a cross-compiler to xstormy16-elf
> on x86_64-pc-linux-gnu, and confirming that the new test case passes with
> "make -k check-gcc".  Ok for mainline?
> 
> 
> 2023-04-22  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
>          format specifier to output high_part register name of SImode reg.
>          * config/stormy16/stormy16.md (extendhisi2): New define_insn.
>          (zero_extendqihi2): Fix lengths, consistent formatting and add
>          "and Rx,#255" alternative, for documentation purposes.
>          (zero_extendhisi2): New define_insn.
> 
> gcc/testsuite/ChangeLog
>          * gcc.target/xstormy16/extendhisi2.c: New test case.
>          * gcc.target/xstormy16/zextendhisi2.c: Likewise.
Does the "T" alternative ever match?  AFAICT its constraint check always 
fails:

> (define_constraint "T"
>   "@internal"
>   ;; For Rx; not implemented yet.
>   (match_test "0"))

No objections, but just not sure what's going on with that T constraint.

jeff
Roger Sayle April 23, 2023, 10:38 a.m. UTC | #2
On 4/33/23, Jeff Law wrote:
> On 4/22/23 14:57, Roger Sayle wrote:
> > Whilst there, I also fixed the instruction lengths and formatting of
> > the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
> > as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
> > alternative to zero_extendqihi2 that takes advantage of its efficient
> > instruction encoding.
> >
> > This patch has been tested by building a cross-compiler to
> > xstormy16-elf on x86_64-pc-linux-gnu, and confirming that the new test
> > case passes with "make -k check-gcc".  Ok for mainline?
> >
> >
> > 2023-04-22  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
> >          format specifier to output high_part register name of SImode reg.
> >          * config/stormy16/stormy16.md (extendhisi2): New define_insn.
> >          (zero_extendqihi2): Fix lengths, consistent formatting and add
> >          "and Rx,#255" alternative, for documentation purposes.
> >          (zero_extendhisi2): New define_insn.
> >
> > gcc/testsuite/ChangeLog
> >          * gcc.target/xstormy16/extendhisi2.c: New test case.
> >          * gcc.target/xstormy16/zextendhisi2.c: Likewise.
> Does the "T" alternative ever match?  AFAICT its constraint check always
> fails:
> 
> > (define_constraint "T"
> >   "@internal"
> >   ;; For Rx; not implemented yet.
> >   (match_test "0"))
> 
> No objections, but just not sure what's going on with that T constraint.

This is an interesting/cool artifact of the xstormy16 architecture/instruction set
that isn't yet (fully) supported in GCC, but much of the infrastructure is in place.
Instructions on xstormy16 are encoded by either one or two 16-bit words.
If an immediate constant is between 0..15, arithmetic instructions can be
encoded in a single word, otherwise they require two words, with a full
16-bit immediate constant in the second word.  The possibly unique feature
of xstormy is an "Rx" addressing mode, that can be used when the destination
register is the same destination as the previous instruction, which by implicitly
encoding DEST, allows 8-bit immediate constants, to be encoded in a single word
instruction.

Handling this dependency between instructions is tricky, with Rx (aka Rpsw) depending
upon the N0..N3 bits in the flags register, and these bits being modified/updated
by almost every instruction.  The 'T' constraint is a placeholder, that currently
always returns false but in theory allows the register allocator to identify/select
this alternative, and the psw_operand attribute on each instruction indicating
how it updates N0..N3 (DEST) bits in the processor status word (PSW).

This feature is particularly useful for zero extension from QI to HI mode.  This
normally requires a shl/shr sequence, but when the register being extended was
modified in the preceding instruction, the single word instruction "and Rx,#255"
can be used.

Currently, for
unsigned char foo(unsigned char x) { return ~x; }
GCC -O2 generates:
foo:    not r2
        shl r2,#8 | shr r2,#8
        ret
but more optimally could use:
foo:    not r2  // Rx now means r2
        and Rx,#255  // shorter than and r2,#255
        ret


I doubt this functionality will be supported by the register allocator and/or scheduler
any time soon, but there's plenty that can be done with "macro instructions", for
example, a hypothetical "*onecmplqi_zexthi" which matches the above RTL,
something  (zero_extend:HI (not:QI (match_operand:QI "register_operand")),
could emit "not %0 | and Rx,#255".

Presumably, the semantics of "Rx" are correctly supported by the xstormy16 simulator?

I hope this helps explain things (as I understand them).
Thanks again for your help.
Roger
--
Jeff Law April 24, 2023, 11:53 p.m. UTC | #3
On 4/23/23 04:38, Roger Sayle wrote:
> On 4/33/23, Jeff Law wrote:
>> On 4/22/23 14:57, Roger Sayle wrote:
>>> Whilst there, I also fixed the instruction lengths and formatting of
>>> the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
>>> as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
>>> alternative to zero_extendqihi2 that takes advantage of its efficient
>>> instruction encoding.
>>>
>>> This patch has been tested by building a cross-compiler to
>>> xstormy16-elf on x86_64-pc-linux-gnu, and confirming that the new test
>>> case passes with "make -k check-gcc".  Ok for mainline?
>>>
>>>
>>> 2023-04-22  Roger Sayle  <roger@nextmovesoftware.com>
>>>
>>> gcc/ChangeLog
>>>           * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
>>>           format specifier to output high_part register name of SImode reg.
>>>           * config/stormy16/stormy16.md (extendhisi2): New define_insn.
>>>           (zero_extendqihi2): Fix lengths, consistent formatting and add
>>>           "and Rx,#255" alternative, for documentation purposes.
>>>           (zero_extendhisi2): New define_insn.
>>>
>>> gcc/testsuite/ChangeLog
>>>           * gcc.target/xstormy16/extendhisi2.c: New test case.
>>>           * gcc.target/xstormy16/zextendhisi2.c: Likewise.
>> Does the "T" alternative ever match?  AFAICT its constraint check always
>> fails:
>>
>>> (define_constraint "T"
>>>    "@internal"
>>>    ;; For Rx; not implemented yet.
>>>    (match_test "0"))
>>
>> No objections, but just not sure what's going on with that T constraint.
> 
> This is an interesting/cool artifact of the xstormy16 architecture/instruction set
> that isn't yet (fully) supported in GCC, but much of the infrastructure is in place.
> Instructions on xstormy16 are encoded by either one or two 16-bit words.
> If an immediate constant is between 0..15, arithmetic instructions can be
> encoded in a single word, otherwise they require two words, with a full
> 16-bit immediate constant in the second word.  The possibly unique feature
> of xstormy is an "Rx" addressing mode, that can be used when the destination
> register is the same destination as the previous instruction, which by implicitly
> encoding DEST, allows 8-bit immediate constants, to be encoded in a single word
> instruction.
> 
> Handling this dependency between instructions is tricky, with Rx (aka Rpsw) depending
> upon the N0..N3 bits in the flags register, and these bits being modified/updated
> by almost every instruction.  The 'T' constraint is a placeholder, that currently
> always returns false but in theory allows the register allocator to identify/select
> this alternative, and the psw_operand attribute on each instruction indicating
> how it updates N0..N3 (DEST) bits in the processor status word (PSW).
> 
> This feature is particularly useful for zero extension from QI to HI mode.  This
> normally requires a shl/shr sequence, but when the register being extended was
> modified in the preceding instruction, the single word instruction "and Rx,#255"
> can be used.
> 
> Currently, for
> unsigned char foo(unsigned char x) { return ~x; }
> GCC -O2 generates:
> foo:    not r2
>          shl r2,#8 | shr r2,#8
>          ret
> but more optimally could use:
> foo:    not r2  // Rx now means r2
>          and Rx,#255  // shorter than and r2,#255
>          ret
> 
> 
> I doubt this functionality will be supported by the register allocator and/or scheduler
> any time soon, but there's plenty that can be done with "macro instructions", for
> example, a hypothetical "*onecmplqi_zexthi" which matches the above RTL,
> something  (zero_extend:HI (not:QI (match_operand:QI "register_operand")),
> could emit "not %0 | and Rx,#255".
> 
> Presumably, the semantics of "Rx" are correctly supported by the xstormy16 simulator?
> 
> I hope this helps explain things (as I understand them).
That is pretty cool.  THanks for the explanation.


> Thanks again for your help.
Happy to.  Though you might regret asking for it :-)

> Tests that now fail, but worked before (5 tests):
> 
> xstormy16-sim: gcc.c-torture/execute/memset-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> xstormy16-sim: gcc.c-torture/execute/memset-2.c   -O3 -g  (test for excess errors)
> xstormy16-sim: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> xstormy16-sim: gcc.c-torture/execute/memset-3.c   -O3 -g  (test for excess errors)
> xstormy16-sim: gcc.c-torture/execute/va-arg-22.c   -O3 -g  (test for excess errors)
These started after your xstormy16 changes.  Given they're excess 
errors, they could well be ICEs.  I haven't debugged them at all.

jeff
diff mbox series

Patch

diff --git a/gcc/config/stormy16/stormy16.cc b/gcc/config/stormy16/stormy16.cc
index 1ed619a..34118e2 100644
--- a/gcc/config/stormy16/stormy16.cc
+++ b/gcc/config/stormy16/stormy16.cc
@@ -1828,6 +1828,14 @@  xstormy16_print_operand (FILE *file, rtx x, int code)
 	return;
       }
 
+    case 'h':
+      /* Print the highpart register of an SI mode register pair.  */
+      if (REG_P (x) && GET_MODE (x) == SImode)
+        fputs (reg_names [REGNO (x) + 1], file);
+      else
+	output_operand_lossage ("'h' operand is not SImode register");
+      return;
+
     case 0:
       /* Handled below.  */
       break;
diff --git a/gcc/config/stormy16/stormy16.md b/gcc/config/stormy16/stormy16.md
index d27c9fb..fd52588 100644
--- a/gcc/config/stormy16/stormy16.md
+++ b/gcc/config/stormy16/stormy16.md
@@ -268,17 +268,34 @@ 
   ""
   "cbw %0")
 
+(define_insn "extendhisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(sign_extend:SI (match_operand:HI 1 "register_operand" "0")))]
+  ""
+  "mov %h0,%0 | asr %h0,#15"
+  [(set_attr "length" "4")
+   (set_attr "psw_operand" "clobber")])
+
 (define_insn "zero_extendqihi2"
-  [(set (match_operand:HI                 0 "register_operand" 	   "=e,r")
-	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "m,0")))]
+  [(set (match_operand:HI                 0 "register_operand" 	   "=e,T,r")
+	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "m,0,0")))]
   ""
   "@
-   mov.b %0, %1
-   shl %0,#8\n\tshr %0,#8"
-  [(set_attr "psw_operand" "nop,0")
+   mov.b %0,%1
+   and Rx,#255
+   shl %0,#8 | shr %0,#8"
+  [(set_attr "psw_operand" "nop,nop,0")
    (set_attr_alternative "length"
-	     [(const_int 4)
-	      (const_int 8)])])
+	     [(const_int 2)
+	      (const_int 2)
+	      (const_int 4)])])
+
+(define_insn "zero_extendhisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(zero_extend:SI (match_operand:HI 1 "register_operand" "0")))]
+  ""
+  "mov %h0,#0"
+  [(set_attr "psw_operand" "clobber")])
 
 ;; ::::::::::::::::::::
 ;; ::
diff --git a/gcc/testsuite/gcc.target/xstormy16/extendhisi2.c b/gcc/testsuite/gcc.target/xstormy16/extendhisi2.c
new file mode 100644
index 0000000..bb8a22d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xstormy16/extendhisi2.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+long foo(short x)
+{
+  return x;
+}
+/* { dg-final { scan-assembler "asr r3,#15" } } */
diff --git a/gcc/testsuite/gcc.target/xstormy16/zextendhisi2.c b/gcc/testsuite/gcc.target/xstormy16/zextendhisi2.c
new file mode 100644
index 0000000..9aef32a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xstormy16/zextendhisi2.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned long foo(unsigned short x)
+{
+  return x;
+}
+/* { dg-final { scan-assembler "mov r3,#0" } } */