diff mbox

[avr] minor tweaks for 8-bit operations

Message ID 14737e83-d5dc-3096-2d12-686c42da1aa9@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay July 13, 2016, 11:12 a.m. UTC
This patch contains some unrelated tweaks

- Supplying no-ldregs variant for andqi3, iorqi3 where a const_int mask affects 
only 1 bit

- Some patterns that match situations with zero_extend that can be performed 
with less instructions / register pressure.

- comparing HI against -1

Ok for trunk?

Johann


gcc/
	Minor tweaks for QImode.

	* config/avr/predicates.md (const_m255_to_m1_operand): New.
	* config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
	* config/avr/avr.md (add<mode>3) <ALL1>: Make "r,0,r" more
	expensive.
	(*cmphi.zero-extend.0, *cmphi.zero-extend.1)
	(*usum_widenqihi3, *udiff_widenqihi3)
	(*addhi3_zero_extend.const): New combiner insns.
	(andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
	just 1 bit is affected.
	* config/avr/avr.c (avr_out_bitop) <QImode>: Don't access xop[3].
	(avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.

Comments

Senthil Kumar Selvaraj July 14, 2016, 3:55 a.m. UTC | #1
Georg-Johann Lay writes:

> This patch contains some unrelated tweaks
>
> - Supplying no-ldregs variant for andqi3, iorqi3 where a const_int mask affects 
> only 1 bit
>
> - Some patterns that match situations with zero_extend that can be performed 
> with less instructions / register pressure.

From my (admittedly limited) attempts at code size benchmarking, this
one will help a lot I think, considering ints are used everywhere and
they end up always taking 2 regs, even when one would do. Do you have any numbers on the code size improvement this provides?

Regards
Senthil
>
> - comparing HI against -1
>
> Ok for trunk?
>
> Johann
>
>
> gcc/
> 	Minor tweaks for QImode.
>
> 	* config/avr/predicates.md (const_m255_to_m1_operand): New.
> 	* config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
> 	* config/avr/avr.md (add<mode>3) <ALL1>: Make "r,0,r" more
> 	expensive.
> 	(*cmphi.zero-extend.0, *cmphi.zero-extend.1)
> 	(*usum_widenqihi3, *udiff_widenqihi3)
> 	(*addhi3_zero_extend.const): New combiner insns.
> 	(andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
> 	just 1 bit is affected.
> 	* config/avr/avr.c (avr_out_bitop) <QImode>: Don't access xop[3].
> 	(avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.
Denis Chertykov July 14, 2016, 6:36 a.m. UTC | #2
2016-07-13 14:12 GMT+03:00 Georg-Johann Lay <avr@gjlay.de>:
>
> This patch contains some unrelated tweaks
>
> - Supplying no-ldregs variant for andqi3, iorqi3 where a const_int mask affects only 1 bit
>
> - Some patterns that match situations with zero_extend that can be performed with less instructions / register pressure.
>
> - comparing HI against -1
>
> Ok for trunk?
>
> Johann
>
>
> gcc/
>         Minor tweaks for QImode.
>
>         * config/avr/predicates.md (const_m255_to_m1_operand): New.
>         * config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
>         * config/avr/avr.md (add<mode>3) <ALL1>: Make "r,0,r" more
>         expensive.
>         (*cmphi.zero-extend.0, *cmphi.zero-extend.1)
>         (*usum_widenqihi3, *udiff_widenqihi3)
>         (*addhi3_zero_extend.const): New combiner insns.
>         (andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
>         just 1 bit is affected.
>         * config/avr/avr.c (avr_out_bitop) <QImode>: Don't access xop[3].
>         (avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.



Please apply.
Georg-Johann Lay July 14, 2016, 12:09 p.m. UTC | #3
On 14.07.2016 05:55, Senthil Kumar Selvaraj wrote:
>
> Georg-Johann Lay writes:
>
>> This patch contains some unrelated tweaks
>>
>> [...]
>>
>> - Some patterns that match situations with zero_extend that can be performed
>> with less instructions / register pressure.
>
> From my (admittedly limited) attempts at code size benchmarking, this
> one will help a lot I think, considering ints are used everywhere and
> they end up always taking 2 regs, even when one would do. Do you have any numbers on the code size improvement this provides?
>
> Regards
> Senthil


No, I usually propose such patches when I am coming across respective situation 
in generated asm or in a dump file and it's clear that the change won't reduce 
performance loss in other places.  My expectations on the overall gains are 
low, in particular if some subreg-lowering already split from one HI to two QI 
so that there is no zero_extend any more that could be caught.

For the QImoe plus, and, ior with const_int it might pay to add a "d" clobber 
if the register ends up in "l" regs.  The compiler sometimes reloads the whole 
stuff into d-regs, i.e. instead of

   LDI R20, -42
   ADD R10, R20

we get

   MOV R20, R10
   ADD R20, -42
   MOV R10, R20

I even came across situations where reload reloads the const_int to a "l" 
register which might cost up to 4 instructions for the constant alone!

Johann


>>
>> - comparing HI against -1
>>
>> Ok for trunk?
>>
>> Johann
>>
>>
>> gcc/
>> 	Minor tweaks for QImode.
>>
>> 	* config/avr/predicates.md (const_m255_to_m1_operand): New.
>> 	* config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
>> 	* config/avr/avr.md (add<mode>3) <ALL1>: Make "r,0,r" more
>> 	expensive.
>> 	(*cmphi.zero-extend.0, *cmphi.zero-extend.1)
>> 	(*usum_widenqihi3, *udiff_widenqihi3)
>> 	(*addhi3_zero_extend.const): New combiner insns.
>> 	(andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
>> 	just 1 bit is affected.
>> 	* config/avr/avr.c (avr_out_bitop) <QImode>: Don't access xop[3].
>> 	(avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.
diff mbox

Patch

Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(revision 237643)
+++ config/avr/avr.c	(working copy)
@@ -5346,6 +5346,34 @@  avr_out_compare (rtx_insn *insn, rtx *xo
         }
     }
 
+  /* Comparisons == -1 and != -1 of a d-register that's used after the
+     comparison.  (If it's unused after we use CPI / SBCI or ADIW sequence
+     from below.)  Instead of  CPI Rlo,-1 / LDI Rx,-1 / CPC Rhi,Rx  we can
+     use  CPI Rlo,-1 / CPC Rhi,Rlo  which is 1 instruction shorter:
+     If CPI is true then Rlo contains -1 and we can use Rlo instead of Rx
+     when CPC'ing the high part.  If CPI is false then CPC cannot render
+     the result to true.  This also works for the more generic case where
+     the constant is of the form 0xabab.  */
+
+  if (n_bytes == 2
+      && xval != 0
+      && test_hard_reg_class (LD_REGS, xreg)
+      && compare_eq_p (insn)
+      && !reg_unused_after (insn, xreg))
+    {
+      rtx xlo8 = simplify_gen_subreg (QImode, xval, mode, 0);
+      rtx xhi8 = simplify_gen_subreg (QImode, xval, mode, 1);
+
+      if (INTVAL (xlo8) == INTVAL (xhi8))
+        {
+          xop[0] = xreg;
+          xop[1] = xlo8;
+
+          return avr_asm_len ("cpi %A0,%1"  CR_TAB
+                              "cpc %B0,%A0", xop, plen, 2);
+        }
+    }
+
   for (i = 0; i < n_bytes; i++)
     {
       /* We compare byte-wise.  */
@@ -7687,11 +7715,11 @@  avr_out_bitop (rtx insn, rtx *xop, int *
 
   /* op[0]: 8-bit destination register
      op[1]: 8-bit const int
-     op[2]: 8-bit clobber register or SCRATCH
+     op[2]: 8-bit clobber register, SCRATCH or NULL_RTX.
      op[3]: 8-bit register containing 0xff or NULL_RTX  */
   rtx op[4];
 
-  op[2] = xop[3];
+  op[2] = QImode == mode ? NULL_RTX : xop[3];
   op[3] = NULL_RTX;
 
   if (plen)
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(revision 237643)
+++ config/avr/avr.md	(working copy)
@@ -1240,6 +1240,33 @@  (define_insn "*addhi3.sign_extend1"
   [(set_attr "length" "5")
    (set_attr "cc" "clobber")])
 
+(define_insn "*addhi3_zero_extend.const"
+  [(set (match_operand:HI 0 "register_operand"                         "=d")
+        (plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0"))
+                 (match_operand:HI 2 "const_m255_to_m1_operand"         "Cn8")))]
+  ""
+  "subi %A0,%n2\;sbc %B0,%B0"
+  [(set_attr "length" "2")
+   (set_attr "cc" "set_czn")])
+
+(define_insn "*usum_widenqihi3"
+  [(set (match_operand:HI 0 "register_operand"                          "=r")
+        (plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+                 (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"))))]
+  ""
+  "add %A0,%2\;clr %B0\;rol %B0"
+  [(set_attr "length" "3")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*udiff_widenqihi3"
+  [(set (match_operand:HI 0 "register_operand"                           "=r")
+        (minus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+                  (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"))))]
+  ""
+  "sub %A0,%2\;sbc %B0,%B0"
+  [(set_attr "length" "2")
+   (set_attr "cc" "set_czn")])
+    
 (define_insn "*addhi3_sp"
   [(set (match_operand:HI 1 "stack_register_operand"           "=q")
         (plus:HI (match_operand:HI 2 "stack_register_operand"   "q")
@@ -3102,15 +3129,16 @@  (define_insn "*udivmodsi4_call"
 ; and
 
 (define_insn "andqi3"
-  [(set (match_operand:QI 0 "register_operand"       "=??r,d")
-        (and:QI (match_operand:QI 1 "register_operand" "%0,0")
-                (match_operand:QI 2 "nonmemory_operand" "r,i")))]
+  [(set (match_operand:QI 0 "register_operand"       "=??r,d,*l")
+        (and:QI (match_operand:QI 1 "register_operand" "%0,0,0")
+                (match_operand:QI 2 "nonmemory_operand" "r,i,Ca1")))]
   ""
   "@
 	and %0,%2
-	andi %0,lo8(%2)"
-  [(set_attr "length" "1,1")
-   (set_attr "cc" "set_zn,set_zn")])
+	andi %0,lo8(%2)
+	* return avr_out_bitop (insn, operands, NULL);"
+  [(set_attr "length" "1,1,2")
+   (set_attr "cc" "set_zn,set_zn,none")])
 
 (define_insn "andhi3"
   [(set (match_operand:HI 0 "register_operand"       "=??r,d,d,r  ,r")
@@ -3184,15 +3212,16 @@  (define_peephole2 ; andi
 ;; ior
 
 (define_insn "iorqi3"
-  [(set (match_operand:QI 0 "register_operand"       "=??r,d")
-        (ior:QI (match_operand:QI 1 "register_operand" "%0,0")
-                (match_operand:QI 2 "nonmemory_operand" "r,i")))]
+  [(set (match_operand:QI 0 "register_operand"       "=??r,d,*l")
+        (ior:QI (match_operand:QI 1 "register_operand" "%0,0,0")
+                (match_operand:QI 2 "nonmemory_operand" "r,i,Co1")))]
   ""
   "@
 	or %0,%2
-	ori %0,lo8(%2)"
-  [(set_attr "length" "1,1")
-   (set_attr "cc" "set_zn,set_zn")])
+	ori %0,lo8(%2)
+        * return avr_out_bitop (insn, operands, NULL);"
+  [(set_attr "length" "1,1,2")
+   (set_attr "cc" "set_zn,set_zn,none")])
 
 (define_insn "iorhi3"
   [(set (match_operand:HI 0 "register_operand"       "=??r,d,d,r  ,r")
@@ -4607,6 +4636,25 @@  (define_insn "*cmpqi_sign_extend"
   [(set_attr "cc" "compare")
    (set_attr "length" "1")])
 
+
+(define_insn "*cmphi.zero-extend.0"
+  [(set (cc0)
+        (compare (zero_extend:HI (match_operand:QI 0 "register_operand" "r"))
+                 (match_operand:HI 1 "register_operand" "r")))]
+  ""
+  "cp %0,%A1\;cpc __zero_reg__,%B1"
+  [(set_attr "cc" "compare")
+   (set_attr "length" "2")])
+
+(define_insn "*cmphi.zero-extend.1"
+  [(set (cc0)
+        (compare (match_operand:HI 0 "register_operand" "r")
+                 (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))))]
+  ""
+  "cp %A0,%1\;cpc %B0,__zero_reg__"
+  [(set_attr "cc" "compare")
+   (set_attr "length" "2")])
+
 ;; "*cmphi"
 ;; "*cmphq" "*cmpuhq"
 ;; "*cmpha" "*cmpuha"
Index: config/avr/constraints.md
===================================================================
--- config/avr/constraints.md	(revision 237643)
+++ config/avr/constraints.md	(working copy)
@@ -133,6 +133,11 @@  (define_constraint "C07"
   (and (match_code "const_int")
        (match_test "ival == 7")))
 
+(define_constraint "Ca1"
+  "Constant 1-byte integer that allows AND by means of CLT + BLD."
+  (and (match_code "const_int")
+       (match_test "avr_popcount_each_byte (op, 1, 1<<7)")))
+
 (define_constraint "Ca2"
   "Constant 2-byte integer that allows AND without clobber register."
   (and (match_code "const_int")
@@ -148,6 +153,11 @@  (define_constraint "Ca4"
   (and (match_code "const_int")
        (match_test "avr_popcount_each_byte (op, 4, (1<<0) | (1<<7) | (1<<8))")))
 
+(define_constraint "Co1"
+  "Constant 1-byte integer that allows AND by means of SET + BLD."
+  (and (match_code "const_int")
+       (match_test "avr_popcount_each_byte (op, 1, 1<<1)")))
+
 (define_constraint "Co2"
   "Constant 2-byte integer that allows OR without clobber register."
   (and (match_code "const_int")
@@ -193,6 +203,11 @@  (define_constraint "C0f"
   (and (match_code "const_int")
        (match_test "!avr_has_nibble_0xf (op)")))
 
+(define_constraint "Cn8"
+  "A negative constant integer in the range @minus{}255 @dots{} @minus{}1."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, -255, -1)")))
+
 ;; CONST_FIXED is no element of 'n' so cook our own.
 ;; "i" or "s" would match but because the insn uses iterators that cover
 ;; INT_MODE, "i" or "s" is not always possible.
@@ -230,6 +245,12 @@  (define_constraint "Ym2"
             (match_test "-2 == INTVAL (avr_to_int_mode (op))"))
        (match_test "satisfies_constraint_Cm2 (op)")))
 
+(define_constraint "Yx2"
+  "Fixed-point or integer constant not in the range @minus{}2 @dots{} 2"
+  (and (ior (match_code "const_int")
+            (match_code "const_fixed"))
+       (match_test "!IN_RANGE (INTVAL (avr_to_int_mode (op)), -2, 2)")))
+
 ;; Similar to "IJ" used with ADIW/SBIW, but for CONST_FIXED.
 
 (define_constraint "YIJ"
Index: config/avr/predicates.md
===================================================================
--- config/avr/predicates.md	(revision 237643)
+++ config/avr/predicates.md	(working copy)
@@ -114,6 +114,11 @@  (define_predicate "const_2_to_6_operand"
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 2, 6)")))
 
+;; Return 1 if OP is constant integer -255..-1.
+(define_predicate "const_m255_to_m1_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), -255, -1)")))
+
 ;; Returns true if OP is either the constant zero or a register.
 (define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "register_operand")