diff mbox

[AVR] : QI builtins for parity, popcount, 1<< n

Message ID 4DFA26FE.1000400@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay June 16, 2011, 3:53 p.m. UTC
The recent implementation of some asm function in libgcc added
__popcountqi2 and __parityqi2. This patch makes these functions
available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.

Moreover, just out of a mood, I wrote a builtin for 1<<n.
1<<n is sometimes used to set a variable SFR bit. The builtin supplies
a fast, loop-free implementation.

The nice thing is that all these can represented explicitly so that
gcc sees what's going on and can fold if it sees operation on a
compile time constant.

Johann

	* config/avr/avr.c (avr_init_builtins, avr_builtin_id,
	bdesc_1arg): Add support for __builtin_avr_parity8,
	__builtin_avr_popcount8, __builtin_avr_pow2.
	(adjust_insn_length): Filter out "ashlqi2_1" insn in length
	computation.
	* config/avr/avr.c (avr_cpu_cpp_builtins): New builtin defines for
	__BUILTIN_AVR_PARITY8, __BUILTIN_AVR_POPCOUNT8,	__BUILTIN_AVR_POW2.
	* config/avr/avr.md (parityqi2): New expander.
	(popcountqi2): New expander.
	(*parityqihi2.libgcc): New insn.
	(*popcountqi2.libgcc): New insn.
	(ashlqi2_1): New insn.
	* doc/extend.texi (AVR Built-in Functions): Document new builtins
	__builtin_avr_parity8, __builtin_avr_popcount8,
	__builtin_avr_pow2.

Comments

Joseph Myers June 16, 2011, 5:03 p.m. UTC | #1
On Thu, 16 Jun 2011, Georg-Johann Lay wrote:

> The recent implementation of some asm function in libgcc added
> __popcountqi2 and __parityqi2. This patch makes these functions
> available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.
> 
> Moreover, just out of a mood, I wrote a builtin for 1<<n.
> 1<<n is sometimes used to set a variable SFR bit. The builtin supplies
> a fast, loop-free implementation.

I hope this whole patch is just intended as an example of how to add 
built-in functions and not as a serious proposal for an addition to GCC, 
since it doesn't make sense to add machine-specific built-in functions for 
things that can so readily be represented in generic GNU C; instead, you 
should make generic GNU C generate the right code (for example, by 
handling __builtin_parity and __builtin_popcount smartly on zero-extended 
values if QImode patterns are available - if it doesn't already do so).
Georg-Johann Lay June 17, 2011, 10:55 a.m. UTC | #2
Joseph S. Myers schrieb:
> On Thu, 16 Jun 2011, Georg-Johann Lay wrote:
> 
>> The recent implementation of some asm function in libgcc added
>> __popcountqi2 and __parityqi2. This patch makes these functions
>> available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.
>>
>> Moreover, just out of a mood, I wrote a builtin for 1<<n.
>> 1<<n is sometimes used to set a variable SFR bit. The builtin supplies
>> a fast, loop-free implementation.
> 
> I hope this whole patch is just intended as an example of how to add 
> built-in functions and not as a serious proposal for an addition to GCC, 
> since it doesn't make sense to add machine-specific built-in functions for 
> things that can so readily be represented in generic GNU C; instead, you 
> should make generic GNU C generate the right code (for example, by 
> handling __builtin_parity and __builtin_popcount smartly on zero-extended 
> values if QImode patterns are available - if it doesn't already do so).
> 
> Joseph S. Myers

I don't see what's bat with the patch, it's straight forward.

As I don't intend to introduce new register classes and constraints
(hard regnos won't do any more if combine is intended to come up with
new insn, what it actually does) just for that... Dropping the patch.

Johann
Joseph Myers June 17, 2011, 12:26 p.m. UTC | #3
On Fri, 17 Jun 2011, Georg-Johann Lay wrote:

> I don't see what's bat with the patch, it's straight forward.

C is a high-level language, defined in terms of high-level semantics 
rather than machine instructions.  C code should be written where possible 
using machine-independent functionality, falling into machine-dependent 
operations only where the semantics cannot readily be represented in a 
machine-independent way; the compiler should generally be responsible for 
picking optimal instructions for the source code.

C code should write "1 << n" (or "1 << (n & 7)" if that's what it wants) 
for shifts rather than __builtin_avr_pow2.  That way it's more portable 
and more readable to general C programmers who aren't familiar with all 
the machine-specific interfaces.  Similarly, code wanting to add values 
should use the "+" operator rather than each target having its own 
__builtin_<arch>_add.  And since parity and population-count operations 
are widely present and generically useful, GNU C has generic built-in 
functions for those, so code should use __builtin_parity and 
__builtin_popcount on all machines rather than machine-specific variants; 
such machine-specific variants should not exist.

The machine-independent parts of the compiler should know about the 
equivalence of (popcount X) and (popcount (zero_extend X)), (parity X) and 
(parity (zero_extend X)) and (parity X) and (parity (sign_extend X)) if 
the sign-extension is by an even number of bits - in fact, there is 
already code in simplify_unary_operation_1 that does know this (the 
assumption that all sign-extensions are by an even number of bits is 
hardcoded).  The target should just need to describe how to code these 
operations on various modes.  If the existing code doesn't suffice to 
cause popcountqi etc. patterns to be used for the generic built-in 
functions, the appropriate fix is generic rather than adding new 
target-specific built-in functions - just as if the compiler didn't 
generate your target's addition instruction from the '+' operator, the 
right fix is not to add __builtin_<arch>_add to generate that instruction 
but rather to make '+' generate it.
diff mbox

Patch

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(Revision 175104)
+++ doc/extend.texi	(Arbeitskopie)
@@ -8203,6 +8203,21 @@  int __builtin_avr_fmuls (char, char)
 int __builtin_avr_fmulsu (char, unsigned char)
 @end smallexample
 
+The following built-in functions are a supplements to gcc's
+@code{__builtin_parity*} resp. @code{__builtin_popcount*} built-ins
+for 8-bit values and are implemented as library calls:
+@smallexample
+unsigned char __builtin_avr_parity8 (unsigned char)
+unsigned char __builtin_avr_popcount8 (unsigned char)
+@end smallexample
+
+@smallexample
+unsigned char __builtin_avr_pow2 (unsigned char)
+@end smallexample
+This built-in supplies a fast, loop-free implementation for the @var{N}-th
+power of two as of @code{1 << (N & 7)} that takes about seven ticks resp.
+seven instructions.
+
 In order to delay execution for a specific number of cycles, GCC
 implements
 @smallexample
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 175104)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -3321,6 +3321,62 @@  (define_insn "delay_cycles_4"
   [(set_attr "length" "9")
    (set_attr "cc" "clobber")])
 
+;; __builtin_avr_parity8
+(define_expand "parityqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*parityqihi2.libgcc"
+  [(set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))]
+  ""
+  "%~call __parityqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_popcount8
+(define_expand "popcountqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*popcountqi2.libgcc"
+  [(set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))]
+  ""
+  "%~call __popcountqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_pow2
+;; $0 = (1 << ($1 & 7))
+(define_insn "ashlqi2_1"
+  [(set (match_operand:QI 0 "register_operand" "=&d")
+        (ashift:QI (const_int 1)
+                   (and:QI (match_operand:QI 1 "register_operand" "r")
+                           (const_int 7))))]
+  ""
+  "ldi %0, 1
+	sbrc %1, 1
+	ldi %0, 4
+	sbrc %1, 0
+	lsl %0
+	sbrc %1, 2
+	swap %0"
+  [(set_attr "length" "7")
+   (set_attr "cc" "clobber")])
+
 ;; CPU instructions
 
 ;; NOP taking 1 or 2 Ticks 
Index: config/avr/avr-c.c
===================================================================
--- config/avr/avr-c.c	(Revision 175104)
+++ config/avr/avr-c.c	(Arbeitskopie)
@@ -93,6 +93,9 @@  avr_cpu_cpp_builtins (struct cpp_reader
   cpp_define (pfile, "__BUILTIN_AVR_SLEEP");
   cpp_define (pfile, "__BUILTIN_AVR_SWAP");
   cpp_define (pfile, "__BUILTIN_AVR_DELAY_CYCLES");
+  cpp_define (pfile, "__BUILTIN_AVR_PARITY8");
+  cpp_define (pfile, "__BUILTIN_AVR_POPCOUNT8");
+  cpp_define (pfile, "__BUILTIN_AVR_POW2");
 
   if (AVR_HAVE_MUL)
     {
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(Revision 175104)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -4740,9 +4740,11 @@  adjust_insn_length (rtx insn, int len)
 	      break;
 	    }
 	}
-      else if (GET_CODE (op[1]) == ASHIFT
-	  || GET_CODE (op[1]) == ASHIFTRT
-	  || GET_CODE (op[1]) == LSHIFTRT)
+      else if ((GET_CODE (op[1]) == ASHIFT
+                || GET_CODE (op[1]) == ASHIFTRT
+                || GET_CODE (op[1]) == LSHIFTRT)
+               /* Filter out "ashlqi2_1" */
+               && AND != GET_CODE (XEXP (op[1], 1)))
 	{
 	  rtx ops[10];
 	  ops[0] = op[0];
@@ -6614,7 +6616,10 @@  enum avr_builtin_id
     AVR_BUILTIN_FMUL,
     AVR_BUILTIN_FMULS,
     AVR_BUILTIN_FMULSU,
-    AVR_BUILTIN_DELAY_CYCLES
+    AVR_BUILTIN_DELAY_CYCLES,
+    AVR_BUILTIN_PARITY8,
+    AVR_BUILTIN_POPCOUNT8,
+    AVR_BUILTIN_POW2
   };
 
 #define DEF_BUILTIN(NAME, TYPE, CODE)                                   \
@@ -6665,6 +6670,12 @@  avr_init_builtins (void)
   DEF_BUILTIN ("__builtin_avr_swap", uchar_ftype_uchar, AVR_BUILTIN_SWAP);
   DEF_BUILTIN ("__builtin_avr_delay_cycles", void_ftype_ulong, 
                AVR_BUILTIN_DELAY_CYCLES);
+  DEF_BUILTIN ("__builtin_avr_parity8", uchar_ftype_uchar,
+               AVR_BUILTIN_PARITY8);
+  DEF_BUILTIN ("__builtin_avr_popcount8", uchar_ftype_uchar,
+               AVR_BUILTIN_POPCOUNT8);
+  DEF_BUILTIN ("__builtin_avr_pow2", uchar_ftype_uchar,
+               AVR_BUILTIN_POW2);
 
   if (AVR_HAVE_MUL)
     {
@@ -6694,6 +6705,9 @@  static const struct avr_builtin_descript
 bdesc_1arg[] =
   {
     { CODE_FOR_rotlqi3_4, "__builtin_avr_swap", AVR_BUILTIN_SWAP }
+    , { CODE_FOR_parityqi2, "__builtin_avr_parity8", AVR_BUILTIN_PARITY8 }
+    , { CODE_FOR_popcountqi2, "__builtin_avr_popcount8", AVR_BUILTIN_POPCOUNT8 }
+    , { CODE_FOR_ashlqi2_1, "__builtin_avr_pow2", AVR_BUILTIN_POW2 }
   };
 
 static const struct avr_builtin_description