From patchwork Fri Jun 17 18:32:57 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 100843 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 6CBEAB6FC9 for ; Sat, 18 Jun 2011 04:33:30 +1000 (EST) Received: (qmail 5460 invoked by alias); 17 Jun 2011 18:33:29 -0000 Received: (qmail 5449 invoked by uid 22791); 17 Jun 2011 18:33:27 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_NONE, TW_LQ, TW_RJ, TW_TQ X-Spam-Check-By: sourceware.org Received: from mo-p00-ob.rzone.de (HELO mo-p00-ob.rzone.de) (81.169.146.161) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 17 Jun 2011 18:33:11 +0000 X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT2k715jHQaJercGObUOFkj18odoYNahU4Q== X-RZG-CLASS-ID: mo00 Received: from [192.168.0.22] (business-188-111-022-002.static.arcor-ip.net [188.111.22.2]) by post.strato.de (cohen mo30) (RZmta 25.18) with ESMTPA id n00fe3n5HIBQuP ; Fri, 17 Jun 2011 20:32:57 +0200 (MEST) Message-ID: <4DFB9DD9.9070700@gjlay.de> Date: Fri, 17 Jun 2011 20:32:57 +0200 From: Georg-Johann Lay User-Agent: Thunderbird 2.0.0.24 (X11/20100302) MIME-Version: 1.0 To: "Joseph S. Myers" CC: gcc-patches@gcc.gnu.org, Denis Chertykov , Anatoly Sokolov , "Eric B. Weddington" , Richard Henderson Subject: Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n References: <4DFA26FE.1000400@gjlay.de> <4DFB32BA.9090009@gjlay.de> In-Reply-To: X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Joseph S. Myers schrieb: > On Fri, 17 Jun 2011, Georg-Johann Lay wrote: > >> I don't see what's bat with the patch, it's straight forward. > > C is a high-level language, defined in terms of high-level > semantics rather than machine instructions. C code should be > written where possible using machine-independent functionality, > falling into machine-dependent operations only where the semantics > cannot readily be represented in a machine-independent way; the > compiler should generally be responsible for picking optimal > instructions for the source code. > > C code should write "1 << n" (or "1 << (n & 7)" if that's what it > wants) for shifts rather than __builtin_avr_pow2. That way it's > more portable and more readable to general C programmers who aren't > familiar with all the machine-specific interfaces. Similarly, code > wanting to add values should use the "+" operator rather than each > target having its own __builtin__add. And since parity and > population-count operations are widely present and generically > useful, GNU C has generic built-in functions for those, so code > should use __builtin_parity and __builtin_popcount on all machines > rather than machine-specific variants; such machine-specific > variants should not exist. > > The machine-independent parts of the compiler should know about the > equivalence of (popcount X) and (popcount (zero_extend X)), > (parity X) and (parity (zero_extend X)) and (parity X) and (parity > (sign_extend X)) if the sign-extension is by an even number of bits > - in fact, there is already code in simplify_unary_operation_1 that > does know this (the assumption that all sign-extensions are by an > even number of bits is hardcoded). The target should just need to > describe how to code these operations on various modes. If the > existing code doesn't suffice to cause popcountqi etc. patterns to > be used for the generic built-in functions, the appropriate fix is > generic rather than adding new target-specific built-in functions - > just as if the compiler didn't generate your target's addition > instruction from the '+' operator, the right fix is not to add > __builtin__add to generate that instruction but rather to > make '+' generate it. > > > Joseph S. Myers Thanks for you detailed explanations. For popcount and parity I agree, the patch is cleaner now and there are no ad-hoc builtins needed to get the QI versions produced. Trying popcountsi2 confused be a bit: Without popcountsi2 in md, optabs expands to a function as HI = popcount (SI) as I expect from the prototype/documentation. For the expander, however, destination (operand0) must be SI in order to get it to work. With HI destination the expander is ignored. For the 1 << n type of shifts the situation on a 8-bit µC is considerably different to a PC: On bare metal with RAM of 128 bytes, 2K for the program and at your option also hard real time constraints, the situation is different and sometimes each instruction/tick/byte counts. In my programs I avoid shifting by variable offsets if possible; but some guys do it. The resulting code is a loop because AVR can only shift by one bit. Together with C spec that results in a bulky 16-bit loop, see PR29560. A typical piece of code might look like this: #define SFR (*((volatile unsigned char*) 0x2b)) void set_port (unsigned char pin) { SFR |= (1 << pin); } which compiles to set_port: in r25,43-0x20 ; 7 *movqi/4 [length = 1] ldi r18,lo8(1) ; 9 *movhi/4 [length = 2] ldi r19,hi8(1) rjmp 2f ; 10 ashlhi3/1 [length = 6] 1: lsl r18 rol r19 2: dec r24 brpl 1b or r25,r18 ; 11 iorqi3/1 [length = 1] out 43-0x20,r25 ; 13 *movqi/3 [length = 1] ret ; 21 return [length = 1] Trying to hack around that and to get at least an 8-bit loop fails. Sprinkling (unsigned char) casts won't help and writing 1 << (pin & 7) will add just another instruction (andqi3). Insn combine does not come up with more than (set (reg:HI) (ashift:HI (const_int 1) (reg:QI))) and even if a suitable pattern would show up it is not nice to clutter a backend with this sorts of code like (set (reg:QI) (subreg:QI (ashift:HI (const_int 1) (and:QI (reg:QI) (const_int 7))) 0)) or whatever. There is a ashlqi3 pattern in avr BE but I never managed to produce it (except with -mint8 which changes ABI and is not really supported any more). For the addition, some users torture poor AVR with long long. I agree that it's overkill and anyone that writes that in C must know what he is doing. Anyway, there are programmers that write such code and expect avr-gcc to come up with code like subi r18, -1 sbci r19, -1 sbci r20, -1 sbci r21, -1 sbci r22, -1 sbci r23, -1 sbci r24, -1 sbci r25, -1 for x=x+1. What they actually will get is mov r14,r18 ; 215 *movqi/1 [length = 1] inc r14 ; 24 addqi3/3 [length = 1] ldi r30,lo8(1) ; 25 *movqi/2 [length = 1] cp r14,r18 ; 26 *cmpqi/2 [length = 1] brlo .L5 ; 27 branch [length = 1] ldi r30,lo8(0) ; 28 *movqi/1 [length = 1] .L5: mov r15,r30 ; 216 *movqi/1 [length = 1] add r15,r19 ; 36 addqi3/1 [length = 1] ldi r16,lo8(1) ; 37 *movqi/2 [length = 1] cp r15,r19 ; 38 *cmpqi/2 [length = 1] brlo .L7 ; 39 branch [length = 1] ldi r16,lo8(0) ; 40 *movqi/1 [length = 1] .L7: add r16,r20 ; 50 addqi3/1 [length = 1] ldi r17,lo8(1) ; 51 *movqi/2 [length = 1] cp r16,r20 ; 52 *cmpqi/2 [length = 1] brlo .L9 ; 53 branch [length = 1] ldi r17,lo8(0) ; 54 *movqi/1 [length = 1] .L9: add r17,r21 ; 64 addqi3/1 [length = 1] ldi r27,lo8(1) ; 65 *movqi/2 [length = 1] cp r17,r21 ; 66 *cmpqi/2 [length = 1] brlo .L11 ; 67 branch [length = 1] ldi r27,lo8(0) ; 68 *movqi/1 [length = 1] .L11: add r27,r22 ; 78 addqi3/1 [length = 1] ldi r26,lo8(1) ; 79 *movqi/2 [length = 1] cp r27,r22 ; 80 *cmpqi/2 [length = 1] brlo .L13 ; 81 branch [length = 1] ldi r26,lo8(0) ; 82 *movqi/1 [length = 1] .L13: add r26,r23 ; 92 addqi3/1 [length = 1] ldi r31,lo8(1) ; 93 *movqi/2 [length = 1] cp r26,r23 ; 94 *cmpqi/2 [length = 1] brlo .L15 ; 95 branch [length = 1] ldi r31,lo8(0) ; 96 *movqi/1 [length = 1] .L15: add r31,r24 ; 106 addqi3/1 [length = 1] ldi r30,lo8(1) ; 107 *movqi/2 [length = 1] cp r31,r24 ; 108 *cmpqi/2 [length = 1] brlo .L17 ; 109 branch [length = 1] ldi r30,lo8(0) ; 110 *movqi/1 [length = 1] .L17: mov r18,r14 ; 138 *movqi/1 [length = 1] mov r19,r15 ; 139 *movqi/1 [length = 1] mov r20,r16 ; 140 *movqi/1 [length = 1] mov r21,r17 ; 141 *movqi/1 [length = 1] mov r22,r27 ; 142 *movqi/1 [length = 1] mov r23,r26 ; 143 *movqi/1 [length = 1] mov r24,r31 ; 144 *movqi/1 [length = 1] add r25,r30 ; 145 addqi3/1 [length = 1] I agree with Denis that 64-bit support in avr is too complex. However, if someone sees such code and asks what can be done about it, there is no straight forward solution (movdi in avr BE is not really straight forward), and typical reactions are complete lack of understanding and comments like "free software = scrap, use another compiler" Guys that write such code would be glad if there was something like builtin_add64... So here is my question: Would it be big deal to teach optabs to expand plus:di, minus:di as libcall? Maybe even compare is feasible? Like so: * Just test if there is a corresponding insn * if not, look if there is a corresponding optab entry * if not, lower to word_mode. To come back to the original topic, here is a tentative patch for better popcount and parity: * config/avr/t-avr (LIB1ASMFUNCS): Rename _loop_ffsqi2 to _ffsqi2_nz. * confif/avr/libgcc.S: Ditto. Rename __loop_ffsqi2 to __ffsqi2_nz. (__ctzsi2, __ctzhi2): Map zero to 255. (__popcounthi2): Use r27 instead of r30. (__popcountdi2): Use r30 instead of r27. * config/avr/avr.md (parityhi2): New expander. (popcounthi2): New expander. (popcountsi2): New expander. (*parityhi2.libgcc): New insn. (*parityqihi2.libgcc): New insn. (*popcounthi2.libgcc): New insn. (*popcountsi2.libgcc): New insn. (*popcountqi2.libgcc): New insn. (*popcountqihi2.libgcc): New insn_and_split. Johann BTW, combine tests for (parity:HI (reg:QI)) it does not try (parity:HI (zero_extend:HI (reg:QI))) Index: doc/extend.texi =================================================================== --- doc/extend.texi (Revision 175104) +++ doc/extend.texi (Arbeitskopie) @@ -8203,6 +8203,21 @@ int __builtin_avr_fmuls (char, char) int __builtin_avr_fmulsu (char, unsigned char) @end smallexample +The following built-in functions are a supplements to gcc's +@code{__builtin_parity*} resp. @code{__builtin_popcount*} built-ins +for 8-bit values and are implemented as library calls: +@smallexample +unsigned char __builtin_avr_parity8 (unsigned char) +unsigned char __builtin_avr_popcount8 (unsigned char) +@end smallexample + +@smallexample +unsigned char __builtin_avr_pow2 (unsigned char) +@end smallexample +This built-in supplies a fast, loop-free implementation for the @var{N}-th +power of two as of @code{1 << (N & 7)} that takes about seven ticks resp. +seven instructions. + In order to delay execution for a specific number of cycles, GCC implements @smallexample Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (Revision 175104) +++ config/avr/avr.md (Arbeitskopie) @@ -3321,6 +3321,62 @@ (define_insn "delay_cycles_4" [(set_attr "length" "9") (set_attr "cc" "clobber")]) +;; __builtin_avr_parity8 +(define_expand "parityqi2" + [(set (reg:QI 24) + (match_operand:QI 1 "register_operand" "")) + (set (reg:HI 24) + (zero_extend:HI (parity:QI (reg:QI 24)))) + (set (match_operand:QI 0 "register_operand" "") + (reg:QI 24))] + "" + "") + +(define_insn "*parityqihi2.libgcc" + [(set (reg:HI 24) + (zero_extend:HI (parity:QI (reg:QI 24))))] + "" + "%~call __parityqi2" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + +;; __builtin_avr_popcount8 +(define_expand "popcountqi2" + [(set (reg:QI 24) + (match_operand:QI 1 "register_operand" "")) + (set (reg:QI 24) + (popcount:QI (reg:QI 24))) + (set (match_operand:QI 0 "register_operand" "") + (reg:QI 24))] + "" + "") + +(define_insn "*popcountqi2.libgcc" + [(set (reg:QI 24) + (popcount:QI (reg:QI 24)))] + "" + "%~call __popcountqi2" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + +;; __builtin_avr_pow2 +;; $0 = (1 << ($1 & 7)) +(define_insn "ashlqi2_1" + [(set (match_operand:QI 0 "register_operand" "=&d") + (ashift:QI (const_int 1) + (and:QI (match_operand:QI 1 "register_operand" "r") + (const_int 7))))] + "" + "ldi %0, 1 + sbrc %1, 1 + ldi %0, 4 + sbrc %1, 0 + lsl %0 + sbrc %1, 2 + swap %0" + [(set_attr "length" "7") + (set_attr "cc" "clobber")]) + ;; CPU instructions ;; NOP taking 1 or 2 Ticks Index: config/avr/avr-c.c =================================================================== --- config/avr/avr-c.c (Revision 175104) +++ config/avr/avr-c.c (Arbeitskopie) @@ -93,6 +93,9 @@ avr_cpu_cpp_builtins (struct cpp_reader cpp_define (pfile, "__BUILTIN_AVR_SLEEP"); cpp_define (pfile, "__BUILTIN_AVR_SWAP"); cpp_define (pfile, "__BUILTIN_AVR_DELAY_CYCLES"); + cpp_define (pfile, "__BUILTIN_AVR_PARITY8"); + cpp_define (pfile, "__BUILTIN_AVR_POPCOUNT8"); + cpp_define (pfile, "__BUILTIN_AVR_POW2"); if (AVR_HAVE_MUL) { Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (Revision 175104) +++ config/avr/avr.c (Arbeitskopie) @@ -4740,9 +4740,11 @@ adjust_insn_length (rtx insn, int len) break; } } - else if (GET_CODE (op[1]) == ASHIFT - || GET_CODE (op[1]) == ASHIFTRT - || GET_CODE (op[1]) == LSHIFTRT) + else if ((GET_CODE (op[1]) == ASHIFT + || GET_CODE (op[1]) == ASHIFTRT + || GET_CODE (op[1]) == LSHIFTRT) + /* Filter out "ashlqi2_1" */ + && AND != GET_CODE (XEXP (op[1], 1))) { rtx ops[10]; ops[0] = op[0]; @@ -6614,7 +6616,10 @@ enum avr_builtin_id AVR_BUILTIN_FMUL, AVR_BUILTIN_FMULS, AVR_BUILTIN_FMULSU, - AVR_BUILTIN_DELAY_CYCLES + AVR_BUILTIN_DELAY_CYCLES, + AVR_BUILTIN_PARITY8, + AVR_BUILTIN_POPCOUNT8, + AVR_BUILTIN_POW2 }; #define DEF_BUILTIN(NAME, TYPE, CODE) \ @@ -6665,6 +6670,12 @@ avr_init_builtins (void) DEF_BUILTIN ("__builtin_avr_swap", uchar_ftype_uchar, AVR_BUILTIN_SWAP); DEF_BUILTIN ("__builtin_avr_delay_cycles", void_ftype_ulong, AVR_BUILTIN_DELAY_CYCLES); + DEF_BUILTIN ("__builtin_avr_parity8", uchar_ftype_uchar, + AVR_BUILTIN_PARITY8); + DEF_BUILTIN ("__builtin_avr_popcount8", uchar_ftype_uchar, + AVR_BUILTIN_POPCOUNT8); + DEF_BUILTIN ("__builtin_avr_pow2", uchar_ftype_uchar, + AVR_BUILTIN_POW2); if (AVR_HAVE_MUL) { @@ -6694,6 +6705,9 @@ static const struct avr_builtin_descript bdesc_1arg[] = { { CODE_FOR_rotlqi3_4, "__builtin_avr_swap", AVR_BUILTIN_SWAP } + , { CODE_FOR_parityqi2, "__builtin_avr_parity8", AVR_BUILTIN_PARITY8 } + , { CODE_FOR_popcountqi2, "__builtin_avr_popcount8", AVR_BUILTIN_POPCOUNT8 } + , { CODE_FOR_ashlqi2_1, "__builtin_avr_pow2", AVR_BUILTIN_POW2 } }; static const struct avr_builtin_description