From patchwork Wed Jul 27 13:21:42 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 107061 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 65870B6F68 for ; Wed, 27 Jul 2011 23:22:48 +1000 (EST) Received: (qmail 12087 invoked by alias); 27 Jul 2011 13:22:46 -0000 Received: (qmail 12072 invoked by uid 22791); 27 Jul 2011 13:22:41 -0000 X-SWARE-Spam-Status: No, hits=0.1 required=5.0 tests=AWL, BAYES_50, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_NONE, TW_DQ, TW_IV, TW_LQ, TW_OV, TW_RJ X-Spam-Check-By: sourceware.org Received: from mo-p00-ob.rzone.de (HELO mo-p00-ob.rzone.de) (81.169.146.162) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 27 Jul 2011 13:22:20 +0000 X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT2k715jHQaJercGObUOFkj18odoYNahU4Q== X-RZG-CLASS-ID: mo00 Received: from [192.168.0.22] (business-188-111-022-002.static.arcor-ip.net [188.111.22.2]) by smtp.strato.de (klopstock mo34) (RZmta 26.2) with ESMTPA id j00c5cn6RCJQ7J ; Wed, 27 Jul 2011 15:21:43 +0200 (MEST) Message-ID: <4E3010E6.8020402@gjlay.de> Date: Wed, 27 Jul 2011 15:21:42 +0200 From: Georg-Johann Lay User-Agent: Thunderbird 2.0.0.24 (X11/20100302) MIME-Version: 1.0 To: "Weddington, Eric" CC: gcc-patches@gcc.gnu.org, Anatoly Sokolov , Denis Chertykov , Richard Henderson Subject: Re: [Patch,AVR]: PR49687 (better widening 32-bit mul) References: <4E2D3821.3090007@gjlay.de> <8D64F155F1C88743BFDC71288E8E2DA8032C1E5F@csomb01.corp.atmel.com> <4E2D99CF.3050004@gjlay.de> <8D64F155F1C88743BFDC71288E8E2DA8032C2159@csomb01.corp.atmel.com> In-Reply-To: <8D64F155F1C88743BFDC71288E8E2DA8032C2159@csomb01.corp.atmel.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02113.html Weddington, Eric wrote: > >> -----Original Message----- >> From: Georg-Johann Lay >> >> This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18). >> >> If all functions are used they occupy 116 bytes (-4), so they actually >> save a little space if they are used all with the benefit that they also >> can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for >> small (17 bit signed) constants. >> >> __umulhisi3 reads: >> >> DEFUN __umulhisi3 >> mul A0, B0 >> movw C0, r0 >> mul A1, B1 >> movw C2, r0 >> mul A0, B1 >> add C1, r0 >> adc C2, r1 >> clr __zero_reg__ >> adc C3, __zero_reg__ >> mul A1, B0 >> add C1, r0 >> adc C2, r1 >> clr __zero_reg__ >> adc C3, __zero_reg__ >> ret >> ENDF __umulhisi3 >> >> It could be compressed to the following sequence, i.e. >> 24 bytes instead of 30, but I think that's too much of >> quenching the last byte out of the code: >> >> DEFUN __umulhisi3 >> mul A0, B0 >> movw C0, r0 >> mul A1, B1 >> movw C2, r0 >> mul A0, B1 >> rcall 1f >> mul A1, B0 >> 1: add C1, r0 >> adc C2, r1 >> clr __zero_reg__ >> adc C3, __zero_reg__ >> ret >> ENDF __umulhisi3 >> >> >> In that lack of real-world-code that uses 32-bit arithmetic I trust >> my intuition that code size will decrease in general ;-) >> > > Hi Johann, > > I would agree with you that it seems that overall code size will decrease in general. > > However, I also like your creative compression in the second sequence above, and I think that it would be best to implement that sequence and try to find others like that where possible. > > Remember that to AVR users, code size is *everything*. Even saving 6 bytes here or there has a positive effect. > > I'll let Richard (or Denis if he's back from vacation) do the actual approval of the patch, as they are a lot more technically competent in this area. But I'm ok with the general tactic of the code reuse with looking at further ways to reduce code size like the example above. > > Eric Weddington This is a revised patch for review with the changes proposed by Eric, i.e. __umulhisi3 is calling it's own tail. A pure __mulsi3 will now cost 30+24+20 = 74 bytes (+12). Using all functions will cost 110 bytes (-10). __mulsi3 missed a final ENDF __mulsi3, I added it. The rest of the patch is just technical: * postponing emit of implicit library call from expand to split1, i.e. after combiner but prior to reload, of course. * The patch covers QI->SI extensions where such extensions are done in two steps: First an explicit QI-HI extension expanded inline and second the implicit HI->SI extension as by, e.g. __muluhisi3 (32 = 16 * 32) * There is a bunch of possible HI/QI combinations. This is done with help of code iterators; the cross product covers all 16 cases of QI->SI resp. HI->SI as signed resp. unsigned extension for operand1 resp. operand2. * extendhisi2 need not to early-clobber the output because HI will always start in even register. Tested without regressions. Ok to install? Johann PR target/49687 * config/avr/t-avr (LIB1ASMFUNCS): Remove _xmulhisi3_exit. Add _muluhisi3, _mulshisi3, _usmulhisi3. * config/avr/libgcc.S (__mulsi3): Rewrite. (__mulhisi3): Rewrite. (__umulhisi3): Rewrite. (__usmulhisi3): New. (__muluhisi3): New. (__mulshisi3): New. (__mulohisi3): New. (__mulqi3, __mulqihi3, __umulqihi3, __mulhi3): Use DEFUN/ENDF to declare. * config/avr/predicates.md (pseudo_register_operand): Rewrite. (pseudo_register_or_const_int_operand): New. (combine_pseudo_register_operand): New. (u16_operand): New. (s16_operand): New. (o16_operand): New. * config/avr/avr.c (avr_rtx_costs): Handle costs for mult:SI. * config/avr/avr.md (QIHI, QIHI2): New mode iterators. (any_extend, any_extend2): New code iterators. (extend_prefix): New code attribute. (mulsi3): Rewrite. Turn insn to expander. (mulhisi3): Ditto. (umulhisi3): Ditto. (usmulhisi3): New expander. (*mulsi3): New insn-and-split. (mulusi3): New insn-and-split. (mulssi3): New insn-and-split. (mulohisi3): New insn-and-split. (*uumulqihisi3, *uumulhiqisi3, *uumulhihisi3, *uumulqiqisi3, *usmulqihisi3, *usmulhiqisi3, *usmulhihisi3, *usmulqiqisi3, *sumulqihisi3, *sumulhiqisi3, *sumulhihisi3, *sumulqiqisi3, *ssmulqihisi3, *ssmulhiqisi3, *ssmulhihisi3, *ssmulqiqisi3): New insn-and-split. (*mulsi3_call): Rewrite. (*mulhisi3_call): Rewrite. (*umulhisi3_call): Rewrite. (*usmulhisi3_call): New insn. (*muluhisi3_call): New insn. (*mulshisi3_call): New insn. (*mulohisi3_call): New insn. (extendqihi2): Use combine_pseudo_register_operand as predicate for operand 1. (extendqisi2): Ditto. (zero_extendqihi2): Ditto. (zero_extendqisi2): Ditto. (zero_extendhisi2): Ditto. (extendhisi2): Ditto. Don't early-clobber operand 0. Index: config/avr/predicates.md =================================================================== --- config/avr/predicates.md (revision 176818) +++ config/avr/predicates.md (working copy) @@ -155,10 +155,34 @@ (define_predicate "call_insn_operand" (ior (match_test "register_operand (XEXP (op, 0), mode)") (match_test "CONSTANT_ADDRESS_P (XEXP (op, 0))")))) +;; For some insns we must ensure that no hard register is inserted +;; into their operands because the insns are split and the split +;; involves hard registers. An example are divmod insn that are +;; split to insns that represent implicit library calls. + ;; True for register that is pseudo register. (define_predicate "pseudo_register_operand" - (and (match_code "reg") - (match_test "!HARD_REGISTER_P (op)"))) + (and (match_operand 0 "register_operand") + (not (and (match_code "reg") + (match_test "HARD_REGISTER_P (op)"))))) + +;; True for operand that is pseudo register or CONST_INT. +(define_predicate "pseudo_register_or_const_int_operand" + (ior (match_operand 0 "const_int_operand") + (match_operand 0 "pseudo_register_operand"))) + +;; We keep combiner from inserting hard registers into the input of sign- and +;; zero-extends. A hard register in the input operand is not wanted because +;; 32-bit multiply patterns clobber some hard registers and extends with a +;; hard register that overlaps these clobbers won't combine to a widening +;; multiplication. There is no need for combine to propagate or insert +;; hard registers, register allocation can do it just as well. + +;; True for operand that is pseudo register at combine time. +(define_predicate "combine_pseudo_register_operand" + (ior (match_operand 0 "pseudo_register_operand") + (and (match_operand 0 "register_operand") + (match_test "reload_completed || reload_in_progress")))) ;; Return true if OP is a constant integer that is either ;; 8 or 16 or 24. @@ -189,3 +213,18 @@ (define_predicate "s9_operand" (define_predicate "register_or_s9_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "s9_operand"))) + +;; Unsigned CONST_INT that fits in 16 bits, i.e. 0..65536. +(define_predicate "u16_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, (1<<16)-1)"))) + +;; Signed CONST_INT that fits in 16 bits, i.e. -32768..32767. +(define_predicate "s16_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), -(1<<15), (1<<15)-1)"))) + +;; One-extended CONST_INT that fits in 16 bits, i.e. -65536..-1. +(define_predicate "o16_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), -(1<<16), -1)"))) Index: config/avr/libgcc.S =================================================================== --- config/avr/libgcc.S (revision 176818) +++ config/avr/libgcc.S (working copy) @@ -72,10 +72,11 @@ see the files COPYING3 and COPYING.RUNTI .endm +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; /* Note: mulqi3, mulhi3 are open-coded on the enhanced core. */ #if !defined (__AVR_HAVE_MUL__) /******************************************************* - Multiplication 8 x 8 + Multiplication 8 x 8 without MUL *******************************************************/ #if defined (L_mulqi3) @@ -83,9 +84,7 @@ see the files COPYING3 and COPYING.RUNTI #define r_arg1 r24 /* multiplier */ #define r_res __tmp_reg__ /* result */ - .global __mulqi3 - .func __mulqi3 -__mulqi3: +DEFUN __mulqi3 clr r_res ; clear result __mulqi3_loop: sbrc r_arg1,0 @@ -97,18 +96,16 @@ __mulqi3_loop: __mulqi3_exit: mov r_arg1,r_res ; result to return register ret +ENDF __mulqi3 #undef r_arg2 #undef r_arg1 #undef r_res -.endfunc #endif /* defined (L_mulqi3) */ #if defined (L_mulqihi3) - .global __mulqihi3 - .func __mulqihi3 -__mulqihi3: +DEFUN __mulqihi3 clr r25 sbrc r24, 7 dec r25 @@ -116,21 +113,19 @@ __mulqihi3: sbrc r22, 7 dec r22 rjmp __mulhi3 - .endfunc +ENDF __mulqihi3: #endif /* defined (L_mulqihi3) */ #if defined (L_umulqihi3) - .global __umulqihi3 - .func __umulqihi3 -__umulqihi3: +DEFUN __umulqihi3 clr r25 clr r23 rjmp __mulhi3 - .endfunc +ENDF __umulqihi3 #endif /* defined (L_umulqihi3) */ /******************************************************* - Multiplication 16 x 16 + Multiplication 16 x 16 without MUL *******************************************************/ #if defined (L_mulhi3) #define r_arg1L r24 /* multiplier Low */ @@ -140,9 +135,7 @@ __umulqihi3: #define r_resL __tmp_reg__ /* result Low */ #define r_resH r21 /* result High */ - .global __mulhi3 - .func __mulhi3 -__mulhi3: +DEFUN __mulhi3 clr r_resH ; clear result clr r_resL ; clear result __mulhi3_loop: @@ -166,6 +159,7 @@ __mulhi3_exit: mov r_arg1H,r_resH ; result to return register mov r_arg1L,r_resL ret +ENDF __mulhi3 #undef r_arg1L #undef r_arg1H @@ -174,168 +168,51 @@ __mulhi3_exit: #undef r_resL #undef r_resH -.endfunc #endif /* defined (L_mulhi3) */ -#endif /* !defined (__AVR_HAVE_MUL__) */ /******************************************************* - Widening Multiplication 32 = 16 x 16 + Widening Multiplication 32 = 16 x 16 without MUL *******************************************************/ - + #if defined (L_mulhisi3) DEFUN __mulhisi3 -#if defined (__AVR_HAVE_MUL__) - -;; r25:r22 = r19:r18 * r21:r20 - -#define A0 18 -#define B0 20 -#define C0 22 - -#define A1 A0+1 -#define B1 B0+1 -#define C1 C0+1 -#define C2 C0+2 -#define C3 C0+3 - - ; C = (signed)A1 * (signed)B1 - muls A1, B1 - movw C2, R0 - - ; C += A0 * B0 - mul A0, B0 - movw C0, R0 - - ; C += (signed)A1 * B0 - mulsu A1, B0 - sbci C3, 0 - add C1, R0 - adc C2, R1 - clr __zero_reg__ - adc C3, __zero_reg__ - - ; C += (signed)B1 * A0 - mulsu B1, A0 - sbci C3, 0 - XJMP __xmulhisi3_exit - -#undef A0 -#undef A1 -#undef B0 -#undef B1 -#undef C0 -#undef C1 -#undef C2 -#undef C3 - -#else /* !__AVR_HAVE_MUL__ */ ;;; FIXME: This is dead code (noone calls it) - mov_l r18, r24 - mov_h r19, r25 - clr r24 - sbrc r23, 7 - dec r24 - mov r25, r24 - clr r20 - sbrc r19, 7 - dec r20 - mov r21, r20 - XJMP __mulsi3 -#endif /* __AVR_HAVE_MUL__ */ + mov_l r18, r24 + mov_h r19, r25 + clr r24 + sbrc r23, 7 + dec r24 + mov r25, r24 + clr r20 + sbrc r19, 7 + dec r20 + mov r21, r20 + XJMP __mulsi3 ENDF __mulhisi3 #endif /* defined (L_mulhisi3) */ #if defined (L_umulhisi3) DEFUN __umulhisi3 -#if defined (__AVR_HAVE_MUL__) - -;; r25:r22 = r19:r18 * r21:r20 - -#define A0 18 -#define B0 20 -#define C0 22 - -#define A1 A0+1 -#define B1 B0+1 -#define C1 C0+1 -#define C2 C0+2 -#define C3 C0+3 - - ; C = A1 * B1 - mul A1, B1 - movw C2, R0 - - ; C += A0 * B0 - mul A0, B0 - movw C0, R0 - - ; C += A1 * B0 - mul A1, B0 - add C1, R0 - adc C2, R1 - clr __zero_reg__ - adc C3, __zero_reg__ - - ; C += B1 * A0 - mul B1, A0 - XJMP __xmulhisi3_exit - -#undef A0 -#undef A1 -#undef B0 -#undef B1 -#undef C0 -#undef C1 -#undef C2 -#undef C3 - -#else /* !__AVR_HAVE_MUL__ */ ;;; FIXME: This is dead code (noone calls it) - mov_l r18, r24 - mov_h r19, r25 - clr r24 - clr r25 - clr r20 - clr r21 - XJMP __mulsi3 -#endif /* __AVR_HAVE_MUL__ */ + mov_l r18, r24 + mov_h r19, r25 + clr r24 + clr r25 + mov_l r20, r24 + mov_h r21, r25 + XJMP __mulsi3 ENDF __umulhisi3 #endif /* defined (L_umulhisi3) */ -#if defined (L_xmulhisi3_exit) - -;;; Helper for __mulhisi3 resp. __umulhisi3. - -#define C0 22 -#define C1 C0+1 -#define C2 C0+2 -#define C3 C0+3 - -DEFUN __xmulhisi3_exit - add C1, R0 - adc C2, R1 - clr __zero_reg__ - adc C3, __zero_reg__ - ret -ENDF __xmulhisi3_exit - -#undef C0 -#undef C1 -#undef C2 -#undef C3 - -#endif /* defined (L_xmulhisi3_exit) */ - #if defined (L_mulsi3) /******************************************************* - Multiplication 32 x 32 + Multiplication 32 x 32 without MUL *******************************************************/ #define r_arg1L r22 /* multiplier Low */ #define r_arg1H r23 #define r_arg1HL r24 #define r_arg1HH r25 /* multiplier High */ - #define r_arg2L r18 /* multiplicand Low */ #define r_arg2H r19 #define r_arg2HL r20 @@ -346,43 +223,7 @@ ENDF __xmulhisi3_exit #define r_resHL r30 #define r_resHH r31 /* result High */ - - .global __mulsi3 - .func __mulsi3 -__mulsi3: -#if defined (__AVR_HAVE_MUL__) - mul r_arg1L, r_arg2L - movw r_resL, r0 - mul r_arg1H, r_arg2H - movw r_resHL, r0 - mul r_arg1HL, r_arg2L - add r_resHL, r0 - adc r_resHH, r1 - mul r_arg1L, r_arg2HL - add r_resHL, r0 - adc r_resHH, r1 - mul r_arg1HH, r_arg2L - add r_resHH, r0 - mul r_arg1HL, r_arg2H - add r_resHH, r0 - mul r_arg1H, r_arg2HL - add r_resHH, r0 - mul r_arg1L, r_arg2HH - add r_resHH, r0 - clr r_arg1HH ; use instead of __zero_reg__ to add carry - mul r_arg1H, r_arg2L - add r_resH, r0 - adc r_resHL, r1 - adc r_resHH, r_arg1HH ; add carry - mul r_arg1L, r_arg2H - add r_resH, r0 - adc r_resHL, r1 - adc r_resHH, r_arg1HH ; add carry - movw r_arg1L, r_resL - movw r_arg1HL, r_resHL - clr r1 ; __zero_reg__ clobbered by "mul" - ret -#else +DEFUN __mulsi3 clr r_resHH ; clear result clr r_resHL ; clear result clr r_resH ; clear result @@ -414,13 +255,13 @@ __mulsi3_exit: mov_h r_arg1H,r_resH mov_l r_arg1L,r_resL ret -#endif /* defined (__AVR_HAVE_MUL__) */ +ENDF __mulsi3 + #undef r_arg1L #undef r_arg1H #undef r_arg1HL #undef r_arg1HH - #undef r_arg2L #undef r_arg2H #undef r_arg2HL @@ -431,9 +272,181 @@ __mulsi3_exit: #undef r_resHL #undef r_resHH -.endfunc #endif /* defined (L_mulsi3) */ + +#endif /* !defined (__AVR_HAVE_MUL__) */ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +#if defined (__AVR_HAVE_MUL__) +#define A0 26 +#define B0 18 +#define C0 22 + +#define A1 A0+1 + +#define B1 B0+1 +#define B2 B0+2 +#define B3 B0+3 + +#define C1 C0+1 +#define C2 C0+2 +#define C3 C0+3 + +/******************************************************* + Widening Multiplication 32 = 16 x 16 +*******************************************************/ + +#if defined (L_mulhisi3) +;;; R25:R22 = (signed long) R27:R26 * (signed long) R19:R18 +;;; C3:C0 = (signed long) A1:A0 * (signed long) B1:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __mulhisi3 + XCALL __umulhisi3 + ;; Sign-extend B + tst B1 + brpl 1f + sub C2, A0 + sbc C3, A1 +1: ;; Sign-extend A + XJMP __usmulhisi3_tail +ENDF __mulhisi3 +#endif /* L_mulhisi3 */ + +#if defined (L_usmulhisi3) +;;; R25:R22 = (signed long) R27:R26 * (unsigned long) R19:R18 +;;; C3:C0 = (signed long) A1:A0 * (unsigned long) B1:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __usmulhisi3 + XCALL __umulhisi3 + ;; FALLTHRU +ENDF __usmulhisi3 + +DEFUN __usmulhisi3_tail + ;; Sign-extend A + sbrs A1, 7 + ret + sub C2, B0 + sbc C3, B1 + ret +ENDF __usmulhisi3_tail +#endif /* L_usmulhisi3 */ + +#if defined (L_umulhisi3) +;;; R25:R22 = (unsigned long) R27:R26 * (unsigned long) R19:R18 +;;; C3:C0 = (unsigned long) A1:A0 * (unsigned long) B1:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __umulhisi3 + mul A0, B0 + movw C0, r0 + mul A1, B1 + movw C2, r0 + mul A0, B1 + rcall 1f + mul A1, B0 +1: add C1, r0 + adc C2, r1 + clr __zero_reg__ + adc C3, __zero_reg__ + ret +ENDF __umulhisi3 +#endif /* L_umulhisi3 */ + +/******************************************************* + Widening Multiplication 32 = 16 x 32 +*******************************************************/ + +#if defined (L_mulshisi3) +;;; R25:R22 = (signed long) R27:R26 * R21:R18 +;;; (C3:C0) = (signed long) A1:A0 * B3:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __mulshisi3 +#ifdef __AVR_HAVE_JMP_CALL__ + ;; Some cores have problem skipping 2-word instruction + tst A1 + brmi __mulohisi3 +#else + sbrs A1, 7 +#endif /* __AVR_HAVE_JMP_CALL__ */ + XJMP __muluhisi3 + ;; FALLTHRU +ENDF __mulshisi3 + +;;; R25:R22 = (one-extended long) R27:R26 * R21:R18 +;;; (C3:C0) = (one-extended long) A1:A0 * B3:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __mulohisi3 + XCALL __muluhisi3 + ;; One-extend R27:R26 (A1:A0) + sub C2, B0 + sbc C3, B1 + ret +ENDF __mulohisi3 +#endif /* L_mulshisi3 */ + +#if defined (L_muluhisi3) +;;; R25:R22 = (unsigned long) R27:R26 * R21:R18 +;;; (C3:C0) = (unsigned long) A1:A0 * B3:B0 +;;; Clobbers: __tmp_reg__ +DEFUN __muluhisi3 + XCALL __umulhisi3 + mul A0, B3 + add C3, r0 + mul A1, B2 + add C3, r0 + mul A0, B2 + add C2, r0 + adc C3, r1 + clr __zero_reg__ + ret +ENDF __muluhisi3 +#endif /* L_muluhisi3 */ + +/******************************************************* + Multiplication 32 x 32 +*******************************************************/ + +#if defined (L_mulsi3) +;;; R25:R22 = R25:R22 * R21:R18 +;;; (C3:C0) = C3:C0 * B3:B0 +;;; Clobbers: R26, R27, __tmp_reg__ +DEFUN __mulsi3 + movw A0, C0 + push C2 + push C3 + XCALL __muluhisi3 + pop A1 + pop A0 + ;; A1:A0 now contains the high word of A + mul A0, B0 + add C2, r0 + adc C3, r1 + mul A0, B1 + add C3, r0 + mul A1, B0 + add C3, r0 + clr __zero_reg__ + ret +ENDF __mulsi3 +#endif /* L_mulsi3 */ + +#undef A0 +#undef A1 + +#undef B0 +#undef B1 +#undef B2 +#undef B3 + +#undef C0 +#undef C1 +#undef C2 +#undef C3 + +#endif /* __AVR_HAVE_MUL__ */ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + /******************************************************* Division 8 / 8 => (result + remainder) *******************************************************/ Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 176818) +++ config/avr/avr.md (working copy) @@ -126,12 +126,25 @@ (define_attr "length" "" (const_int 2))] (const_int 2))) -;; Define mode iterator +;; Define mode iterators +(define_mode_iterator QIHI [(QI "") (HI "")]) +(define_mode_iterator QIHI2 [(QI "") (HI "")]) (define_mode_iterator QISI [(QI "") (HI "") (SI "")]) (define_mode_iterator QIDI [(QI "") (HI "") (SI "") (DI "")]) (define_mode_iterator HIDI [(HI "") (SI "") (DI "")]) (define_mode_iterator HISI [(HI "") (SI "")]) +;; Define code iterators +;; Define two incarnations so that we can build the cross product. +(define_code_iterator any_extend [sign_extend zero_extend]) +(define_code_iterator any_extend2 [sign_extend zero_extend]) + +;; Define code attributes +(define_code_attr extend_prefix + [(sign_extend "s") + (zero_extend "u")]) + + ;;======================================================================== ;; The following is used by nonlocal_goto and setjmp. ;; The receiver pattern will create no instructions since internally @@ -1349,69 +1362,310 @@ (define_insn "*mulhi3_call" ;; Operand 2 (reg:SI 18) not clobbered on the enhanced core. ;; All call-used registers clobbered otherwise - normal library call. +;; To support widening multiplicatioon with constant we postpone +;; expanding to the implicit library call until post combine and +;; prior to register allocation. Clobber all hard registers that +;; might be used by the (widening) multiply until it is split and +;; it's final register footprint is worked out. + (define_expand "mulsi3" - [(set (reg:SI 22) (match_operand:SI 1 "register_operand" "")) - (set (reg:SI 18) (match_operand:SI 2 "register_operand" "")) - (parallel [(set (reg:SI 22) (mult:SI (reg:SI 22) (reg:SI 18))) - (clobber (reg:HI 26)) - (clobber (reg:HI 30))]) - (set (match_operand:SI 0 "register_operand" "") (reg:SI 22))] + [(parallel [(set (match_operand:SI 0 "register_operand" "") + (mult:SI (match_operand:SI 1 "register_operand" "") + (match_operand:SI 2 "nonmemory_operand" ""))) + (clobber (reg:DI 18))])] "AVR_HAVE_MUL" - "") + { + if (u16_operand (operands[2], SImode)) + { + operands[2] = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + emit_insn (gen_muluhisi3 (operands[0], operands[2], operands[1])); + DONE; + } -(define_insn "*mulsi3_call" - [(set (reg:SI 22) (mult:SI (reg:SI 22) (reg:SI 18))) - (clobber (reg:HI 26)) - (clobber (reg:HI 30))] - "AVR_HAVE_MUL" - "%~call __mulsi3" - [(set_attr "type" "xcall") - (set_attr "cc" "clobber")]) + if (o16_operand (operands[2], SImode)) + { + operands[2] = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + emit_insn (gen_mulohisi3 (operands[0], operands[2], operands[1])); + DONE; + } + }) -(define_expand "mulhisi3" - [(set (reg:HI 18) - (match_operand:HI 1 "register_operand" "")) - (set (reg:HI 20) - (match_operand:HI 2 "register_operand" "")) +(define_insn_and_split "*mulsi3" + [(set (match_operand:SI 0 "pseudo_register_operand" "=r") + (mult:SI (match_operand:SI 1 "pseudo_register_operand" "r") + (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn"))) + (clobber (reg:DI 18))] + "AVR_HAVE_MUL && !reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (reg:SI 18) + (match_dup 1)) (set (reg:SI 22) - (mult:SI (sign_extend:SI (reg:HI 18)) - (sign_extend:SI (reg:HI 20)))) - (set (match_operand:SI 0 "register_operand" "") + (match_dup 2)) + (parallel [(set (reg:SI 22) + (mult:SI (reg:SI 22) + (reg:SI 18))) + (clobber (reg:HI 26))]) + (set (match_dup 0) + (reg:SI 22))] + { + if (u16_operand (operands[2], SImode)) + { + operands[2] = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + emit_insn (gen_muluhisi3 (operands[0], operands[2], operands[1])); + DONE; + } + + if (o16_operand (operands[2], SImode)) + { + operands[2] = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + emit_insn (gen_mulohisi3 (operands[0], operands[2], operands[1])); + DONE; + } + }) + +;; "muluqisi3" +;; "muluhisi3" +(define_insn_and_split "mulusi3" + [(set (match_operand:SI 0 "pseudo_register_operand" "=r") + (mult:SI (zero_extend:SI (match_operand:QIHI 1 "pseudo_register_operand" "r")) + (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn"))) + (clobber (reg:DI 18))] + "AVR_HAVE_MUL && !reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (reg:HI 26) + (match_dup 1)) + (set (reg:SI 18) + (match_dup 2)) + (set (reg:SI 22) + (mult:SI (zero_extend:SI (reg:HI 26)) + (reg:SI 18))) + (set (match_dup 0) + (reg:SI 22))] + { + /* Do the QI -> HI extension explicitely before the multiplication. */ + /* Do the HI -> SI extension implicitely and after the multiplication. */ + + if (QImode == mode) + operands[1] = gen_rtx_ZERO_EXTEND (HImode, operands[1]); + + if (u16_operand (operands[2], SImode)) + { + operands[1] = force_reg (HImode, operands[1]); + operands[2] = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + emit_insn (gen_umulhisi3 (operands[0], operands[1], operands[2])); + DONE; + } + }) + +;; "mulsqisi3" +;; "mulshisi3" +(define_insn_and_split "mulssi3" + [(set (match_operand:SI 0 "pseudo_register_operand" "=r") + (mult:SI (sign_extend:SI (match_operand:QIHI 1 "pseudo_register_operand" "r")) + (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn"))) + (clobber (reg:DI 18))] + "AVR_HAVE_MUL && !reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (reg:HI 26) + (match_dup 1)) + (set (reg:SI 18) + (match_dup 2)) + (set (reg:SI 22) + (mult:SI (sign_extend:SI (reg:HI 26)) + (reg:SI 18))) + (set (match_dup 0) + (reg:SI 22))] + { + /* Do the QI -> HI extension explicitely before the multiplication. */ + /* Do the HI -> SI extension implicitely and after the multiplication. */ + + if (QImode == mode) + operands[1] = gen_rtx_SIGN_EXTEND (HImode, operands[1]); + + if (u16_operand (operands[2], SImode) + || s16_operand (operands[2], SImode)) + { + rtx xop2 = force_reg (HImode, gen_int_mode (INTVAL (operands[2]), HImode)); + + operands[1] = force_reg (HImode, operands[1]); + + if (u16_operand (operands[2], SImode)) + emit_insn (gen_usmulhisi3 (operands[0], xop2, operands[1])); + else + emit_insn (gen_mulhisi3 (operands[0], operands[1], xop2)); + + DONE; + } + }) + +;; One-extend operand 1 + +(define_insn_and_split "mulohisi3" + [(set (match_operand:SI 0 "pseudo_register_operand" "=r") + (mult:SI (not:SI (zero_extend:SI + (not:HI (match_operand:HI 1 "pseudo_register_operand" "r")))) + (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn"))) + (clobber (reg:DI 18))] + "AVR_HAVE_MUL && !reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (reg:HI 26) + (match_dup 1)) + (set (reg:SI 18) + (match_dup 2)) + (set (reg:SI 22) + (mult:SI (not:SI (zero_extend:SI (not:HI (reg:HI 26)))) + (reg:SI 18))) + (set (match_dup 0) (reg:SI 22))] + "") + +(define_expand "mulhisi3" + [(parallel [(set (match_operand:SI 0 "register_operand" "") + (mult:SI (sign_extend:SI (match_operand:HI 1 "register_operand" "")) + (sign_extend:SI (match_operand:HI 2 "register_operand" "")))) + (clobber (reg:DI 18))])] "AVR_HAVE_MUL" "") (define_expand "umulhisi3" + [(parallel [(set (match_operand:SI 0 "register_operand" "") + (mult:SI (zero_extend:SI (match_operand:HI 1 "register_operand" "")) + (zero_extend:SI (match_operand:HI 2 "register_operand" "")))) + (clobber (reg:DI 18))])] + "AVR_HAVE_MUL" + "") + +(define_expand "usmulhisi3" + [(parallel [(set (match_operand:SI 0 "register_operand" "") + (mult:SI (zero_extend:SI (match_operand:HI 1 "register_operand" "")) + (sign_extend:SI (match_operand:HI 2 "register_operand" "")))) + (clobber (reg:DI 18))])] + "AVR_HAVE_MUL" + "") + +;; "*uumulqihisi3" "*uumulhiqisi3" "*uumulhihisi3" "*uumulqiqisi3" +;; "*usmulqihisi3" "*usmulhiqisi3" "*usmulhihisi3" "*usmulqiqisi3" +;; "*sumulqihisi3" "*sumulhiqisi3" "*sumulhihisi3" "*sumulqiqisi3" +;; "*ssmulqihisi3" "*ssmulhiqisi3" "*ssmulhihisi3" "*ssmulqiqisi3" +(define_insn_and_split + "*mulsi3" + [(set (match_operand:SI 0 "pseudo_register_operand" "=r") + (mult:SI (any_extend:SI (match_operand:QIHI 1 "pseudo_register_operand" "r")) + (any_extend2:SI (match_operand:QIHI2 2 "pseudo_register_operand" "r")))) + (clobber (reg:DI 18))] + "AVR_HAVE_MUL && !reload_completed" + { gcc_unreachable(); } + "&& 1" [(set (reg:HI 18) - (match_operand:HI 1 "register_operand" "")) - (set (reg:HI 20) - (match_operand:HI 2 "register_operand" "")) - (set (reg:SI 22) - (mult:SI (zero_extend:SI (reg:HI 18)) - (zero_extend:SI (reg:HI 20)))) - (set (match_operand:SI 0 "register_operand" "") + (match_dup 1)) + (set (reg:HI 26) + (match_dup 2)) + (set (reg:SI 22) + (mult:SI (match_dup 3) + (match_dup 4))) + (set (match_dup 0) (reg:SI 22))] + { + rtx xop1 = operands[1]; + rtx xop2 = operands[2]; + + /* Do the QI -> HI extension explicitely before the multiplication. */ + /* Do the HI -> SI extension implicitely and after the multiplication. */ + + if (QImode == mode) + xop1 = gen_rtx_fmt_e (, HImode, xop1); + + if (QImode == mode) + xop2 = gen_rtx_fmt_e (, HImode, xop2); + + if ( == + || == ZERO_EXTEND) + { + operands[1] = xop1; + operands[2] = xop2; + operands[3] = gen_rtx_fmt_e (, SImode, gen_rtx_REG (HImode, 18)); + operands[4] = gen_rtx_fmt_e (, SImode, gen_rtx_REG (HImode, 26)); + } + else + { + /* = SIGN_EXTEND */ + /* = ZERO_EXTEND */ + + operands[1] = xop2; + operands[2] = xop1; + operands[3] = gen_rtx_ZERO_EXTEND (SImode, gen_rtx_REG (HImode, 18)); + operands[4] = gen_rtx_SIGN_EXTEND (SImode, gen_rtx_REG (HImode, 26)); + } + }) + +(define_insn "*mulsi3_call" + [(set (reg:SI 22) + (mult:SI (reg:SI 22) + (reg:SI 18))) + (clobber (reg:HI 26))] "AVR_HAVE_MUL" - "") + "%~call __mulsi3" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) (define_insn "*mulhisi3_call" - [(set (reg:SI 22) + [(set (reg:SI 22) (mult:SI (sign_extend:SI (reg:HI 18)) - (sign_extend:SI (reg:HI 20))))] + (sign_extend:SI (reg:HI 26))))] "AVR_HAVE_MUL" "%~call __mulhisi3" [(set_attr "type" "xcall") (set_attr "cc" "clobber")]) (define_insn "*umulhisi3_call" - [(set (reg:SI 22) + [(set (reg:SI 22) (mult:SI (zero_extend:SI (reg:HI 18)) - (zero_extend:SI (reg:HI 20))))] + (zero_extend:SI (reg:HI 26))))] "AVR_HAVE_MUL" "%~call __umulhisi3" [(set_attr "type" "xcall") (set_attr "cc" "clobber")]) +(define_insn "*usmulhisi3_call" + [(set (reg:SI 22) + (mult:SI (zero_extend:SI (reg:HI 18)) + (sign_extend:SI (reg:HI 26))))] + "AVR_HAVE_MUL" + "%~call __usmulhisi3" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + +(define_insn "*muluhisi3_call" + [(set (reg:SI 22) + (mult:SI (zero_extend:SI (reg:HI 26)) + (reg:SI 18)))] + "AVR_HAVE_MUL" + "%~call __muluhisi3" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + +(define_insn "*mulshisi3_call" + [(set (reg:SI 22) + (mult:SI (sign_extend:SI (reg:HI 26)) + (reg:SI 18)))] + "AVR_HAVE_MUL" + "%~call __mulshisi3" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + +(define_insn "*mulohisi3_call" + [(set (reg:SI 22) + (mult:SI (not:SI (zero_extend:SI (not:HI (reg:HI 26)))) + (reg:SI 18)))] + "AVR_HAVE_MUL" + "%~call __mulohisi3" + [(set_attr "type" "xcall") + (set_attr "cc" "clobber")]) + ; / % / % / % / % / % / % / % / % / % / % / % / % / % / % / % / % / % / % / % ; divmod @@ -2399,9 +2653,16 @@ (define_insn "one_cmplsi2" ;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x ;; sign extend +;; We keep combiner from inserting hard registers into the input of sign- and +;; zero-extends. A hard register in the input operand is not wanted because +;; 32-bit multiply patterns clobber some hard registers and extends with a +;; hard register that overlaps these clobbers won't be combined to a widening +;; multiplication. There is no need for combine to propagate hard registers, +;; register allocation can do it just as well. + (define_insn "extendqihi2" [(set (match_operand:HI 0 "register_operand" "=r,r") - (sign_extend:HI (match_operand:QI 1 "register_operand" "0,*r")))] + (sign_extend:HI (match_operand:QI 1 "combine_pseudo_register_operand" "0,*r")))] "" "@ clr %B0\;sbrc %0,7\;com %B0 @@ -2411,7 +2672,7 @@ (define_insn "extendqihi2" (define_insn "extendqisi2" [(set (match_operand:SI 0 "register_operand" "=r,r") - (sign_extend:SI (match_operand:QI 1 "register_operand" "0,*r")))] + (sign_extend:SI (match_operand:QI 1 "combine_pseudo_register_operand" "0,*r")))] "" "@ clr %B0\;sbrc %A0,7\;com %B0\;mov %C0,%B0\;mov %D0,%B0 @@ -2420,8 +2681,8 @@ (define_insn "extendqisi2" (set_attr "cc" "set_n,set_n")]) (define_insn "extendhisi2" - [(set (match_operand:SI 0 "register_operand" "=r,&r") - (sign_extend:SI (match_operand:HI 1 "register_operand" "0,*r")))] + [(set (match_operand:SI 0 "register_operand" "=r,r") + (sign_extend:SI (match_operand:HI 1 "combine_pseudo_register_operand" "0,*r")))] "" "@ clr %C0\;sbrc %B0,7\;com %C0\;mov %D0,%C0 @@ -2438,7 +2699,7 @@ (define_insn "extendhisi2" (define_insn_and_split "zero_extendqihi2" [(set (match_operand:HI 0 "register_operand" "=r") - (zero_extend:HI (match_operand:QI 1 "register_operand" "r")))] + (zero_extend:HI (match_operand:QI 1 "combine_pseudo_register_operand" "r")))] "" "#" "reload_completed" @@ -2454,7 +2715,7 @@ (define_insn_and_split "zero_extendqihi2 (define_insn_and_split "zero_extendqisi2" [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))] + (zero_extend:SI (match_operand:QI 1 "combine_pseudo_register_operand" "r")))] "" "#" "reload_completed" @@ -2469,8 +2730,8 @@ (define_insn_and_split "zero_extendqisi2 }) (define_insn_and_split "zero_extendhisi2" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI (match_operand:HI 1 "register_operand" "r")))] + [(set (match_operand:SI 0 "register_operand" "=r") + (zero_extend:SI (match_operand:HI 1 "combine_pseudo_register_operand" "r")))] "" "#" "reload_completed" Index: config/avr/t-avr =================================================================== --- config/avr/t-avr (revision 176818) +++ config/avr/t-avr (working copy) @@ -41,7 +41,9 @@ LIB1ASMFUNCS = \ _mulhi3 \ _mulhisi3 \ _umulhisi3 \ - _xmulhisi3_exit \ + _usmulhisi3 \ + _muluhisi3 \ + _mulshisi3 \ _mulsi3 \ _udivmodqi4 \ _divmodqi4 \ Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 176818) +++ config/avr/avr.c (working copy) @@ -5515,6 +5515,34 @@ avr_rtx_costs (rtx x, int codearg, int o return false; break; + case SImode: + if (AVR_HAVE_MUL) + { + if (!speed) + { + /* Add some additional costs besides CALL like moves etc. */ + + *total = COSTS_N_INSNS (AVR_HAVE_JMP_CALL ? 5 : 4); + } + else + { + /* Just a rough estimate. Even with -O2 we don't want bulky + code expanded inline. */ + + *total = COSTS_N_INSNS (25); + } + } + else + { + if (speed) + *total = COSTS_N_INSNS (300); + else + /* Add some additional costs besides CALL like moves etc. */ + *total = COSTS_N_INSNS (AVR_HAVE_JMP_CALL ? 5 : 4); + } + + return true; + default: return false; }