From patchwork Thu Jun 24 16:47:33 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 56810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 87C5EB6F16 for ; Fri, 25 Jun 2010 02:47:48 +1000 (EST) Received: (qmail 26460 invoked by alias); 24 Jun 2010 16:47:45 -0000 Received: (qmail 26443 invoked by uid 22791); 24 Jun 2010 16:47:42 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, TW_IV X-Spam-Check-By: sourceware.org Received: from mail-pv0-f175.google.com (HELO mail-pv0-f175.google.com) (74.125.83.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 24 Jun 2010 16:47:35 +0000 Received: by pvg3 with SMTP id 3so1048790pvg.20 for ; Thu, 24 Jun 2010 09:47:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.6.33 with SMTP id 33mr9664046wff.135.1277398053321; Thu, 24 Jun 2010 09:47:33 -0700 (PDT) Received: by 10.142.231.8 with HTTP; Thu, 24 Jun 2010 09:47:33 -0700 (PDT) In-Reply-To: <4C229DC0.1050108@gnu.org> References: <20100621193321.GA13780@intel.com> <1277229955.2613.1.camel@localhost> <1277232299.2613.13.camel@localhost> <4C224592.5090100@gnu.org> <4C224EB7.8090900@gnu.org> <4C225D75.2070106@gnu.org> <4C22624E.1050207@gnu.org> <4C228D24.7040407@gnu.org> <4C229DC0.1050108@gnu.org> Date: Thu, 24 Jun 2010 09:47:33 -0700 Message-ID: Subject: Re: PATCH: PR target/44588: Very inefficient 8bit mod/div From: "H.J. Lu" To: Paolo Bonzini Cc: Uros Bizjak , gcc-patches@gcc.gnu.org X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Wed, Jun 23, 2010 at 4:50 PM, Paolo Bonzini wrote: > On 06/24/2010 01:13 AM, H.J. Lu wrote: >>> >>> I don't understand exactly what the problem was; clearly it couldn't work >>> as >>> long as this pattern was cheating blatantly: >> >> Upper 8bit registers aren't real registers. You can't >> do RA with them. > > Please read what other people write.  I'm saying that you should have used a > HI destination just like you did with your new UNSPEC, which is acceptable. > > The pattern I quoted below used a QI destination; then magically you > attempted to extract bit 8..15 of it with an unspec, or something like that. >  For anything like that the optimizers are going to bite back sooner or > later.  And you'd deserve it. > >>> (define_insn "*divqi3" >>>  [(set (match_operand:QI 0 "register_operand" "=a") >>>        (any_div:QI >>>          (match_operand:HI 1 "register_operand" "0") >>>          (match_operand:QI 2 "nonimmediate_operand" "qm"))) >>> >>> Anyway this patch is IMO much nicer than the first ones, so if Uros is >>> okay >>> I don't think it's useful to pursue a more accurate representation.  Just >>> make sure that REG_EQUAL notes are added to the two extractions. >>> >> >> What are REG_EQUAL notes used for? >> As far as the rest of gcc is concerned, there are no upper 8bit >> registers. But you can access bits 8-15 of HI, SI, >> DI registers via XXX_extract. > > Understood.  Using an unspec is fine for me, even though it's not an > approval.  But that's completely orthogonal to putting a REG_EQUAL note _on > the two regs that you extract out of AX_.  The notes' value should be one of > >   (subreg:QI (udiv:HI (...) (zero_extend (...)))) >   (subreg:QI (div:HI (...) (sign_extend (...)))) >   (subreg:QI (umod:HI (...) (zero_extend (...)))) >   (subreg:QI (mod:HI (...) (sign_extend (...)))) > > But I think that with over ten years of GCC experience you do not need > anyone to tell you this.  In fact, regarding "what are REG_EQUAL notes used > for?" my first thought was RTFM, but anyway: they are used by CSE, fwprop, > GCSE and combine to simplify and eliminate redundant expressions. I am learning new things all the time. Thanks for the pointer. Here is a new patch without UNSPEC. OK for trunk? Thanks. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 1f7369b..6514ee4 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -760,6 +760,8 @@ ;; Used in signed and unsigned divisions. (define_code_iterator any_div [div udiv]) +(define_code_attr extract_code + [(div "SIGN_EXTRACT") (udiv "ZERO_EXTRACT")]) ;; Instruction prefix for signed and unsigned operations. (define_code_attr sgnprefix [(sign_extend "i") (zero_extend "") @@ -7269,17 +7271,6 @@ ;; Divide instructions -(define_insn "divqi3" - [(set (match_operand:QI 0 "register_operand" "=a") - (any_div:QI - (match_operand:HI 1 "register_operand" "0") - (match_operand:QI 2 "nonimmediate_operand" "qm"))) - (clobber (reg:CC FLAGS_REG))] - "TARGET_QIMODE_MATH" - "div{b}\t%2" - [(set_attr "type" "idiv") - (set_attr "mode" "QI")]) - ;; The patterns that match these are at the end of this file. (define_expand "divxf3" @@ -7316,6 +7307,92 @@ ;; Divmod instructions. +(define_expand "divmodqi4" + [(parallel [(set (match_operand:QI 0 "register_operand" "") + (any_div:QI + (match_operand:QI 1 "register_operand" "") + (match_operand:QI 2 "nonimmediate_operand" ""))) + (set (match_operand:QI 3 "register_operand" "") + (mod:QI (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] + "TARGET_QIMODE_MATH" +{ + rtx div, mod, insn; + rtx operand0, operand1; + + operand0 = gen_reg_rtx (HImode); + operand1 = gen_reg_rtx (HImode); + + /* Extend operands[1] to HImode. Generate 8bit divide. Result is + in AX. */ + if ( == SIGN_EXTRACT) + { + emit_insn (gen_extendqihi2 (operand1, operands[1])); + emit_insn (gen_divmodhiqi3 (operand0, operand1, operands[2])); + + div = gen_rtx_DIV (QImode, operands[1], operands[2]); + mod = gen_rtx_MOD (QImode, operands[1], operands[2]); + + operand1 = gen_rtx_ (QImode, operand0, + GEN_INT (8), GEN_INT (8)); + } + else + { + emit_insn (gen_zero_extendqihi2 (operand1, operands[1])); + emit_insn (gen_udivmodhiqi3 (operand0, operand1, operands[2])); + + div = gen_rtx_UDIV (QImode, operands[1], operands[2]); + mod = gen_rtx_UMOD (QImode, operands[1], operands[2]); + + operand1 = gen_rtx_ (SImode, operand0, + GEN_INT (8), GEN_INT (8)); + operand1 = simplify_gen_subreg (QImode, operand1, SImode, 0); + } + + /* Extract remainder from AH. */ + insn = emit_move_insn (operands[3], operand1); + set_unique_reg_note (insn, REG_EQUAL, mod); + + /* Extract quotient from AL. */ + insn = emit_move_insn (operands[0], gen_lowpart (QImode, operand0)); + set_unique_reg_note (insn, REG_EQUAL, div); + + DONE; +}) + +;; Divide AX by r/m8, with result stored in +;; AL <- Quotient +;; AH <- Remainder +(define_insn "divmodhiqi3" + [(set (match_operand:HI 0 "register_operand" "=a") + (ior:HI + (ashift:HI + (zero_extend:HI + (mod:QI (match_operand:HI 1 "register_operand" "0") + (match_operand:QI 2 "nonimmediate_operand" "qm"))) + (const_int 8)) + (zero_extend:HI (div:QI (match_dup 1) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_QIMODE_MATH" + "idiv{b}\t%2" + [(set_attr "type" "idiv") + (set_attr "mode" "QI")]) + +(define_insn "udivmodhiqi3" + [(set (match_operand:HI 0 "register_operand" "=a") + (ior:HI + (ashift:HI + (zero_extend:HI + (umod:QI (match_operand:HI 1 "register_operand" "0") + (match_operand:QI 2 "nonimmediate_operand" "qm"))) + (const_int 8)) + (zero_extend:HI (udiv:QI (match_dup 1) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_QIMODE_MATH" + "div{b}\t%2" + [(set_attr "type" "idiv") + (set_attr "mode" "QI")]) + (define_expand "divmod4" [(parallel [(set (match_operand:SWIM248 0 "register_operand" "") (div:SWIM248 --- /dev/null 2010-06-16 10:11:06.602750711 -0700 +++ gcc/gcc/testsuite/gcc.target/i386/mod-1.c 2010-06-24 09:41:47.401923084 -0700 @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -mtune=atom" } */ + +typedef struct { + int a; +} VCR; + +typedef struct { + VCR vcr[8]; +} VCRC; + +typedef struct { + char vcr; +} OWN; + +OWN Own[16]; + +void +f (VCRC *x, OWN *own) +{ + x[own->vcr / 8].vcr[own->vcr % 8].a--; + x[own->vcr / 8].vcr[own->vcr % 8].a = x[own->vcr / 8].vcr[own->vcr % 8].a; +} + +/* { dg-final { scan-assembler-times "idivb" 1 } } */ +/* { dg-final { scan-assembler-not "incl" } } */ +/* { dg-final { scan-assembler-not "orl" } } */ +/* { dg-final { scan-assembler-not "andb" } } */ +/* { dg-final { scan-assembler-not "jns" } } */ --- /dev/null 2010-06-16 10:11:06.602750711 -0700 +++ gcc/gcc/testsuite/gcc.target/i386/umod-1.c 2010-06-21 12:01:25.705950180 -0700 @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=atom" } */ + +unsigned char +foo (unsigned char x, unsigned char y) +{ + return x % y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ +/* { dg-final { scan-assembler-not "divw" } } */ --- /dev/null 2010-06-16 10:11:06.602750711 -0700 +++ gcc/gcc/testsuite/gcc.target/i386/umod-2.c 2010-06-21 12:01:17.857932744 -0700 @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=atom" } */ + +extern unsigned char z; + +unsigned char +foo (unsigned char x, unsigned char y) +{ + z = x/y; + return x % y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ +/* { dg-final { scan-assembler-not "divw" } } */ --- /dev/null 2010-06-16 10:11:06.602750711 -0700 +++ gcc/gcc/testsuite/gcc.target/i386/umod-3.c 2010-06-21 12:02:36.809962702 -0700 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=atom" } */ + +extern void abort (void); +extern void exit (int); + +unsigned char cx = 7; + +int +main () +{ + unsigned char cy; + + cy = cx / 6; if (cy != 1) abort (); + cy = cx % 6; if (cy != 1) abort (); + + exit(0); +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ +/* { dg-final { scan-assembler-not "divw" } } */