From patchwork Fri Oct 7 07:13:16 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 118219 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 69BC7B70DC for ; Fri, 7 Oct 2011 18:13:41 +1100 (EST) Received: (qmail 16993 invoked by alias); 7 Oct 2011 07:13:36 -0000 Received: (qmail 16979 invoked by uid 22791); 7 Oct 2011 07:13:33 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-gx0-f175.google.com (HELO mail-gx0-f175.google.com) (209.85.161.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 07 Oct 2011 07:13:18 +0000 Received: by ggnq1 with SMTP id q1so2865626ggn.20 for ; Fri, 07 Oct 2011 00:13:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.147.154.12 with SMTP id g12mr1185803yao.36.1317971597019; Fri, 07 Oct 2011 00:13:17 -0700 (PDT) Received: by 10.147.116.13 with HTTP; Fri, 7 Oct 2011 00:13:16 -0700 (PDT) In-Reply-To: References: <20111003230055.GA27052@intel.com> Date: Fri, 7 Oct 2011 09:13:16 +0200 Message-ID: Subject: Re: PATCH: PR target/50603: [x32] Unnecessary lea From: Uros Bizjak To: "H.J. Lu" Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek , Richard Henderson Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Thu, Oct 6, 2011 at 11:33 PM, H.J. Lu wrote: >>>>>>> OTOH, x86_64 and i686 targets can also benefit from this change. If >>>>>>> combine can't create more complex address (covered by lea), then it >>>>>>> will simply propagate memory operand back into the add insn. It looks >>>>>>> to me that we can't loose here, so: >>>>>>> >>>>>>>  /* Improve address combine.  */ >>>>>>>  if (code == PLUS && MEM_P (src2)) >>>>>>>    src2 = force_reg (mode, src2); >>>>>>> >>>>>>> Any opinions? >>>>>>> >>>>>> >>>>>> It doesn't work with 64bit libstdc++: >>>>> >>>>> Yeah, yeah. ix86_output_mi_thunk has some ...  issues. >>>>> >>>>> Please try attached patch that introduces ix86_emit_binop and uses it >>>>> in a bunch of places. >>> >>>> I tried it on GCC.  There are no regressions.  The bugs are fixed for x32. >>>> Here are size comparison with GCC runtime libraries on ia32, x32 and >>>> x86-64: >>> >>>>  884093   18600   27064  929757   e2fdd old libstdc++.so >>>>  884189   18600   27064  929853   e303d new libs/libstdc++.so >>>> >>>> The new code is >>>> >>>> mov    0xc(%edi),%eax >>>> mov    %eax,0x8(%esi) >>>> mov    -0xc(%eax),%eax >>>> mov    0x10(%edi),%edx >>>> lea    0x8(%esi,%eax,1),%eax >>>> >>>> The old one is >>>> >>>> mov    0xc(%edi),%edx >>>> lea    0x8(%esi),%eax >>>> mov    %edx,0x8(%esi) >>>> add    -0xc(%edx),%eax >>>> mov    0x10(%edi),%edx >>> >>> The new code merged lea+add into one lea, so it looks quite OK to me. >>> >>> Do you have some performance numbers? >>> >> >> I will report performance numbers in a few days. > > The differences in SPEC CPU 2006 on ia32, x86-64 and > x32 are within noise range. Great. Attached is a slightly updated patch, where we consider only integer-mode PLUS RTXes. 2011-10-07 Uros Bizjak H.J. Lu PR target/50603 * config/i386/i386.c (ix86_fixup_binary_operands): Force src2 of integer PLUS RTX to a register to improve address combine. testsuite/ChangeLog: 2011-10-07 Uros Bizjak H.J. Lu PR target/50603 * gcc.target/i386/pr50603.c: New test. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 179645) +++ config/i386/i386.c (working copy) @@ -15798,6 +15798,12 @@ ix86_fixup_binary_operands (enum rtx_code code, en if (MEM_P (src1) && !rtx_equal_p (dst, src1)) src1 = force_reg (mode, src1); + /* Improve address combine. */ + if (code == PLUS + && GET_MODE_CLASS (mode) == MODE_INT + && MEM_P (src2)) + src2 = force_reg (mode, src2); + operands[1] = src1; operands[2] = src2; return dst; Index: testsuite/gcc.target/i386/pr50603.c =================================================================== --- testsuite/gcc.target/i386/pr50603.c (revision 0) +++ testsuite/gcc.target/i386/pr50603.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern int *foo; + +int +bar (int x) +{ + return foo[x]; +} +/* { dg-final { scan-assembler-not "lea\[lq\]" } } */