From patchwork Mon Sep 8 22:19:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sriraman Tallam X-Patchwork-Id: 387060 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BEAB1401DA for ; Tue, 9 Sep 2014 08:19:46 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=RJ43xw4fXlCT4OqUbC xl4ohtCZIoJHkNtzzIqIG5a2vOkN+WRP5whZt/U1HwYVoLZv05xCvdOHB9tUJADT Foj29LDVXkH+fbLcRe/leaqpKjHhswncvihAk/kTE13zPo5CSsTb+fdpFeeKad+9 WPM2/H4RcssmCUIuOZZhsnNo8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=/PkalZ9bAoOHXrN42iTbKZcs oFQ=; b=V1njmAZRncBvQWS08JUfhPdMHqpNECQez322zoGpMlBGZLsn/k5pMGi+ e2Ft1qBwcNG52oq8fR4kEHaydoEZO/HkimeTklJSjvIm2X6iCLCWOzZavZVy1uNb CROGR1o12hOEASf531tzpDtmFrj8cjN5KYhd5ALKRVJlOxYr9s4= Received: (qmail 22596 invoked by alias); 8 Sep 2014 22:19:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22587 invoked by uid 89); 8 Sep 2014 22:19:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL, BAYES_00, KAM_STOCKGEN, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qa0-f47.google.com Received: from mail-qa0-f47.google.com (HELO mail-qa0-f47.google.com) (209.85.216.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 08 Sep 2014 22:19:37 +0000 Received: by mail-qa0-f47.google.com with SMTP id x12so14649388qac.6 for ; Mon, 08 Sep 2014 15:19:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7c/kGkDwWPdyOxaxCZ9cZ7VAdxvov+AFtFU+PX1sa5A=; b=D1GpL8+b9z/pE42oyyo1CjOLgvUCMgzo9oDuIAp71+GnIvXp5/OrKs8H3s5j7R+WZY NRrTlN2d8EvBzJRv89qjt1GiTbbDUGbiM+cSMkebD+NRUt+0cNSkISDg/uOaWa2M635x B+guy/xwWy6hczCigsnfkfLBc/9y8GCt2jTDYpM1Ao93V5Y9xzgO8dG3IMnpuKH5LWMV W255ppe/QVE/f/pZ4Fv7qE4pB00L0uHZY+AP/89mQR1W8tHmreIPy7JprH6tD/6L4hFJ lfhH0SL2hOyBDMWX8T4/O3qo1SRfcHIMRhcyPU7uxN4zQh7KCHzAH+YvTWdjX2ZSYPYB 8lwg== X-Gm-Message-State: ALoCoQmJdqkcTUwXepdtBAE2MWPjr3o/IEoMu7vOR1Y1FIHSknOc9FQGsBP1MjEx4PcJ2G1/AGf+ MIME-Version: 1.0 X-Received: by 10.140.94.233 with SMTP id g96mr18906969qge.21.1410214775074; Mon, 08 Sep 2014 15:19:35 -0700 (PDT) Received: by 10.229.210.2 with HTTP; Mon, 8 Sep 2014 15:19:34 -0700 (PDT) In-Reply-To: <54062B52.9040706@redhat.com> References: <54062B52.9040706@redhat.com> Date: Mon, 8 Sep 2014 15:19:34 -0700 Message-ID: Subject: Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations From: Sriraman Tallam To: Richard Henderson Cc: GCC Patches , David Li , Cary Coutant , Ian Lance Taylor , Paul Pluzhnikov X-IsSubscribed: yes On Tue, Sep 2, 2014 at 1:40 PM, Richard Henderson wrote: > On 06/20/2014 05:17 PM, Sriraman Tallam wrote: >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 211826) >> +++ config/i386/i386.c (working copy) >> @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) >> return true; >> } >> else if (!SYMBOL_REF_FAR_ADDR_P (op0) >> - && SYMBOL_REF_LOCAL_P (op0) >> + && (SYMBOL_REF_LOCAL_P (op0) >> + || (TARGET_64BIT && ix86_copyrelocs && flag_pie >> + && !SYMBOL_REF_FUNCTION_P (op0))) >> && ix86_cmodel != CM_LARGE_PIC) >> return true; >> break; > > This is the wrong place to patch. > > You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified > TARGET_BINDS_LOCAL_P. I have done this in the new attached patch, I added a new function i386_binds_local_p which will check for this and call default_binds_local_p otherwise. > > Note in particular that I believe that you are doing the wrong thing with weak > and COMMON symbols, in that you probably ought not force a copy reloc there. I added an extra check to not do this for WEAK symbols. I also added a check for DECL_EXTERNAL so I believe this will also not be called for COMMON symbols. > > Note the complexity of default_binds_local_p_1, and the fact that all you > really want to modify is > > /* If PIC, then assume that any global name can be overridden by > symbols resolved from other modules. */ > else if (shlib) > local_p = false; > > near the bottom of that function. I did not understand what you mean here? Were you suggesting an alternative way of doing this? Thanks for reviewing Sri > > > r~ Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc ------ int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0
: mov 0x165a(%rip),%eax # 1c40 foo.cc ------ extern int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0
: mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> mov (%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mcopyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i386.opt (mpie-copyrelocs): New option. * config/i386/i386.c (i386_binds_local_p): New function. (TARGET_BINDS_LOCAL_P): Define. * testsuite/gcc.target/i386/pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/pie-copyrelocs-2.c: New test. Index: testsuite/gcc.target/i386/pie-copyrelocs-2.c =================================================================== --- testsuite/gcc.target/i386/pie-copyrelocs-2.c (revision 0) +++ testsuite/gcc.target/i386/pie-copyrelocs-2.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mno-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mno-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should always be accessed via GOT */ +/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */ Index: testsuite/gcc.target/i386/pie-copyrelocs-1.c =================================================================== --- testsuite/gcc.target/i386/pie-copyrelocs-1.c (revision 0) +++ testsuite/gcc.target/i386/pie-copyrelocs-1.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mcopyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mcopyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should never be accessed with a GOTPCREL */ +/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */ Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 214973) +++ config/i386/i386.c (working copy) @@ -12642,6 +12642,18 @@ legitimate_pic_operand_p (rtx x) } } +bool +i386_binds_local_p (const_tree exp) +{ + /* Globals marked extern are treated as local when linker copy relocations + support is available with -f{pie|PIE}. */ + if (TARGET_64BIT && ix86_copyrelocs && flag_pie + && TREE_CODE (exp) == VAR_DECL + && DECL_EXTERNAL (exp) && !DECL_WEAK (exp)) + return true; + return default_binds_local_p (exp); +} + /* Determine if a given CONST RTX is a valid memory displacement in PIC mode. */ @@ -47157,6 +47169,9 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree * #undef TARGET_MS_BITFIELD_LAYOUT_P #define TARGET_MS_BITFIELD_LAYOUT_P ix86_ms_bitfield_layout_p +#undef TARGET_BINDS_LOCAL_P +#define TARGET_BINDS_LOCAL_P i386_binds_local_p + #if TARGET_MACHO #undef TARGET_BINDS_LOCAL_P #define TARGET_BINDS_LOCAL_P darwin_binds_local_p Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 214973) +++ config/i386/i386.opt (working copy) @@ -108,6 +108,10 @@ int x_ix86_dump_tunes TargetSave int x_ix86_force_align_arg_pointer +;; -mcopyrelocs +TargetSave +int x_ix86_copyrelocs + ;; -mforce-drap= TargetSave int x_ix86_force_drap @@ -291,6 +295,10 @@ mfancy-math-387 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save Generate sin, cos, sqrt for FPU +mcopyrelocs +Target Report Var(ix86_copyrelocs) Init(0) +Use linker copy relocs for pie + mforce-drap Target Report Var(ix86_force_drap) Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack