From patchwork Thu Dec 18 13:11:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 422567 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B3B60140082 for ; Fri, 19 Dec 2014 00:12:48 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=ooGGoEtHF/q191yqhP/P9DjnXSaW3 2H9T6atbUSBT+05OtSi0NePlWwdqDNOiZ10ye834xolWF5u9afIOQhXxVHk/tA2x n5yFaeThfuVPOeLSm/vHMmpFHvlzSK43wzxAAe2ET581tt7z5ic7nqFTutfrEg6h fGu6gj5ixoXho0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=Hu9hgvvvEP7vmelYrhHnHBDwTac=; b=TN7 7hd4H3mAWfK+VIPvPc7tvIqTF5XOdOjqyvVBz/eSW+nV+F8FRVva2Kn5+U8AfcVU LCK9Mr7CQ9+M4wrJKjt/jaOH7DTNvZTGIXJ5CsCt6Aix04NutEGhCVF5zyLZsVlL jh1AKxjwLikEdn97fohSC3ou9PcYfOSTt7aSN2Ew= Received: (qmail 26414 invoked by alias); 18 Dec 2014 13:12:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 26390 invoked by uid 89); 18 Dec 2014 13:12:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mga03.intel.com Received: from mga03.intel.com (HELO mga03.intel.com) (134.134.136.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Dec 2014 13:12:37 +0000 Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 18 Dec 2014 05:09:32 -0800 X-ExtLoop1: 1 Received: from gnu-6.sc.intel.com ([172.25.70.151]) by orsmga003.jf.intel.com with ESMTP; 18 Dec 2014 05:07:31 -0800 Received: by gnu-6.sc.intel.com (Postfix, from userid 1000) id EEBB4200CAD; Thu, 18 Dec 2014 05:11:50 -0800 (PST) Date: Thu, 18 Dec 2014 05:11:50 -0800 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org, Rasmus Villemoes , "x86@kernel.org" , Andi Kleen Cc: Uros Bizjak , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner Subject: [PATCH] X86-64: Add -mskip-rax-setup Message-ID: <20141218131150.GA32638@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an "xor %eax,%eax" before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: text data bss dec hex filename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant "xor %eax,%eax" instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? Thanks. H.J. Acked-by: H. Peter Anvin --- gcc/ * config/i386/i386.c (ix86_expand_call): Skip setting up RAX register for -mskip-rax-setup when there are no parameters passed in vector registers. * config/i386/i386.opt (mskip-rax-setup): New option. * doc/invoke.texi: Document -mskip-rax-setup. gcc/testsuite/ * gcc.target/i386/amd64-abi-7.c: New tests. * gcc.target/i386/amd64-abi-8.c: Likwise. * gcc.target/i386/amd64-abi-9.c: Likwise. --- gcc/ChangeLog | 8 +++++ gcc/config/i386/i386.c | 7 ++++- gcc/config/i386/i386.opt | 4 +++ gcc/doc/invoke.texi | 13 ++++++++ gcc/testsuite/ChangeLog | 6 ++++ gcc/testsuite/gcc.target/i386/amd64-abi-7.c | 46 +++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/amd64-abi-8.c | 18 +++++++++++ gcc/testsuite/gcc.target/i386/amd64-abi-9.c | 18 +++++++++++ 8 files changed, 119 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-7.c create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-8.c create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-9.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 24a252a..de7907a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-12-18 H.J. Lu + + * config/i386/i386.c (ix86_expand_call): Skip setting up RAX + register for -mskip-rax-setup when there are no parameters + passed in vector registers. + * config/i386/i386.opt (mskip-rax-setup): New option. + * doc/invoke.texi: Document -mskip-rax-setup. + 2014-12-18 Martin Liska PR ipa/64146 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 17ef751..122a350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25461,7 +25461,12 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } } - if (TARGET_64BIT && INTVAL (callarg2) >= 0) + /* Skip setting up RAX register for -mskip-rax-setup when there are no + parameters passed in vector registers. */ + if (TARGET_64BIT + && (INTVAL (callarg2) > 0 + || (INTVAL (callarg2) == 0 + && (TARGET_SSE || !flag_skip_rax_setup)))) { rtx al = gen_rtx_REG (QImode, AX_REG); emit_move_insn (al, callarg2); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 3d54bfa..6dc4da2 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -831,6 +831,10 @@ Target Report Var(flag_nop_mcount) Init(0) Generate mcount/__fentry__ calls as nops. To activate they need to be patched in. +mskip-rax-setup +Target Report Var(flag_skip_rax_setup) Init(0) +Skip setting up RAX register when passing variable arguments. + m8bit-idiv Target Report Mask(USE_8BIT_IDIV) Save Expand 32bit/64bit integer divide into 8bit unsigned integer divide with run-time check diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 15068da..33a7ed2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16256,6 +16256,19 @@ the profiling functions as nops. This is useful when they should be patched in later dynamically. This is likely only useful together with @option{-mrecord-mcount}. +@item -mskip-rax-setup +@itemx -mno-skip-rax-setup +@opindex mskip-rax-setup +When generating code for the x86-64 architecture with SSE extensions +disabled, @option{-skip-rax-setup} can be used to skip setting up RAX +register when there are no variable arguments passed in vector registers. + +@strong{Warning:} Since RAX register is used to avoid unnecessarily +saving vector registers on stack when passing variable arguments, the +impacts of this option are callees may waste some stack space, +misbehave or jump to a random location. GCC 4.4 or newer don't have +those issues, regardless the RAX register value. + @item -m8bit-idiv @itemx -mno-8bit-idiv @opindex 8bit-idiv diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 025dfce..6c06503 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-12-18 H.J. Lu + + * gcc.target/i386/amd64-abi-7.c: New tests. + * gcc.target/i386/amd64-abi-8.c: Likwise. + * gcc.target/i386/amd64-abi-9.c: Likwise. + 2014-12-18 Martin Liska * g++.dg/ipa/pr64146.C: New test. diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-7.c b/gcc/testsuite/gcc.target/i386/amd64-abi-7.c new file mode 100644 index 0000000..fcca680 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-7.c @@ -0,0 +1,46 @@ +/* { dg-do run { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse" } */ + +#include +#include + +int n1 = 30; +int n2 = 324; +void *n3 = (void *) &n2; +int n4 = 407; + +int e1; +int e2; +void *e3; +int e4; + +static void +__attribute__((noinline)) +foo (va_list va_arglist) +{ + e2 = va_arg (va_arglist, int); + e3 = va_arg (va_arglist, void *); + e4 = va_arg (va_arglist, int); +} + +static void +__attribute__((noinline)) +test (int a1, ...) +{ + va_list va_arglist; + e1 = a1; + va_start (va_arglist, a1); + foo (va_arglist); + va_end (va_arglist); +} + +int +main () +{ + test (n1, n2, n3, n4); + assert (n1 == e1); + assert (n2 == e2); + assert (n3 == e3); + assert (n4 == e4); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-8.c b/gcc/testsuite/gcc.target/i386/amd64-abi-8.c new file mode 100644 index 0000000..b25ceec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse -mskip-rax-setup" } */ +/* { dg-final { scan-assembler-not "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" } } */ + +void foo (const char *, ...); + +void +test1 (void) +{ + foo ("%d", 20); +} + +int +test2 (void) +{ + foo ("%d", 20); + return 3; +} diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c new file mode 100644 index 0000000..4707eb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */ +/* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 2 } } */ + +void foo (const char *, ...); + +void +test1 (void) +{ + foo ("%d", 20); +} + +int +test2 (void) +{ + foo ("%d", 20); + return 3; +}