From patchwork Wed Nov 8 14:32:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Kleen X-Patchwork-Id: 835837 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466249-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="NtTtoV+L"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yX80L0GN9z9s3w for ; Thu, 9 Nov 2017 01:32:49 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=G8ij4XNkGACe RQtd0IpoI6RUuc2ocoWdlBqTH/UbS/dgoygPUAeBAKvEWiF++MbdggeKWEBbhBp2 shlYcar6qleWqhoew8yRubZM9bqMUvsEwRYP5uRN6vID/bmu+Fliv/67QV41EJac GjJ1Q4RATwm8U/bO4sGNwCiuKoKrf2I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=WWZKqQev7CSOZ+dKuF PJUZqt5sc=; b=NtTtoV+LJ9748at6NlKCV5pRnTuyVTJjVjo8ZnV+ctuBfsxxaX W5KDYMNev4hoLvgHHbGOkTegbMT6wIKaTGfKRVZlRWh9o7lqhKPSOTzODGyi9ObD ZWQw1KcTLMU+4xusxAE9GyG1ldEbnB9l94e+ytSRTyNU7oFaNVqHH7RPU= Received: (qmail 19945 invoked by alias); 8 Nov 2017 14:32:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 19932 invoked by uid 89); 8 Nov 2017 14:32:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=H*Ad:U*ak, alg X-HELO: one.firstfloor.org Received: from one.firstfloor.org (HELO one.firstfloor.org) (193.170.194.197) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 08 Nov 2017 14:32:34 +0000 Received: from firstfloor.org (174-25-38-10.ptld.qwest.net [174.25.38.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by one.firstfloor.org (Postfix) with ESMTPSA id 7CF1086785; Wed, 8 Nov 2017 15:32:28 +0100 (CET) Received: by firstfloor.org (Postfix, from userid 1000) id 46C88A6DF5; Wed, 8 Nov 2017 06:32:26 -0800 (PST) From: Andi Kleen To: gcc-patches@gcc.gnu.org Cc: Andi Kleen Subject: [PATCH] Add option to force indirect calls for x86 Date: Wed, 8 Nov 2017 06:32:22 -0800 Message-Id: <20171108143222.4212-1-andi@firstfloor.org> From: Andi Kleen This patch adds a -mforce-indirect-call option to force all calls or tail calls on x86_64 between functions to indirect. This is similar to the large code model, but doesn't affect jumps inside functions, so has much less run time overhead. This is useful with Intel Processor Trace (PT). PT has precise timing for indirect calls/jumps, but not for direct ones. So if we can force them to indirect it allows to time every function relatively accurately (minus the overhead of the indirect branch) Without this short functions often don't see a timing update and cannot be measured. The timing requires at least Skylake or Goldmont based CPUs. I made it an option. Originally I tried to make it a new code model, but since it can be combined with other code models (medium, pic, kernel etc.) this turned out to be too many combinations. For example with gcc. This first column is a ns time stamp for the functions. $ perf record -e intel_pt/noretcomp=1,cyc=1,cyc_thresh=1/u ./cc1 -O3 hello.c $ perf script --itrace=cr -F callindent,time,sym,addr --ns | sed -n 180000,182000p | less ... 1184596.432756920: build_int_cst => 79c9de c_common_nodes_and_builtins 1184596.432756921: tree_cons => ee2080 tree_cons 1184596.432756938: ggc_internal_alloc => 80f3e0 ggc_internal_alloc 1184596.432756951: memset@plt => 598af0 memset@plt 1184596.432756967: __memset_avx2_unaligned_erms => 80f605 ggc_internal_alloc 1184596.432756969: ggc_internal_alloc => ee20a2 tree_cons 1184596.432756973: tree_cons => 79c9f4 c_common_nodes_and_builtins 1184596.432756974: build_int_cst => ef9a40 build_int_cst 1184596.432756996: wide_int_to_tree => ef93a0 wide_int_to_tree 1184596.432757000: wi::force_to_size => f48f70 wi::force_to_size 1184596.432757005: canonize => ef94de wide_int_to_tree 1184596.432757021: get_int_cst_ext_nunits => ee1960 get_int_cst_ext_nunits 1184596.432757026: get_int_cst_ext_nunits => ef94fe wide_int_to_tree 1184596.432757042: tree_int_cst_elt_check => 83e310 tree_int_cst_elt_check 1184596.432757044: tree_int_cst_elt_check => ef9761 wide_int_to_tree 1184596.432757046: wide_int_to_tree => ef9a9b build_int_cst Passes bootstrap and test suite with x86_64, also a gcc itself built with the option bootstraps. gcc/: 2017-11-08 Andi Kleen * config/i386/i386.opt: Add -mforce-indirect-call. * config/i386/predicates.md: Check for flag_force_indirect_call. * doc/invoke.texi: Document -mforce-indirect-call gcc/testsuite/: 2017-11-08 Andi Kleen * gcc.target/i386/force-indirect-call-1.c: New test. * gcc.target/i386/force-indirect-call-2.c: New test. * gcc.target/i386/force-indirect-call-3.c: New test. --- gcc/config/i386/i386.opt | 4 ++++ gcc/config/i386/predicates.md | 3 ++- gcc/doc/invoke.texi | 8 +++++++- .../gcc.target/i386/force-indirect-call-1.c | 23 ++++++++++++++++++++++ .../gcc.target/i386/force-indirect-call-2.c | 5 +++++ .../gcc.target/i386/force-indirect-call-3.c | 5 +++++ 6 files changed, 46 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/force-indirect-call-1.c create mode 100644 gcc/testsuite/gcc.target/i386/force-indirect-call-2.c create mode 100644 gcc/testsuite/gcc.target/i386/force-indirect-call-3.c diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 7c9dd471686..df727eee468 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -977,3 +977,7 @@ mcet-switch Target Report Undocumented Var(flag_cet_switch) Init(0) Turn on CET instrumentation for switch statements, which use jump table and indirect jump. + +mforce-indirect-call +Target Report Var(flag_force_indirect_call) Init(0) +Make all function calls indirect. diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index c3f442eb8ac..c6e6e980959 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -600,7 +600,8 @@ (define_predicate "constant_call_address_operand" (match_code "symbol_ref") { - if (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) + if (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC + || flag_force_indirect_call) return false; if (TARGET_DLLIMPORT_DECL_ATTRIBUTES && SYMBOL_REF_DLLIMPORT_P (op)) return false; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2ef88e081f9..e897d93070a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1205,7 +1205,7 @@ See RS/6000 and PowerPC Options. -msse4a -m3dnow -m3dnowa -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop @gol -mlzcnt -mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx @gol -mmwaitx -mclzero -mpku -mthreads @gol --mcet -mibt -mshstk @gol +-mcet -mibt -mshstk -mforce-indirect-call @gol -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol @@ -26175,6 +26175,12 @@ You can control this behavior for specific functions by using the function attributes @code{ms_abi} and @code{sysv_abi}. @xref{Function Attributes}. +@item -mforce-indirect-call +@opindex mforce-indirect-call +Force all calls to functions to be indirect. This is useful +when using Intel Processor Trace where it generates more precise timing +information for function calls. + @item -mcall-ms2sysv-xlogues @opindex mcall-ms2sysv-xlogues @opindex mno-call-ms2sysv-xlogues diff --git a/gcc/testsuite/gcc.target/i386/force-indirect-call-1.c b/gcc/testsuite/gcc.target/i386/force-indirect-call-1.c new file mode 100644 index 00000000000..be1be2c879e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/force-indirect-call-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mforce-indirect-call" } */ +/* { dg-final { scan-assembler-times "call\[ \\t\]+\\*%" 2 } } */ +/* { dg-final { scan-assembler-times "jmp\[ \\t\]+\\*%" 1 } } */ +int x; +int y; + +void __attribute__((noinline)) f1(void) +{ + x++; +} + +static __attribute__((noinline)) void f3(void) +{ + y++; +} + +void f2() +{ + f1(); + f3(); + f1(); +} diff --git a/gcc/testsuite/gcc.target/i386/force-indirect-call-2.c b/gcc/testsuite/gcc.target/i386/force-indirect-call-2.c new file mode 100644 index 00000000000..dd0df259ab8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/force-indirect-call-2.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mforce-indirect-call -fPIC" } */ +/* { dg-final { scan-assembler-times "call\[ \\t\]+\\*%" 2 } } */ +/* { dg-final { scan-assembler-times "jmp\[ \\t\]+\\*%" 1 } } */ +#include "force-indirect-call-1.c" diff --git a/gcc/testsuite/gcc.target/i386/force-indirect-call-3.c b/gcc/testsuite/gcc.target/i386/force-indirect-call-3.c new file mode 100644 index 00000000000..28d8c98b7b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/force-indirect-call-3.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mforce-indirect-call -mcmodel=medium" } */ +/* { dg-final { scan-assembler-times "call\[ \\t\]+\\*%" 2 } } */ +/* { dg-final { scan-assembler-times "jmp\[ \\t\]+\\*%" 1 } } */ +#include "force-indirect-call-1.c"