From patchwork Fri May 10 02:30:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sriraman Tallam X-Patchwork-Id: 242885 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id D93482C0127 for ; Fri, 10 May 2013 12:30:41 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; q= dns; s=default; b=FSt7iflIFxxveiaWpva5X3u5t6pyG7yXhX1jwUS3eB4O6m pgDq2vYMDzLlHitneRxNv+wYxVhuwrUcRlQNS8uu4oKWnXXo3/PDB/N88j9/T0ld f1LOBlkeGrGftx3HeEjVFYzVS4xS3063BRkCb7YjiL5ChkA84iE5d4bW5+sVg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; s= default; bh=O6XaMH87bN9LeQHJFNHQ9nTGJsE=; b=RJvoOC2A/is9KknFQEZ2 xsqXiqc3lHlYiaGJ4Sv2mx64im9dElriSIJq+oouDHHwt1CL0TcyU24/ZeIvX5ON slzgE+z8hmvZ9uB9KZmZ3Y4oyzRRNIerR7xOuv77E/w8IBnbsQKV1ybcXrjCr/F8 49Ht1+YPtGE0cw3F7ZEcz1A= Received: (qmail 7636 invoked by alias); 10 May 2013 02:30:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7609 invoked by uid 89); 10 May 2013 02:30:32 -0000 X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, RP_MATCHES_RCVD, SPF_PASS, TW_AV, TW_CP, TW_TM, T_FRT_FRIEND autolearn=ham version=3.3.1 Received: from mail-ie0-f174.google.com (HELO mail-ie0-f174.google.com) (209.85.223.174) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 10 May 2013 02:30:03 +0000 Received: by mail-ie0-f174.google.com with SMTP id 10so6807304ied.5 for ; Thu, 09 May 2013 19:30:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=mGlniy2v+4HdNwo4HbWZCjx8AlIDyHiyfEHLTfrnBoM=; b=iQM97lc/lABU07Clce42updWlKNc/o77fsCbKgRP/NoP4JLIzPRVL0eIW7Nytvf3SW Xl6xAbZE/iGbIKQRubO7iIDHcn2R90mdCQNTn9V87DjFJ4vB7quurHN8uw/qB4yhH9jC MkzlI2G5jboZUHsGVI3mIeUocFNUA7lv5+5E44H1TAJnbAzbIIp5ntPKRVIQ8KvnnRzN Eqbcmxnnprj0vnFpzCS26EWlmzr+uPpCkeKtbP/I66Uqv6X47mvb1iTKQmxP0mzAtuqH l5M9n1hgZMJwOQKR3FURFwhyGy6OHTXGBUIwkUvHlrHVlh5e2C+Vw1fc2P5bdIb6AGHv EI7A== MIME-Version: 1.0 X-Received: by 10.42.146.68 with SMTP id i4mr6048079icv.44.1368153001907; Thu, 09 May 2013 19:30:01 -0700 (PDT) Received: by 10.231.111.193 with HTTP; Thu, 9 May 2013 19:30:01 -0700 (PDT) Date: Thu, 9 May 2013 19:30:01 -0700 Message-ID: Subject: [PATCH] Dynamic dispatch of multiversioned functions and CPU mocks for code coverage. From: Sriraman Tallam To: GCC Patches , David Li , Jason Merrill , "H.J. Lu" X-Gm-Message-State: ALoCoQmmJZIVBVpPdHmZHXH/Nf8Hg8/+8R8FtDCSWbNmagJdW6zNOJsXY0jUloRCkYV5TRd39GGkdEzw3V6lRp4nBuKs3xGeOW676EPFf6ZsvvhMGoMZGNrMTfvBA4DMemNlxZJCMXN0jh0tMnY+ZPh4DTrp9LwNuto2iKslmJKiHqeMW0uxKc5E0NXiLSKxHCxea3a/dp49 X-Virus-Found: No X-IsSubscribed: yes Hi, This patch is an enhancement to the Function Multiversioning feature. This patch achieves two things: * Primarily, this patch makes it easy to test for code coverage of multiversioned functions. * Secondary, It makes function multiversioning work when there is no ifunc support. Since it invokes the dispatcher for every call, it is possible to execute different function versions every time. This incurs a performance penalty. This patch makes it easy to test for code coverage of multiversioned functions. Here is a motivating example: __attribute__((target ("default"))) int foo () { ... return 0; } __attribute__((target ("sse"))) int foo () { ... return 1; } __attribute__((target ("popcnt"))) int foo () { ... return 2; } int main () { return foo(); } Lets say your test CPU supports popcnt. A run of this program will invoke the popcnt version of foo (). Then, how do we test the sse version of foo()? To do that for the above example, we need to run this code on a CPU that has sse support but no popcnt support. Otherwise, we need to comment out the popcnt version and run this example. This can get painful when there are many versions. The same argument applies to testing the default version of foo. So, I am introducing the ability to mock a CPU. If the CPU you are testing on supports sse, you should be able to test the sse version. First, I have introduced a new flag called -fmultiversion-dynamic-dispatch. This patch invokes the function version dispatcher every time a call to a foo () is made. Without that flag, the version dispatch happens once at startup time via the IFUNC mechanism. Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses the two new builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to check the cpu type and cpu isa. Then, I plan to add the following hooks to libgcc (in a different patch) : int set_mock_cpu_is (const char *cpu); int set_mock_cpu_supports (const char *isa); int init_mock_cpu (); // Clear the values of the mock cpu. With this support, here is how you can test for code coverage of the "sse" version and "default version of foo in the above example: int main () { // Test SSE version. if (__builtin_cpu_supports ("sse")) { init_mock_cpu(); set_mock_cpu_supports ("sse"); assert (foo () == 1); } // Test default version. init_mock_cpu(); assert (foo () == 0); } Invoking a multiversioned binary several times with appropriate mock cpu values for the various ISAs and CPUs will give the complete code coverage desired. Ofcourse, the underlying platform should be able to support the various features. Note that the above test will work only with -fmultiversion-dynamic-dispatch as the dispatcher must be invoked on every multiversioned call to be able to dynamically change the version. Multiple ISA features can be set in the mock cpu by calling "set_mock_cpu_supports" several times with different ISA names. Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is" will set the CPU type. This patch only includes the gcc changes. I will separately prepare a patch for the libgcc changes. Right now, since the libgcc changes are not available the two new mock cpu builtins check the real CPU like "__builtin_cpu_is" and "__builtin_cpu_supports". Patch attached. Please look at mv14_debug_code_coverage.C for an exhaustive example of testing for code coverage in the presence of multiple versions. This patch was already discussed when sent earlier to google/gcc-4_7 branch. That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html Some of the alternative suggested here are: * Lazy IFUNC relocation, which got shot down due to problems with bad interactions with other shared libraries. * Using environment variables to mock CPU architectures: This may still be plausible. For instance: LD_CPU_FEATURES=sse,sse2 ./a.out # run as if only sse and sse2 are available However, with dynamic dispatch, there is the unique advantage of executing different function versions in the same execution. Patch attached. Comments please. Thanks Sri This patch achieves two things: * Primarily, this patch makes it easy to test for code coverage of multiversioned functions. * Secondary, It makes function multiversioning work when there is no ifunc support. This patch makes it easy to test for code coverage of multiversioned functions. Here is a motivating example: __attribute__((target ("default"))) int foo () { ... return 0; } __attribute__((target ("sse"))) int foo () { ... return 1; } __attribute__((target ("popcnt"))) int foo () { ... return 2; } int main () { return foo(); } Lets say your test CPU supports popcnt. A run of this program will invoke the popcnt version of foo (). Then, how do we test the sse version of foo()? To do that for the above example, we need to run this code on a CPU that has sse support but no popcnt support. Otherwise, we need to comment out the popcnt version and run this example. This can get painful when there are many versions. The same argument applies to testing the default version of foo. So, I am introducing the ability to mock a CPU. If the CPU you are testing on supports sse, you should be able to test the sse version. First, I have introduced a new flag called -fmultiversion-dynamic-dispatch. This patch invokes the function version dispatcher every time a call to a foo () is made. Without that flag, the version dispatch happens once at startup time via the IFUNC mechanism. Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses the two new builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to check the cpu type and cpu isa. Then, I plan to add the following hooks to libgcc (in a different patch) : int set_mock_cpu_is (const char *cpu); int set_mock_cpu_supports (const char *isa); int init_mock_cpu (); // Clear the values of the mock cpu. With this support, here is how you can test for code coverage of the "sse" version and "default version of foo in the above example: int main () { // Test SSE version. if (__builtin_cpu_supports ("sse")) { init_mock_cpu(); set_mock_cpu_supports ("sse"); assert (foo () == 1); } // Test default version. init_mock_cpu(); assert (foo () == 0); } Invoking a multiversioned binary several times with appropriate mock cpu values for the various ISAs and CPUs will give the complete code coverage desired. Ofcourse, the underlying platform should be able to support the various features. Note that the above test will work only with -fmultiversion-dynamic-dispatch as the dispatcher must be invoked on every multiversioned call to be able to dynamically change the version. Multiple ISA features can be set in the mock cpu by calling "set_mock_cpu_supports" several times with different ISA names. Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is" will set the CPU type. This patch only includes the gcc changes. I will separately prepare a patch for the libgcc changes. Right now, since the libgcc changes are not available the two new mock cpu builtins check the real CPU like "__builtin_cpu_is" and "__builtin_cpu_supports". Patch attached. Please look at mv14_debug_code_coverage.C for an exhaustive example of testing for code coverage in the presence of multiple versions. This patch was already discussed when sent earlier to google/gcc-4_7 branch. That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html Some of the alternative suggested here are: * Lazy IFUNC relocation, which got shot down due to problems with bad interactions with other shared libraries. * Using environment variables to mock CPU architectures: This may still be plausible. For instance: LD_CPU_FEATURES=sse,sse2 ./a.out # run as if only sse and sse2 are available However, with dynamic dispatch, there is the unique advantage of executing different function versions in the same execution. * cgraphunit.c (cgraph_analyze_function): Pass value of -fmultiversion-dynamic-dispatch when building resolver. * common.opt (fmultiversion-dynamic-dispatch): New flag. * target.def (generate_version_dispatcher_body): New parameter. * doc/tm.texi (TARGET_GENERATE_VERSION_DISPATCHER_BODY): Regenerate. * doc/tm.texi.in (TARGET_GENERATE_VERSION_DISPATCHER_BODY): Update. * doc/invoke.texi (-fmultiversion-dynamic-dispatch): Document new flag. * testsuite/g++.dg/ext/mv1_debug.C: New test. * testsuite/g++.dg/ext/mv2_debug.C: New test. * testsuite/g++.dg/ext/mv6_debug.C: New test. * testsuite/g++.dg/ext/mv14_debug_code_coverage.C: New test. * config/i386/i386.c (IX86_BUILTIN_MOCK_CPU_IS): New enum value. (IX86_BUILTIN_MOCK_CPU_SUPPORTS): Ditto. (add_condition_to_bb): New parameter. Handle code gen when dynamic dispatch is needed. (get_builtin_code_for_version): New parameter. Handle dynamic dispatch. (ix86_compare_version_priority): Call get_builtin_code_for_version with updated parameters. (dispatch_function_versions): New parameter. Handle dynamic dispatch. (make_resolver_func): New parameter. Handle dynamic dispatch. (ix86_generate_version_dispatcher_body): Ditto. (ix86_init_platform_type_builtins): New builtins. (ix86_expand_builtin): Expand new builtins. Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 198754) +++ cgraphunit.c (working copy) @@ -640,7 +640,13 @@ cgraph_analyze_function (struct cgraph_node *node) { tree resolver = NULL_TREE; gcc_assert (targetm.generate_version_dispatcher_body); - resolver = targetm.generate_version_dispatcher_body (node); + /* When -fmultiversion-dynamic-dispatch is not turned on, the + dispatcher should be invoked optimally (once using ifunc support). + When -fmultiversion-dynamic-dispatch is on, the dispatcher should + be invoked every time a call to the multiversioned function is + made. */ + resolver = targetm.generate_version_dispatcher_body (node, + flag_multiversion_dynamic_dispatch); gcc_assert (resolver != NULL_TREE); } } Index: common.opt =================================================================== --- common.opt (revision 198754) +++ common.opt (working copy) @@ -1555,6 +1555,10 @@ fmove-loop-invariants Common Report Var(flag_move_loop_invariants) Init(1) Optimization Move loop invariant computations out of loops +fmultiversion-dynamic-dispatch +Common Report Var(flag_multiversion_dynamic_dispatch) Init(0) +Invoke the function version dispatcher for every multiversioned function call. + fdce Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass Index: target.def =================================================================== --- target.def (revision 198754) +++ target.def (working copy) @@ -1323,11 +1323,12 @@ DEFHOOK /* Target hook is used to generate the dispatcher logic to invoke the right function version at run-time for a given set of function versions. ARG points to the callgraph node of the dispatcher function whose body - must be generated. */ + must be generated. The version dispatcher is invoked on every call when + debug_mode is 1. */ DEFHOOK (generate_version_dispatcher_body, "", - tree, (void *arg), NULL) + tree, (void *arg, int debug_mode), NULL) /* Target hook is used to get the dispatcher function for a set of function versions. The dispatcher function is called to invoke the right function Index: doc/tm.texi =================================================================== --- doc/tm.texi (revision 198754) +++ doc/tm.texi (working copy) @@ -10961,11 +10961,13 @@ version at run-time. @var{decl} is one version fro identical versions. @end deftypefn -@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg}) +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg}, int @var{debug_mode}) This hook is used to generate the dispatcher logic to invoke the right function version at run-time for a given set of function versions. @var{arg} points to the callgraph node of the dispatcher function whose -body must be generated. +body must be generated. When @var{debug_mode} is 1, the dispatcher +logic is invoked on every call. Otherwise, the dispatcher is invoked +only at start up to minimize call overhead. @end deftypefn @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) Index: doc/tm.texi.in =================================================================== --- doc/tm.texi.in (revision 198754) +++ doc/tm.texi.in (working copy) @@ -10804,7 +10804,9 @@ identical versions. This hook is used to generate the dispatcher logic to invoke the right function version at run-time for a given set of function versions. @var{arg} points to the callgraph node of the dispatcher function whose -body must be generated. +body must be generated. When @var{debug_mode} is 1, the dispatcher +logic is invoked on every call. Otherwise, the dispatcher is invoked +only at start up to minimize call overhead. @end deftypefn @hook TARGET_INVALID_WITHIN_DOLOOP Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 198754) +++ doc/invoke.texi (working copy) @@ -178,6 +178,7 @@ in the following sections. @xref{C++ Dialect Options,,Options Controlling C++ Dialect}. @gccoptlist{-fabi-version=@var{n} -fno-access-control -fcheck-new @gol -fconstexpr-depth=@var{n} -ffriend-injection @gol +-fmultiversion-dynamic-dispatch @gol -fno-elide-constructors @gol -fno-enforce-eh-specs @gol -ffor-scope -fno-for-scope -fno-gnu-keywords @gol @@ -2023,6 +2024,13 @@ earlier releases. This option is for compatibility, and may be removed in a future release of G++. +@item -fmultiversion-dynamic-dispatch +@opindex fmultiversion-dynamic-dispatch +When using function multiversioning, the function versions dispatcher is +invoked only once at start-up using IFUNC support to minimize call overhead. +This flag can be used to instead invoke the dispatcher every time a call to +a multiversioned function is made. + @item -fno-elide-constructors @opindex fno-elide-constructors The C++ standard allows an implementation to omit creating a temporary Index: testsuite/g++.dg/ext/mv1_debug.C =================================================================== --- testsuite/g++.dg/ext/mv1_debug.C (revision 0) +++ testsuite/g++.dg/ext/mv1_debug.C (revision 0) @@ -0,0 +1,4 @@ +/* Test case to check if mv1.C works with -fmultiversion-dynamic-dispatch additionally added. */ +/* { dg-do run { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fPIC -fmultiversion-dynamic-dispatch" } */ +/* { dg-additional-sources "mv1.C" } */ Index: testsuite/g++.dg/ext/mv14_debug_code_coverage.C =================================================================== --- testsuite/g++.dg/ext/mv14_debug_code_coverage.C (revision 0) +++ testsuite/g++.dg/ext/mv14_debug_code_coverage.C (revision 0) @@ -0,0 +1,214 @@ +/* Test case to show how code coverage testing of of a multiversioned function + can be done using cpu mocks. */ +/* { dg-do run { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmultiversion-dynamic-dispatch" } */ + +#include +#include + +/* Temporary code till the libgcc hooks for this are checked in. Override + __builtin_mock_cpu_* builtins to change the mock cpu. */ +const char *mock_cpu = NULL; +int __builtin_mock_cpu_is (const char *cpu) +{ + if (strcmp (cpu, mock_cpu) == 0) + return 1; + return 0; +} + +/* Temporary code till the libgcc hooks for this are checked in. + Only mock one ISA type. The libgcc hooks will allow mocking multiple + ISA features together, like popcnt and avx2. */ +const char *mock_isa = NULL; +int __builtin_mock_cpu_supports (const char *isa) +{ + if (strcmp (isa, mock_isa) == 0) + return 1; + return 0; +} +/* End of temporary code. */ + + +/* Default version. */ +int foo () __attribute__ ((target ("default"))); + +int foo () __attribute__ ((target ("mmx"))); +int foo () __attribute__ ((target ("sse"))); +int foo () __attribute__ ((target ("sse2"))); +int foo () __attribute__ ((target ("sse3"))); +int foo () __attribute__ ((target ("ssse3"))); +int foo () __attribute__ ((target ("sse4.1"))); +int foo () __attribute__ ((target ("sse4.2"))); +int foo () __attribute__ ((target ("popcnt"))); +int foo () __attribute__ ((target ("avx"))); +int foo () __attribute__ ((target ("avx2"))); + +int foo () __attribute__ ((target ("arch=corei7"))); + +int main () +{ + /* Using CPU mocks run each version of foo() when possible and + check the return value. */ + + /* Run Intel corei7 version if possible. Test if this + CPU can mock corei7. It should support SSE4.2 and + below, SSSE3 and MMX. */ + if (__builtin_cpu_supports ("sse4.2") + && __builtin_cpu_supports ("ssse3") + && __builtin_cpu_supports ("mmx")) + { + mock_cpu = "corei7"; + mock_isa = ""; + assert (foo () == 11); + } + + /* Run avx2 version if possible. */ + if (__builtin_cpu_supports ("avx2")) + { + mock_cpu = ""; + mock_isa = "avx2"; + assert (foo () == 1); + } + /* Run avx version if possible. */ + if (__builtin_cpu_supports ("avx")) + { + mock_cpu = ""; + mock_isa = "avx"; + assert (foo () == 2); + } + /* Run popcnt version if possible. */ + if (__builtin_cpu_supports ("popcnt")) + { + mock_cpu = ""; + mock_isa = "popcnt"; + assert (foo () == 3); + } + /* Run sse4.2 version if possible. */ + if (__builtin_cpu_supports ("sse4.2")) + { + mock_cpu = ""; + mock_isa = "sse4.2"; + assert (foo () == 4); + } + /* Run sse4.1 version if possible. */ + if (__builtin_cpu_supports ("sse4.1")) + { + mock_cpu = ""; + mock_isa = "sse4.1"; + assert (foo () == 5); + } + /* Run ssse3 version if possible. */ + if (__builtin_cpu_supports ("ssse3")) + { + mock_cpu = ""; + mock_isa = "ssse3"; + assert (foo () == 6); + } + /* Run sse3 version if possible. */ + if (__builtin_cpu_supports ("sse3")) + { + mock_cpu = ""; + mock_isa = "sse3"; + assert (foo () == 7); + } + /* Run sse2 version if possible. */ + if (__builtin_cpu_supports ("sse2")) + { + mock_cpu = ""; + mock_isa = "sse2"; + assert (foo () == 8); + } + /* Run sse version if possible. */ + if (__builtin_cpu_supports ("sse")) + { + mock_cpu = ""; + mock_isa = "sse"; + assert (foo () == 9); + } + /* Run mmx version if possible. */ + if (__builtin_cpu_supports ("mmx")) + { + mock_cpu = ""; + mock_isa = "mmx"; + assert (foo () == 10); + } + + /* Run the default version. */ + mock_cpu = ""; + mock_isa = ""; + assert (foo () == 0); + + return 0; +} + +int __attribute__ ((target("default"))) +foo () +{ + return 0; +} + +int __attribute__ ((target("arch=corei7"))) +foo () +{ + return 11; +} + +int __attribute__ ((target("mmx"))) +foo () +{ + return 10; +} + +int __attribute__ ((target("sse"))) +foo () +{ + return 9; +} + +int __attribute__ ((target("sse2"))) +foo () +{ + return 8; +} + +int __attribute__ ((target("sse3"))) +foo () +{ + return 7; +} + +int __attribute__ ((target("ssse3"))) +foo () +{ + return 6; +} + +int __attribute__ ((target("sse4.1"))) +foo () +{ + return 5; +} + +int __attribute__ ((target("sse4.2"))) +foo () +{ + return 4; +} + +int __attribute__ ((target("popcnt"))) +foo () +{ + return 3; +} + +int __attribute__ ((target("avx"))) +foo () +{ + return 2; +} + +int __attribute__ ((target("avx2"))) +foo () +{ + return 1; +} Index: testsuite/g++.dg/ext/mv2_debug.C =================================================================== --- testsuite/g++.dg/ext/mv2_debug.C (revision 0) +++ testsuite/g++.dg/ext/mv2_debug.C (revision 0) @@ -0,0 +1,4 @@ +/* Test case to check if mv2.C works with -fmultiversion-dynamic-dispatch additionally added. */ +/* { dg-do run { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmultiversion-dynamic-dispatch" } */ +/* { dg-additional-sources "mv2.C" } */ Index: testsuite/g++.dg/ext/mv6_debug.C =================================================================== --- testsuite/g++.dg/ext/mv6_debug.C (revision 0) +++ testsuite/g++.dg/ext/mv6_debug.C (revision 0) @@ -0,0 +1,4 @@ +/* Test case to check if mv6.C works with -fmultiversion-dynamic-dispatch additionally added. */ +/* { dg-do run { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-march=x86-64 -fmultiversion-dynamic-dispatch" } */ +/* { dg-additional-sources "mv6.C" } */ Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 198754) +++ config/i386/i386.c (working copy) @@ -26779,6 +26779,11 @@ enum ix86_builtins IX86_BUILTIN_CPU_IS, IX86_BUILTIN_CPU_SUPPORTS, + /* Builtins to mock CPU and ISA features, for + testing multiversioned functions. */ + IX86_BUILTIN_MOCK_CPU_IS, + IX86_BUILTIN_MOCK_CPU_SUPPORTS, + IX86_BUILTIN_MAX }; @@ -28631,11 +28636,14 @@ ix86_init_mmx_sse_builtins (void) to return a pointer to VERSION_DECL if the outcome of the expression formed by PREDICATE_CHAIN is true. This function will be called during version dispatch to decide which function version to execute. It returns - the basic block at the end, to which more conditions can be added. */ + the basic block at the end, to which more conditions can be added. When + DEBUG_MODE is 1, the version dispatcher is invoked for every call + to the multiversioned function. */ static basic_block add_condition_to_bb (tree function_decl, tree version_decl, - tree predicate_chain, basic_block new_bb) + tree predicate_chain, basic_block new_bb, + int debug_mode) { gimple return_stmt; tree convert_expr, result_var; @@ -28656,11 +28664,43 @@ add_condition_to_bb (tree function_decl, tree vers gcc_assert (new_bb != NULL); gseq = bb_seq (new_bb); + /* If debug_mode is true, generate a call to the versioned function + and return the output of the call. Otherwise, return a pointer to + the versioned function. */ - convert_expr = build1 (CONVERT_EXPR, ptr_type_node, - build_fold_addr_expr (version_decl)); - result_var = create_tmp_var (ptr_type_node, NULL); - convert_stmt = gimple_build_assign (result_var, convert_expr); + if (debug_mode) + { + tree arg; + tree ret_type = TREE_TYPE (TREE_TYPE (function_decl)); + vec tmp_vec = vNULL; + tmp_vec.create (2); + + arg = DECL_ARGUMENTS (function_decl); + + while (arg) + { + tmp_vec.safe_push (arg); + arg = DECL_CHAIN (arg); + } + + convert_stmt = gimple_build_call_vec (version_decl, tmp_vec); + tmp_vec.release (); + result_var = NULL; + + if (ret_type != void_type_node) + { + result_var = DECL_RESULT (function_decl); + gimple_call_set_lhs (convert_stmt, result_var); + } + } + else + { + convert_expr = build1 (CONVERT_EXPR, ptr_type_node, + build_fold_addr_expr (version_decl)); + result_var = DECL_RESULT (function_decl); + convert_stmt = gimple_build_assign (result_var, convert_expr); + } + return_stmt = gimple_build_return (result_var); if (predicate_chain == NULL_TREE) @@ -28742,10 +28782,11 @@ add_condition_to_bb (tree function_decl, tree vers the right builtin to use to match the platform specification. It returns the priority value for this version decl. If PREDICATE_LIST is not NULL, it stores the list of cpu features that need to be checked - before dispatching this function. */ + before dispatching this function. When debug_mode is 1, use the mock + cpu check builtins to do the dispatch. */ static unsigned int -get_builtin_code_for_version (tree decl, tree *predicate_list) +get_builtin_code_for_version (tree decl, tree *predicate_list, int debug_mode) { tree attrs; struct cl_target_option cur_target; @@ -28882,7 +28923,10 @@ static unsigned int if (predicate_list) { - predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS]; + if (debug_mode) + predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_IS]; + else + predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS]; /* For a C string literal the length includes the trailing NULL. */ predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str); predicate_chain = tree_cons (predicate_decl, predicate_arg, @@ -28894,8 +28938,12 @@ static unsigned int tok_str = (char *) xmalloc (strlen (attrs_str) + 1); strcpy (tok_str, attrs_str); token = strtok (tok_str, ","); - predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS]; + if (debug_mode) + predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_SUPPORTS]; + else + predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS]; + while (token != NULL) { /* Do not process "arch=" */ @@ -28957,8 +29005,8 @@ static unsigned int static int ix86_compare_version_priority (tree decl1, tree decl2) { - unsigned int priority1 = get_builtin_code_for_version (decl1, NULL); - unsigned int priority2 = get_builtin_code_for_version (decl2, NULL); + unsigned int priority1 = get_builtin_code_for_version (decl1, NULL, false); + unsigned int priority2 = get_builtin_code_for_version (decl2, NULL, false); return (int)priority1 - (int)priority2; } @@ -28985,12 +29033,15 @@ feature_compare (const void *v1, const void *v2) multi-versioned functions. DISPATCH_DECL is the function which will contain the dispatch logic. FNDECLS are the function choices for dispatch, and is a tree chain. EMPTY_BB is the basic block pointer - in DISPATCH_DECL in which the dispatch code is generated. */ + in DISPATCH_DECL in which the dispatch code is generated. When + DEBUG_MODE is 1, the version dispatcher is invoked for every call + to the multiversioned function. */ static int dispatch_function_versions (tree dispatch_decl, void *fndecls_p, - basic_block *empty_bb) + basic_block *empty_bb, + int debug_mode) { tree default_decl; gimple ifunc_cpu_init_stmt; @@ -29048,8 +29099,8 @@ dispatch_function_versions (tree dispatch_decl, /* Get attribute string, parse it and find the right predicate decl. The predicate function could be a lengthy combination of many features, like arch-type and various isa-variants. */ - priority = get_builtin_code_for_version (version_decl, - &predicate_chain); + priority = get_builtin_code_for_version (version_decl, &predicate_chain, + debug_mode); if (predicate_chain == NULL_TREE) continue; @@ -29072,11 +29123,11 @@ dispatch_function_versions (tree dispatch_decl, *empty_bb = add_condition_to_bb (dispatch_decl, function_version_info[i].version_decl, function_version_info[i].predicate_chain, - *empty_bb); + *empty_bb, debug_mode); /* dispatch default version at the end. */ *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, - NULL, *empty_bb); + NULL, *empty_bb, debug_mode); free (function_version_info); return 0; @@ -29446,7 +29497,7 @@ ix86_get_function_versions_dispatcher (void *decl) default_node = default_version_info->this_node; #if defined (ASM_OUTPUT_TYPE_DIRECTIVE) - if (targetm.has_ifunc_p ()) + if (targetm.has_ifunc_p () || flag_multiversion_dynamic_dispatch) { struct cgraph_function_version_info *it_v = NULL; struct cgraph_node *dispatcher_node = NULL; @@ -29475,8 +29526,9 @@ ix86_get_function_versions_dispatcher (void *decl) #endif { error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl), - "multiversioning needs ifunc which is not supported " - "on this target"); + "multiversioning needs ifunc" + " (or use -fmultiversion-dynamic-dispatch)" + " which is not supported on this target"); } return dispatch_decl; @@ -29503,15 +29555,19 @@ make_attribute (const char *name, const char *arg_ /* Make the resolver function decl to dispatch the versions of a multi-versioned function, DEFAULT_DECL. Create an empty basic block in the resolver and store the pointer in - EMPTY_BB. Return the decl of the resolver function. */ + EMPTY_BB. Return the decl of the resolver function. When + DEBUG_MODE is 1, the resolver function body is not an + ifunc resolver; it simply calls the appropriate function + version and returns the call output. */ static tree make_resolver_func (const tree default_decl, const tree dispatch_decl, - basic_block *empty_bb) + basic_block *empty_bb, + int debug_mode) { char *resolver_name; - tree decl, type, decl_name, t; + tree decl, type, decl_name, t = NULL; bool is_uniq = false; /* IFUNC's have to be globally visible. So, if the default_decl is @@ -29526,8 +29582,19 @@ make_resolver_func (const tree default_decl, another module which is based on the same version name. */ resolver_name = make_name (default_decl, "resolver", is_uniq); - /* The resolver function should return a (void *). */ - type = build_function_type_list (ptr_type_node, NULL_TREE); + if (debug_mode) + { + /* In debug_mode, the resolver function calls the appropriate + function version. Its type is same as dispatch_decl. */ + tree fn_type = TREE_TYPE (dispatch_decl); + type = build_function_type (TREE_TYPE (fn_type), + TYPE_ARG_TYPES (fn_type)); + } + else + { + /* The resolver function should return a (void *). */ + type = build_function_type_list (ptr_type_node, NULL_TREE); + } decl = build_fn_decl (resolver_name, type); decl_name = get_identifier (resolver_name); @@ -29549,6 +29616,16 @@ make_resolver_func (const tree default_decl, DECL_INITIAL (decl) = make_node (BLOCK); DECL_STATIC_CONSTRUCTOR (decl) = 0; + /* In debug_mode, the resolver function is not an ifunc resolver. Its + signature is the same as the dispatch_decl or default_decl. */ + if (debug_mode) + { + tree arg; + DECL_ARGUMENTS (decl) = copy_list (DECL_ARGUMENTS (default_decl)); + for (arg = DECL_ARGUMENTS (decl); arg ; arg = DECL_CHAIN (arg)) + DECL_CONTEXT (arg) = decl; + } + if (DECL_COMDAT_GROUP (default_decl) || TREE_PUBLIC (default_decl)) { @@ -29559,7 +29636,9 @@ make_resolver_func (const tree default_decl, make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl)); } /* Build result decl and add to function_decl. */ - t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, + TREE_TYPE (TREE_TYPE (decl))); + DECL_ARTIFICIAL (t) = 1; DECL_IGNORED_P (t) = 1; DECL_RESULT (decl) = t; @@ -29574,9 +29653,17 @@ make_resolver_func (const tree default_decl, pop_cfun (); gcc_assert (dispatch_decl != NULL); - /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */ - DECL_ATTRIBUTES (dispatch_decl) - = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl)); + + /* Mark dispatch_decl as "alias" or "ifunc" with resolver as + resolver_name. */ + if (debug_mode) + DECL_ATTRIBUTES (dispatch_decl) + = make_attribute ("alias", resolver_name, + DECL_ATTRIBUTES (dispatch_decl)); + else + DECL_ATTRIBUTES (dispatch_decl) + = make_attribute ("ifunc", resolver_name, + DECL_ATTRIBUTES (dispatch_decl)); /* Create the alias for dispatch to resolver here. */ /*cgraph_create_function_alias (dispatch_decl, decl);*/ @@ -29588,10 +29675,13 @@ make_resolver_func (const tree default_decl, /* Generate the dispatching code body to dispatch multi-versioned function DECL. The target hook is called to process the "target" attributes and provide the code to dispatch the right function at run-time. NODE points - to the dispatcher decl whose body will be created. */ + to the dispatcher decl whose body will be created. When DEBUG_MODE is + 1, the dispatch checks should be made during every call to the versioned + function. When DEBUG_MODE is 0, ifunc based dispatching is used to + keep the call overhead small. */ static tree -ix86_generate_version_dispatcher_body (void *node_p) +ix86_generate_version_dispatcher_body (void *node_p, int debug_mode) { tree resolver_decl; basic_block empty_bb; @@ -29618,8 +29708,8 @@ static tree /* node is going to be an alias, so remove the finalized bit. */ node->local.finalized = false; - resolver_decl = make_resolver_func (default_ver_decl, - node->symbol.decl, &empty_bb); + resolver_decl = make_resolver_func (default_ver_decl, node->symbol.decl, + &empty_bb, debug_mode); node_version_info->dispatcher_resolver = resolver_decl; @@ -29642,7 +29732,8 @@ static tree fn_ver_vec.safe_push (versn->symbol.decl); } - dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb); + dispatch_function_versions (resolver_decl, &fn_ver_vec, + &empty_bb, debug_mode); fn_ver_vec.release (); rebuild_cgraph_edges (); pop_cfun (); @@ -29828,7 +29919,8 @@ fold_builtin_cpu (tree fndecl, tree *args) gcc_assert (param_string_cst); - if (fn_code == IX86_BUILTIN_CPU_IS) + if (fn_code == IX86_BUILTIN_CPU_IS + || fn_code == IX86_BUILTIN_MOCK_CPU_IS) { tree ref; tree field; @@ -29877,7 +29969,8 @@ fold_builtin_cpu (tree fndecl, tree *args) build_int_cstu (unsigned_type_node, field_val)); return build1 (CONVERT_EXPR, integer_type_node, final); } - else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS) + else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS + || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS) { tree ref; tree array_elt; @@ -29931,7 +30024,9 @@ ix86_fold_builtin (tree fndecl, int n_args, enum ix86_builtins fn_code = (enum ix86_builtins) DECL_FUNCTION_CODE (fndecl); if (fn_code == IX86_BUILTIN_CPU_IS - || fn_code == IX86_BUILTIN_CPU_SUPPORTS) + || fn_code == IX86_BUILTIN_CPU_SUPPORTS + || fn_code == IX86_BUILTIN_MOCK_CPU_IS + || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS) { gcc_assert (n_args == 1); return fold_builtin_cpu (fndecl, args); @@ -29981,6 +30076,13 @@ ix86_init_platform_type_builtins (void) INT_FTYPE_PCCHAR, true); make_cpu_type_builtin ("__builtin_cpu_supports", IX86_BUILTIN_CPU_SUPPORTS, INT_FTYPE_PCCHAR, true); + /* Create builtins that mock cpu type and isa features. This is meant to + be used for code coverage testing of multiversioned functions. */ + make_cpu_type_builtin ("__builtin_mock_cpu_is", IX86_BUILTIN_MOCK_CPU_IS, + INT_FTYPE_PCCHAR, false); + make_cpu_type_builtin ("__builtin_mock_cpu_supports", + IX86_BUILTIN_MOCK_CPU_SUPPORTS, + INT_FTYPE_PCCHAR, false); } /* Internal method for ix86_init_builtins. */ @@ -31701,6 +31803,8 @@ ix86_expand_builtin (tree exp, rtx target, rtx sub call_expr = build_call_expr (fndecl, 0); return expand_expr (call_expr, target, mode, EXPAND_NORMAL); } + case IX86_BUILTIN_MOCK_CPU_IS: + case IX86_BUILTIN_MOCK_CPU_SUPPORTS: case IX86_BUILTIN_CPU_IS: case IX86_BUILTIN_CPU_SUPPORTS: {