Patchwork [google,4.7] Using CPU mocks to test code coverage of multiversioned functions

login
register
mail settings
Submitter Sriraman Tallam
Date March 15, 2013, 9:55 p.m.
Message ID <CAAs8HmyUjEf5ecGBerysqSFgOq3SC0esvN3AnEJm5O_kxzrGyQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/228188/
State New
Headers show

Comments

Sriraman Tallam - March 15, 2013, 9:55 p.m.
Hi,

   This patch is meant for google/gcc-4_7 but I want this to be
considered for trunk when it opens again. This patch makes it easy to
test for code coverage of multiversioned functions. Here is a
motivating example:

__attribute__((target ("default"))) int foo () { ... return 0; }
__attribute__((target ("sse"))) int foo () { ... return 1; }
__attribute__((target ("popcnt"))) int foo () { ... return 2; }

int main ()
{
  return foo();
}

Lets say your test CPU supports popcnt.  A run of this program will
invoke the popcnt version of foo (). Then, how do we test the sse
version of foo()? To do that for the above example, we need to run
this code on a CPU that has sse support but no popcnt support.
Otherwise, we need to comment out the popcnt version and run this
example. This can get painful when there are many versions. The same
argument applies to testing  the default version of foo.

So, I am introducing the ability to mock a CPU. If the CPU you are
testing on supports sse, you should be able to test the sse version.

First, I have introduced a new flag called -fmv-debug.  This patch
invokes the function version dispatcher every time a call to a foo ()
is made. Without that flag, the version dispatch happens once at
startup time via the IFUNC mechanism.

Also, with -fmv-debug, the version dispatcher uses the two new
builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
check the cpu type and cpu isa.

Then, I plan to add the following hooks to libgcc (in a different patch) :

int set_mock_cpu_is (const char *cpu);
int set_mock_cpu_supports (const char *isa);
int init_mock_cpu (); // Clear the values of the mock cpu.

With this support, here is how you can test for code coverage of the
"sse" version and "default version of foo in the above example:

int main ()
{
  // Test SSE version.
   if (__builtin_cpu_supports ("sse"))
   {
     init_mock_cpu();
     set_mock_cpu_supports ("sse");
     assert (foo () == 1);
   }
  // Test default version.
  init_mock_cpu();
  assert (foo () == 0);
}

Invoking a multiversioned binary several times with appropriate mock
cpu values for the various ISAs and CPUs will give the complete code
coverage desired. Ofcourse, the underlying platform should be able to
support the various features.

Note that the above test will work only with -fmv-debug as the
dispatcher must be invoked on every multiversioned call to be able to
dynamically change the version.

Multiple ISA features can be set in the mock cpu by calling
"set_mock_cpu_supports" several times with different ISA names.
Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
will set the CPU type.

This patch only includes the gcc changes.  I will separately prepare a
patch for the libgcc changes. Right now, since the libgcc changes are
not available the two new mock cpu builtins check the real CPU like
"__builtin_cpu_is" and "__builtin_cpu_supports".

Patch attached.  Please look at mv14_debug_code_coverage.C for an
exhaustive example of testing for code coverage in the presence of
multiple versions.

Comments please.

Thanks
Sri
Xinliang David Li - March 15, 2013, 10:37 p.m.
On Fri, Mar 15, 2013 at 2:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>    This patch is meant for google/gcc-4_7 but I want this to be
> considered for trunk when it opens again. This patch makes it easy to
> test for code coverage of multiversioned functions. Here is a
> motivating example:
>
> __attribute__((target ("default"))) int foo () { ... return 0; }
> __attribute__((target ("sse"))) int foo () { ... return 1; }
> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>
> int main ()
> {
>   return foo();
> }
>
> Lets say your test CPU supports popcnt.  A run of this program will
> invoke the popcnt version of foo (). Then, how do we test the sse
> version of foo()? To do that for the above example, we need to run
> this code on a CPU that has sse support but no popcnt support.
> Otherwise, we need to comment out the popcnt version and run this
> example. This can get painful when there are many versions. The same
> argument applies to testing  the default version of foo.
>
> So, I am introducing the ability to mock a CPU. If the CPU you are
> testing on supports sse, you should be able to test the sse version.
>
> First, I have introduced a new flag called -fmv-debug.  This patch
> invokes the function version dispatcher every time a call to a foo ()
> is made. Without that flag, the version dispatch happens once at
> startup time via the IFUNC mechanism.
>
> Also, with -fmv-debug, the version dispatcher uses the two new
> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
> check the cpu type and cpu isa.

With this option, compiler probably can also define some macros so
that if user can use to write overriding hooks.

>
> Then, I plan to add the following hooks to libgcc (in a different patch) :
>
> int set_mock_cpu_is (const char *cpu);
> int set_mock_cpu_supports (const char *isa);
> int init_mock_cpu (); // Clear the values of the mock cpu.
>
> With this support, here is how you can test for code coverage of the
> "sse" version and "default version of foo in the above example:
>
> int main ()
> {
>   // Test SSE version.
>    if (__builtin_cpu_supports ("sse"))
>    {
>      init_mock_cpu();
>      set_mock_cpu_supports ("sse");
>      assert (foo () == 1);
>    }
>   // Test default version.
>   init_mock_cpu();
>   assert (foo () == 0);
> }
>
> Invoking a multiversioned binary several times with appropriate mock
> cpu values for the various ISAs and CPUs will give the complete code
> coverage desired. Ofcourse, the underlying platform should be able to
> support the various features.
>

It is the other way around -- it simplifies unit test writing and
running -- one unit test just need to be run on the same hardware
(with the most hw features) *ONCE* and all the versions can be
covered.



> Note that the above test will work only with -fmv-debug as the
> dispatcher must be invoked on every multiversioned call to be able to
> dynamically change the version.
>
> Multiple ISA features can be set in the mock cpu by calling
> "set_mock_cpu_supports" several times with different ISA names.
> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
> will set the CPU type.
>


Just through about another idea. Is it possible for compiler to create
some alias for each version so that they can be accessed explicitly,
just like the use of :: ?

if (__buitin_cpu_supports ("sse"))
   CHECK_RESULT (foo_sse (...));

CHECK_RESULT (foo_default(...));

...

David


> This patch only includes the gcc changes.  I will separately prepare a
> patch for the libgcc changes. Right now, since the libgcc changes are
> not available the two new mock cpu builtins check the real CPU like
> "__builtin_cpu_is" and "__builtin_cpu_supports".
>
> Patch attached.  Please look at mv14_debug_code_coverage.C for an
> exhaustive example of testing for code coverage in the presence of
> multiple versions.
>
> Comments please.
>
> Thanks
> Sri
Sriraman Tallam - March 15, 2013, 10:49 p.m.
On Fri, Mar 15, 2013 at 3:37 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Mar 15, 2013 at 2:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>    This patch is meant for google/gcc-4_7 but I want this to be
>> considered for trunk when it opens again. This patch makes it easy to
>> test for code coverage of multiversioned functions. Here is a
>> motivating example:
>>
>> __attribute__((target ("default"))) int foo () { ... return 0; }
>> __attribute__((target ("sse"))) int foo () { ... return 1; }
>> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>>
>> int main ()
>> {
>>   return foo();
>> }
>>
>> Lets say your test CPU supports popcnt.  A run of this program will
>> invoke the popcnt version of foo (). Then, how do we test the sse
>> version of foo()? To do that for the above example, we need to run
>> this code on a CPU that has sse support but no popcnt support.
>> Otherwise, we need to comment out the popcnt version and run this
>> example. This can get painful when there are many versions. The same
>> argument applies to testing  the default version of foo.
>>
>> So, I am introducing the ability to mock a CPU. If the CPU you are
>> testing on supports sse, you should be able to test the sse version.
>>
>> First, I have introduced a new flag called -fmv-debug.  This patch
>> invokes the function version dispatcher every time a call to a foo ()
>> is made. Without that flag, the version dispatch happens once at
>> startup time via the IFUNC mechanism.
>>
>> Also, with -fmv-debug, the version dispatcher uses the two new
>> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
>> check the cpu type and cpu isa.
>
> With this option, compiler probably can also define some macros so
> that if user can use to write overriding hooks.
>
>>
>> Then, I plan to add the following hooks to libgcc (in a different patch) :
>>
>> int set_mock_cpu_is (const char *cpu);
>> int set_mock_cpu_supports (const char *isa);
>> int init_mock_cpu (); // Clear the values of the mock cpu.
>>
>> With this support, here is how you can test for code coverage of the
>> "sse" version and "default version of foo in the above example:
>>
>> int main ()
>> {
>>   // Test SSE version.
>>    if (__builtin_cpu_supports ("sse"))
>>    {
>>      init_mock_cpu();
>>      set_mock_cpu_supports ("sse");
>>      assert (foo () == 1);
>>    }
>>   // Test default version.
>>   init_mock_cpu();
>>   assert (foo () == 0);
>> }
>>
>> Invoking a multiversioned binary several times with appropriate mock
>> cpu values for the various ISAs and CPUs will give the complete code
>> coverage desired. Ofcourse, the underlying platform should be able to
>> support the various features.
>>
>
> It is the other way around -- it simplifies unit test writing and
> running -- one unit test just need to be run on the same hardware
> (with the most hw features) *ONCE* and all the versions can be
> covered.


Yes,  the test needs to run just once, potentially, if the test
platform can support all of the features.

>
>
>
>> Note that the above test will work only with -fmv-debug as the
>> dispatcher must be invoked on every multiversioned call to be able to
>> dynamically change the version.
>>
>> Multiple ISA features can be set in the mock cpu by calling
>> "set_mock_cpu_supports" several times with different ISA names.
>> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
>> will set the CPU type.
>>
>
>
> Just through about another idea. Is it possible for compiler to create
> some alias for each version so that they can be accessed explicitly,
> just like the use of :: ?
>
> if (__buitin_cpu_supports ("sse"))
>    CHECK_RESULT (foo_sse (...));
>
> CHECK_RESULT (foo_default(...));

This will work for this example. But, in general, this means changing
the call site of every multiversioned call and that can become
infeasible.

Thanks
Sri


>
> ...
>
> David
>
>
>> This patch only includes the gcc changes.  I will separately prepare a
>> patch for the libgcc changes. Right now, since the libgcc changes are
>> not available the two new mock cpu builtins check the real CPU like
>> "__builtin_cpu_is" and "__builtin_cpu_supports".
>>
>> Patch attached.  Please look at mv14_debug_code_coverage.C for an
>> exhaustive example of testing for code coverage in the presence of
>> multiple versions.
>>
>> Comments please.
>>
>> Thanks
>> Sri
Xinliang David Li - March 15, 2013, 11:15 p.m.
Ok. If the use case is to enable the test of  the same application
binary (not the per function unit test) with CPU mocking at runtime
(via environment variable or application specific flags), the proposed
changes make sense.

David

On Fri, Mar 15, 2013 at 3:49 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Mar 15, 2013 at 3:37 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Fri, Mar 15, 2013 at 2:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>    This patch is meant for google/gcc-4_7 but I want this to be
>>> considered for trunk when it opens again. This patch makes it easy to
>>> test for code coverage of multiversioned functions. Here is a
>>> motivating example:
>>>
>>> __attribute__((target ("default"))) int foo () { ... return 0; }
>>> __attribute__((target ("sse"))) int foo () { ... return 1; }
>>> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>>>
>>> int main ()
>>> {
>>>   return foo();
>>> }
>>>
>>> Lets say your test CPU supports popcnt.  A run of this program will
>>> invoke the popcnt version of foo (). Then, how do we test the sse
>>> version of foo()? To do that for the above example, we need to run
>>> this code on a CPU that has sse support but no popcnt support.
>>> Otherwise, we need to comment out the popcnt version and run this
>>> example. This can get painful when there are many versions. The same
>>> argument applies to testing  the default version of foo.
>>>
>>> So, I am introducing the ability to mock a CPU. If the CPU you are
>>> testing on supports sse, you should be able to test the sse version.
>>>
>>> First, I have introduced a new flag called -fmv-debug.  This patch
>>> invokes the function version dispatcher every time a call to a foo ()
>>> is made. Without that flag, the version dispatch happens once at
>>> startup time via the IFUNC mechanism.
>>>
>>> Also, with -fmv-debug, the version dispatcher uses the two new
>>> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
>>> check the cpu type and cpu isa.
>>
>> With this option, compiler probably can also define some macros so
>> that if user can use to write overriding hooks.
>>
>>>
>>> Then, I plan to add the following hooks to libgcc (in a different patch) :
>>>
>>> int set_mock_cpu_is (const char *cpu);
>>> int set_mock_cpu_supports (const char *isa);
>>> int init_mock_cpu (); // Clear the values of the mock cpu.
>>>
>>> With this support, here is how you can test for code coverage of the
>>> "sse" version and "default version of foo in the above example:
>>>
>>> int main ()
>>> {
>>>   // Test SSE version.
>>>    if (__builtin_cpu_supports ("sse"))
>>>    {
>>>      init_mock_cpu();
>>>      set_mock_cpu_supports ("sse");
>>>      assert (foo () == 1);
>>>    }
>>>   // Test default version.
>>>   init_mock_cpu();
>>>   assert (foo () == 0);
>>> }
>>>
>>> Invoking a multiversioned binary several times with appropriate mock
>>> cpu values for the various ISAs and CPUs will give the complete code
>>> coverage desired. Ofcourse, the underlying platform should be able to
>>> support the various features.
>>>
>>
>> It is the other way around -- it simplifies unit test writing and
>> running -- one unit test just need to be run on the same hardware
>> (with the most hw features) *ONCE* and all the versions can be
>> covered.
>
>
> Yes,  the test needs to run just once, potentially, if the test
> platform can support all of the features.
>
>>
>>
>>
>>> Note that the above test will work only with -fmv-debug as the
>>> dispatcher must be invoked on every multiversioned call to be able to
>>> dynamically change the version.
>>>
>>> Multiple ISA features can be set in the mock cpu by calling
>>> "set_mock_cpu_supports" several times with different ISA names.
>>> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
>>> will set the CPU type.
>>>
>>
>>
>> Just through about another idea. Is it possible for compiler to create
>> some alias for each version so that they can be accessed explicitly,
>> just like the use of :: ?
>>
>> if (__buitin_cpu_supports ("sse"))
>>    CHECK_RESULT (foo_sse (...));
>>
>> CHECK_RESULT (foo_default(...));
>
> This will work for this example. But, in general, this means changing
> the call site of every multiversioned call and that can become
> infeasible.
>
> Thanks
> Sri
>
>
>>
>> ...
>>
>> David
>>
>>
>>> This patch only includes the gcc changes.  I will separately prepare a
>>> patch for the libgcc changes. Right now, since the libgcc changes are
>>> not available the two new mock cpu builtins check the real CPU like
>>> "__builtin_cpu_is" and "__builtin_cpu_supports".
>>>
>>> Patch attached.  Please look at mv14_debug_code_coverage.C for an
>>> exhaustive example of testing for code coverage in the presence of
>>> multiple versions.
>>>
>>> Comments please.
>>>
>>> Thanks
>>> Sri
Richard Guenther - March 18, 2013, 9:02 a.m.
On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>    This patch is meant for google/gcc-4_7 but I want this to be
> considered for trunk when it opens again. This patch makes it easy to
> test for code coverage of multiversioned functions. Here is a
> motivating example:
>
> __attribute__((target ("default"))) int foo () { ... return 0; }
> __attribute__((target ("sse"))) int foo () { ... return 1; }
> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>
> int main ()
> {
>   return foo();
> }
>
> Lets say your test CPU supports popcnt.  A run of this program will
> invoke the popcnt version of foo (). Then, how do we test the sse
> version of foo()? To do that for the above example, we need to run
> this code on a CPU that has sse support but no popcnt support.
> Otherwise, we need to comment out the popcnt version and run this
> example. This can get painful when there are many versions. The same
> argument applies to testing  the default version of foo.
>
> So, I am introducing the ability to mock a CPU. If the CPU you are
> testing on supports sse, you should be able to test the sse version.
>
> First, I have introduced a new flag called -fmv-debug.  This patch
> invokes the function version dispatcher every time a call to a foo ()
> is made. Without that flag, the version dispatch happens once at
> startup time via the IFUNC mechanism.
>
> Also, with -fmv-debug, the version dispatcher uses the two new
> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
> check the cpu type and cpu isa.
>
> Then, I plan to add the following hooks to libgcc (in a different patch) :
>
> int set_mock_cpu_is (const char *cpu);
> int set_mock_cpu_supports (const char *isa);
> int init_mock_cpu (); // Clear the values of the mock cpu.
>
> With this support, here is how you can test for code coverage of the
> "sse" version and "default version of foo in the above example:
>
> int main ()
> {
>   // Test SSE version.
>    if (__builtin_cpu_supports ("sse"))
>    {
>      init_mock_cpu();
>      set_mock_cpu_supports ("sse");
>      assert (foo () == 1);
>    }
>   // Test default version.
>   init_mock_cpu();
>   assert (foo () == 0);
> }
>
> Invoking a multiversioned binary several times with appropriate mock
> cpu values for the various ISAs and CPUs will give the complete code
> coverage desired. Ofcourse, the underlying platform should be able to
> support the various features.
>
> Note that the above test will work only with -fmv-debug as the
> dispatcher must be invoked on every multiversioned call to be able to
> dynamically change the version.
>
> Multiple ISA features can be set in the mock cpu by calling
> "set_mock_cpu_supports" several times with different ISA names.
> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
> will set the CPU type.
>
> This patch only includes the gcc changes.  I will separately prepare a
> patch for the libgcc changes. Right now, since the libgcc changes are
> not available the two new mock cpu builtins check the real CPU like
> "__builtin_cpu_is" and "__builtin_cpu_supports".
>
> Patch attached.  Please look at mv14_debug_code_coverage.C for an
> exhaustive example of testing for code coverage in the presence of
> multiple versions.
>
> Comments please.

Err.  As we are using IFUNCs isn't it simply possible to do this in
the dynamic loader - for example by simlply pre-loading a library
with the IFUNC relocators implemented differently?  Thus, shouldn't
we simply provide such library as a convenience?

Thanks,
Richard.

> Thanks
> Sri
Xinliang David Li - March 18, 2013, 4:05 p.m.
Interesting idea about lazy IFUNC relocation.

David

On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>    This patch is meant for google/gcc-4_7 but I want this to be
>> considered for trunk when it opens again. This patch makes it easy to
>> test for code coverage of multiversioned functions. Here is a
>> motivating example:
>>
>> __attribute__((target ("default"))) int foo () { ... return 0; }
>> __attribute__((target ("sse"))) int foo () { ... return 1; }
>> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>>
>> int main ()
>> {
>>   return foo();
>> }
>>
>> Lets say your test CPU supports popcnt.  A run of this program will
>> invoke the popcnt version of foo (). Then, how do we test the sse
>> version of foo()? To do that for the above example, we need to run
>> this code on a CPU that has sse support but no popcnt support.
>> Otherwise, we need to comment out the popcnt version and run this
>> example. This can get painful when there are many versions. The same
>> argument applies to testing  the default version of foo.
>>
>> So, I am introducing the ability to mock a CPU. If the CPU you are
>> testing on supports sse, you should be able to test the sse version.
>>
>> First, I have introduced a new flag called -fmv-debug.  This patch
>> invokes the function version dispatcher every time a call to a foo ()
>> is made. Without that flag, the version dispatch happens once at
>> startup time via the IFUNC mechanism.
>>
>> Also, with -fmv-debug, the version dispatcher uses the two new
>> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
>> check the cpu type and cpu isa.
>>
>> Then, I plan to add the following hooks to libgcc (in a different patch) :
>>
>> int set_mock_cpu_is (const char *cpu);
>> int set_mock_cpu_supports (const char *isa);
>> int init_mock_cpu (); // Clear the values of the mock cpu.
>>
>> With this support, here is how you can test for code coverage of the
>> "sse" version and "default version of foo in the above example:
>>
>> int main ()
>> {
>>   // Test SSE version.
>>    if (__builtin_cpu_supports ("sse"))
>>    {
>>      init_mock_cpu();
>>      set_mock_cpu_supports ("sse");
>>      assert (foo () == 1);
>>    }
>>   // Test default version.
>>   init_mock_cpu();
>>   assert (foo () == 0);
>> }
>>
>> Invoking a multiversioned binary several times with appropriate mock
>> cpu values for the various ISAs and CPUs will give the complete code
>> coverage desired. Ofcourse, the underlying platform should be able to
>> support the various features.
>>
>> Note that the above test will work only with -fmv-debug as the
>> dispatcher must be invoked on every multiversioned call to be able to
>> dynamically change the version.
>>
>> Multiple ISA features can be set in the mock cpu by calling
>> "set_mock_cpu_supports" several times with different ISA names.
>> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
>> will set the CPU type.
>>
>> This patch only includes the gcc changes.  I will separately prepare a
>> patch for the libgcc changes. Right now, since the libgcc changes are
>> not available the two new mock cpu builtins check the real CPU like
>> "__builtin_cpu_is" and "__builtin_cpu_supports".
>>
>> Patch attached.  Please look at mv14_debug_code_coverage.C for an
>> exhaustive example of testing for code coverage in the presence of
>> multiple versions.
>>
>> Comments please.
>
> Err.  As we are using IFUNCs isn't it simply possible to do this in
> the dynamic loader - for example by simlply pre-loading a library
> with the IFUNC relocators implemented differently?  Thus, shouldn't
> we simply provide such library as a convenience?
>
> Thanks,
> Richard.
>
>> Thanks
>> Sri
Paul Pluzhnikov - March 18, 2013, 5:02 p.m.
+cc libc-alpha

On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li <davidxl@google.com> wrote:
> Interesting idea about lazy IFUNC relocation.

> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:

>> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:

>>>    This patch is meant for google/gcc-4_7 but I want this to be
>>> considered for trunk when it opens again. This patch makes it easy to
>>> test for code coverage of multiversioned functions. Here is a
>>> motivating example:

>> Err.  As we are using IFUNCs isn't it simply possible to do this in
>> the dynamic loader - for example by simlply pre-loading a library
>> with the IFUNC relocators implemented differently?  Thus, shouldn't
>> we simply provide such library as a convenience?

A similar need exists in glibc itself: it too has multiversioned functions,
and lack of testing has led to recent bugs in some of them.

HJ has added a framework to test IFUNCs to glibc late last year, but it
would be nice to have a more general IFUNC control, so I could e.g. run
a binary on SSE4-capable machine A as that binary would run on SSE2-only
capable machine B.

(We've had a few bugs recently, were the crash would only show on machine
B and not A. These are a pain to debug, as I may not have access to B.)

If such a controller is implemented, I'd think it would have to be part
of GLIBC (or part of the ld-linux itself), and not of libgcc.

  LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available

Thanks,
H.J. Lu - March 18, 2013, 5:10 p.m.
On Mon, Mar 18, 2013 at 10:02 AM, Paul Pluzhnikov
<ppluzhnikov@google.com> wrote:
> +cc libc-alpha
>
> On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Interesting idea about lazy IFUNC relocation.
>
>> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>
>>> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>
>>>>    This patch is meant for google/gcc-4_7 but I want this to be
>>>> considered for trunk when it opens again. This patch makes it easy to
>>>> test for code coverage of multiversioned functions. Here is a
>>>> motivating example:
>
>>> Err.  As we are using IFUNCs isn't it simply possible to do this in
>>> the dynamic loader - for example by simlply pre-loading a library
>>> with the IFUNC relocators implemented differently?  Thus, shouldn't
>>> we simply provide such library as a convenience?
>
> A similar need exists in glibc itself: it too has multiversioned functions,
> and lack of testing has led to recent bugs in some of them.
>
> HJ has added a framework to test IFUNCs to glibc late last year, but it
> would be nice to have a more general IFUNC control, so I could e.g. run
> a binary on SSE4-capable machine A as that binary would run on SSE2-only
> capable machine B.
>
> (We've had a few bugs recently, were the crash would only show on machine
> B and not A. These are a pain to debug, as I may not have access to B.)
>
> If such a controller is implemented, I'd think it would have to be part
> of GLIBC (or part of the ld-linux itself), and not of libgcc.
>
>   LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available
>

We can pass environment variables to IFUNC selector.   Maybe we can
enable it for debug build.
Richard Guenther - March 18, 2013, 5:18 p.m.
"H.J. Lu" <hjl.tools@gmail.com> wrote:

>On Mon, Mar 18, 2013 at 10:02 AM, Paul Pluzhnikov
><ppluzhnikov@google.com> wrote:
>> +cc libc-alpha
>>
>> On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li
><davidxl@google.com> wrote:
>>> Interesting idea about lazy IFUNC relocation.
>>
>>> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>
>>>> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam
><tmsriram@google.com> wrote:
>>
>>>>>    This patch is meant for google/gcc-4_7 but I want this to be
>>>>> considered for trunk when it opens again. This patch makes it easy
>to
>>>>> test for code coverage of multiversioned functions. Here is a
>>>>> motivating example:
>>
>>>> Err.  As we are using IFUNCs isn't it simply possible to do this in
>>>> the dynamic loader - for example by simlply pre-loading a library
>>>> with the IFUNC relocators implemented differently?  Thus, shouldn't
>>>> we simply provide such library as a convenience?
>>
>> A similar need exists in glibc itself: it too has multiversioned
>functions,
>> and lack of testing has led to recent bugs in some of them.
>>
>> HJ has added a framework to test IFUNCs to glibc late last year, but
>it
>> would be nice to have a more general IFUNC control, so I could e.g.
>run
>> a binary on SSE4-capable machine A as that binary would run on
>SSE2-only
>> capable machine B.
>>
>> (We've had a few bugs recently, were the crash would only show on
>machine
>> B and not A. These are a pain to debug, as I may not have access to
>B.)
>>
>> If such a controller is implemented, I'd think it would have to be
>part
>> of GLIBC (or part of the ld-linux itself), and not of libgcc.
>>
>>   LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are
>available
>>
>
>We can pass environment variables to IFUNC selector.   Maybe we can
>enable it for debug build.

I was asking for the ifunc selector to be
Overridable by ld_preload or a similar mechanism at dynamic load time.

Richard.
Paul Pluzhnikov - March 18, 2013, 5:24 p.m.
On Mon, Mar 18, 2013 at 10:18 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> wrote:

>>We can pass environment variables to IFUNC selector.   Maybe we can
>>enable it for debug build.

Enabling this for just debug builds would not cover my use case.

If the environment variable is used at loader initialization time to
override CPUID output, then the runtime cost of that code would be minuscule,
and it can be available in production glibc builds.

> I was asking for the ifunc selector to be
> Overridable by ld_preload or a similar mechanism at dynamic load time.

Yes, that's how I understood you.

I don't believe it would be easy to implement such interposer (if
possible at all), and it would be very much tied to glibc internals.

Overriding CPUID at loader initialization time sounds simpler (but I
haven't looked at the code yet :-).
Alan Modra - March 19, 2013, 5:44 a.m.
On Mon, Mar 18, 2013 at 06:18:58PM +0100, Richard Biener wrote:
> I was asking for the ifunc selector to be
> Overridable by ld_preload or a similar mechanism at dynamic load time.

Please don't.  Calling an ifunc resolver function in another library
is just asking for trouble with current glibc.  Why?  Well, the other
library containing the resolver function may not have had any dynamic
relocations applied.  So if the resolver makes use of the GOT (to read
some variable), it will use unrelocated addresses.  You'll segfault if
you're lucky.

For anyone playing with ifunc, please test out your great ideas on
i386, ppc32, mips, arm, etc. *NOT* x86_64 or powerpc64 which both
avoid the GOT in many cases.
Sriraman Tallam - March 25, 2013, 9:24 p.m.
Hi,

On Mon, Mar 18, 2013 at 10:44 PM, Alan Modra <amodra@gmail.com> wrote:
> On Mon, Mar 18, 2013 at 06:18:58PM +0100, Richard Biener wrote:
>> I was asking for the ifunc selector to be
>> Overridable by ld_preload or a similar mechanism at dynamic load time.
>
> Please don't.  Calling an ifunc resolver function in another library
> is just asking for trouble with current glibc.  Why?  Well, the other
> library containing the resolver function may not have had any dynamic
> relocations applied.  So if the resolver makes use of the GOT (to read
> some variable), it will use unrelocated addresses.  You'll segfault if
> you're lucky.

Does this also mean that Paul's idea of doing:

LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available

is fraught with risk when used with IFUNC, particularly on x86_64?

Shouldn't the IFUNC resolver go through the GOT even in this case.
This could work well for the MV testing problem I explained earlier,
but if this is not feasible with IFUNC in play I would like my
original proposal reconsidered.

Thanks
Sri


>
> For anyone playing with ifunc, please test out your great ideas on
> i386, ppc32, mips, arm, etc. *NOT* x86_64 or powerpc64 which both
> avoid the GOT in many cases.
>
> --
> Alan Modra
> Australia Development Lab, IBM
Alan Modra - March 25, 2013, 11:46 p.m.
On Mon, Mar 25, 2013 at 02:24:21PM -0700, Sriraman Tallam wrote:
> Does this also mean that Paul's idea of doing:
> 
> LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available
> 
> is fraught with risk when used with IFUNC, particularly on x86_64?
> 
> Shouldn't the IFUNC resolver go through the GOT even in this case.
> This could work well for the MV testing problem I explained earlier,
> but if this is not feasible with IFUNC in play I would like my
> original proposal reconsidered.

I haven't been following the thread so can't comment, sorry.  I jumped
in on seeing Richard's suggestion re LD_PRELOAD, which is a bad idea
given glibc's current support for STT_GNU_IFUNC.  IFUNC as it stands
is not a general purpose feature and interacts badly with other
features of ELF shared libraries.  Trivial testcases can easily be
created that

1) won't work on any architecture.
     eg. shared library takes address of ifunc, ifunc resolver in main
     app, resolver uses variable in shared library.
2) only work on x86_64 and powerpc64.
     eg. shared library takes address of ifunc, ifunc resolver in main
     app which is PIC, resolver uses variable in main app.
3) won't work with LD_BIND_NOW=1
     either of the above examples but shared library doesn't take
     address, just calls ifunc.

The reason for these problems is that ld.so makes a single pass over
dynamic relocations.  In the simple case of LD_BIND_NOW=1 and an
application that uses a single shared library, relocations for the
library will be applied first, then relocations for the main app.  So
if the shared library has relocations against symbols that turn out to
be ifunc, and the ifunc resolver lives in the main app, then the
resolver will run *before* the main app has been relocated.  The
resolver had better not have code that requires relocation!

Of course, the obvious fix of making ld.so do two passes over
relocations, applying ifunc relocations on the second pass, is
somewhat counterproductive.  Mostly ifunc is used to gain a speedup
when running on particular hardware.  Two passes would have to slow
down application startup..  Nonetheless, I believe that is the correct
solution if we want to make ifunc generally useful.

What we have at the moment requires quite a lot of care when using
ifunc.  Accidentally writing code that only works on x86_64 or
powerpc64 is very easy, and might lead people to think you own shares
in Intel or IBM.

Patch

Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 196618)
+++ cgraphunit.c	(working copy)
@@ -942,7 +942,12 @@  cgraph_analyze_function (struct cgraph_node *node)
 	{
 	  tree resolver = NULL_TREE;
 	  gcc_assert (targetm.generate_version_dispatcher_body);
-	  resolver = targetm.generate_version_dispatcher_body (node);
+	  /* flag_mv_debug is 0 means that the dispatcher should be invoked
+	     optimally (once using ifunc support).  When flag_mv_debug is 1,
+	     the dispatcher should be invoked every time a call to the
+             multiversioned function is made.  */
+	  resolver
+	    = targetm.generate_version_dispatcher_body (node, flag_mv_debug);
 	  gcc_assert (resolver != NULL_TREE);
 	}
     }
Index: common.opt
===================================================================
--- common.opt	(revision 196618)
+++ common.opt	(working copy)
@@ -1600,6 +1600,10 @@  fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops
 
+fmv-debug
+Common RejectNegative Report Var(flag_mv_debug) Init(0)
+Invoke the function version dispatcher for every multiversioned function call.
+
 ftsan
 Common RejectNegative Report Var(flag_tsan)
 Add ThreadSanitizer instrumentation
Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 196618)
+++ doc/tm.texi	(working copy)
@@ -11032,11 +11032,13 @@  version at run-time. @var{decl} is one version fro
 identical versions.
 @end deftypefn
 
-@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg}, int @var{debug_mode})
 This hook is used to generate the dispatcher logic to invoke the right
 function version at run-time for a given set of function versions.
 @var{arg} points to the callgraph node of the dispatcher function whose
-body must be generated.
+body must be generated.  When @var{debug_mode} is 1, the dispatcher
+logic is invoked on every call. Otherwise, the dispatcher is invoked
+only at start up to minimize call overhead.
 @end deftypefn
 
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 196618)
+++ doc/tm.texi.in	(working copy)
@@ -10908,7 +10908,9 @@  identical versions.
 This hook is used to generate the dispatcher logic to invoke the right
 function version at run-time for a given set of function versions.
 @var{arg} points to the callgraph node of the dispatcher function whose
-body must be generated.
+body must be generated.  When @var{debug_mode} is 1, the dispatcher
+logic is invoked on every call. Otherwise, the dispatcher is invoked
+only at start up to minimize call overhead.
 @end deftypefn
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
Index: testsuite/g++.dg/ext/mv14_debug_code_coverage.C
===================================================================
--- testsuite/g++.dg/ext/mv14_debug_code_coverage.C	(revision 0)
+++ testsuite/g++.dg/ext/mv14_debug_code_coverage.C	(revision 0)
@@ -0,0 +1,213 @@ 
+/* Test case to show how code coverage testing of of a multiversioned function
+   can be done using cpu mocks.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmv-debug" } */
+
+#include <assert.h>
+#include <string.h>
+
+/* Temporary code till the libgcc hooks for this are checked in. Override
+   __builtin_mock_cpu_* builtins to change the mock cpu.  */
+const char *mock_cpu = NULL;
+int __builtin_mock_cpu_is (const char *cpu)
+{
+  if (strcmp (cpu, mock_cpu) == 0)
+    return 1;
+  return 0;  
+}
+
+/* Only mock one ISA type.  The libgcc hooks will allow mocking multiple
+   ISA features together, like popcnt and avx2.  */
+const char *mock_isa = NULL;
+int __builtin_mock_cpu_supports (const char *isa)
+{
+  if (strcmp (isa, mock_isa) == 0)
+    return 1
+  return 0;
+}
+/* End of temporary code.  */
+
+
+/* Default version.  */
+int foo () __attribute__ ((target ("default")));
+
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int foo () __attribute__ ((target ("arch=corei7")));
+
+int main ()
+{
+  /* Using CPU mocks run each version of foo() when possible and
+     check the return value.  */
+
+  /* Run Intel corei7 version if possible.  Test if this
+     CPU can mock corei7.  It should support SSE4.2 and
+     below, SSSE3 and MMX. */
+  if (__builtin_cpu_supports ("sse4.2")
+      && __builtin_cpu_supports ("ssse3")
+      && __builtin_cpu_supports ("mmx"))
+    {
+      mock_cpu = "corei7";
+      mock_isa = "";
+      assert (foo () == 11);
+    }
+
+  /* Run avx2 version if possible.  */
+  if (__builtin_cpu_supports ("avx2"))
+    {
+      mock_cpu = "";
+      mock_isa = "avx2";
+      assert (foo () == 1);
+    }
+  /* Run avx version if possible.  */
+  if (__builtin_cpu_supports ("avx"))
+    {
+      mock_cpu = "";
+      mock_isa = "avx";
+      assert (foo () == 2);
+    }
+  /* Run popcnt version if possible.  */
+  if (__builtin_cpu_supports ("popcnt"))
+    {
+      mock_cpu = "";
+      mock_isa = "popcnt";
+      assert (foo () == 3);
+    }
+  /* Run sse4.2 version if possible.  */
+  if (__builtin_cpu_supports ("sse4.2"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse4.2";
+      assert (foo () == 4);
+    }
+  /* Run sse4.1 version if possible.  */
+  if (__builtin_cpu_supports ("sse4.1"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse4.1";
+      assert (foo () == 5);
+    }
+  /* Run ssse3 version if possible.  */
+  if (__builtin_cpu_supports ("ssse3"))
+    {
+      mock_cpu = "";
+      mock_isa = "ssse3";
+      assert (foo () == 6);
+    }
+  /* Run sse3 version if possible.  */
+  if (__builtin_cpu_supports ("sse3"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse3";
+      assert (foo () == 7);
+    }
+  /* Run sse2 version if possible.  */
+  if (__builtin_cpu_supports ("sse2"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse2";
+      assert (foo () == 8);
+    }
+  /* Run sse version if possible.  */
+  if (__builtin_cpu_supports ("sse"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse";
+      assert (foo () == 9);
+    }
+  /* Run mmx version if possible.  */
+  if (__builtin_cpu_supports ("mmx"))
+    {
+      mock_cpu = "";
+      mock_isa = "mmx";
+      assert (foo () == 10);
+    }
+
+  /* Run the default version.  */
+  mock_cpu = "";
+  mock_isa = "";
+  assert (foo () == 0);
+
+  return 0;
+}
+
+int __attribute__ ((target("default")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: testsuite/g++.dg/ext/mv2_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv2_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv2_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv2.C works with -fmv-debug additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmv-debug" } */
+/* { dg-additional-sources "mv2.C" } */
Index: testsuite/g++.dg/ext/mv6_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv6_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv6_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv6.C works with -fmv-debug additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-march=x86-64 -fmv-debug" } */
+/* { dg-additional-sources "mv6.C" } */
Index: testsuite/g++.dg/ext/mv1_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv1_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv1_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv1.C works with -fmv-debug additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fPIC -fmv-debug" } */
+/* { dg-additional-sources "mv1.C" } */
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 196618)
+++ config/i386/i386.c	(working copy)
@@ -26173,6 +26173,11 @@  enum ix86_builtins
   IX86_BUILTIN_CPU_IS,
   IX86_BUILTIN_CPU_SUPPORTS,
 
+  /* Builtins to mock CPU and ISA features, for
+     testing multiversioned functions.  */
+  IX86_BUILTIN_MOCK_CPU_IS,
+  IX86_BUILTIN_MOCK_CPU_SUPPORTS,
+
   IX86_BUILTIN_MAX
 };
 
@@ -28001,11 +28006,14 @@  ix86_slow_unaligned_vector_memop (void)
    to return a pointer to VERSION_DECL if the outcome of the expression
    formed by PREDICATE_CHAIN is true.  This function will be called during
    version dispatch to decide which function version to execute.  It returns
-   the basic block at the end, to which more conditions can be added.  */
+   the basic block at the end, to which more conditions can be added.  When
+   DEBUG_MODE is 1, the version dispatcher is invoked for every call
+   to the multiversioned function.  */
 
 static basic_block
 add_condition_to_bb (tree function_decl, tree version_decl,
-		     tree predicate_chain, basic_block new_bb)
+		     tree predicate_chain, basic_block new_bb,
+		     int debug_mode)
 {
   gimple return_stmt;
   tree convert_expr, result_var;
@@ -28026,11 +28034,43 @@  add_condition_to_bb (tree function_decl, tree vers
   gcc_assert (new_bb != NULL);
   gseq = bb_seq (new_bb);
 
+  /* If debug_mode is true, generate a call to the versioned function
+     and return the output of the call.  Otherwise, return a pointer to
+     the versioned function.  */
 
-  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
-	     		 build_fold_addr_expr (version_decl));
-  result_var = create_tmp_var (ptr_type_node, NULL);
-  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  if (debug_mode)
+    {
+      tree arg;
+      tree ret_type = TREE_TYPE (TREE_TYPE (function_decl));
+      VEC (tree, heap) *vec = NULL;
+      vec = VEC_alloc (tree, heap, 2);
+      
+      arg = DECL_ARGUMENTS (function_decl);
+
+      while (arg)
+	{
+	  VEC_safe_push (tree, heap, vec, arg);
+	  arg = DECL_CHAIN (arg);
+	}
+
+      convert_stmt = gimple_build_call_vec (version_decl, vec);
+      VEC_free (tree, heap, vec);
+      result_var = NULL;
+
+      if (ret_type != void_type_node)
+	{
+          result_var = DECL_RESULT (function_decl);
+          gimple_call_set_lhs (convert_stmt, result_var);
+	}
+    }
+  else
+    {
+      convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		     build_fold_addr_expr (version_decl));
+      result_var = DECL_RESULT (function_decl);
+      convert_stmt = gimple_build_assign (result_var, convert_expr); 
+    }
+
   return_stmt = gimple_build_return (result_var);
 
   if (predicate_chain == NULL_TREE)
@@ -28112,10 +28152,11 @@  add_condition_to_bb (tree function_decl, tree vers
    the right builtin to use to match the platform specification.
    It returns the priority value for this version decl.  If PREDICATE_LIST
    is not NULL, it stores the list of cpu features that need to be checked
-   before dispatching this function.  */
+   before dispatching this function.   When debug_mode is 1, use the mock
+   cpu check builtins to do the dispatch.  */
 
 static unsigned int
-get_builtin_code_for_version (tree decl, tree *predicate_list)
+get_builtin_code_for_version (tree decl, tree *predicate_list, int debug_mode)
 {
   tree attrs;
   struct cl_target_option cur_target;
@@ -28254,7 +28295,10 @@  static unsigned int
     
       if (predicate_list)
 	{
-          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+	  if (debug_mode)
+            predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_IS];
+	  else
+            predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
           /* For a C string literal the length includes the trailing NULL.  */
           predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
           predicate_chain = tree_cons (predicate_decl, predicate_arg,
@@ -28266,8 +28310,12 @@  static unsigned int
   tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
   strcpy (tok_str, attrs_str);
   token = strtok (tok_str, ",");
-  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
 
+  if (debug_mode)
+    predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_SUPPORTS];
+  else
+    predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
   while (token != NULL)
     {
       /* Do not process "arch="  */
@@ -28329,8 +28377,8 @@  static unsigned int
 static int
 ix86_compare_version_priority (tree decl1, tree decl2)
 {
-  unsigned int priority1 = get_builtin_code_for_version (decl1, NULL);
-  unsigned int priority2 = get_builtin_code_for_version (decl2, NULL);
+  unsigned int priority1 = get_builtin_code_for_version (decl1, NULL, false);
+  unsigned int priority2 = get_builtin_code_for_version (decl2, NULL, false);
 
   return (int)priority1 - (int)priority2;
 }
@@ -28357,12 +28405,15 @@  feature_compare (const void *v1, const void *v2)
    multi-versioned functions.  DISPATCH_DECL is the function which will
    contain the dispatch logic.  FNDECLS are the function choices for
    dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
-   in DISPATCH_DECL in which the dispatch code is generated.  */
+   in DISPATCH_DECL in which the dispatch code is generated.  When
+   DEBUG_MODE is 1, the version dispatcher is invoked for every call
+   to the multiversioned function.  */
 
 static int
 dispatch_function_versions (tree dispatch_decl,
 			    void *fndecls_p,
-			    basic_block *empty_bb)
+			    basic_block *empty_bb,
+			    int debug_mode)
 {
   tree default_decl;
   gimple ifunc_cpu_init_stmt;
@@ -28420,8 +28471,8 @@  dispatch_function_versions (tree dispatch_decl,
       /* Get attribute string, parse it and find the right predicate decl.
          The predicate function could be a lengthy combination of many
 	 features, like arch-type and various isa-variants.  */
-      priority = get_builtin_code_for_version (version_decl,
-	 			               &predicate_chain);
+      priority = get_builtin_code_for_version (version_decl, &predicate_chain,
+					       debug_mode);
 
       if (predicate_chain == NULL_TREE)
 	continue;
@@ -28444,11 +28495,11 @@  dispatch_function_versions (tree dispatch_decl,
     *empty_bb = add_condition_to_bb (dispatch_decl,
 				     function_version_info[i].version_decl,
 				     function_version_info[i].predicate_chain,
-				     *empty_bb);
+				     *empty_bb, debug_mode);
 
   /* dispatch default version at the end.  */
   *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
-				   NULL, *empty_bb);
+				   NULL, *empty_bb, debug_mode);
 
   free (function_version_info);
   return 0;
@@ -28813,8 +28864,19 @@  ix86_get_function_versions_dispatcher (void *decl)
 
   default_node = default_version_info->this_node;
 
+
+  /* Right now, the dispatching at startup non-debug mode is done via ifunc.  */
 #if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
-  /* Right now, the dispatching is done via ifunc.  */
+#else
+  if (!debug_mode)
+    {
+      error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+		"multiversioning needs ifunc which is not supported "
+		"in this configuration");
+      return NULL;
+    }
+#endif
+
   dispatch_decl = make_dispatcher_decl (default_node->decl); 
 
   dispatcher_node = cgraph_get_create_node (dispatch_decl);
@@ -28832,11 +28894,7 @@  ix86_get_function_versions_dispatcher (void *decl)
       it_v->dispatcher_resolver = dispatch_decl;
       it_v = it_v->next;
     }
-#else
-  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
-	    "multiversioning needs ifunc which is not supported "
-	    "in this configuration");
-#endif
+
   return dispatch_decl;
 }
 
@@ -28861,15 +28919,19 @@  make_attribute (const char *name, const char *arg_
 /* Make the resolver function decl to dispatch the versions of
    a multi-versioned function,  DEFAULT_DECL.  Create an
    empty basic block in the resolver and store the pointer in
-   EMPTY_BB.  Return the decl of the resolver function.  */
+   EMPTY_BB.  Return the decl of the resolver function.  When
+   DEBUG_MODE is 1, the resolver function body is not an
+   ifunc resolver; it simply calls the appropriate function
+   version and returns the call output.  */
 
 static tree
 make_resolver_func (const tree default_decl,
 		    const tree dispatch_decl,
-		    basic_block *empty_bb)
+		    basic_block *empty_bb,
+		    int debug_mode)
 {
   char *resolver_name;
-  tree decl, type, decl_name, t;
+  tree decl, type, decl_name, t = NULL;
   bool is_uniq = false;
 
   /* IFUNC's have to be globally visible.  So, if the default_decl is
@@ -28884,8 +28946,19 @@  make_resolver_func (const tree default_decl,
      another module which is based on the same version name.  */
   resolver_name = make_name (default_decl, "resolver", is_uniq);
 
-  /* The resolver function should return a (void *). */
-  type = build_function_type_list (ptr_type_node, NULL_TREE);
+  if (debug_mode)
+    {
+      /* In debug_mode, the resolver function calls the appropriate
+	 function version.  Its type is same as dispatch_decl.  */
+      tree fn_type = TREE_TYPE (dispatch_decl);
+      type = build_function_type (TREE_TYPE (fn_type),
+				  TYPE_ARG_TYPES (fn_type));
+    }
+  else
+    {
+      /* The resolver function should return a (void *). */
+      type = build_function_type_list (ptr_type_node, NULL_TREE);
+    }
 
   decl = build_fn_decl (resolver_name, type);
   decl_name = get_identifier (resolver_name);
@@ -28907,6 +28980,16 @@  make_resolver_func (const tree default_decl,
   DECL_INITIAL (decl) = make_node (BLOCK);
   DECL_STATIC_CONSTRUCTOR (decl) = 0;
 
+  /* In debug_mode, the resolver function is not an ifunc resolver.  Its
+     signature is the same as the dispatch_decl or default_decl.  */
+  if (debug_mode)
+    {
+      tree arg;
+      DECL_ARGUMENTS (decl) = copy_list (DECL_ARGUMENTS (default_decl));
+      for (arg = DECL_ARGUMENTS (decl); arg ; arg = DECL_CHAIN (arg))
+	DECL_CONTEXT (arg) = decl;
+    }
+
   if (DECL_COMDAT_GROUP (default_decl)
       || TREE_PUBLIC (default_decl))
     {
@@ -28917,7 +29000,9 @@  make_resolver_func (const tree default_decl,
       make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
     }
   /* Build result decl and add to function_decl. */
-  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		  TREE_TYPE (TREE_TYPE (decl)));
+
   DECL_ARTIFICIAL (t) = 1;
   DECL_IGNORED_P (t) = 1;
   DECL_RESULT (decl) = t;
@@ -28932,9 +29017,17 @@  make_resolver_func (const tree default_decl,
   pop_cfun ();
 
   gcc_assert (dispatch_decl != NULL);
-  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
-  DECL_ATTRIBUTES (dispatch_decl) 
-    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+ 
+  /* Mark dispatch_decl as "alias" or "ifunc" with resolver as
+     resolver_name.  */
+  if (debug_mode)
+    DECL_ATTRIBUTES (dispatch_decl) 
+      = make_attribute ("alias", resolver_name,
+	   	        DECL_ATTRIBUTES (dispatch_decl));
+  else
+    DECL_ATTRIBUTES (dispatch_decl) 
+      = make_attribute ("ifunc", resolver_name,
+		        DECL_ATTRIBUTES (dispatch_decl));
 
   /* Create the alias for dispatch to resolver here.  */
   /*cgraph_create_function_alias (dispatch_decl, decl);*/
@@ -28946,10 +29039,13 @@  make_resolver_func (const tree default_decl,
 /* Generate the dispatching code body to dispatch multi-versioned function
    DECL.  The target hook is called to process the "target" attributes and
    provide the code to dispatch the right function at run-time.  NODE points
-   to the dispatcher decl whose body will be created.  */
+   to the dispatcher decl whose body will be created.  When DEBUG_MODE is
+   1, the dispatch checks should be made during every call to the versioned
+   function.  When DEBUG_MODE is 0, ifunc based dispatching is used to
+   keep the call overhead small.  */
 
 static tree 
-ix86_generate_version_dispatcher_body (void *node_p)
+ix86_generate_version_dispatcher_body (void *node_p, int debug_mode)
 {
   tree resolver_decl;
   basic_block empty_bb;
@@ -28976,8 +29072,8 @@  static tree
   /* node is going to be an alias, so remove the finalized bit.  */
   node->local.finalized = false;
 
-  resolver_decl = make_resolver_func (default_ver_decl,
-				      node->decl, &empty_bb);
+  resolver_decl = make_resolver_func (default_ver_decl, node->decl,
+				      &empty_bb, debug_mode);
 
   node_version_info->dispatcher_resolver = resolver_decl;
 
@@ -29000,7 +29096,8 @@  static tree
       VEC_safe_push (tree, heap, fn_ver_vec, versn->decl);
     }
 
-  dispatch_function_versions (resolver_decl, fn_ver_vec, &empty_bb);
+  dispatch_function_versions (resolver_decl, fn_ver_vec,
+			      &empty_bb, debug_mode);
   VEC_free (tree, heap, fn_ver_vec);
   rebuild_cgraph_edges (); 
   pop_cfun ();
@@ -29185,7 +29282,8 @@  fold_builtin_cpu (tree fndecl, tree *args)
 
   gcc_assert (param_string_cst);
 
-  if (fn_code == IX86_BUILTIN_CPU_IS)
+  if (fn_code == IX86_BUILTIN_CPU_IS
+      || fn_code == IX86_BUILTIN_MOCK_CPU_IS)
     {
       tree ref;
       tree field;
@@ -29234,7 +29332,8 @@  fold_builtin_cpu (tree fndecl, tree *args)
 		      build_int_cstu (unsigned_type_node, field_val));
       return build1 (CONVERT_EXPR, integer_type_node, final);
     }
-  else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS)
+  else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS
+	   || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS)
     {
       tree ref;
       tree array_elt;
@@ -29288,7 +29387,9 @@  ix86_fold_builtin (tree fndecl, int n_args,
       enum ix86_builtins fn_code = (enum ix86_builtins)
 				   DECL_FUNCTION_CODE (fndecl);
       if (fn_code ==  IX86_BUILTIN_CPU_IS
-	  || fn_code == IX86_BUILTIN_CPU_SUPPORTS)
+	  || fn_code == IX86_BUILTIN_CPU_SUPPORTS
+          || fn_code ==  IX86_BUILTIN_MOCK_CPU_IS
+	  || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS)
 	{
 	  gcc_assert (n_args == 1);
           return fold_builtin_cpu (fndecl, args);
@@ -29334,6 +29435,13 @@  ix86_init_platform_type_builtins (void)
 			 INT_FTYPE_PCCHAR, true);
   make_cpu_type_builtin ("__builtin_cpu_supports", IX86_BUILTIN_CPU_SUPPORTS,
 			 INT_FTYPE_PCCHAR, true);
+  /* Create builtins that mock cpu type and isa features.  This is meant to
+     be used for code coverage testing of multiversioned functions.  */
+  make_cpu_type_builtin ("__builtin_mock_cpu_is", IX86_BUILTIN_MOCK_CPU_IS,
+			 INT_FTYPE_PCCHAR, false);
+  make_cpu_type_builtin ("__builtin_mock_cpu_supports",
+			 IX86_BUILTIN_MOCK_CPU_SUPPORTS,
+			 INT_FTYPE_PCCHAR, false);
 }
 
 /* Internal method for ix86_init_builtins.  */
@@ -31050,6 +31158,8 @@  ix86_expand_builtin (tree exp, rtx target, rtx sub
 	call_expr = build_call_expr (fndecl, 0); 
 	return expand_expr (call_expr, target, mode, EXPAND_NORMAL);
       }
+    case IX86_BUILTIN_MOCK_CPU_IS:
+    case IX86_BUILTIN_MOCK_CPU_SUPPORTS:
     case IX86_BUILTIN_CPU_IS:
     case IX86_BUILTIN_CPU_SUPPORTS:
       {
Index: target.def
===================================================================
--- target.def	(revision 196618)
+++ target.def	(working copy)
@@ -1271,11 +1271,12 @@  DEFHOOK
 /*  Target hook is used to generate the dispatcher logic to invoke the right
     function version at run-time for a given set of function versions.
     ARG points to the callgraph node of the dispatcher function whose body
-    must be generated.  */
+    must be generated.  The version dispatcher is invoked on every call when
+    debug_mode is 1.  */
 DEFHOOK
 (generate_version_dispatcher_body,
  "",
- tree, (void *arg), NULL) 
+ tree, (void *arg, int debug_mode), NULL) 
 
 /* Target hook is used to get the dispatcher function for a set of function
    versions.  The dispatcher function is called to invoke the right function