Patchwork PATCH: Add pause intrinsic

login
register
mail settings
Submitter H.J. Lu
Date May 25, 2011, 5:19 p.m.
Message ID <BANLkTinZQ=k9EKJL7LgBsDLWt6ozRoh_Kw@mail.gmail.com>
Download mbox | patch
Permalink /patch/97384/
State New
Headers show

Comments

H.J. Lu - May 25, 2011, 5:19 p.m.
On Wed, May 25, 2011 at 9:43 AM, Andrew Haley <aph@redhat.com> wrote:
> On 05/25/2011 04:32 PM, H.J. Lu wrote:
>> On Wed, May 25, 2011 at 8:27 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, May 25, 2011 at 5:20 PM, Michael Matz <matz@suse.de> wrote:
>>>> Hi,
>>>>
>>>> On Wed, 25 May 2011, Richard Guenther wrote:
>>>>
>>>>>>> asm volatile ("" : : : "memory") in fact will work as a full memory
>>>>>>> barrier
>>>>>>
>>>>>> How?  You surely need MFENCE or somesuch, unless all you care about is
>>>>>> a compiler barrier.  That's what I think needs to be clarified.
>>>>>
>>>>> Well, yes, I'm talking about the compiler memory barrier.
>>>>
>>>> Something that we conventionally call "optimization barrier" :)  memory
>>>> barrier has a fixed meaning which we shouldn't use in this case, it's
>>>> confusing.
>>>
>>> Sure ;)
>>>
>>> And to keep the info in a suitable thread what I'd like to improve here
>>> is to make us disambiguate memory loads/stores against asms that
>>> have no memory outputs/inputs.
>>>
>>
>> Please let me know how I should improve the document,
>
> "Compiler memory barrier" seems to be well-understood.  I suggest
>
> +Generates the @code{pause} machine instruction with a compiler memory barrier.
>
> It's clear enough.
>
> Andrew.
>

I checked in this.

Thanks.
Andrew Pinski - May 25, 2011, 5:26 p.m.
On Wed, May 25, 2011 at 10:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> --
> H.J.
> ---
> Index: doc/extend.texi
> ===================================================================
> --- doc/extend.texi     (revision 174216)
> +++ doc/extend.texi     (working copy)
> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>
>  @table @code
>  @item void __builtin_ia32_pause (void)
> -Generates the @code{pause} machine instruction with full memory barrier.
> +Generates the @code{pause} machine instruction with a compiler memory
> +barrier.

What is the pause machine instruction do?  How is it different from a
normal nop?

Also pause to me means it waits for input or an interrupt.

Thanks,
Andrew Pinski
Andrew Haley - May 25, 2011, 5:28 p.m.
On 05/25/2011 06:26 PM, Andrew Pinski wrote:
> On Wed, May 25, 2011 at 10:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> --
>> H.J.
>> ---
>> Index: doc/extend.texi
>> ===================================================================
>> --- doc/extend.texi     (revision 174216)
>> +++ doc/extend.texi     (working copy)
>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>
>>  @table @code
>>  @item void __builtin_ia32_pause (void)
>> -Generates the @code{pause} machine instruction with full memory barrier.
>> +Generates the @code{pause} machine instruction with a compiler memory
>> +barrier.
> 
> What is the pause machine instruction do?

That's documented by Intel in the architecture manual.  Surely
we don't have to explain it all.

Andrew.


PAUSE—Spin Loop Hint

Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a
Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting
the loop because it detects a possible memory order violation. The PAUSE instruction
provides a hint to the processor that the code sequence is a spin-wait loop. The
processor uses this hint to avoid the memory order violation in most situations,
which greatly improves processor performance. For this reason, it is recommended
that a PAUSE instruction be placed in all spin-wait loops.

An additional function of the PAUSE instruction is to reduce the power consumed by
a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can
execute a spin-wait loop extremely quickly, causing the processor to consume a lot of
power while it waits for the resource it is spinning on to become available. Inserting
a pause instruction in a spin-wait loop greatly reduces the processor’s power
consumption.

This instruction was introduced in the Pentium 4 processors, but is backward compat-
ible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction
operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement
the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for
some processors. This instruction does not change the architectural state of the
processor (that is, it performs essentially a delaying no-op operation).
This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
Richard Guenther - May 26, 2011, 9:34 a.m.
On Wed, May 25, 2011 at 7:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 25, 2011 at 9:43 AM, Andrew Haley <aph@redhat.com> wrote:
>> On 05/25/2011 04:32 PM, H.J. Lu wrote:
>>> On Wed, May 25, 2011 at 8:27 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, May 25, 2011 at 5:20 PM, Michael Matz <matz@suse.de> wrote:
>>>>> Hi,
>>>>>
>>>>> On Wed, 25 May 2011, Richard Guenther wrote:
>>>>>
>>>>>>>> asm volatile ("" : : : "memory") in fact will work as a full memory
>>>>>>>> barrier
>>>>>>>
>>>>>>> How?  You surely need MFENCE or somesuch, unless all you care about is
>>>>>>> a compiler barrier.  That's what I think needs to be clarified.
>>>>>>
>>>>>> Well, yes, I'm talking about the compiler memory barrier.
>>>>>
>>>>> Something that we conventionally call "optimization barrier" :)  memory
>>>>> barrier has a fixed meaning which we shouldn't use in this case, it's
>>>>> confusing.
>>>>
>>>> Sure ;)
>>>>
>>>> And to keep the info in a suitable thread what I'd like to improve here
>>>> is to make us disambiguate memory loads/stores against asms that
>>>> have no memory outputs/inputs.
>>>>
>>>
>>> Please let me know how I should improve the document,
>>
>> "Compiler memory barrier" seems to be well-understood.  I suggest
>>
>> +Generates the @code{pause} machine instruction with a compiler memory barrier.
>>
>> It's clear enough.
>>
>> Andrew.
>>
>
> I checked in this.
>
> Thanks.
>
>
> --
> H.J.
> ---
> Index: doc/extend.texi
> ===================================================================
> --- doc/extend.texi     (revision 174216)
> +++ doc/extend.texi     (working copy)
> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>
>  @table @code
>  @item void __builtin_ia32_pause (void)
> -Generates the @code{pause} machine instruction with full memory barrier.
> +Generates the @code{pause} machine instruction with a compiler memory
> +barrier.
>  @end table

This isn't true.  It is _not_ a compiler memory barrier.

Richard.
Andrew Haley - May 26, 2011, 1:30 p.m.
On 05/26/2011 10:34 AM, Richard Guenther wrote:

>> Index: doc/extend.texi
>> ===================================================================
>> --- doc/extend.texi     (revision 174216)
>> +++ doc/extend.texi     (working copy)
>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>
>>  @table @code
>>  @item void __builtin_ia32_pause (void)
>> -Generates the @code{pause} machine instruction with full memory barrier.
>> +Generates the @code{pause} machine instruction with a compiler memory
>> +barrier.
>>  @end table
> 
> This isn't true.  It is _not_ a compiler memory barrier.

Please elucidate.  Please suggest alternative wording.

Andrew.
Richard Guenther - May 26, 2011, 1:51 p.m.
On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote:
> On 05/26/2011 10:34 AM, Richard Guenther wrote:
>
>>> Index: doc/extend.texi
>>> ===================================================================
>>> --- doc/extend.texi     (revision 174216)
>>> +++ doc/extend.texi     (working copy)
>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>>
>>>  @table @code
>>>  @item void __builtin_ia32_pause (void)
>>> -Generates the @code{pause} machine instruction with full memory barrier.
>>> +Generates the @code{pause} machine instruction with a compiler memory
>>> +barrier.
>>>  @end table
>>
>> This isn't true.  It is _not_ a compiler memory barrier.
>
> Please elucidate.  Please suggest alternative wording.

+Generates the @code{pause} machine instruction.

Richard.

> Andrew.
>
Andrew Haley - May 26, 2011, 1:53 p.m.
On 05/26/2011 02:51 PM, Richard Guenther wrote:
> On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote:
>> On 05/26/2011 10:34 AM, Richard Guenther wrote:
>>
>>>> Index: doc/extend.texi
>>>> ===================================================================
>>>> --- doc/extend.texi     (revision 174216)
>>>> +++ doc/extend.texi     (working copy)
>>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>>>
>>>>  @table @code
>>>>  @item void __builtin_ia32_pause (void)
>>>> -Generates the @code{pause} machine instruction with full memory barrier.
>>>> +Generates the @code{pause} machine instruction with a compiler memory
>>>> +barrier.
>>>>  @end table
>>>
>>> This isn't true.  It is _not_ a compiler memory barrier.
>>
>> Please elucidate.  Please suggest alternative wording.
> 
> +Generates the @code{pause} machine instruction.

But that's missing the fact that it generates a compiler memory barrier,
which is important.  And if you think it's not a compiler memory barrier,
please explain

a.  Why it's not a compiler memory barrier,
b.  What you'd call it.

Andrew.
Richard Guenther - May 26, 2011, 2:29 p.m.
On Thu, May 26, 2011 at 3:53 PM, Andrew Haley <aph@redhat.com> wrote:
> On 05/26/2011 02:51 PM, Richard Guenther wrote:
>> On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote:
>>> On 05/26/2011 10:34 AM, Richard Guenther wrote:
>>>
>>>>> Index: doc/extend.texi
>>>>> ===================================================================
>>>>> --- doc/extend.texi     (revision 174216)
>>>>> +++ doc/extend.texi     (working copy)
>>>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>>>>
>>>>>  @table @code
>>>>>  @item void __builtin_ia32_pause (void)
>>>>> -Generates the @code{pause} machine instruction with full memory barrier.
>>>>> +Generates the @code{pause} machine instruction with a compiler memory
>>>>> +barrier.
>>>>>  @end table
>>>>
>>>> This isn't true.  It is _not_ a compiler memory barrier.
>>>
>>> Please elucidate.  Please suggest alternative wording.
>>
>> +Generates the @code{pause} machine instruction.
>
> But that's missing the fact that it generates a compiler memory barrier,
> which is important.  And if you think it's not a compiler memory barrier,
> please explain
>
> a.  Why it's not a compiler memory barrier,

It is not a compiler memory barrier because it is a builtin function call
which is never assumed to be a barrier for local automatic storage
that does not have its address taken.

> b.  What you'd call it.

Not a compiler memory barrier ;)

To make it a compiler memory barrier you have to "expand" the
builtin already in the frontend and present the middle-end with
__asm__ ("...." : : : "memory").  That will serve as a compiler
memory barrier also covering local non-address taken storage
(global and practically most of address-taken local storage
is covered by a builtin function call already).

Richard.

>
> Andrew.
>
Jakub Jelinek - May 26, 2011, 2:34 p.m.
On Thu, May 26, 2011 at 04:29:50PM +0200, Richard Guenther wrote:
> To make it a compiler memory barrier you have to "expand" the
> builtin already in the frontend and present the middle-end with
> __asm__ ("...." : : : "memory").  That will serve as a compiler
> memory barrier also covering local non-address taken storage
> (global and practically most of address-taken local storage
> is covered by a builtin function call already).

But then, what is the point of the builtin when
__asm__ __volatile__ ("rep; nop" : : : "memory");
does all of that already and has been supported for years...

	Jakub
Richard Guenther - May 26, 2011, 2:36 p.m.
On Thu, May 26, 2011 at 4:34 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 26, 2011 at 04:29:50PM +0200, Richard Guenther wrote:
>> To make it a compiler memory barrier you have to "expand" the
>> builtin already in the frontend and present the middle-end with
>> __asm__ ("...." : : : "memory").  That will serve as a compiler
>> memory barrier also covering local non-address taken storage
>> (global and practically most of address-taken local storage
>> is covered by a builtin function call already).
>
> But then, what is the point of the builtin when
> __asm__ __volatile__ ("rep; nop" : : : "memory");
> does all of that already and has been supported for years...

Good question ;)

Richard.
Andrew Haley - May 26, 2011, 2:38 p.m.
On 05/26/2011 03:29 PM, Richard Guenther wrote:
> On Thu, May 26, 2011 at 3:53 PM, Andrew Haley <aph@redhat.com> wrote:
>> On 05/26/2011 02:51 PM, Richard Guenther wrote:
>>> On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote:
>>>> On 05/26/2011 10:34 AM, Richard Guenther wrote:
>>>>
>>>>>> Index: doc/extend.texi
>>>>>> ===================================================================
>>>>>> --- doc/extend.texi     (revision 174216)
>>>>>> +++ doc/extend.texi     (working copy)
>>>>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway
>>>>>>
>>>>>>  @table @code
>>>>>>  @item void __builtin_ia32_pause (void)
>>>>>> -Generates the @code{pause} machine instruction with full memory barrier.
>>>>>> +Generates the @code{pause} machine instruction with a compiler memory
>>>>>> +barrier.
>>>>>>  @end table
>>>>>
>>>>> This isn't true.  It is _not_ a compiler memory barrier.
>>>>
>>>> Please elucidate.  Please suggest alternative wording.
>>>
>>> +Generates the @code{pause} machine instruction.
>>
>> But that's missing the fact that it generates a compiler memory barrier,
>> which is important.  And if you think it's not a compiler memory barrier,
>> please explain
>>
>> a.  Why it's not a compiler memory barrier,
> 
> It is not a compiler memory barrier because it is a builtin function call
> which is never assumed to be a barrier for local automatic storage
> that does not have its address taken.

OK.  How would you tell the difference between the kind of barrier
that it is and a real compiler memory barrier?  If an auto does not
have its address taken, it isn't visible anyway.

>> b.  What you'd call it.
> 
> Not a compiler memory barrier ;)

I don't want to know what not to call it, though.

> To make it a compiler memory barrier you have to "expand" the
> builtin already in the frontend and present the middle-end with
> __asm__ ("...." : : : "memory").  That will serve as a compiler
> memory barrier also covering local non-address taken storage
> (global and practically most of address-taken local storage
> is covered by a builtin function call already).

Well, the fact that it's also a memory clobber has to be documented
somehow.  If the present documentation is to be changed, it should
not be changed by deleting a vital piece of information.

Andrew.
Michael Matz - May 26, 2011, 2:47 p.m.
Hi,

On Thu, 26 May 2011, Andrew Haley wrote:

> >>> +Generates the @code{pause} machine instruction.
> >>
> >> But that's missing the fact that it generates a compiler memory 
> >> barrier, which is important.  And if you think it's not a compiler 
> >> memory barrier, please explain
> >>
> >> a.  Why it's not a compiler memory barrier,
> > 
> > It is not a compiler memory barrier because it is a builtin function call
> > which is never assumed to be a barrier for local automatic storage
> > that does not have its address taken.
> 
> OK.  How would you tell the difference between the kind of barrier
> that it is and a real compiler memory barrier?

First we have to determine if this builtin really does what its users 
intend to use it for.  I believe they _do_ want to use it also with 
regards to auto variables (it includes also address-takens whose address 
doesn't escape).  A normal builtin call is not a barrier for operations on 
such entities, hence it might very well be that the implementation of HJ 
actually doesn't what he wanted.

I don't have a good word for what functions calls are in their barrierness 
part of pre/post conditions.  "global memory movement barrier" perhaps, 
with an appropriate definition of global memory (which funnily include 
address-taken escaped local storage, ugh).

> > To make it a compiler memory barrier you have to "expand" the
> > builtin already in the frontend and present the middle-end with
> > __asm__ ("...." : : : "memory").  That will serve as a compiler
> > memory barrier also covering local non-address taken storage
> > (global and practically most of address-taken local storage
> > is covered by a builtin function call already).
> 
> Well, the fact that it's also a memory clobber has to be documented
> somehow.  If the present documentation is to be changed, it should
> not be changed by deleting a vital piece of information.

It's not only about the docu.  As implemented right now it's neither an 
optimization barrier nor a memory clobber.


Ciao,
Michael.
Andi Kleen - May 26, 2011, 4:10 p.m.
Richard Guenther <richard.guenther@gmail.com> writes:
>
> To make it a compiler memory barrier you have to "expand" the
> builtin already in the frontend and present the middle-end with
> __asm__ ("...." : : : "memory").  That will serve as a compiler

Those are the intended semantics (at least those I asked
for :-). For all practical purposes the same as
asm volatile("pause" ::: "memory")

HJ? Can it be expanded earlier?

As for why having a builtin: one reason would be portability.
Various other architectures have a similar instruction
(e.g. PPC). They could be added later to this as a next 
step.

Then it also seems cleaner to me to cover the instruction
set with builtins like the others.

-Andi
Jakub Jelinek - May 26, 2011, 4:46 p.m.
On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
> As for why having a builtin: one reason would be portability.

You mean portability to other compilers (I think reasonable amount
of them support gcc-ish inline asm), or to other architectures?
__builtin_ia32_pause () doesn't look like a builtin you would
want to use on PPC.

> Then it also seems cleaner to me to cover the instruction
> set with builtins like the others.

No idea why in this case.  Builtins have the advantage that they
can be better scheduled, but in this case you don't want to move
it around.

	Jakub
Andi Kleen - May 26, 2011, 5:37 p.m.
On Thu, May 26, 2011 at 06:46:39PM +0200, Jakub Jelinek wrote:
> On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote:
> > Richard Guenther <richard.guenther@gmail.com> writes:
> > As for why having a builtin: one reason would be portability.
> 
> You mean portability to other compilers (I think reasonable amount
> of them support gcc-ish inline asm), or to other architectures?

Both.

> __builtin_ia32_pause () doesn't look like a builtin you would
> want to use on PPC.

That's true, it should probably have a different name.

__builtin_pause()? 

The Linux kernel calls it cpu_relax() on all architectures.
The following architectures implement it: ia64, powerpc, x86
On others it just acts like a barrier.

I suppose most CPUs that implement SMT will have some equivalent.

-Andi
Paul Koning - May 26, 2011, 5:48 p.m.
On May 26, 2011, at 1:37 PM, Andi Kleen wrote:

> On Thu, May 26, 2011 at 06:46:39PM +0200, Jakub Jelinek wrote:
>> On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote:
>>> Richard Guenther <richard.guenther@gmail.com> writes:
>>> As for why having a builtin: one reason would be portability.
>> 
>> You mean portability to other compilers (I think reasonable amount
>> of them support gcc-ish inline asm), or to other architectures?
> 
> Both.
> 
>> __builtin_ia32_pause () doesn't look like a builtin you would
>> want to use on PPC.
> 
> That's true, it should probably have a different name.
> 
> __builtin_pause()? 
> 
> The Linux kernel calls it cpu_relax() on all architectures.
> The following architectures implement it: ia64, powerpc, x86
> On others it just acts like a barrier.

Relax?  Weird.  "Pause" is just as weird.  It might be an ia32 instruction, so as an ia32 builtin it is a reasonable name  But if you want a generic builtin, you need a name that actually has some plausible connection with what it does, and neither "pause" nor "relax" do that.

	paul
Andi Kleen - May 26, 2011, 6:04 p.m.
> Relax?  Weird.  "Pause" is just as weird.  It might be an ia32 instruction, so as an ia32 builtin it is a reasonable name  But if you want a generic builtin, you need a name that actually has some plausible connection with what it does, and neither "pause" nor "relax" do that.

It's a short pause for the CPU. Both names fit quite well.

-Andi
Basile Starynkevitch - May 26, 2011, 7:37 p.m.
On Thu, 26 May 2011 13:48:13 -0400
Paul Koning <paul_koning@dell.com> wrote:

> Relax?  Weird.  "Pause" is just as weird.  It might be an ia32 instruction, 
> so as an ia32 builtin it is a reasonable name  But if you want a generic 
> builtin, you need a name that actually has some plausible connection with 
> what it does, and neither "pause" nor "relax" do that.

I still think that having a builtin which do a "compiler flush" that is
which spill all registers to memory is useful, eg a
builtin_compiler_flush()

And I even think there is another reason to use it. If you are
debugging a program compiled with -O2 -g, and if you know where there
could be a bug or a fault, temporarily adding a call to that
builtin_compiler_flush () would probably help the gdb debugger a lot.

Regards.
Andrew Haley - May 30, 2011, 9:50 a.m.
On 05/26/2011 08:37 PM, Basile Starynkevitch wrote:
> On Thu, 26 May 2011 13:48:13 -0400
> Paul Koning <paul_koning@dell.com> wrote:
> 
>> Relax?  Weird.  "Pause" is just as weird.  It might be an ia32 instruction, 
>> so as an ia32 builtin it is a reasonable name  But if you want a generic 
>> builtin, you need a name that actually has some plausible connection with 
>> what it does, and neither "pause" nor "relax" do that.
> 
> I still think that having a builtin which do a "compiler flush" that is
> which spill all registers to memory is useful, eg a
> builtin_compiler_flush()

I don't see how it can do that without causing reload failures.  You'd
have to be very careful somehow to identify user variables.

Andrew.

Patch

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 174216)
+++ doc/extend.texi	(working copy)
@@ -8699,7 +8699,8 @@  The following built-in function is alway

 @table @code
 @item void __builtin_ia32_pause (void)
-Generates the @code{pause} machine instruction with full memory barrier.
+Generates the @code{pause} machine instruction with a compiler memory
+barrier.
 @end table

 The following floating point built-in functions are made available in the
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 174216)
+++ ChangeLog	(working copy)
@@ -1,3 +1,8 @@ 
+2011-05-25  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* doc/extend.texi (X86 Built-in Functions): Update pause
+	intrinsic.
+
 2011-05-25  Bernd Schmidt  <bernds@codesourcery.com>

 	PR bootstrap/49160