diff mbox series

[2/5] x86: Add -mindirect-branch-loop=

Message ID 20180107225904.11535-3-hjl.tools@gmail.com
State New
Headers show
Series x86: CVE-2017-5715, aka Spectre | expand

Commit Message

H.J. Lu Jan. 7, 2018, 10:59 p.m. UTC
Add -mindirect-branch-loop= option to control loop filler in call and
return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
as loop filler.  The default is 'lfence'.

gcc/

	* config/i386/i386-opts.h (indirect_branch_loop): New.
	* config/i386/i386.c (output_indirect_thunk): Support
	-mindirect-branch-loop=.
	* config/i386/i386.opt (mindirect-branch-loop=): New option.
	(indirect_branch_loop): New.
	(lfence): Likewise.
	(pause): Likewise.
	(nop): Likewise.
	* doc/invoke.texi: Document -mindirect-branch-loop= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
---
 gcc/config/i386/i386-opts.h                           |  6 ++++++
 gcc/config/i386/i386.c                                | 19 +++++++++++++++++--
 gcc/config/i386/i386.opt                              | 17 +++++++++++++++++
 gcc/doc/invoke.texi                                   |  9 ++++++++-
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c | 19 +++++++++++++++++++
 9 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c

Comments

Florian Weimer Jan. 8, 2018, 8:20 a.m. UTC | #1
* H. J. Lu:

> Add -mindirect-branch-loop= option to control loop filler in call and
> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> as loop filler.  The default is 'lfence'.

Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
execution?
H.J. Lu Jan. 8, 2018, 11:24 a.m. UTC | #2
On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

My understanding is that a loop works better.
Florian Weimer Jan. 8, 2018, 6:32 p.m. UTC | #3
* H. J. Lu:

> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>> as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> My understanding is that a loop works better.

Better how?

What about far jumps?  I think they prevent some forms of prefetch on
i386, so perhaps long mode is similar in that regard?
H.J. Lu Jan. 8, 2018, 7:16 p.m. UTC | #4
On Mon, Jan 8, 2018 at 10:32 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>> as loop filler.  The default is 'lfence'.
>>>
>>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>>> execution?
>>
>> My understanding is that a loop works better.
>
> Better how?
>
> What about far jumps?  I think they prevent some forms of prefetch on
> i386, so perhaps long mode is similar in that regard?

These are more expensive and we can't guarantee that they are
effective, hence the short loop .
David Woodhouse Jan. 8, 2018, 9 p.m. UTC | #5
On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
> * H. J. Lu:
> 
> > Add -mindirect-branch-loop= option to control loop filler in call and
> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> > as loop filler.  The default is 'lfence'.
> 
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

The idea is not to stop it per se, but to capture it. We trick the
speculative execution into *thinking* it's going to return back to that
endless loop, which prevents it from doing the branch prediction which
would otherwise have got into trouble.

There has been a fair amount of bikeshedding of precisely what goes in
there already, and '1: pause; jmp 1b' is the best option that hasn't
been shot down in flames by the CPU architects.

HJ, do we still actually need the options for lfence and nop? I thought
those were originally just for testing and could possibly be dropped
now?

Not that I care for Linux since I'm providing my own external thunk
anyway...
H.J. Lu Jan. 8, 2018, 9:07 p.m. UTC | #6
On Mon, Jan 8, 2018 at 1:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
>> * H. J. Lu:
>>
>> > Add -mindirect-branch-loop= option to control loop filler in call and
>> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> > as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> The idea is not to stop it per se, but to capture it. We trick the
> speculative execution into *thinking* it's going to return back to that
> endless loop, which prevents it from doing the branch prediction which
> would otherwise have got into trouble.
>
> There has been a fair amount of bikeshedding of precisely what goes in
> there already, and '1: pause; jmp 1b' is the best option that hasn't
> been shot down in flames by the CPU architects.
>
> HJ, do we still actually need the options for lfence and nop? I thought
> those were originally just for testing and could possibly be dropped
> now?

This is a trial change.  It may be useful later.  But I can drop it and
hardcode it to "pause".
Jeff Law Jan. 11, 2018, 9:42 p.m. UTC | #7
On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mindirect-branch-loop= option to control loop filler in call and
> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> as loop filler.  The default is 'lfence'.
> 
> gcc/
> 
> 	* config/i386/i386-opts.h (indirect_branch_loop): New.
> 	* config/i386/i386.c (output_indirect_thunk): Support
> 	-mindirect-branch-loop=.
> 	* config/i386/i386.opt (mindirect-branch-loop=): New option.
> 	(indirect_branch_loop): New.
> 	(lfence): Likewise.
> 	(pause): Likewise.
> 	(nop): Likewise.
> 	* doc/invoke.texi: Document -mindirect-branch-loop= option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
> 	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
I think we should drop the ability to change the filler until such time
as we really need it.  Just pick one and go with it.  I think David
suggested that they wanted "pause".  I'm obviously fine with that.


jeff
H.J. Lu Jan. 11, 2018, 9:48 p.m. UTC | #8
On Thu, Jan 11, 2018 at 1:42 PM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>>
>> gcc/
>>
>>       * config/i386/i386-opts.h (indirect_branch_loop): New.
>>       * config/i386/i386.c (output_indirect_thunk): Support
>>       -mindirect-branch-loop=.
>>       * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>       (indirect_branch_loop): New.
>>       (lfence): Likewise.
>>       (pause): Likewise.
>>       (nop): Likewise.
>>       * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>
>> gcc/testsuite/
>>
>>       * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>       * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> I think we should drop the ability to change the filler until such time
> as we really need it.  Just pick one and go with it.  I think David
> suggested that they wanted "pause".  I'm obviously fine with that.

I will hard code it to "pause".
Martin Jambor Jan. 12, 2018, 12:38 p.m. UTC | #9
Hi,

On Thu, Jan 11 2018, Jeff Law wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>> 
>> gcc/
>> 
>> 	* config/i386/i386-opts.h (indirect_branch_loop): New.
>> 	* config/i386/i386.c (output_indirect_thunk): Support
>> 	-mindirect-branch-loop=.
>> 	* config/i386/i386.opt (mindirect-branch-loop=): New option.
>> 	(indirect_branch_loop): New.
>> 	(lfence): Likewise.
>> 	(pause): Likewise.
>> 	(nop): Likewise.
>> 	* doc/invoke.texi: Document -mindirect-branch-loop= option.
>> 
>> gcc/testsuite/
>> 
>> 	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
>> 	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> I think we should drop the ability to change the filler until such time
> as we really need it.  Just pick one and go with it.  I think David
> suggested that they wanted "pause".  I'm obviously fine with that.
>

unless I am mistaken (which is frankly quite possible, I am still not
quite up to speed about the nuances), AMD strongly prefers the lfence
variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
does not matter in the most pressing case and we might want to have a
mechanism doing something similar for protecting userspace later on.
But perhaps it is enough to keep the option?

Muthu and/or Venkat, can you please comment?

Thank you,

Martin
H.J. Lu Jan. 12, 2018, 2:05 p.m. UTC | #10
On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Thu, Jan 11 2018, Jeff Law wrote:
>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>> as loop filler.  The default is 'lfence'.
>>>
>>> gcc/
>>>
>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>      -mindirect-branch-loop=.
>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>      (indirect_branch_loop): New.
>>>      (lfence): Likewise.
>>>      (pause): Likewise.
>>>      (nop): Likewise.
>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>
>>> gcc/testsuite/
>>>
>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>> I think we should drop the ability to change the filler until such time
>> as we really need it.  Just pick one and go with it.  I think David
>> suggested that they wanted "pause".  I'm obviously fine with that.
>>
>
> unless I am mistaken (which is frankly quite possible, I am still not
> quite up to speed about the nuances), AMD strongly prefers the lfence
> variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
> does not matter in the most pressing case and we might want to have a
> mechanism doing something similar for protecting userspace later on.
> But perhaps it is enough to keep the option?
>
> Muthu and/or Venkat, can you please comment?

If we do want it, I will submit a separate patch AFTER the current patch
set has been approved and checked into GCC 8.
Kumar, Venkataramanan Jan. 12, 2018, 2:46 p.m. UTC | #11
Hi all,

> -----Original Message-----

> From: H.J. Lu [mailto:hjl.tools@gmail.com]

> Sent: Friday, January 12, 2018 7:36 PM

> To: Martin Jambor <mjambor@suse.cz>

> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC

> Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>

> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> 

> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:

> > Hi,

> >

> > On Thu, Jan 11 2018, Jeff Law wrote:

> >> On 01/07/2018 03:59 PM, H.J. Lu wrote:

> >>> Add -mindirect-branch-loop= option to control loop filler in call

> >>> and return thunks generated by -mindirect-branch=.  'lfence' uses

> "lfence"

> >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"

> >>> as loop filler.  The default is 'lfence'.

> >>>

> >>> gcc/

> >>>

> >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.

> >>>      * config/i386/i386.c (output_indirect_thunk): Support

> >>>      -mindirect-branch-loop=.

> >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.

> >>>      (indirect_branch_loop): New.

> >>>      (lfence): Likewise.

> >>>      (pause): Likewise.

> >>>      (nop): Likewise.

> >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.

> >>>

> >>> gcc/testsuite/

> >>>

> >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.

> >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.

> >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.

> >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.

> >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.

> >> I think we should drop the ability to change the filler until such

> >> time as we really need it.  Just pick one and go with it.  I think

> >> David suggested that they wanted "pause".  I'm obviously fine with that.

> >>

> >

> > unless I am mistaken (which is frankly quite possible, I am still not

> > quite up to speed about the nuances), AMD strongly prefers the lfence

> > variant.  OTOH, IIUC, in kernel this will be run-time patched but so

> > it does not matter in the most pressing case and we might want to have

> > a mechanism doing something similar for protecting userspace later on.

> > But perhaps it is enough to keep the option?

> >

> > Muthu and/or Venkat, can you please comment?

> 

> If we do want it, I will submit a separate patch AFTER the current patch set

> has been approved and checked into GCC 8.

> 


As per AMD architects, using “lfence” in “retpoline” is better than “pause” for our targets.
So please allow filler to use "lfence". 

Regards.
Venkat.
> --

> H.J.
Kumar, Venkataramanan Jan. 12, 2018, 3:09 p.m. UTC | #12
Hi all, 

> -----Original Message-----

> From: Kumar, Venkataramanan

> Sent: Friday, January 12, 2018 8:16 PM

> To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>

> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; Uros

> Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'

> <jh@suse.de>

> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> 

> Hi all,

> 

> > -----Original Message-----

> > From: H.J. Lu [mailto:hjl.tools@gmail.com]

> > Sent: Friday, January 12, 2018 7:36 PM

> > To: Martin Jambor <mjambor@suse.cz>

> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC

> Patches

> > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>

> > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> >

> > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:

> > > Hi,

> > >

> > > On Thu, Jan 11 2018, Jeff Law wrote:

> > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:

> > >>> Add -mindirect-branch-loop= option to control loop filler in call

> > >>> and return thunks generated by -mindirect-branch=.  'lfence' uses

> > "lfence"

> > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"

> > >>> as loop filler.  The default is 'lfence'.

> > >>>

> > >>> gcc/

> > >>>

> > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.

> > >>>      * config/i386/i386.c (output_indirect_thunk): Support

> > >>>      -mindirect-branch-loop=.

> > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.

> > >>>      (indirect_branch_loop): New.

> > >>>      (lfence): Likewise.

> > >>>      (pause): Likewise.

> > >>>      (nop): Likewise.

> > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.

> > >>>

> > >>> gcc/testsuite/

> > >>>

> > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.

> > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.

> > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.

> > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.

> > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.

> > >> I think we should drop the ability to change the filler until such

> > >> time as we really need it.  Just pick one and go with it.  I think

> > >> David suggested that they wanted "pause".  I'm obviously fine with that.

> > >>

> > >

> > > unless I am mistaken (which is frankly quite possible, I am still

> > > not quite up to speed about the nuances), AMD strongly prefers the

> > > lfence variant.  OTOH, IIUC, in kernel this will be run-time patched

> > > but so it does not matter in the most pressing case and we might

> > > want to have a mechanism doing something similar for protecting

> userspace later on.

> > > But perhaps it is enough to keep the option?

> > >

> > > Muthu and/or Venkat, can you please comment?

> >

> > If we do want it, I will submit a separate patch AFTER the current

> > patch set has been approved and checked into GCC 8.

> >

> 

> As per AMD architects, using “lfence” in “retpoline” is better than “pause” for

> our targets.

> So please allow filler to use "lfence".

> 

We also leant that "lfence" is a dispatch serializing instruction.  The Pause instruction is not serializing on AMD processors and has high latencies. 

 Regards.
 Venkat.
> > --

> > H.J.
Jeff Law Jan. 12, 2018, 4 p.m. UTC | #13
On 01/12/2018 08:09 AM, Kumar, Venkataramanan wrote:
> Hi all, 
> 
>> -----Original Message-----
>> From: Kumar, Venkataramanan
>> Sent: Friday, January 12, 2018 8:16 PM
>> To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; Uros
>> Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> <jh@suse.de>
>> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>
>> Hi all,
>>
>>> -----Original Message-----
>>> From: H.J. Lu [mailto:hjl.tools@gmail.com]
>>> Sent: Friday, January 12, 2018 7:36 PM
>>> To: Martin Jambor <mjambor@suse.cz>
>>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>>> Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
>> Patches
>>> <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
>>> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>>
>>> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> Hi,
>>>>
>>>> On Thu, Jan 11 2018, Jeff Law wrote:
>>>>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>>>>> Add -mindirect-branch-loop= option to control loop filler in call
>>>>>> and return thunks generated by -mindirect-branch=.  'lfence' uses
>>> "lfence"
>>>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>>>> as loop filler.  The default is 'lfence'.
>>>>>>
>>>>>> gcc/
>>>>>>
>>>>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>>>>      -mindirect-branch-loop=.
>>>>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>>>>      (indirect_branch_loop): New.
>>>>>>      (lfence): Likewise.
>>>>>>      (pause): Likewise.
>>>>>>      (nop): Likewise.
>>>>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>>>>
>>>>>> gcc/testsuite/
>>>>>>
>>>>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>>>>> I think we should drop the ability to change the filler until such
>>>>> time as we really need it.  Just pick one and go with it.  I think
>>>>> David suggested that they wanted "pause".  I'm obviously fine with that.
>>>>>
>>>>
>>>> unless I am mistaken (which is frankly quite possible, I am still
>>>> not quite up to speed about the nuances), AMD strongly prefers the
>>>> lfence variant.  OTOH, IIUC, in kernel this will be run-time patched
>>>> but so it does not matter in the most pressing case and we might
>>>> want to have a mechanism doing something similar for protecting
>> userspace later on.
>>>> But perhaps it is enough to keep the option?
>>>>
>>>> Muthu and/or Venkat, can you please comment?
>>>
>>> If we do want it, I will submit a separate patch AFTER the current
>>> patch set has been approved and checked into GCC 8.
>>>
>>
>> As per AMD architects, using “lfence” in “retpoline” is better than “pause” for
>> our targets.
>> So please allow filler to use "lfence".
>>
> We also leant that "lfence" is a dispatch serializing instruction.  The Pause instruction is not serializing on AMD processors and has high latencies. 
So the concern I have if we end up wanting different sequences for
different micro-architectures is you either up having to build multiple
kernels, libraries and executables or you end up doing dynamic patching
(ifuncs which are often used for this kind of dynamic selection don't
really help in this particular scenario).

It's unlikely we'll see distributors building multiple kernels,
libraries and executables within a processor family.  The kernel does
have dynamic patching, but we don't have that for anything in user space
(but I think folks will be looking at that in the relatively near future).

So it's difficult to see a scenario where having the compiler make
different codegen choices for these sequences based on the
microarchitecture is all that useful.

ISTM we want to pick a sequence that is reasonably OK, then look to
dynamic patching to micro-optimize if the benefit is significant.
Ideally going forward these issues will be largely addressed at the
hardware level with perhaps some cooperation with kernel and repolines
become a mitigation technique for legacy hardware.

Thoughts Venkat?

Jeff
Jeff Law Jan. 12, 2018, 4:02 p.m. UTC | #14
On 01/12/2018 07:05 AM, H.J. Lu wrote:
> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
>> Hi,
>>
>> On Thu, Jan 11 2018, Jeff Law wrote:
>>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>> as loop filler.  The default is 'lfence'.
>>>>
>>>> gcc/
>>>>
>>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>>      -mindirect-branch-loop=.
>>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>>      (indirect_branch_loop): New.
>>>>      (lfence): Likewise.
>>>>      (pause): Likewise.
>>>>      (nop): Likewise.
>>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>>
>>>> gcc/testsuite/
>>>>
>>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>>> I think we should drop the ability to change the filler until such time
>>> as we really need it.  Just pick one and go with it.  I think David
>>> suggested that they wanted "pause".  I'm obviously fine with that.
>>>
>>
>> unless I am mistaken (which is frankly quite possible, I am still not
>> quite up to speed about the nuances), AMD strongly prefers the lfence
>> variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
>> does not matter in the most pressing case and we might want to have a
>> mechanism doing something similar for protecting userspace later on.
>> But perhaps it is enough to keep the option?
>>
>> Muthu and/or Venkat, can you please comment?
> 
> If we do want it, I will submit a separate patch AFTER the current patch
> set has been approved and checked into GCC 8.
That would be my preference, if we need it at all.

jeff
Kumar, Venkataramanan Jan. 12, 2018, 6:27 p.m. UTC | #15
Hi HJ, 

> -----Original Message-----

> From: Kumar, Venkataramanan

> Sent: Friday, January 12, 2018 8:39 PM

> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor' <mjambor@suse.cz>

> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;

> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'

> <jh@suse.de>

> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> 

> Hi all,

> 

> > -----Original Message-----

> > From: Kumar, Venkataramanan

> > Sent: Friday, January 12, 2018 8:16 PM

> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>

> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;

> Uros

> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'

> > <jh@suse.de>

> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> >

> > Hi all,

> >

> > > -----Original Message-----

> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]

> > > Sent: Friday, January 12, 2018 7:36 PM

> > > To: Martin Jambor <mjambor@suse.cz>

> > > Cc: Nagarajan, Muthu kumar raj

> <Muthukumarraj.Nagarajan@amd.com>;

> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC

> > Patches

> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>

> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> > >

> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>

> wrote:

> > > > Hi,

> > > >

> > > > On Thu, Jan 11 2018, Jeff Law wrote:

> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:

> > > >>> Add -mindirect-branch-loop= option to control loop filler in

> > > >>> call and return thunks generated by -mindirect-branch=.

> > > >>> 'lfence' uses

> > > "lfence"

> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"

> > > >>> as loop filler.  The default is 'lfence'.

> > > >>>

> > > >>> gcc/

> > > >>>

> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.

> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support

> > > >>>      -mindirect-branch-loop=.

> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.

> > > >>>      (indirect_branch_loop): New.

> > > >>>      (lfence): Likewise.

> > > >>>      (pause): Likewise.

> > > >>>      (nop): Likewise.

> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.

> > > >>>

> > > >>> gcc/testsuite/

> > > >>>

> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.

> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.

> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.

> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.

> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.

> > > >> I think we should drop the ability to change the filler until

> > > >> such time as we really need it.  Just pick one and go with it.  I

> > > >> think David suggested that they wanted "pause".  I'm obviously fine

> with that.

> > > >>

> > > >

> > > > unless I am mistaken (which is frankly quite possible, I am still

> > > > not quite up to speed about the nuances), AMD strongly prefers the

> > > > lfence variant.  OTOH, IIUC, in kernel this will be run-time

> > > > patched but so it does not matter in the most pressing case and we

> > > > might want to have a mechanism doing something similar for

> > > > protecting

> > userspace later on.

> > > > But perhaps it is enough to keep the option?

> > > >

> > > > Muthu and/or Venkat, can you please comment?

> > >

> > > If we do want it, I will submit a separate patch AFTER the current

> > > patch set has been approved and checked into GCC 8.

> > >

> >

> > As per AMD architects, using “lfence” in “retpoline” is better than

> > “pause” for our targets.

> > So please allow filler to use "lfence".

> >

> We also leant that "lfence" is a dispatch serializing instruction.  The Puse

> instruction is not serializing on AMD processors and has high latencies.

> 

Any reason why Intel has chosen "pause" over "lfence" as the default loop filler for Retpoline?

>  Regards.

>  Venkat.

> > > --

> > > H.J.
H.J. Lu Jan. 12, 2018, 7:40 p.m. UTC | #16
On Fri, Jan 12, 2018 at 10:27 AM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi HJ,
>
>> -----Original Message-----
>> From: Kumar, Venkataramanan
>> Sent: Friday, January 12, 2018 8:39 PM
>> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor' <mjambor@suse.cz>
>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;
>> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> <jh@suse.de>
>> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>
>> Hi all,
>>
>> > -----Original Message-----
>> > From: Kumar, Venkataramanan
>> > Sent: Friday, January 12, 2018 8:16 PM
>> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
>> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;
>> Uros
>> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> > <jh@suse.de>
>> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>> >
>> > Hi all,
>> >
>> > > -----Original Message-----
>> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]
>> > > Sent: Friday, January 12, 2018 7:36 PM
>> > > To: Martin Jambor <mjambor@suse.cz>
>> > > Cc: Nagarajan, Muthu kumar raj
>> <Muthukumarraj.Nagarajan@amd.com>;
>> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
>> > Patches
>> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
>> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>> > >
>> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>
>> wrote:
>> > > > Hi,
>> > > >
>> > > > On Thu, Jan 11 2018, Jeff Law wrote:
>> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> > > >>> Add -mindirect-branch-loop= option to control loop filler in
>> > > >>> call and return thunks generated by -mindirect-branch=.
>> > > >>> 'lfence' uses
>> > > "lfence"
>> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> > > >>> as loop filler.  The default is 'lfence'.
>> > > >>>
>> > > >>> gcc/
>> > > >>>
>> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support
>> > > >>>      -mindirect-branch-loop=.
>> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>> > > >>>      (indirect_branch_loop): New.
>> > > >>>      (lfence): Likewise.
>> > > >>>      (pause): Likewise.
>> > > >>>      (nop): Likewise.
>> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>> > > >>>
>> > > >>> gcc/testsuite/
>> > > >>>
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>> > > >> I think we should drop the ability to change the filler until
>> > > >> such time as we really need it.  Just pick one and go with it.  I
>> > > >> think David suggested that they wanted "pause".  I'm obviously fine
>> with that.
>> > > >>
>> > > >
>> > > > unless I am mistaken (which is frankly quite possible, I am still
>> > > > not quite up to speed about the nuances), AMD strongly prefers the
>> > > > lfence variant.  OTOH, IIUC, in kernel this will be run-time
>> > > > patched but so it does not matter in the most pressing case and we
>> > > > might want to have a mechanism doing something similar for
>> > > > protecting
>> > userspace later on.
>> > > > But perhaps it is enough to keep the option?
>> > > >
>> > > > Muthu and/or Venkat, can you please comment?
>> > >
>> > > If we do want it, I will submit a separate patch AFTER the current
>> > > patch set has been approved and checked into GCC 8.
>> > >
>> >
>> > As per AMD architects, using “lfence” in “retpoline” is better than
>> > “pause” for our targets.
>> > So please allow filler to use "lfence".
>> >
>> We also leant that "lfence" is a dispatch serializing instruction.  The Puse
>> instruction is not serializing on AMD processors and has high latencies.
>>
> Any reason why Intel has chosen "pause" over "lfence" as the default loop filler for Retpoline?
>

My original patch uses "lfence".  I was asked to use "pause":

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
Kumar, Venkataramanan Jan. 13, 2018, 3:11 a.m. UTC | #17
Hi All, 

> -----Original Message-----

> From: H.J. Lu [mailto:hjl.tools@gmail.com]

> Sent: Saturday, January 13, 2018 1:11 AM

> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>

> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> GCC Patches <gcc-patches@gcc.gnu.org>; Martin Jambor

> <mjambor@suse.cz>; Jeff Law <law@redhat.com>; Uros Bizjak

> (ubizjak@gmail.com) <ubizjak@gmail.com>; Jan Hubicka <jh@suse.de>;

> Dharmakan, Rohit arul raj <Rohitarulraj.Dharmakan@amd.com>

> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> 

> On Fri, Jan 12, 2018 at 10:27 AM, Kumar, Venkataramanan

> <Venkataramanan.Kumar@amd.com> wrote:

> > Hi HJ,

> >

> >> -----Original Message-----

> >> From: Kumar, Venkataramanan

> >> Sent: Friday, January 12, 2018 8:39 PM

> >> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor'

> >> <mjambor@suse.cz>

> >> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> >> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;

> >> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'

> >> <jh@suse.de>

> >> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> >>

> >> Hi all,

> >>

> >> > -----Original Message-----

> >> > From: Kumar, Venkataramanan

> >> > Sent: Friday, January 12, 2018 8:16 PM

> >> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor

> >> > <mjambor@suse.cz>

> >> > Cc: Nagarajan, Muthu kumar raj

> <Muthukumarraj.Nagarajan@amd.com>;

> >> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;

> >> Uros

> >> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'

> >> > <jh@suse.de>

> >> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> >> >

> >> > Hi all,

> >> >

> >> > > -----Original Message-----

> >> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]

> >> > > Sent: Friday, January 12, 2018 7:36 PM

> >> > > To: Martin Jambor <mjambor@suse.cz>

> >> > > Cc: Nagarajan, Muthu kumar raj

> >> <Muthukumarraj.Nagarajan@amd.com>;

> >> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC

> >> > Patches

> >> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>

> >> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> >> > >

> >> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>

> >> wrote:

> >> > > > Hi,

> >> > > >

> >> > > > On Thu, Jan 11 2018, Jeff Law wrote:

> >> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:

> >> > > >>> Add -mindirect-branch-loop= option to control loop filler in

> >> > > >>> call and return thunks generated by -mindirect-branch=.

> >> > > >>> 'lfence' uses

> >> > > "lfence"

> >> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"

> >> > > >>> as loop filler.  The default is 'lfence'.

> >> > > >>>

> >> > > >>> gcc/

> >> > > >>>

> >> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.

> >> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support

> >> > > >>>      -mindirect-branch-loop=.

> >> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.

> >> > > >>>      (indirect_branch_loop): New.

> >> > > >>>      (lfence): Likewise.

> >> > > >>>      (pause): Likewise.

> >> > > >>>      (nop): Likewise.

> >> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.

> >> > > >>>

> >> > > >>> gcc/testsuite/

> >> > > >>>

> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.

> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.

> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.

> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.

> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.

> >> > > >> I think we should drop the ability to change the filler until

> >> > > >> such time as we really need it.  Just pick one and go with it.

> >> > > >> I think David suggested that they wanted "pause".  I'm

> >> > > >> obviously fine

> >> with that.

> >> > > >>

> >> > > >

> >> > > > unless I am mistaken (which is frankly quite possible, I am

> >> > > > still not quite up to speed about the nuances), AMD strongly

> >> > > > prefers the lfence variant.  OTOH, IIUC, in kernel this will be

> >> > > > run-time patched but so it does not matter in the most pressing

> >> > > > case and we might want to have a mechanism doing something

> >> > > > similar for protecting

> >> > userspace later on.

> >> > > > But perhaps it is enough to keep the option?

> >> > > >

> >> > > > Muthu and/or Venkat, can you please comment?

> >> > >

> >> > > If we do want it, I will submit a separate patch AFTER the

> >> > > current patch set has been approved and checked into GCC 8.

> >> > >

> >> >

> >> > As per AMD architects, using “lfence” in “retpoline” is better than

> >> > “pause” for our targets.

> >> > So please allow filler to use "lfence".

> >> >

> >> We also leant that "lfence" is a dispatch serializing instruction.

> >> The Puse instruction is not serializing on AMD processors and has high

> latencies.

> >>

> > Any reason why Intel has chosen "pause" over "lfence" as the default loop

> filler for Retpoline?

> >

> 

> My original patch uses "lfence".  I was asked to use "pause":

> 

> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html


If everyone is ok, my suggestion is to use  "lfence" as the default loop filler for retpoline.

Please confirm.

> 

> --

> H.J.
David Woodhouse Jan. 13, 2018, 8:58 a.m. UTC | #18
On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
> 
> > My original patch uses "lfence".  I was asked to use "pause":
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
> 
> If everyone is ok, my suggestion is to use  "lfence" as the default
> loop filler for retpoline.
> 
> Please confirm.

I have had the same request for the kernel patches. I'm happy with it
but would like confirmation from Intel and from Paul Turner (whose idea
this is, and who has overseen most of the coherent analysis).

FWIW I haven't actually *changed* the kernel patch yet, awaiting that
confirmation. I understand this is a power optimisation only;
preventing the CPU from spinning in that loop when it's mispredicted a
return to it.
David Woodhouse Jan. 13, 2018, 4:34 p.m. UTC | #19
On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
> 
> > My original patch uses "lfence".  I was asked to use "pause":
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
> 
> If everyone is ok, my suggestion is to use  "lfence" as the default
> loop filler for retpoline.
> 
> Please confirm.

I have the same response to this as I did to precisely the same request
in the kernel implementation: I'm OK with it but I'd like an explicit
approval from Intel, and from Paul Turner at Google who architected the
retpoline concept in the first place.

cf. https://marc.info/?l=linux-kernel&m=151585246530355
H.J. Lu Jan. 13, 2018, 4:37 p.m. UTC | #20
On Sat, Jan 13, 2018 at 8:34 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
>>
>> > My original patch uses "lfence".  I was asked to use "pause":
>> >
>> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
>>
>> If everyone is ok, my suggestion is to use  "lfence" as the default
>> loop filler for retpoline.
>>
>> Please confirm.
>
> I have the same response to this as I did to precisely the same request
> in the kernel implementation: I'm OK with it but I'd like an explicit
> approval from Intel, and from Paul Turner at Google who architected the
> retpoline concept in the first place.
>
> cf. https://marc.info/?l=linux-kernel&m=151585246530355
>

There is no urgency on this.  We can take another look AFTER
my current patch set is approved and checked in.
Van De Ven, Arjan Jan. 13, 2018, 4:46 p.m. UTC | #21
> > If everyone is ok, my suggestion is to use  "lfence" as the default

> > loop filler for retpoline.


can we do BOTH a pause and lfence.
(that way on cpu's where pause is the power stop, it works, and on cpus where it's a fallthrough (AMD) it goes to the lfence)
Kumar, Venkataramanan Jan. 14, 2018, 9:02 a.m. UTC | #22
Hi Arjan, 

> -----Original Message-----

> From: Van De Ven, Arjan [mailto:arjan.van.de.ven@intel.com]

> Sent: Saturday, January 13, 2018 10:16 PM

> To: David Woodhouse <dwmw2@infradead.org>; Kumar, Venkataramanan

> <Venkataramanan.Kumar@amd.com>; H.J. Lu <hjl.tools@gmail.com>; Jeff

> Law <law@redhat.com>; Paul Turner <pjt@google.com>; Mallick, Asit K

> <asit.k.mallick@intel.com>

> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;

> GCC Patches <gcc-patches@gcc.gnu.org>; Martin Jambor

> <mjambor@suse.cz>; Uros Bizjak (ubizjak@gmail.com)

> <ubizjak@gmail.com>; Jan Hubicka <jh@suse.de>; Dharmakan, Rohit arul raj

> <Rohitarulraj.Dharmakan@amd.com>

> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

> 

> > > If everyone is ok, my suggestion is to use  "lfence" as the default

> > > loop filler for retpoline.

> 

> can we do BOTH a pause and lfence.

> (that way on cpu's where pause is the power stop, it works, and on cpus

> where it's a fallthrough (AMD) it goes to the lfence)

> 


I checked with our Architect. Having just "pause" is the concern. 
It should also be fine for AMD to use "pause" followed by "lfence" in the loop of retpoline.

Regards,
Venkat.
diff mbox series

Patch

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index f14cbeee7a1..52f7cb979cc 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -114,4 +114,10 @@  enum indirect_branch {
   indirect_branch_thunk_extern
 };
 
+enum indirect_branch_loop {
+  indirect_branch_loop_lfence,
+  indirect_branch_loop_pause,
+  indirect_branch_loop_nop
+};
+
 #endif
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ac4d1f62f50..f327a6b1b62 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10753,8 +10753,23 @@  output_indirect_thunk (bool need_bnd_p, int regno)
 
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
 
-  /* lfence .  */
-  fprintf (asm_out_file, "\tlfence\n");
+  switch (ix86_indirect_branch_loop)
+    {
+    case indirect_branch_loop_lfence:
+      /* lfence.  */
+      fprintf (asm_out_file, "\tlfence\n");
+      break;
+    case indirect_branch_loop_pause:
+      /* pause.  */
+      fprintf (asm_out_file, "\tpause\n");
+      break;
+    case indirect_branch_loop_nop:
+      /* nop.  */
+      fprintf (asm_out_file, "\tnop\n");
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
   /* Jump.  */
   fputs ("\tjmp\t", asm_out_file);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 22c806206e4..f3e43da0aed 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1041,3 +1041,20 @@  Enum(indirect_branch) String(thunk-inline) Value(indirect_branch_thunk_inline)
 
 EnumValue
 Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern)
+
+mindirect-branch-loop=
+Target Report RejectNegative Joined Enum(indirect_branch_loop) Var(ix86_indirect_branch_loop) Init(indirect_branch_loop_lfence)
+Control loop filler in call and return thunk for indirect call and jump.
+
+Enum
+Name(indirect_branch_loop) Type(enum indirect_branch_loop)
+Known looop choices (for use with the -mindirect-branch-loop= option):
+
+EnumValue
+Enum(indirect_branch_loop) String(lfence) Value(indirect_branch_loop_lfence)
+
+EnumValue
+Enum(indirect_branch_loop) String(pause) Value(indirect_branch_loop_pause)
+
+EnumValue
+Enum(indirect_branch_loop) String(nop) Value(indirect_branch_loop_nop)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 46461d1ada3..ee5248aba59 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,7 +1227,7 @@  See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset} @gol
 -mstack-protector-guard-symbol=@var{symbol} -mmitigate-rop @gol
 -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
--mindirect-branch=@var{choice}}
+-mindirect-branch=@var{choice} -mindirect-branch-loop=@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -26776,6 +26776,13 @@  to external call and return thunk provided in a separate object file.
 You can control this behavior for a specific function by using the
 function attribute @code{indirect_branch}.  @xref{Function Attributes}.
 
+@item -mindirect-branch-loop=@var{choice}
+@opindex -mindirect-branch-boop
+Control loop filler in call and return thunk for indirect call and jump.
+@samp{lfence} uses @code{lfence} as loop filler.  @samp{pause} uses
+@code{pause} as loop filler.  @samp{nop} uses @code{nop} as loop filler.
+The default is @samp{lfence}.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
new file mode 100644
index 00000000000..1b0e2c58775
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
new file mode 100644
index 00000000000..feace47a765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=nop -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tnop} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
new file mode 100644
index 00000000000..ad2165fa7aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=lfence -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
new file mode 100644
index 00000000000..4ba997da966
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tpause} } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
new file mode 100644
index 00000000000..10fb2193f5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause|nop)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */