[v2,00/19] prevent bounds-check bypass via speculative execution

Message ID	151571798296.27429.7166552848688034184.stgit@dwillia2-desk3.amr.corp.intel.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zHkmQ5Jfzz9s9Y for <patchwork-incoming@ozlabs.org>; Fri, 12 Jan 2018 11:54:46 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932469AbeALAyn (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 11 Jan 2018 19:54:43 -0500 Received: from mga02.intel.com ([134.134.136.20]:13976 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932148AbeALAyk (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 11 Jan 2018 19:54:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jan 2018 16:54:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,346,1511856000"; d="scan'208";a="9118944" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 11 Jan 2018 16:54:37 -0800 Subject: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution From: Dan Williams <dan.j.williams@intel.com> To: linux-kernel@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com>, kernel-hardening@lists.openwall.com, Peter Zijlstra <peterz@infradead.org>, Alan Cox <alan.cox@intel.com>, Will Deacon <will.deacon@arm.com>, Alexei Starovoitov <ast@kernel.org>, Solomon Peachy <pizza@shaftnet.org>, "H. Peter Anvin" <hpa@zytor.com>, Christian Lamparter <chunkeey@googlemail.com>, Elena Reshetova <elena.reshetova@intel.com>, linux-arch@vger.kernel.org, Andi Kleen <ak@linux.intel.com>, "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>, linux-scsi@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>, x86@kernel.org, Russell King <linux@armlinux.org.uk>, Ingo Molnar <mingo@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>, linux-media@vger.kernel.org, Tom Lendacky <thomas.lendacky@amd.com>, Kees Cook <keescook@chromium.org>, Jan Kara <jack@suse.com>, Al Viro <viro@zeniv.linux.org.uk>, qla2xxx-upstream@qlogic.com, tglx@linutronix.de, Mauro Carvalho Chehab <mchehab@kernel.org>, Kalle Valo <kvalo@codeaurora.org>, alan@linux.intel.com, "Martin K. Petersen" <martin.petersen@oracle.com>, Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>, Greg KH <gregkh@linuxfoundation.org>, linux-wireless@vger.kernel.org, "Eric W. Biederman" <ebiederm@xmission.com>, netdev@vger.kernel.org, akpm@linux-foundation.org, torvalds@linux-foundation.org, "David S. Miller" <davem@davemloft.net>, Laurent Pinchart <laurent.pinchart@ideasonboard.com> Date: Thu, 11 Jan 2018 16:46:24 -0800 Message-ID: <151571798296.27429.7166552848688034184.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	prevent bounds-check bypass via speculative execution \| expand [v2,00/19] prevent bounds-check bypass via speculative execution [v2,09/19] ipv6: prevent bounds-check bypass via speculative execution [v2,10/19] ipv4: prevent bounds-check bypass via speculative execution [v2,15/19] carl9170: prevent bounds-check bypass via speculative execution [v2,16/19] p54: prevent bounds-check bypass via speculative execution [v2,18/19] cw1200: prevent bounds-check bypass via speculative execution [v2,19/19] net: mpls: prevent bounds-check bypass via speculative execution

Dan Williams Jan. 12, 2018, 12:46 a.m. UTC

Changes since v1 [1]:
* fixup the ifence definition to use alternative_2 per recent AMD
  changes in tip/x86/pti (Tom)

* drop 'nospec_ptr' (Linus, Mark)

* rename 'nospec_array_ptr' to 'array_ptr' (Alexei)

* rename 'nospec_barrier' to 'ifence' (Peter, Ingo)

* clean up occasions of 'variable assignment in if()' (Sergei, Stephen)

* make 'array_ptr' use a mask instead of an architectural ifence by
  default (Linus, Alexei)

* provide a command line and compile-time opt-in to the ifence
  mechanism, if an architecture provides 'ifence_array_ptr'.

* provide an optimized mask generation helper, 'array_ptr_mask', for
  x86 (Linus)

* move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
  (Linus)

* drop "Thermal/int340x: prevent bounds-check..." since userspace does
  not have arbitrary control over the 'trip' index (Srinivas)

* update the changelog of "net: mpls: prevent bounds-check..." and keep
  it in the series to continue the debate about Spectre hygiene patches.
  (Eric).

* record a reviewed-by from Laurent on "[media] uvcvideo: prevent
  bounds-check..."

* update the cover letter

[1]: https://lwn.net/Articles/743376/

---

Quoting Mark's original RFC:

"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."

This series incorporates Mark Rutland's latest ARM changes and adds
the x86 specific implementation of 'ifence_array_ptr'. That ifence
based approach is provided as an opt-in fallback, but the default
mitigation, '__array_ptr', uses a 'mask' approach that removes
conditional branches instructions, and otherwise aims to redirect
speculation to use a NULL pointer rather than a user controlled value.

The mask is generated by the following from Alexei, and Linus:

    mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);

...and Linus provided an optimized mask generation helper for x86:

    asm ("cmpq %1,%2; sbbq %0,%0;"
		:"=r" (mask)
		:"r"(sz),"r" (idx)
		:"cc");

The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
via the spectre_v1={mask,ifence} command line option, and the
compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
CONFIG_SPECTRE1_IFENCE.

The 'array_ptr' infrastructure is the primary focus this patch set. The
individual patches that perform 'array_ptr' conversions are a point in
time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
start at finding some of these gadgets.

Another consideration for reviewing these patches is the 'hygiene'
argument. When a patch refers to hygiene it is concerned with stopping
speculation on an unconstrained or insufficiently constrained pointer
value under userspace control. That by itself is not sufficient for
attack (per current understanding) [3], but it is a necessary
pre-condition.  So 'hygiene' refers to cleaning up those suspect
pointers regardless of whether they are usable as a gadget.

These patches are also be available via the 'nospec-v2' git branch
here:

    git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2

Note that the BPF fix for Spectre variant1 is merged in the bpf.git
tree [4], and is not included in this branch.

[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
[3]: https://spectreattack.com/spectre.pdf
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98

---

Dan Williams (16):
      x86: implement ifence()
      x86: implement ifence_array_ptr() and array_ptr_mask()
      asm-generic/barrier: mask speculative execution flows
      x86: introduce __uaccess_begin_nospec and ASM_IFENCE
      x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
      ipv6: prevent bounds-check bypass via speculative execution
      ipv4: prevent bounds-check bypass via speculative execution
      vfs, fdtable: prevent bounds-check bypass via speculative execution
      userns: prevent bounds-check bypass via speculative execution
      udf: prevent bounds-check bypass via speculative execution
      [media] uvcvideo: prevent bounds-check bypass via speculative execution
      carl9170: prevent bounds-check bypass via speculative execution
      p54: prevent bounds-check bypass via speculative execution
      qla2xxx: prevent bounds-check bypass via speculative execution
      cw1200: prevent bounds-check bypass via speculative execution
      net: mpls: prevent bounds-check bypass via speculative execution

Mark Rutland (3):
      Documentation: document array_ptr
      arm64: implement ifence_array_ptr()
      arm: implement ifence_array_ptr()

 Documentation/speculation.txt            |  142 ++++++++++++++++++++++++++++++
 arch/arm/Kconfig                         |    1 
 arch/arm/include/asm/barrier.h           |   24 +++++
 arch/arm64/Kconfig                       |    1 
 arch/arm64/include/asm/barrier.h         |   24 +++++
 arch/x86/Kconfig                         |    3 +
 arch/x86/include/asm/barrier.h           |   46 ++++++++++
 arch/x86/include/asm/msr.h               |    3 -
 arch/x86/include/asm/smap.h              |    4 +
 arch/x86/include/asm/uaccess.h           |   16 +++
 arch/x86/include/asm/uaccess_32.h        |    6 +
 arch/x86/include/asm/uaccess_64.h        |   12 +--
 arch/x86/lib/copy_user_64.S              |    3 +
 arch/x86/lib/usercopy_32.c               |    8 +-
 drivers/media/usb/uvc/uvc_v4l2.c         |    9 +-
 drivers/net/wireless/ath/carl9170/main.c |    7 +
 drivers/net/wireless/intersil/p54/main.c |    9 +-
 drivers/net/wireless/st/cw1200/sta.c     |   11 +-
 drivers/net/wireless/st/cw1200/wsm.h     |    4 -
 drivers/scsi/qla2xxx/qla_mr.c            |   17 ++--
 fs/udf/misc.c                            |   40 +++++---
 include/linux/fdtable.h                  |    7 +
 include/linux/nospec.h                   |   71 +++++++++++++++
 kernel/Kconfig.nospec                    |   31 +++++++
 kernel/Makefile                          |    1 
 kernel/nospec.c                          |   52 +++++++++++
 kernel/user_namespace.c                  |   11 +-
 lib/Kconfig                              |    3 +
 net/ipv4/raw.c                           |   10 +-
 net/ipv6/raw.c                           |   10 +-
 net/mpls/af_mpls.c                       |   12 +--
 31 files changed, 521 insertions(+), 77 deletions(-)
 create mode 100644 Documentation/speculation.txt
 create mode 100644 include/linux/nospec.h
 create mode 100644 kernel/Kconfig.nospec
 create mode 100644 kernel/nospec.c

Linus Torvalds Jan. 12, 2018, 1:19 a.m. UTC | #1

On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.

Do you have any performance numbers and perhaps example code
generation? Is this noticeable? Are there any microbenchmarks showing
the difference between lfence use and the masking model?

Having both seems good for testing, but wouldn't we want to pick one in the end?

Also, I do think that there is one particular array load that would
seem to be pretty obvious: the system call function pointer array.

Yes, yes, the actual call is now behind a retpoline, but that protects
against a speculative BTB access, it's not obvious that it  protects
against the mispredict of the __NR_syscall_max comparison in
arch/x86/entry/entry_64.S.

The act of fetching code is a kind of read too. And retpoline protects
against BTB stuffing etc, but what if the _actual_ system call
function address is wrong (due to mis-prediction of the system call
index check)?

Should the array access in entry_SYSCALL_64_fastpath be made to use
the masking approach?

                Linus

Dan Williams Jan. 12, 2018, 1:41 a.m. UTC | #2

On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> This series incorporates Mark Rutland's latest ARM changes and adds
>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> based approach is provided as an opt-in fallback, but the default
>> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> conditional branches instructions, and otherwise aims to redirect
>> speculation to use a NULL pointer rather than a user controlled value.
>
> Do you have any performance numbers and perhaps example code
> generation? Is this noticeable? Are there any microbenchmarks showing
> the difference between lfence use and the masking model?

I don't have performance numbers, but here's a sample code generation
from __fcheck_files, where the 'and; lea; and' sequence is portion of
array_ptr() after the mask generation with 'sbb'.

        fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
     8e7:       8b 02                   mov    (%rdx),%eax
     8e9:       48 39 c7                cmp    %rax,%rdi
     8ec:       48 19 c9                sbb    %rcx,%rcx
     8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
     8f3:       48 89 fe                mov    %rdi,%rsi
     8f6:       48 21 ce                and    %rcx,%rsi
     8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
     8fd:       48 21 c8                and    %rcx,%rax


> Having both seems good for testing, but wouldn't we want to pick one in the end?

I was thinking we'd keep it as a 'just in case' sort of thing, at
least until the 'probably safe' assumption of the 'mask' approach has
more time to settle out.

>
> Also, I do think that there is one particular array load that would
> seem to be pretty obvious: the system call function pointer array.
>
> Yes, yes, the actual call is now behind a retpoline, but that protects
> against a speculative BTB access, it's not obvious that it  protects
> against the mispredict of the __NR_syscall_max comparison in
> arch/x86/entry/entry_64.S.
>
> The act of fetching code is a kind of read too. And retpoline protects
> against BTB stuffing etc, but what if the _actual_ system call
> function address is wrong (due to mis-prediction of the system call
> index check)?
>
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?

I'll take a look. I'm firmly in the 'patch first / worry later' stance
on these investigations.

Russell King (Oracle) Jan. 12, 2018, 10:02 a.m. UTC | #3

Do you think that the appropriate patches could be copied to the
appropriate people please?

On Thu, Jan 11, 2018 at 04:46:24PM -0800, Dan Williams wrote:
> Changes since v1 [1]:
> * fixup the ifence definition to use alternative_2 per recent AMD
>   changes in tip/x86/pti (Tom)
> 
> * drop 'nospec_ptr' (Linus, Mark)
> 
> * rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
> 
> * rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
> 
> * clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
> 
> * make 'array_ptr' use a mask instead of an architectural ifence by
>   default (Linus, Alexei)
> 
> * provide a command line and compile-time opt-in to the ifence
>   mechanism, if an architecture provides 'ifence_array_ptr'.
> 
> * provide an optimized mask generation helper, 'array_ptr_mask', for
>   x86 (Linus)
> 
> * move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
>   (Linus)
> 
> * drop "Thermal/int340x: prevent bounds-check..." since userspace does
>   not have arbitrary control over the 'trip' index (Srinivas)
> 
> * update the changelog of "net: mpls: prevent bounds-check..." and keep
>   it in the series to continue the debate about Spectre hygiene patches.
>   (Eric).
> 
> * record a reviewed-by from Laurent on "[media] uvcvideo: prevent
>   bounds-check..."
> 
> * update the cover letter
> 
> [1]: https://lwn.net/Articles/743376/
> 
> ---
> 
> Quoting Mark's original RFC:
> 
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
> 
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
> 
> The mask is generated by the following from Alexei, and Linus:
> 
>     mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
> 
> ...and Linus provided an optimized mask generation helper for x86:
> 
>     asm ("cmpq %1,%2; sbbq %0,%0;"
> 		:"=r" (mask)
> 		:"r"(sz),"r" (idx)
> 		:"cc");
> 
> The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
> via the spectre_v1={mask,ifence} command line option, and the
> compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
> CONFIG_SPECTRE1_IFENCE.
> 
> The 'array_ptr' infrastructure is the primary focus this patch set. The
> individual patches that perform 'array_ptr' conversions are a point in
> time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
> start at finding some of these gadgets.
> 
> Another consideration for reviewing these patches is the 'hygiene'
> argument. When a patch refers to hygiene it is concerned with stopping
> speculation on an unconstrained or insufficiently constrained pointer
> value under userspace control. That by itself is not sufficient for
> attack (per current understanding) [3], but it is a necessary
> pre-condition.  So 'hygiene' refers to cleaning up those suspect
> pointers regardless of whether they are usable as a gadget.
> 
> These patches are also be available via the 'nospec-v2' git branch
> here:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
> 
> Note that the BPF fix for Spectre variant1 is merged in the bpf.git
> tree [4], and is not included in this branch.
> 
> [2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> [3]: https://spectreattack.com/spectre.pdf
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
> 
> ---
> 
> Dan Williams (16):
>       x86: implement ifence()
>       x86: implement ifence_array_ptr() and array_ptr_mask()
>       asm-generic/barrier: mask speculative execution flows
>       x86: introduce __uaccess_begin_nospec and ASM_IFENCE
>       x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
>       ipv6: prevent bounds-check bypass via speculative execution
>       ipv4: prevent bounds-check bypass via speculative execution
>       vfs, fdtable: prevent bounds-check bypass via speculative execution
>       userns: prevent bounds-check bypass via speculative execution
>       udf: prevent bounds-check bypass via speculative execution
>       [media] uvcvideo: prevent bounds-check bypass via speculative execution
>       carl9170: prevent bounds-check bypass via speculative execution
>       p54: prevent bounds-check bypass via speculative execution
>       qla2xxx: prevent bounds-check bypass via speculative execution
>       cw1200: prevent bounds-check bypass via speculative execution
>       net: mpls: prevent bounds-check bypass via speculative execution
> 
> Mark Rutland (3):
>       Documentation: document array_ptr
>       arm64: implement ifence_array_ptr()
>       arm: implement ifence_array_ptr()
> 
>  Documentation/speculation.txt            |  142 ++++++++++++++++++++++++++++++
>  arch/arm/Kconfig                         |    1 
>  arch/arm/include/asm/barrier.h           |   24 +++++
>  arch/arm64/Kconfig                       |    1 
>  arch/arm64/include/asm/barrier.h         |   24 +++++
>  arch/x86/Kconfig                         |    3 +
>  arch/x86/include/asm/barrier.h           |   46 ++++++++++
>  arch/x86/include/asm/msr.h               |    3 -
>  arch/x86/include/asm/smap.h              |    4 +
>  arch/x86/include/asm/uaccess.h           |   16 +++
>  arch/x86/include/asm/uaccess_32.h        |    6 +
>  arch/x86/include/asm/uaccess_64.h        |   12 +--
>  arch/x86/lib/copy_user_64.S              |    3 +
>  arch/x86/lib/usercopy_32.c               |    8 +-
>  drivers/media/usb/uvc/uvc_v4l2.c         |    9 +-
>  drivers/net/wireless/ath/carl9170/main.c |    7 +
>  drivers/net/wireless/intersil/p54/main.c |    9 +-
>  drivers/net/wireless/st/cw1200/sta.c     |   11 +-
>  drivers/net/wireless/st/cw1200/wsm.h     |    4 -
>  drivers/scsi/qla2xxx/qla_mr.c            |   17 ++--
>  fs/udf/misc.c                            |   40 +++++---
>  include/linux/fdtable.h                  |    7 +
>  include/linux/nospec.h                   |   71 +++++++++++++++
>  kernel/Kconfig.nospec                    |   31 +++++++
>  kernel/Makefile                          |    1 
>  kernel/nospec.c                          |   52 +++++++++++
>  kernel/user_namespace.c                  |   11 +-
>  lib/Kconfig                              |    3 +
>  net/ipv4/raw.c                           |   10 +-
>  net/ipv6/raw.c                           |   10 +-
>  net/mpls/af_mpls.c                       |   12 +--
>  31 files changed, 521 insertions(+), 77 deletions(-)
>  create mode 100644 Documentation/speculation.txt
>  create mode 100644 include/linux/nospec.h
>  create mode 100644 kernel/Kconfig.nospec
>  create mode 100644 kernel/nospec.c
>

Tony Luck Jan. 13, 2018, 12:15 a.m. UTC | #4

On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?

That one has a bounds check for an inline constant.

     cmpq    $__NR_syscall_max, %rax

so should be safe.

The classic Spectre variant #1 code sequence is:

int array_size;

       if (x < array_size) {
               something with array[x]
       }

which runs into problems because the array_size variable may not
be in cache, and while the CPU core is waiting for the value it
speculates inside the "if" body.

The syscall entry is more like:

#define ARRAY_SIZE 10

     if (x < ARRAY_SIZE) {
          something with array[x]
     }

Here there isn't any reason for speculation. The core has the
value of 'x' in a register and the upper bound encoded into the
"cmp" instruction.  Both are right there, no waiting, no speculation.

-Tony

Linus Torvalds Jan. 13, 2018, 6:51 p.m. UTC | #5

On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <tony.luck@gmail.com> wrote:
>
> Here there isn't any reason for speculation. The core has the
> value of 'x' in a register and the upper bound encoded into the
> "cmp" instruction.  Both are right there, no waiting, no speculation.

So this is an argument I haven't seen before (although it was brought
up in private long ago), but that is very relevant: the actual scope
and depth of speculation.

Your argument basically depends on just what gets speculated, and on
the _actual_ order of execution.

So your argument depends on "the uarch will actually run the code in
order if there are no events that block the pipeline".

Or at least it depends on a certain latency of the killing of any OoO
execution being low enough that the cache access doesn't even begin.

I realize that that is very much a particular microarchitectural
detail, but it's actually a *big* deal. Do we have a set of rules for
what is not a worry, simply because the speculated accesses get killed
early enough?

Apparently "test a register value against a constant" is good enough,
assuming that register is also needed for the address of the access.

            Linus

Tony Luck Jan. 16, 2018, 7:21 p.m. UTC | #6

On Sat, Jan 13, 2018 at 10:51 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <tony.luck@gmail.com> wrote:
> So your argument depends on "the uarch will actually run the code in
> order if there are no events that block the pipeline".

And might be bogus ... I'm a software person not a u-arch expert. That
sounded good in my head, but the level of parallelism may be greater
than I can imagine.

> Or at least it depends on a certain latency of the killing of any OoO
> execution being low enough that the cache access doesn't even begin.
>
> I realize that that is very much a particular microarchitectural
> detail, but it's actually a *big* deal. Do we have a set of rules for
> what is not a worry, simply because the speculated accesses get killed
> early enough?
>
> Apparently "test a register value against a constant" is good enough,
> assuming that register is also needed for the address of the access.

People who do understand this are working on what can be guaranteed.
For now don't make big plans based on my ramblings.

-Tony

Will Deacon Jan. 18, 2018, 1:18 p.m. UTC | #7

Hi Dan, Linus,

On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >>
> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> based approach is provided as an opt-in fallback, but the default
> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> conditional branches instructions, and otherwise aims to redirect
> >> speculation to use a NULL pointer rather than a user controlled value.
> >
> > Do you have any performance numbers and perhaps example code
> > generation? Is this noticeable? Are there any microbenchmarks showing
> > the difference between lfence use and the masking model?
> 
> I don't have performance numbers, but here's a sample code generation
> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> array_ptr() after the mask generation with 'sbb'.
> 
>         fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
>      8e7:       8b 02                   mov    (%rdx),%eax
>      8e9:       48 39 c7                cmp    %rax,%rdi
>      8ec:       48 19 c9                sbb    %rcx,%rcx
>      8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
>      8f3:       48 89 fe                mov    %rdi,%rsi
>      8f6:       48 21 ce                and    %rcx,%rsi
>      8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
>      8fd:       48 21 c8                and    %rcx,%rax
> 
> 
> > Having both seems good for testing, but wouldn't we want to pick one in the end?
> 
> I was thinking we'd keep it as a 'just in case' sort of thing, at
> least until the 'probably safe' assumption of the 'mask' approach has
> more time to settle out.

From the arm64 side, the only concern I have (and this actually applies to
our CSDB sequence as well) is the calculation of the array size by the
caller. As Linus mentioned at the end of [1], if the determination of the
size argument is based on a conditional branch, then masking doesn't help
because you bound within the wrong range under speculation.

We ran into this when trying to use masking to protect our uaccess routines
where the conditional bound is either KERNEL_DS or USER_DS. It's possible
that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
we'd need to throw some heavy barriers in set_fs to make it robust.

The question then is whether or not we're likely to run into this elsewhere,
and I don't have a good feel for that.

Will

[1] http://lkml.kernel.org/r/CA+55aFz0tsreoa=5Ud2noFCpng-dizLBhT9WU9asyhpLfjdcYA@mail.gmail.com

Dan Williams Jan. 18, 2018, 4:58 p.m. UTC | #8

On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Dan, Linus,
>
> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> >>
>> >> This series incorporates Mark Rutland's latest ARM changes and adds
>> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> >> based approach is provided as an opt-in fallback, but the default
>> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> >> conditional branches instructions, and otherwise aims to redirect
>> >> speculation to use a NULL pointer rather than a user controlled value.
>> >
>> > Do you have any performance numbers and perhaps example code
>> > generation? Is this noticeable? Are there any microbenchmarks showing
>> > the difference between lfence use and the masking model?
>>
>> I don't have performance numbers, but here's a sample code generation
>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
>> array_ptr() after the mask generation with 'sbb'.
>>
>>         fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
>>      8e7:       8b 02                   mov    (%rdx),%eax
>>      8e9:       48 39 c7                cmp    %rax,%rdi
>>      8ec:       48 19 c9                sbb    %rcx,%rcx
>>      8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
>>      8f3:       48 89 fe                mov    %rdi,%rsi
>>      8f6:       48 21 ce                and    %rcx,%rsi
>>      8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
>>      8fd:       48 21 c8                and    %rcx,%rax
>>
>>
>> > Having both seems good for testing, but wouldn't we want to pick one in the end?
>>
>> I was thinking we'd keep it as a 'just in case' sort of thing, at
>> least until the 'probably safe' assumption of the 'mask' approach has
>> more time to settle out.
>
> From the arm64 side, the only concern I have (and this actually applies to
> our CSDB sequence as well) is the calculation of the array size by the
> caller. As Linus mentioned at the end of [1], if the determination of the
> size argument is based on a conditional branch, then masking doesn't help
> because you bound within the wrong range under speculation.
>
> We ran into this when trying to use masking to protect our uaccess routines
> where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> we'd need to throw some heavy barriers in set_fs to make it robust.

At least in the conditional mask case near set_fs() usage the approach
we are taking is to use a barrier. I.e. the following guidance from
Linus:

"Basically, the rule is trivial: find all 'stac' users, and use address
masking if those users already integrate the limit check, and lfence
they don't."

...which translates to narrow the pointer for get_user() and use a
barrier  for __get_user().

Will Deacon Jan. 18, 2018, 5:05 p.m. UTC | #9

On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >> >>
> >> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> >> based approach is provided as an opt-in fallback, but the default
> >> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> >> conditional branches instructions, and otherwise aims to redirect
> >> >> speculation to use a NULL pointer rather than a user controlled value.
> >> >
> >> > Do you have any performance numbers and perhaps example code
> >> > generation? Is this noticeable? Are there any microbenchmarks showing
> >> > the difference between lfence use and the masking model?
> >>
> >> I don't have performance numbers, but here's a sample code generation
> >> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >> array_ptr() after the mask generation with 'sbb'.
> >>
> >>         fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >>      8e7:       8b 02                   mov    (%rdx),%eax
> >>      8e9:       48 39 c7                cmp    %rax,%rdi
> >>      8ec:       48 19 c9                sbb    %rcx,%rcx
> >>      8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
> >>      8f3:       48 89 fe                mov    %rdi,%rsi
> >>      8f6:       48 21 ce                and    %rcx,%rsi
> >>      8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
> >>      8fd:       48 21 c8                and    %rcx,%rax
> >>
> >>
> >> > Having both seems good for testing, but wouldn't we want to pick one in the end?
> >>
> >> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >> least until the 'probably safe' assumption of the 'mask' approach has
> >> more time to settle out.
> >
> > From the arm64 side, the only concern I have (and this actually applies to
> > our CSDB sequence as well) is the calculation of the array size by the
> > caller. As Linus mentioned at the end of [1], if the determination of the
> > size argument is based on a conditional branch, then masking doesn't help
> > because you bound within the wrong range under speculation.
> >
> > We ran into this when trying to use masking to protect our uaccess routines
> > where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> > that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> > we'd need to throw some heavy barriers in set_fs to make it robust.
> 
> At least in the conditional mask case near set_fs() usage the approach
> we are taking is to use a barrier. I.e. the following guidance from
> Linus:
> 
> "Basically, the rule is trivial: find all 'stac' users, and use address
> masking if those users already integrate the limit check, and lfence
> they don't."
> 
> ...which translates to narrow the pointer for get_user() and use a
> barrier  for __get_user().

Great, that matches my thinking re set_fs but I'm still worried about
finding all the places where the bound is conditional for other users
of the macro. Then again, finding the places that need this macro in the
first place is tough enough so perhaps analysing the bound calculation
doesn't make it much worse.

Will

Laurent Pinchart Jan. 18, 2018, 9:41 p.m. UTC | #10

Hi Will,

On Thursday, 18 January 2018 19:05:47 EET Will Deacon wrote:
> On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> > On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon wrote:
> >> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds wrote:
> >>>> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams wrote:
> >>>>> This series incorporates Mark Rutland's latest ARM changes and adds
> >>>>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >>>>> based approach is provided as an opt-in fallback, but the default
> >>>>> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >>>>> conditional branches instructions, and otherwise aims to redirect
> >>>>> speculation to use a NULL pointer rather than a user controlled
> >>>>> value.
> >>>> 
> >>>> Do you have any performance numbers and perhaps example code
> >>>> generation? Is this noticeable? Are there any microbenchmarks showing
> >>>> the difference between lfence use and the masking model?
> >>> 
> >>> I don't have performance numbers, but here's a sample code generation
> >>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >>> array_ptr() after the mask generation with 'sbb'.
> >>> 
> >>>         fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >>>      
> >>>      8e7:       8b 02                   mov    (%rdx),%eax
> >>>      8e9:       48 39 c7                cmp    %rax,%rdi
> >>>      8ec:       48 19 c9                sbb    %rcx,%rcx
> >>>      8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
> >>>      8f3:       48 89 fe                mov    %rdi,%rsi
> >>>      8f6:       48 21 ce                and    %rcx,%rsi
> >>>      8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
> >>>      8fd:       48 21 c8                and    %rcx,%rax
> >>>> 
> >>>> Having both seems good for testing, but wouldn't we want to pick one
> >>>> in the end?
> >>> 
> >>> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >>> least until the 'probably safe' assumption of the 'mask' approach has
> >>> more time to settle out.
> >> 
> >> From the arm64 side, the only concern I have (and this actually applies
> >> to our CSDB sequence as well) is the calculation of the array size by
> >> the caller. As Linus mentioned at the end of [1], if the determination
> >> of the size argument is based on a conditional branch, then masking
> >> doesn't help because you bound within the wrong range under speculation.
> >> 
> >> We ran into this when trying to use masking to protect our uaccess
> >> routines where the conditional bound is either KERNEL_DS or USER_DS.
> >> It's possible that a prior conditional set_fs(KERNEL_DS) could defeat
> >> the masking and so we'd need to throw some heavy barriers in set_fs to
> >> make it robust.
> > 
> > At least in the conditional mask case near set_fs() usage the approach
> > we are taking is to use a barrier. I.e. the following guidance from
> > Linus:
> > 
> > "Basically, the rule is trivial: find all 'stac' users, and use address
> > masking if those users already integrate the limit check, and lfence
> > they don't."
> > 
> > ...which translates to narrow the pointer for get_user() and use a
> > barrier  for __get_user().
> 
> Great, that matches my thinking re set_fs but I'm still worried about
> finding all the places where the bound is conditional for other users
> of the macro. Then again, finding the places that need this macro in the
> first place is tough enough so perhaps analysing the bound calculation
> doesn't make it much worse.

It might not now, but if the bound calculation changes later, I'm pretty sure 
we'll forget to update the speculation barrier macro at least in some cases. 
Without the help of static (or possibly dynamic) code analysis I think we're 
bound to reintroduce problems over time, but that's true for finding places 
where the barrier is needed, not just for barrier selection based on bound 
calculation.

[v2,00/19] prevent bounds-check bypass via speculative execution

Message

Comments