[SRU,Bionic,1/1] x86/speculation/l1tf: Exempt zeroed PTEs from inversion

Message ID 20190205005115.2215-2-khalid.elmously@canonical.com
State New
Headers show
Series
  • Fix for LP #1799237 (mprotect() failure)
Related show

Commit Message

Khaled Elmously Feb. 5, 2019, 12:51 a.m.
From: Sean Christopherson <sean.j.christopherson@intel.com>

BugLink: http://bugs.launchpad.net/bugs/1799237

It turns out that we should *not* invert all not-present mappings,
because the all zeroes case is obviously special.

clear_page() does not undergo the XOR logic to invert the address bits,
i.e. PTE, PMD and PUD entries that have not been individually written
will have val=0 and so will trigger __pte_needs_invert(). As a result,
{pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
(adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
because the page at physical address 0 is reserved early in boot
specifically to mitigate L1TF, so explicitly exempt them from the
inversion when reading the PFN.

Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
on a VMA that has VM_PFNMAP and was mmap'd to as something other than
PROT_NONE but never used. mprotect() sends the PROT_NONE request down
prot_none_walk(), which walks the PTEs to check the PFNs.
prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
-EACCES because it thinks mprotect() is trying to adjust a high MMIO
address.

[ This is a very modified version of Sean's original patch, but all
  credit goes to Sean for doing this and also pointing out that
  sometimes the __pte_needs_invert() function only gets the protection
  bits, not the full eventual pte.  But zero remains special even in
  just protection bits, so that's ok.   - Linus ]

Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
---
 arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Stefan Bader Feb. 5, 2019, 7:04 a.m. | #1
On 05.02.19 01:51, Khalid Elmously wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1799237
> 
> It turns out that we should *not* invert all not-present mappings,
> because the all zeroes case is obviously special.
> 
> clear_page() does not undergo the XOR logic to invert the address bits,
> i.e. PTE, PMD and PUD entries that have not been individually written
> will have val=0 and so will trigger __pte_needs_invert(). As a result,
> {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
> (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
> because the page at physical address 0 is reserved early in boot
> specifically to mitigate L1TF, so explicitly exempt them from the
> inversion when reading the PFN.
> 
> Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
> on a VMA that has VM_PFNMAP and was mmap'd to as something other than
> PROT_NONE but never used. mprotect() sends the PROT_NONE request down
> prot_none_walk(), which walks the PTEs to check the PFNs.
> prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
> -EACCES because it thinks mprotect() is trying to adjust a high MMIO
> address.
> 
> [ This is a very modified version of Sean's original patch, but all
>   credit goes to Sean for doing this and also pointing out that
>   sometimes the __pte_needs_invert() function only gets the protection
>   bits, not the full eventual pte.  But zero remains special even in
>   just protection bits, so that's ok.   - Linus ]
> 
> Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (cherry picked from commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---

Is related to L1TF and from what I saw we have it in Xenial via stable,
so makes sense.

>  arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h
> index 44b1203ece12..a0c1525f1b6f 100644
> --- a/arch/x86/include/asm/pgtable-invert.h
> +++ b/arch/x86/include/asm/pgtable-invert.h
> @@ -4,9 +4,18 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +/*
> + * A clear pte value is special, and doesn't get inverted.
> + *
> + * Note that even users that only pass a pgprot_t (rather
> + * than a full pte) won't trigger the special zero case,
> + * because even PAGE_NONE has _PAGE_PROTNONE | _PAGE_ACCESSED
> + * set. So the all zero case really is limited to just the
> + * cleared page table entry case.
> + */
>  static inline bool __pte_needs_invert(u64 val)
>  {
> -	return !(val & _PAGE_PRESENT);
> +	return val && !(val & _PAGE_PRESENT);
>  }
>  
>  /* Get a mask to xor with the page table entry to get the correct pfn. */
>
Kleber Souza Feb. 5, 2019, 9:54 a.m. | #2
On 2/5/19 1:51 AM, Khalid Elmously wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> BugLink: http://bugs.launchpad.net/bugs/1799237
>
> It turns out that we should *not* invert all not-present mappings,
> because the all zeroes case is obviously special.
>
> clear_page() does not undergo the XOR logic to invert the address bits,
> i.e. PTE, PMD and PUD entries that have not been individually written
> will have val=0 and so will trigger __pte_needs_invert(). As a result,
> {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
> (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
> because the page at physical address 0 is reserved early in boot
> specifically to mitigate L1TF, so explicitly exempt them from the
> inversion when reading the PFN.
>
> Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
> on a VMA that has VM_PFNMAP and was mmap'd to as something other than
> PROT_NONE but never used. mprotect() sends the PROT_NONE request down
> prot_none_walk(), which walks the PTEs to check the PFNs.
> prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
> -EACCES because it thinks mprotect() is trying to adjust a high MMIO
> address.
>
> [ This is a very modified version of Sean's original patch, but all
>   credit goes to Sean for doing this and also pointing out that
>   sometimes the __pte_needs_invert() function only gets the protection
>   bits, not the full eventual pte.  But zero remains special even in
>   just protection bits, so that's ok.   - Linus ]
>
> Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (cherry picked from commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

I was able to reproduce the issue with 4.15.0-44-generic and confirm
that this patch fixes the issue.

Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

> ---
>  arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h
> index 44b1203ece12..a0c1525f1b6f 100644
> --- a/arch/x86/include/asm/pgtable-invert.h
> +++ b/arch/x86/include/asm/pgtable-invert.h
> @@ -4,9 +4,18 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +/*
> + * A clear pte value is special, and doesn't get inverted.
> + *
> + * Note that even users that only pass a pgprot_t (rather
> + * than a full pte) won't trigger the special zero case,
> + * because even PAGE_NONE has _PAGE_PROTNONE | _PAGE_ACCESSED
> + * set. So the all zero case really is limited to just the
> + * cleared page table entry case.
> + */
>  static inline bool __pte_needs_invert(u64 val)
>  {
> -	return !(val & _PAGE_PRESENT);
> +	return val && !(val & _PAGE_PRESENT);
>  }
>  
>  /* Get a mask to xor with the page table entry to get the correct pfn. */
Kleber Souza Feb. 5, 2019, 9:56 a.m. | #3
On 2/5/19 1:51 AM, Khalid Elmously wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> BugLink: http://bugs.launchpad.net/bugs/1799237
>
> It turns out that we should *not* invert all not-present mappings,
> because the all zeroes case is obviously special.
>
> clear_page() does not undergo the XOR logic to invert the address bits,
> i.e. PTE, PMD and PUD entries that have not been individually written
> will have val=0 and so will trigger __pte_needs_invert(). As a result,
> {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
> (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
> because the page at physical address 0 is reserved early in boot
> specifically to mitigate L1TF, so explicitly exempt them from the
> inversion when reading the PFN.
>
> Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
> on a VMA that has VM_PFNMAP and was mmap'd to as something other than
> PROT_NONE but never used. mprotect() sends the PROT_NONE request down
> prot_none_walk(), which walks the PTEs to check the PFNs.
> prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
> -EACCES because it thinks mprotect() is trying to adjust a high MMIO
> address.
>
> [ This is a very modified version of Sean's original patch, but all
>   credit goes to Sean for doing this and also pointing out that
>   sometimes the __pte_needs_invert() function only gets the protection
>   bits, not the full eventual pte.  But zero remains special even in
>   just protection bits, so that's ok.   - Linus ]
>
> Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (cherry picked from commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
> ---
>  arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h
> index 44b1203ece12..a0c1525f1b6f 100644
> --- a/arch/x86/include/asm/pgtable-invert.h
> +++ b/arch/x86/include/asm/pgtable-invert.h
> @@ -4,9 +4,18 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +/*
> + * A clear pte value is special, and doesn't get inverted.
> + *
> + * Note that even users that only pass a pgprot_t (rather
> + * than a full pte) won't trigger the special zero case,
> + * because even PAGE_NONE has _PAGE_PROTNONE | _PAGE_ACCESSED
> + * set. So the all zero case really is limited to just the
> + * cleared page table entry case.
> + */
>  static inline bool __pte_needs_invert(u64 val)
>  {
> -	return !(val & _PAGE_PRESENT);
> +	return val && !(val & _PAGE_PRESENT);
>  }
>  
>  /* Get a mask to xor with the page table entry to get the correct pfn. */

Applied to bionic/master-next branch.

Thanks,
Kleber

Patch

diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h
index 44b1203ece12..a0c1525f1b6f 100644
--- a/arch/x86/include/asm/pgtable-invert.h
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -4,9 +4,18 @@ 
 
 #ifndef __ASSEMBLY__
 
+/*
+ * A clear pte value is special, and doesn't get inverted.
+ *
+ * Note that even users that only pass a pgprot_t (rather
+ * than a full pte) won't trigger the special zero case,
+ * because even PAGE_NONE has _PAGE_PROTNONE | _PAGE_ACCESSED
+ * set. So the all zero case really is limited to just the
+ * cleared page table entry case.
+ */
 static inline bool __pte_needs_invert(u64 val)
 {
-	return !(val & _PAGE_PRESENT);
+	return val && !(val & _PAGE_PRESENT);
 }
 
 /* Get a mask to xor with the page table entry to get the correct pfn. */