[CVE-2017-1000405,zesty] mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()

Message ID 20171201201138.8952-1-cascardo@canonical.com
State New
Headers show
Series
  • [CVE-2017-1000405,zesty] mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
Related show

Commit Message

Thadeu Lima de Souza Cascardo Dec. 1, 2017, 8:11 p.m.
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit a8f97366452ed491d13cf1e44241bc0b5740b1f0)
[cascardo: dropped touch_pud parts]
CVE-2017-1000405
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---

Reproducer has been tested. It "exploits" without the patch. With the patch, it
fails.

---
 mm/huge_memory.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

Comments

Stefan Bader Dec. 4, 2017, 10:19 a.m. | #1
On 01.12.2017 21:11, Thadeu Lima de Souza Cascardo wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Currently, we unconditionally make page table dirty in touch_pmd().
> It may result in false-positive can_follow_write_pmd().
> 
> We may avoid the situation, if we would only make the page table entry
> dirty if caller asks for write access -- FOLL_WRITE.
> 
> The patch also changes touch_pud() in the same way.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (backported from commit a8f97366452ed491d13cf1e44241bc0b5740b1f0)
> [cascardo: dropped touch_pud parts]
> CVE-2017-1000405
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>

> ---
> 
> Reproducer has been tested. It "exploits" without the patch. With the patch, it
> fails.

PUD level transparent huge pages only introduced with 4.11, so backport looks
good to me. And tested.

-stefan
> 
> ---
>  mm/huge_memory.c | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 49cb70b5993d..65f19bb6ae58 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -756,20 +756,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
>  
>  static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
> -		pmd_t *pmd)
> +		pmd_t *pmd, int flags)
>  {
>  	pmd_t _pmd;
>  
> -	/*
> -	 * We should set the dirty bit only for FOLL_WRITE but for now
> -	 * the dirty bit in the pmd is meaningless.  And if the dirty
> -	 * bit will become meaningful and we'll only set it with
> -	 * FOLL_WRITE, an atomic set_bit will be required on the pmd to
> -	 * set the young bit, instead of the current set_pmd_at.
> -	 */
> -	_pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
> +	_pmd = pmd_mkyoung(*pmd);
> +	if (flags & FOLL_WRITE)
> +		_pmd = pmd_mkdirty(_pmd);
>  	if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
> -				pmd, _pmd,  1))
> +				pmd, _pmd, flags & FOLL_WRITE))
>  		update_mmu_cache_pmd(vma, addr, pmd);
>  }
>  
> @@ -798,7 +793,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		return NULL;
>  
>  	if (flags & FOLL_TOUCH)
> -		touch_pmd(vma, addr, pmd);
> +		touch_pmd(vma, addr, pmd, flags);
>  
>  	/*
>  	 * device mapped pages can only be returned if the
> @@ -1168,7 +1163,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  	page = pmd_page(*pmd);
>  	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>  	if (flags & FOLL_TOUCH)
> -		touch_pmd(vma, addr, pmd);
> +		touch_pmd(vma, addr, pmd, flags);
>  	if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
>  		/*
>  		 * We don't mlock() pte-mapped THPs. This way we can avoid
>
Colin King Dec. 4, 2017, 10:36 a.m. | #2
On 01/12/17 20:11, Thadeu Lima de Souza Cascardo wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Currently, we unconditionally make page table dirty in touch_pmd().
> It may result in false-positive can_follow_write_pmd().
> 
> We may avoid the situation, if we would only make the page table entry
> dirty if caller asks for write access -- FOLL_WRITE.
> 
> The patch also changes touch_pud() in the same way.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (backported from commit a8f97366452ed491d13cf1e44241bc0b5740b1f0)
> [cascardo: dropped touch_pud parts]
> CVE-2017-1000405
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
> 
> Reproducer has been tested. It "exploits" without the patch. With the patch, it
> fails.
> 
> ---
>  mm/huge_memory.c | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 49cb70b5993d..65f19bb6ae58 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -756,20 +756,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
>  
>  static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
> -		pmd_t *pmd)
> +		pmd_t *pmd, int flags)
>  {
>  	pmd_t _pmd;
>  
> -	/*
> -	 * We should set the dirty bit only for FOLL_WRITE but for now
> -	 * the dirty bit in the pmd is meaningless.  And if the dirty
> -	 * bit will become meaningful and we'll only set it with
> -	 * FOLL_WRITE, an atomic set_bit will be required on the pmd to
> -	 * set the young bit, instead of the current set_pmd_at.
> -	 */
> -	_pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
> +	_pmd = pmd_mkyoung(*pmd);
> +	if (flags & FOLL_WRITE)
> +		_pmd = pmd_mkdirty(_pmd);
>  	if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
> -				pmd, _pmd,  1))
> +				pmd, _pmd, flags & FOLL_WRITE))
>  		update_mmu_cache_pmd(vma, addr, pmd);
>  }
>  
> @@ -798,7 +793,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		return NULL;
>  
>  	if (flags & FOLL_TOUCH)
> -		touch_pmd(vma, addr, pmd);
> +		touch_pmd(vma, addr, pmd, flags);
>  
>  	/*
>  	 * device mapped pages can only be returned if the
> @@ -1168,7 +1163,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  	page = pmd_page(*pmd);
>  	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>  	if (flags & FOLL_TOUCH)
> -		touch_pmd(vma, addr, pmd);
> +		touch_pmd(vma, addr, pmd, flags);
>  	if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
>  		/*
>  		 * We don't mlock() pte-mapped THPs. This way we can avoid
> 
Backport looks OK, there is no pud handling for THP required. Positive
test results.

Acked-by: Colin Ian King <colin.king@canonical.com>
Thadeu Lima de Souza Cascardo Dec. 4, 2017, 10:48 a.m. | #3
Applied to zesty master-next branch.

Thanks.
Cascardo.

Applied-to: zesty/master-next

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 49cb70b5993d..65f19bb6ae58 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -756,20 +756,15 @@  int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
 
 static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd)
+		pmd_t *pmd, int flags)
 {
 	pmd_t _pmd;
 
-	/*
-	 * We should set the dirty bit only for FOLL_WRITE but for now
-	 * the dirty bit in the pmd is meaningless.  And if the dirty
-	 * bit will become meaningful and we'll only set it with
-	 * FOLL_WRITE, an atomic set_bit will be required on the pmd to
-	 * set the young bit, instead of the current set_pmd_at.
-	 */
-	_pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
+	_pmd = pmd_mkyoung(*pmd);
+	if (flags & FOLL_WRITE)
+		_pmd = pmd_mkdirty(_pmd);
 	if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
-				pmd, _pmd,  1))
+				pmd, _pmd, flags & FOLL_WRITE))
 		update_mmu_cache_pmd(vma, addr, pmd);
 }
 
@@ -798,7 +793,7 @@  struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 		return NULL;
 
 	if (flags & FOLL_TOUCH)
-		touch_pmd(vma, addr, pmd);
+		touch_pmd(vma, addr, pmd, flags);
 
 	/*
 	 * device mapped pages can only be returned if the
@@ -1168,7 +1163,7 @@  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	page = pmd_page(*pmd);
 	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
 	if (flags & FOLL_TOUCH)
-		touch_pmd(vma, addr, pmd);
+		touch_pmd(vma, addr, pmd, flags);
 	if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
 		/*
 		 * We don't mlock() pte-mapped THPs. This way we can avoid