Patchwork [1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index

login
register
mail settings
Submitter Aneesh Kumar K.V
Date May 17, 2013, 8:15 a.m.
Message ID <1368778503-23230-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/244526/
State Superseded
Headers show

Comments

Aneesh Kumar K.V - May 17, 2013, 8:15 a.m.
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We need to use smb_rmb when looking at hpte slot array. Otherwise we could
reorder the hpte_slot array load bfore even we marked the pmd trans huge.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 15 +++++++++++++++
 arch/powerpc/mm/hugepage-hash64.c        |  6 +-----
 arch/powerpc/mm/pgtable_64.c             |  6 +-----
 3 files changed, 17 insertions(+), 10 deletions(-)
Michael Neuling - May 20, 2013, 1:18 a.m.
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
> reorder the hpte_slot array load bfore even we marked the pmd trans huge.

Does this need to go back into the stable series?

Mikey

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pgtable-ppc64.h | 15 +++++++++++++++
>  arch/powerpc/mm/hugepage-hash64.c        |  6 +-----
>  arch/powerpc/mm/pgtable_64.c             |  6 +-----
>  3 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index d046289..46db094 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -10,6 +10,7 @@
>  #else
>  #include <asm/pgtable-ppc64-4k.h>
>  #endif
> +#include <asm/barrier.h>
>  
>  #define FIRST_USER_ADDRESS	0
>  
> @@ -393,6 +394,20 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
>  	hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
>  }
>  
> +static inline char *get_hpte_slot_array(pmd_t *pmdp)
> +{
> +	/*
> +	 * The hpte hindex is stored in the pgtable whose address is in the
> +	 * second half of the PMD
> +	 *
> +	 * Order this load with the test for pmd_trans_huge in the caller
> +	 */
> +	smp_rmb();
> +	return *(char **)(pmdp + PTRS_PER_PMD);
> +
> +
> +}
> +
>  extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
>  				   pmd_t *pmdp);
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
> index e430766..c3ba3d5 100644
> --- a/arch/powerpc/mm/hugepage-hash64.c
> +++ b/arch/powerpc/mm/hugepage-hash64.c
> @@ -84,11 +84,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
>  
>  	vpn = hpt_vpn(ea, vsid, ssize);
>  	hash = hpt_hash(vpn, shift, ssize);
> -	/*
> -	 * The hpte hindex are stored in the pgtable whose address is in the
> -	 * second half of the PMD
> -	 */
> -	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
> +	hpte_slot_array = get_hpte_slot_array(pmdp);
>  
>  	valid = hpte_valid(hpte_slot_array, index);
>  	if (valid) {
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 8dd7c83..19d6734 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -701,11 +701,7 @@ void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
>  	 * Flush all the hptes mapping this hugepage
>  	 */
>  	s_addr = addr & HPAGE_PMD_MASK;
> -	/*
> -	 * The hpte hindex are stored in the pgtable whose address is in the
> -	 * second half of the PMD
> -	 */
> -	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
> +	hpte_slot_array = get_hpte_slot_array(pmdp);
>  	/*
>  	 * IF we try to do a HUGE PTE update after a withdraw is done.
>  	 * we will find the below NULL. This happens when we do
> -- 
> 1.8.1.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Aneesh Kumar K.V - May 20, 2013, 4:27 a.m.
Michael Neuling <mikey@neuling.org> writes:

> Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
>> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
>
> Does this need to go back into the stable series?
>

No the changes are not yet upstream. hpte_slot_array changes are in the
THP series. I didn't want to post the entire series again with the 
above two patches. 

-aneesh
Benjamin Herrenschmidt - May 20, 2013, 6:28 a.m.
On Mon, 2013-05-20 at 09:57 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
> >> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
> >
> > Does this need to go back into the stable series?
> >
> 
> No the changes are not yet upstream. hpte_slot_array changes are in the
> THP series. I didn't want to post the entire series again with the 
> above two patches. 

Note that any patch that adds such a rmb should have a very clear
description of what is the corresponding wmb. It's almost never right to
have one without the other. IE. What is it that you are ordering
against.

Cheers,
Ben.
Aneesh Kumar K.V - May 20, 2013, 9:26 a.m.
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2013-05-20 at 09:57 +0530, Aneesh Kumar K.V wrote:
>> Michael Neuling <mikey@neuling.org> writes:
>> 
>> > Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>> >
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
>> >> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
>> >
>> > Does this need to go back into the stable series?
>> >
>> 
>> No the changes are not yet upstream. hpte_slot_array changes are in the
>> THP series. I didn't want to post the entire series again with the 
>> above two patches. 
>
> Note that any patch that adds such a rmb should have a very clear
> description of what is the corresponding wmb. It's almost never right to
> have one without the other. IE. What is it that you are ordering
> against.

Will update the commit message. Some of that smb_wmb() are in the
core THP, and the others in ppc64 code. Will update with more info.

When we deposit pgtable we use pgtable_trans_huge_deposit that calls
related smb_wmb(). When we read pgtable via pgtable_trans_huge_withdraw,
core THP code does the related smb_wmb() after setting the right
PTE information in the pgtable.

-aneesh

Patch

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d046289..46db094 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -10,6 +10,7 @@ 
 #else
 #include <asm/pgtable-ppc64-4k.h>
 #endif
+#include <asm/barrier.h>
 
 #define FIRST_USER_ADDRESS	0
 
@@ -393,6 +394,20 @@  static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
 	hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
 }
 
+static inline char *get_hpte_slot_array(pmd_t *pmdp)
+{
+	/*
+	 * The hpte hindex is stored in the pgtable whose address is in the
+	 * second half of the PMD
+	 *
+	 * Order this load with the test for pmd_trans_huge in the caller
+	 */
+	smp_rmb();
+	return *(char **)(pmdp + PTRS_PER_PMD);
+
+
+}
+
 extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 				   pmd_t *pmdp);
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index e430766..c3ba3d5 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -84,11 +84,7 @@  int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 
 	vpn = hpt_vpn(ea, vsid, ssize);
 	hash = hpt_hash(vpn, shift, ssize);
-	/*
-	 * The hpte hindex are stored in the pgtable whose address is in the
-	 * second half of the PMD
-	 */
-	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+	hpte_slot_array = get_hpte_slot_array(pmdp);
 
 	valid = hpte_valid(hpte_slot_array, index);
 	if (valid) {
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 8dd7c83..19d6734 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -701,11 +701,7 @@  void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 	 * Flush all the hptes mapping this hugepage
 	 */
 	s_addr = addr & HPAGE_PMD_MASK;
-	/*
-	 * The hpte hindex are stored in the pgtable whose address is in the
-	 * second half of the PMD
-	 */
-	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+	hpte_slot_array = get_hpte_slot_array(pmdp);
 	/*
 	 * IF we try to do a HUGE PTE update after a withdraw is done.
 	 * we will find the below NULL. This happens when we do