[v6,10/11] powerpc/mm: Adds counting method to track lockless pagetable walks
diff mbox series

Message ID 20200206030900.147032-11-leonardo@linux.ibm.com
State New
Headers show
Series
  • Introduces new functions for tracking lockless pagetable walks
Related show

Commit Message

Leonardo Bras Feb. 6, 2020, 3:08 a.m. UTC
Implements an additional feature to track lockless pagetable walks,
using a per-cpu counter: lockless_pgtbl_walk_counter.

Before a lockless pagetable walk, preemption is disabled and the
current cpu's counter is increased.
When the lockless pagetable walk finishes, the current cpu counter
is decreased and the preemption is enabled.

With that, it's possible to know in which cpus are happening lockless
pagetable walks, and optimize serialize_against_pte_lookup().

Implementation notes:
- Every counter can be changed only by it's CPU
- It makes use of the original memory barrier in the functions
- Any counter can be read by any CPU

Due to not locking nor using atomic variables, the impact on the
lockless pagetable walk is intended to be minimum.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pgtable.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Christophe Leroy Feb. 6, 2020, 6:23 a.m. UTC | #1
Le 06/02/2020 à 04:08, Leonardo Bras a écrit :
> Implements an additional feature to track lockless pagetable walks,
> using a per-cpu counter: lockless_pgtbl_walk_counter.
> 
> Before a lockless pagetable walk, preemption is disabled and the
> current cpu's counter is increased.
> When the lockless pagetable walk finishes, the current cpu counter
> is decreased and the preemption is enabled.
> 
> With that, it's possible to know in which cpus are happening lockless
> pagetable walks, and optimize serialize_against_pte_lookup().
> 
> Implementation notes:
> - Every counter can be changed only by it's CPU
> - It makes use of the original memory barrier in the functions
> - Any counter can be read by any CPU
> 
> Due to not locking nor using atomic variables, the impact on the
> lockless pagetable walk is intended to be minimum.

atomic variables have a lot less impact than preempt_enable/disable.

preemt_disable forces a re-scheduling, it really has impact. Why not use 
atomic variables instead ?

Christophe

> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>   arch/powerpc/mm/book3s64/pgtable.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 535613030363..bb138b628f86 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -83,6 +83,7 @@ static void do_nothing(void *unused)
>   
>   }
>   
> +static DEFINE_PER_CPU(int, lockless_pgtbl_walk_counter);
>   /*
>    * Serialize against find_current_mm_pte which does lock-less
>    * lookup in page tables with local interrupts disabled. For huge pages
> @@ -120,6 +121,15 @@ unsigned long __begin_lockless_pgtbl_walk(bool disable_irq)
>   	if (disable_irq)
>   		local_irq_save(irq_mask);
>   
> +	/*
> +	 * Counts this instance of lockless pagetable walk for this cpu.
> +	 * Disables preempt to make sure there is no cpu change between
> +	 * begin/end lockless pagetable walk, so that percpu counting
> +	 * works fine.
> +	 */
> +	preempt_disable();
> +	(*this_cpu_ptr(&lockless_pgtbl_walk_counter))++;
> +
>   	/*
>   	 * This memory barrier pairs with any code that is either trying to
>   	 * delete page tables, or split huge pages. Without this barrier,
> @@ -158,6 +168,14 @@ inline void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool enable_irq)
>   	 */
>   	smp_mb();
>   
> +	/*
> +	 * Removes this instance of lockless pagetable walk for this cpu.
> +	 * Enables preempt only after end lockless pagetable walk,
> +	 * so that percpu counting works fine.
> +	 */
> +	(*this_cpu_ptr(&lockless_pgtbl_walk_counter))--;
> +	preempt_enable();
> +
>   	/*
>   	 * Interrupts must be disabled during the lockless page table walk.
>   	 * That's because the deleting or splitting involves flushing TLBs,
>
Leonardo Bras Feb. 7, 2020, 1:56 a.m. UTC | #2
Hello Christophe, thanks for the feedback!

On Thu, 2020-02-06 at 07:23 +0100, Christophe Leroy wrote:
> > Due to not locking nor using atomic variables, the impact on the
> > lockless pagetable walk is intended to be minimum.
> 
> atomic variables have a lot less impact than preempt_enable/disable.
> 
> preemt_disable forces a re-scheduling, it really has impact. Why not use 
> atomic variables instead ?

In fact, v5 of this patch used atomic variables. But it seems to cause
contention on a single exclusive cacheline, which had no better
performance than locking.
(discussion here: http://patchwork.ozlabs.org/patch/1171012/)

When I try to understand the effect of preempt_disable(), all I can
see is a barrier() and possibly a preempt_count_inc(), which updates a
member of current thread struct if CONFIG_PREEMPT_COUNT is enabled.

If CONFIG_PREEMPTION is also enabled, preempt_enable() can run a
__preempt_schedule() on unlikely(__preempt_count_dec_and_test()).

On most configs available, CONFIG_PREEMPTION is not set, being replaced
either by CONFIG_PREEMPT_NONE (kernel defconfigs) or
CONFIG_PREEMPT_VOLUNTARY in most supported distros. With that, most
probably CONFIG_PREEMPT_COUNT will also not be set, and
preempt_{en,dis}able() are replaced by a barrier().

Using preempt_disable approach, I intent to get better performance for
most used cases.

What do you think of it?

I am still new on this subject, and I am still trying to better
understand how it works. If you notice something I am missing, please
let me know.

Best regards,
Leonardo Bras

Patch
diff mbox series

diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 535613030363..bb138b628f86 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -83,6 +83,7 @@  static void do_nothing(void *unused)
 
 }
 
+static DEFINE_PER_CPU(int, lockless_pgtbl_walk_counter);
 /*
  * Serialize against find_current_mm_pte which does lock-less
  * lookup in page tables with local interrupts disabled. For huge pages
@@ -120,6 +121,15 @@  unsigned long __begin_lockless_pgtbl_walk(bool disable_irq)
 	if (disable_irq)
 		local_irq_save(irq_mask);
 
+	/*
+	 * Counts this instance of lockless pagetable walk for this cpu.
+	 * Disables preempt to make sure there is no cpu change between
+	 * begin/end lockless pagetable walk, so that percpu counting
+	 * works fine.
+	 */
+	preempt_disable();
+	(*this_cpu_ptr(&lockless_pgtbl_walk_counter))++;
+
 	/*
 	 * This memory barrier pairs with any code that is either trying to
 	 * delete page tables, or split huge pages. Without this barrier,
@@ -158,6 +168,14 @@  inline void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool enable_irq)
 	 */
 	smp_mb();
 
+	/*
+	 * Removes this instance of lockless pagetable walk for this cpu.
+	 * Enables preempt only after end lockless pagetable walk,
+	 * so that percpu counting works fine.
+	 */
+	(*this_cpu_ptr(&lockless_pgtbl_walk_counter))--;
+	preempt_enable();
+
 	/*
 	 * Interrupts must be disabled during the lockless page table walk.
 	 * That's because the deleting or splitting involves flushing TLBs,