diff mbox series

[v6,01/11] asm-generic/pgtable: Adds generic functions to track lockless pgtable walks

Message ID 20200206030900.147032-2-leonardo@linux.ibm.com
State Not Applicable
Headers show
Series Introduces new functions for tracking lockless pagetable walks | expand

Commit Message

Leonardo Bras Feb. 6, 2020, 3:08 a.m. UTC
It's necessary to track lockless pagetable walks, in order to avoid doing
THP splitting/collapsing during them.

The default solution is to disable irq before lockless pagetable walks and
enable it after it's finished.

On code, this means you can find local_irq_disable() and local_irq_enable()
around some pieces of code, usually without comments on why it is needed.

This patch proposes a set of generic functions to be called before starting
and after finishing a lockless pagetable walk. It is supposed to make clear
that a lockless pagetable walk happens there, and also carries information
on why the irq disable/enable is needed.

begin_lockless_pgtbl_walk()
        Insert before starting any lockless pgtable walk
end_lockless_pgtbl_walk()
        Insert after the end of any lockless pgtable walk
        (Mostly after the ptep is last used)

A memory barrier was also added just to make sure there is no speculative
read outside the interrupt disabled area. Other than that, it is not
supposed to have any change of behavior from current code.

It is planned to allow arch-specific versions, so that additional steps can
be added while keeping the code clean.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 include/asm-generic/pgtable.h | 51 +++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Christophe Leroy Feb. 6, 2020, 5:54 a.m. UTC | #1
Le 06/02/2020 à 04:08, Leonardo Bras a écrit :
> It's necessary to track lockless pagetable walks, in order to avoid doing
> THP splitting/collapsing during them.
> 
> The default solution is to disable irq before lockless pagetable walks and
> enable it after it's finished.
> 
> On code, this means you can find local_irq_disable() and local_irq_enable()
> around some pieces of code, usually without comments on why it is needed.
> 
> This patch proposes a set of generic functions to be called before starting
> and after finishing a lockless pagetable walk. It is supposed to make clear
> that a lockless pagetable walk happens there, and also carries information
> on why the irq disable/enable is needed.
> 
> begin_lockless_pgtbl_walk()
>          Insert before starting any lockless pgtable walk
> end_lockless_pgtbl_walk()
>          Insert after the end of any lockless pgtable walk
>          (Mostly after the ptep is last used)
> 
> A memory barrier was also added just to make sure there is no speculative
> read outside the interrupt disabled area. Other than that, it is not
> supposed to have any change of behavior from current code.

Is that speculative barrier necessary for all architectures ? Does it 
impact performance ? Shouldn't this be another patch ?

> 
> It is planned to allow arch-specific versions, so that additional steps can
> be added while keeping the code clean.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>   include/asm-generic/pgtable.h | 51 +++++++++++++++++++++++++++++++++++
>   1 file changed, 51 insertions(+)
> 
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index e2e2bef07dd2..8d368d3c0974 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -1222,6 +1222,57 @@ static inline bool arch_has_pfn_modify_check(void)
>   #endif
>   #endif
>   
> +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
> +/*
> + * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
> + *   lockless pagetable walks, such as __find_linux_pte()
> + */
> +static inline
> +unsigned long begin_lockless_pgtbl_walk(void)

What about keeping the same syntax as local_irq_save(), something like:

#define begin_lockless_pgtbl_walk(flags) \
do {
	local_irq_save(flags);
	smp_mb();
} while (0)

> +{
> +	unsigned long irq_mask;
> +
> +	/*
> +	 * Interrupts must be disabled during the lockless page table walk.
> +	 * That's because the deleting or splitting involves flushing TLBs,
> +	 * which in turn issues interrupts, that will block when disabled.
> +	 */
> +	local_irq_save(irq_mask);
> +
> +	/*
> +	 * This memory barrier pairs with any code that is either trying to
> +	 * delete page tables, or split huge pages. Without this barrier,
> +	 * the page tables could be read speculatively outside of interrupt
> +	 * disabling.
> +	 */
> +	smp_mb();
> +
> +	return irq_mask;
> +}
> +
> +/*
> + * end_lockless_pgtbl_walk: Must be inserted after the last use of a pointer
> + *   returned by a lockless pagetable walk, such as __find_linux_pte()
> + */
> +static inline void end_lockless_pgtbl_walk(unsigned long irq_mask)

Same

#define end_lockless_pgtbl_walk(flags) \
do {
	smp_mb();
	local_irq_restore(flags);
} while (0);

> +{
> +	/*
> +	 * This memory barrier pairs with any code that is either trying to
> +	 * delete page tables, or split huge pages. Without this barrier,
> +	 * the page tables could be read speculatively outside of interrupt
> +	 * disabling.
> +	 */
> +	smp_mb();
> +
> +	/*
> +	 * Interrupts must be disabled during the lockless page table walk.
> +	 * That's because the deleting or splitting involves flushing TLBs,
> +	 * which in turn issues interrupts, that will block when disabled.
> +	 */
> +	local_irq_restore(irq_mask);
> +}
> +#endif
> +
>   /*
>    * On some architectures it depends on the mm if the p4d/pud or pmd
>    * layer of the page table hierarchy is folded or not.
> 

Christophe
Leonardo Bras Feb. 7, 2020, 2:19 a.m. UTC | #2
Hello Christophe, thanks for the feedback!

On Thu, 2020-02-06 at 06:54 +0100, Christophe Leroy wrote:
> > A memory barrier was also added just to make sure there is no speculative
> > read outside the interrupt disabled area. Other than that, it is not
> > supposed to have any change of behavior from current code.
> 
> Is that speculative barrier necessary for all architectures ? Does it 
> impact performance ? Shouldn't this be another patch ?

It makes sense, better keep the code as much as possible as it was. If
any arch finds this barrier needed, it can implement it's own version
of this function (or another patch to add this to generic, if proved to
be needed in every arch).

> > +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
> > +/*
> > + * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
> > + *   lockless pagetable walks, such as __find_linux_pte()
> > + */
> > +static inline
> > +unsigned long begin_lockless_pgtbl_walk(void)
> 
> What about keeping the same syntax as local_irq_save(), something like:
> 
> #define begin_lockless_pgtbl_walk(flags) \
> do {
> 	local_irq_save(flags);
> 	smp_mb();
> } while (0)
> 

Makes sense. But wouldn't inlining have the same code output? 

Best regards,
Leonardo Bras
kernel test robot Feb. 7, 2020, 5:39 a.m. UTC | #3
Hi Leonardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on asm-generic/master paulus-powerpc/kvm-ppc-next linus/master v5.5 next-20200207]
[cannot apply to kvm-ppc/kvm-ppc-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Leonardo-Bras/Introduces-new-functions-for-tracking-lockless-pagetable-walks/20200207-071035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: xtensa-common_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   include/asm-generic/pgtable.h: Assembler messages:
   include/asm-generic/pgtable.h:1230: Error: unknown opcode or format name 'static'
   include/asm-generic/pgtable.h:1231: Error: unknown opcode or format name 'unsigned'
   include/asm-generic/pgtable.h:1233: Error: unknown opcode or format name 'unsigned'
   include/asm-generic/pgtable.h:1240: Error: unknown opcode or format name 'local_irq_save'
   include/asm-generic/pgtable.h:1248: Error: unknown opcode or format name 'smp_mb'
   include/asm-generic/pgtable.h:1250: Error: unknown opcode or format name 'return'
>> include/asm-generic/pgtable.h:1251: Error: couldn't find a valid instruction format
       ops were: 
   include/asm-generic/pgtable.h:1257: Error: unknown opcode or format name 'static'
   include/asm-generic/pgtable.h:1265: Error: unknown opcode or format name 'smp_mb'
   include/asm-generic/pgtable.h:1272: Error: unknown opcode or format name 'local_irq_restore'
   include/asm-generic/pgtable.h:1273: Error: couldn't find a valid instruction format
       ops were: 

vim +1251 include/asm-generic/pgtable.h

  1224	
  1225	#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
  1226	/*
  1227	 * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
  1228	 *   lockless pagetable walks, such as __find_linux_pte()
  1229	 */
  1230	static inline
  1231	unsigned long begin_lockless_pgtbl_walk(void)
  1232	{
  1233		unsigned long irq_mask;
  1234	
  1235		/*
  1236		 * Interrupts must be disabled during the lockless page table walk.
  1237		 * That's because the deleting or splitting involves flushing TLBs,
  1238		 * which in turn issues interrupts, that will block when disabled.
  1239		 */
  1240		local_irq_save(irq_mask);
  1241	
  1242		/*
  1243		 * This memory barrier pairs with any code that is either trying to
  1244		 * delete page tables, or split huge pages. Without this barrier,
  1245		 * the page tables could be read speculatively outside of interrupt
  1246		 * disabling.
  1247		 */
  1248		smp_mb();
  1249	
  1250		return irq_mask;
> 1251	}
  1252	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
diff mbox series

Patch

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index e2e2bef07dd2..8d368d3c0974 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1222,6 +1222,57 @@  static inline bool arch_has_pfn_modify_check(void)
 #endif
 #endif
 
+#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
+/*
+ * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
+ *   lockless pagetable walks, such as __find_linux_pte()
+ */
+static inline
+unsigned long begin_lockless_pgtbl_walk(void)
+{
+	unsigned long irq_mask;
+
+	/*
+	 * Interrupts must be disabled during the lockless page table walk.
+	 * That's because the deleting or splitting involves flushing TLBs,
+	 * which in turn issues interrupts, that will block when disabled.
+	 */
+	local_irq_save(irq_mask);
+
+	/*
+	 * This memory barrier pairs with any code that is either trying to
+	 * delete page tables, or split huge pages. Without this barrier,
+	 * the page tables could be read speculatively outside of interrupt
+	 * disabling.
+	 */
+	smp_mb();
+
+	return irq_mask;
+}
+
+/*
+ * end_lockless_pgtbl_walk: Must be inserted after the last use of a pointer
+ *   returned by a lockless pagetable walk, such as __find_linux_pte()
+ */
+static inline void end_lockless_pgtbl_walk(unsigned long irq_mask)
+{
+	/*
+	 * This memory barrier pairs with any code that is either trying to
+	 * delete page tables, or split huge pages. Without this barrier,
+	 * the page tables could be read speculatively outside of interrupt
+	 * disabling.
+	 */
+	smp_mb();
+
+	/*
+	 * Interrupts must be disabled during the lockless page table walk.
+	 * That's because the deleting or splitting involves flushing TLBs,
+	 * which in turn issues interrupts, that will block when disabled.
+	 */
+	local_irq_restore(irq_mask);
+}
+#endif
+
 /*
  * On some architectures it depends on the mm if the p4d/pud or pmd
  * layer of the page table hierarchy is folded or not.