diff mbox series

[RESEND,v4,10/11] powerpc: Protect patching_mm with a lock

Message ID 20210506043452.9674-11-cmr@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series Use per-CPU temporary mappings for patching | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (7619d98e5041d5c25aba5428704dba6121237a9a)
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 184 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Christopher M. Riedl May 6, 2021, 4:34 a.m. UTC
Powerpc allows for multiple CPUs to patch concurrently. When patching
with STRICT_KERNEL_RWX a single patching_mm is allocated for use by all
CPUs for the few times that patching occurs. Use a spinlock to protect
the patching_mm from concurrent use.

Modify patch_instruction() to acquire the lock, perform the patch op,
and then release the lock.

Also introduce {lock,unlock}_patching() along with
patch_instruction_unlocked() to avoid per-iteration lock overhead when
patch_instruction() is called in a loop. A follow-up patch converts some
uses of patch_instruction() to use patch_instruction_unlocked() instead.

Signed-off-by: Christopher M. Riedl <cmr@linux.ibm.com>

---

v4:  * New to series.
---
 arch/powerpc/include/asm/code-patching.h |  4 ++
 arch/powerpc/lib/code-patching.c         | 85 +++++++++++++++++++++---
 2 files changed, 79 insertions(+), 10 deletions(-)

Comments

Peter Zijlstra May 6, 2021, 10:51 a.m. UTC | #1
On Wed, May 05, 2021 at 11:34:51PM -0500, Christopher M. Riedl wrote:
> Powerpc allows for multiple CPUs to patch concurrently. When patching
> with STRICT_KERNEL_RWX a single patching_mm is allocated for use by all
> CPUs for the few times that patching occurs. Use a spinlock to protect
> the patching_mm from concurrent use.
> 
> Modify patch_instruction() to acquire the lock, perform the patch op,
> and then release the lock.
> 
> Also introduce {lock,unlock}_patching() along with
> patch_instruction_unlocked() to avoid per-iteration lock overhead when
> patch_instruction() is called in a loop. A follow-up patch converts some
> uses of patch_instruction() to use patch_instruction_unlocked() instead.

x86 uses text_mutex for all this, why not do the same?
Christopher M. Riedl May 7, 2021, 8:03 p.m. UTC | #2
On Thu May 6, 2021 at 5:51 AM CDT, Peter Zijlstra wrote:
> On Wed, May 05, 2021 at 11:34:51PM -0500, Christopher M. Riedl wrote:
> > Powerpc allows for multiple CPUs to patch concurrently. When patching
> > with STRICT_KERNEL_RWX a single patching_mm is allocated for use by all
> > CPUs for the few times that patching occurs. Use a spinlock to protect
> > the patching_mm from concurrent use.
> > 
> > Modify patch_instruction() to acquire the lock, perform the patch op,
> > and then release the lock.
> > 
> > Also introduce {lock,unlock}_patching() along with
> > patch_instruction_unlocked() to avoid per-iteration lock overhead when
> > patch_instruction() is called in a loop. A follow-up patch converts some
> > uses of patch_instruction() to use patch_instruction_unlocked() instead.
>
> x86 uses text_mutex for all this, why not do the same?

I wasn't entirely sure if there is a problem with potentially going to
sleep in some of the places where patch_instruction() is called - the
spinlock avoids that (hypothetical) problem.

I just tried switching to text_mutex and at least on a P9 machine the
series boots w/ the Hash and Radix MMUs (with some lockdep errors). I
can rework this in the next version to use text_mutex if I don't find
any new problems with more extensive testing. It does mean more changes
to use patch_instruction_unlocked() in kprobe/optprobe/ftace in
arch/powerpc since iirc those are called with text_mutex already held.

Thanks!
Chris R.
Peter Zijlstra May 7, 2021, 10:26 p.m. UTC | #3
On Fri, May 07, 2021 at 03:03:51PM -0500, Christopher M. Riedl wrote:
> On Thu May 6, 2021 at 5:51 AM CDT, Peter Zijlstra wrote:
> > On Wed, May 05, 2021 at 11:34:51PM -0500, Christopher M. Riedl wrote:
> > > Powerpc allows for multiple CPUs to patch concurrently. When patching
> > > with STRICT_KERNEL_RWX a single patching_mm is allocated for use by all
> > > CPUs for the few times that patching occurs. Use a spinlock to protect
> > > the patching_mm from concurrent use.
> > > 
> > > Modify patch_instruction() to acquire the lock, perform the patch op,
> > > and then release the lock.
> > > 
> > > Also introduce {lock,unlock}_patching() along with
> > > patch_instruction_unlocked() to avoid per-iteration lock overhead when
> > > patch_instruction() is called in a loop. A follow-up patch converts some
> > > uses of patch_instruction() to use patch_instruction_unlocked() instead.
> >
> > x86 uses text_mutex for all this, why not do the same?
> 
> I wasn't entirely sure if there is a problem with potentially going to
> sleep in some of the places where patch_instruction() is called - the
> spinlock avoids that (hypothetical) problem.

So I'm not saying you like have to do this; but I did wonder if there's
a reason not to, and given you didn't mention it, I had to ask.

> I just tried switching to text_mutex and at least on a P9 machine the
> series boots w/ the Hash and Radix MMUs (with some lockdep errors). I
> can rework this in the next version to use text_mutex if I don't find
> any new problems with more extensive testing. It does mean more changes
> to use patch_instruction_unlocked() in kprobe/optprobe/ftace in
> arch/powerpc since iirc those are called with text_mutex already held.

The x86 text_poke() has a lockdep_assert_held(&text_mutex) in to make
sure nobody 'forgets' :-)
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index e51c81e4a9bda..2efa11b68cd8f 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -28,8 +28,12 @@  int create_branch(struct ppc_inst *instr, const struct ppc_inst *addr,
 int create_cond_branch(struct ppc_inst *instr, const struct ppc_inst *addr,
 		       unsigned long target, int flags);
 int patch_branch(struct ppc_inst *addr, unsigned long target, int flags);
+int patch_branch_unlocked(struct ppc_inst *addr, unsigned long target, int flags);
 int patch_instruction(struct ppc_inst *addr, struct ppc_inst instr);
+int patch_instruction_unlocked(struct ppc_inst *addr, struct ppc_inst instr);
 int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr);
+unsigned long lock_patching(void);
+void unlock_patching(unsigned long flags);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 7e15abc09ec04..0a496bb52bbf4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -52,13 +52,17 @@  int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
 
+static DEFINE_SPINLOCK(patching_lock);
+
 struct temp_mm {
 	struct mm_struct *temp;
 	struct mm_struct *prev;
 	struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+	spinlock_t *lock; /* protect access to the temporary mm */
 };
 
-static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm,
+				spinlock_t *lock)
 {
 	/* Do not preload SLB entries from the thread_info struct */
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
@@ -66,12 +70,14 @@  static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
 
 	temp_mm->temp = mm;
 	temp_mm->prev = NULL;
+	temp_mm->lock = lock;
 	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
 }
 
 static inline void use_temporary_mm(struct temp_mm *temp_mm)
 {
 	lockdep_assert_irqs_disabled();
+	lockdep_assert_held(temp_mm->lock);
 
 	temp_mm->prev = current->active_mm;
 	switch_mm_irqs_off(temp_mm->prev, temp_mm->temp, current);
@@ -93,11 +99,13 @@  static inline void use_temporary_mm(struct temp_mm *temp_mm)
 static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
 {
 	lockdep_assert_irqs_disabled();
+	lockdep_assert_held(temp_mm->lock);
 
 	switch_mm_irqs_off(temp_mm->temp, temp_mm->prev, current);
 
 	/*
-	 * On book3s64 the active_cpus counter increments in
+	 * The temporary mm can only be in use on a single CPU at a time due to
+	 * the temp_mm->lock. On book3s64 the active_cpus counter increments in
 	 * switch_mm_irqs_off(). With the Hash MMU this counter affects if TLB
 	 * flushes are local. We have to manually decrement that counter here
 	 * along with removing our current CPU from the mm's cpumask so that in
@@ -230,7 +238,7 @@  static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 	pte = pte_mkdirty(pte);
 	set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
 
-	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+	init_temp_mm(&patch_mapping->temp_mm, patching_mm, &patching_lock);
 	use_temporary_mm(&patch_mapping->temp_mm);
 
 	/*
@@ -258,7 +266,6 @@  static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 {
 	int err;
 	struct ppc_inst *patch_addr = NULL;
-	unsigned long flags;
 	struct patch_mapping patch_mapping;
 
 	/*
@@ -269,11 +276,12 @@  static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 	if (!patching_mm)
 		return raw_patch_instruction(addr, instr);
 
-	local_irq_save(flags);
+	lockdep_assert_held(&patching_lock);
+	lockdep_assert_irqs_disabled();
 
 	err = map_patch(addr, &patch_mapping);
 	if (err)
-		goto out;
+		return err;
 
 	patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
 
@@ -287,11 +295,33 @@  static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 
 	WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
 
-out:
-	local_irq_restore(flags);
-
 	return err;
 }
+
+unsigned long lock_patching(void)
+{
+	unsigned long flags;
+
+	/* We don't need the lock if we're not using the patching_mm. */
+	if (!patching_mm)
+		return 0;
+
+	spin_lock_irqsave(&patching_lock, flags);
+	return flags;
+}
+
+void unlock_patching(const unsigned long flags)
+{
+	/* We never held the lock if we're not using the patching_mm. */
+	if (!patching_mm)
+		return;
+
+	lockdep_assert_held(&patching_lock);
+	lockdep_assert_irqs_disabled();
+
+	spin_unlock_irqrestore(&patching_lock, flags);
+}
+
 #else /* !CONFIG_STRICT_KERNEL_RWX */
 
 static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
@@ -299,19 +329,46 @@  static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 	return raw_patch_instruction(addr, instr);
 }
 
+unsigned long lock_patching(void)
+{
+	return 0;
+}
+
+void unlock_patching(const unsigned long flags) {}
+
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
 int patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 {
+	int err;
+	unsigned long flags;
+
 	/* Make sure we aren't patching a freed init section */
 	if (init_mem_is_free && init_section_contains(addr, 4)) {
 		pr_debug("Skipping init section patching addr: 0x%px\n", addr);
 		return 0;
 	}
-	return do_patch_instruction(addr, instr);
+
+	flags = lock_patching();
+	err = do_patch_instruction(addr, instr);
+	unlock_patching(flags);
+
+	return err;
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+int patch_instruction_unlocked(struct ppc_inst *addr, struct ppc_inst instr)
+{
+	/* Make sure we aren't patching a freed init section */
+	if (init_mem_is_free && init_section_contains(addr, 4)) {
+		pr_debug("Skipping init section patching addr: 0x%p\n", addr);
+		return 0;
+	}
+
+	return do_patch_instruction(addr, instr);
+}
+NOKPROBE_SYMBOL(patch_instruction_unlocked);
+
 int patch_branch(struct ppc_inst *addr, unsigned long target, int flags)
 {
 	struct ppc_inst instr;
@@ -320,6 +377,14 @@  int patch_branch(struct ppc_inst *addr, unsigned long target, int flags)
 	return patch_instruction(addr, instr);
 }
 
+int patch_branch_unlocked(struct ppc_inst *addr, unsigned long target, int flags)
+{
+	struct ppc_inst instr;
+
+	create_branch(&instr, addr, target, flags);
+	return patch_instruction_unlocked(addr, instr);
+}
+
 bool is_offset_in_branch_range(long offset)
 {
 	/*