mbox series

[v3,0/4] shoot lazy tlbs

Message ID 20210601062303.3932513-1-npiggin@gmail.com (mailing list archive)
Headers show
Series shoot lazy tlbs | expand

Message

Nicholas Piggin June 1, 2021, 6:22 a.m. UTC
There haven't been objections to the series since last posting, this
is just a rebase and tidies up a few comments minor patch rearranging.

Thanks,
Nick

Nicholas Piggin (4):
  lazy tlb: introduce lazy mm refcount helper functions
  lazy tlb: allow lazy tlb mm switching to be configurable
  lazy tlb: shoot lazies, a non-refcounting lazy tlb option
  powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

 arch/Kconfig                         | 38 ++++++++++++
 arch/arm/mach-rpc/ecard.c            |  2 +-
 arch/powerpc/Kconfig                 |  1 +
 arch/powerpc/kernel/smp.c            |  2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c |  4 +-
 fs/exec.c                            |  4 +-
 include/linux/sched/mm.h             | 20 +++++++
 kernel/cpu.c                         |  2 +-
 kernel/exit.c                        |  2 +-
 kernel/fork.c                        | 52 ++++++++++++++++
 kernel/kthread.c                     | 11 ++--
 kernel/sched/core.c                  | 88 ++++++++++++++++++++--------
 kernel/sched/sched.h                 |  4 +-
 13 files changed, 192 insertions(+), 38 deletions(-)

Comments

Andy Lutomirski June 4, 2021, 4:54 p.m. UTC | #1
On 5/31/21 11:22 PM, Nicholas Piggin wrote:
> There haven't been objections to the series since last posting, this
> is just a rebase and tidies up a few comments minor patch rearranging.
> 

I continue to object to having too many modes.  I like my more generic
improvements better.  Let me try to find some time to email again.
Andy Lutomirski June 4, 2021, 5:05 p.m. UTC | #2
On 6/4/21 9:54 AM, Andy Lutomirski wrote:
> On 5/31/21 11:22 PM, Nicholas Piggin wrote:
>> There haven't been objections to the series since last posting, this
>> is just a rebase and tidies up a few comments minor patch rearranging.
>>
> 
> I continue to object to having too many modes.  I like my more generic
> improvements better.  Let me try to find some time to email again.
> 

Specifically, this:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm

I, or someone, needs to dust off my membarrier series before any of
these kinds of changes get made.  The barrier situation in the scheduler
is too confusing otherwise.
Nicholas Piggin June 5, 2021, 12:17 a.m. UTC | #3
Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am:
> On 6/4/21 9:54 AM, Andy Lutomirski wrote:
>> On 5/31/21 11:22 PM, Nicholas Piggin wrote:
>>> There haven't been objections to the series since last posting, this
>>> is just a rebase and tidies up a few comments minor patch rearranging.
>>>
>> 
>> I continue to object to having too many modes.  I like my more generic
>> improvements better.  Let me try to find some time to email again.
>> 
> 
> Specifically, this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm

That's worse than what powerpc does with the shoot lazies code so 
we wouldn't use it anyway.

The fact is mm-cpumask and lazy mm is very architecture specific, so I 
don't really see that another "mode" is such a problem, it's for the 
most part "this is what powerpc does" -> "this is what powerpc does".
The only mode in the context switch is just "take a ref on the lazy mm"
or "don't take a ref". Surely that's not too onerous to add!?

Actually the bigger part of it is actually the no-lazy mmu mode which
is not yet used, I thought it was a neat little demonstrator of how code
works with/without lazy but I will get rid of that for submission.


> I, or someone, needs to dust off my membarrier series before any of
> these kinds of changes get made.  The barrier situation in the scheduler
> is too confusing otherwise.
> 

I disagree, I've disentangled the changes from membarrier stuff now, 
they can be done concurrently.

Thanks,
Nick
Nicholas Piggin June 5, 2021, 12:26 a.m. UTC | #4
Excerpts from Nicholas Piggin's message of June 5, 2021 10:17 am:
> Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am:
>> On 6/4/21 9:54 AM, Andy Lutomirski wrote:
>>> On 5/31/21 11:22 PM, Nicholas Piggin wrote:
>>>> There haven't been objections to the series since last posting, this
>>>> is just a rebase and tidies up a few comments minor patch rearranging.
>>>>
>>> 
>>> I continue to object to having too many modes.  I like my more generic
>>> improvements better.  Let me try to find some time to email again.
>>> 
>> 
>> Specifically, this:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm
> 
> That's worse than what powerpc does with the shoot lazies code so 
> we wouldn't use it anyway.
> 
> The fact is mm-cpumask and lazy mm is very architecture specific, so I 
> don't really see that another "mode" is such a problem, it's for the 
> most part "this is what powerpc does" -> "this is what powerpc does".
> The only mode in the context switch is just "take a ref on the lazy mm"
> or "don't take a ref". Surely that's not too onerous to add!?
> 
> Actually the bigger part of it is actually the no-lazy mmu mode which
> is not yet used, I thought it was a neat little demonstrator of how code
> works with/without lazy but I will get rid of that for submission.

I admit that does add a bit more churn than necessary maybe that was
the main objection.

Here is the entire kernel/sched/core.c change after that is removed.
Pretty simple now. I'll resubmit.

Thanks,
Nick


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e359c76ea2e2..1be0b97e12ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4171,7 +4171,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	__releases(rq->lock)
 {
 	struct rq *rq = this_rq();
-	struct mm_struct *mm = rq->prev_mm;
+	struct mm_struct *mm = NULL;
 	long prev_state;
 
 	/*
@@ -4190,7 +4190,10 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 		      current->comm, current->pid, preempt_count()))
 		preempt_count_set(FORK_PREEMPT_COUNT);
 
-	rq->prev_mm = NULL;
+#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT
+	mm = rq->prev_lazy_mm;
+	rq->prev_lazy_mm = NULL;
+#endif
 
 	/*
 	 * A task struct has one reference for the use as "current".
@@ -4326,9 +4329,21 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
 		if (!prev->mm) {                        // from kernel
-			/* will mmdrop_lazy_tlb() in finish_task_switch(). */
-			rq->prev_mm = prev->active_mm;
+#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT
+			/* Will mmdrop_lazy_tlb() in finish_task_switch(). */
+			rq->prev_lazy_mm = prev->active_mm;
 			prev->active_mm = NULL;
+#else
+			/*
+			 * Without MMU_LAZY_TLB_REFCOUNT there is no lazy
+			 * tracking (because no rq->prev_lazy_mm) in
+			 * finish_task_switch, so no mmdrop_lazy_tlb(),
+			 * so no memory barrier for membarrier (see the
+			 * membarrier comment in finish_task_switch()).
+			 * Do it here.
+			 */
+			smp_mb();
+#endif
 		}
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a189bec13729..0729cf19a987 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -961,7 +961,9 @@ struct rq {
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
-	struct mm_struct	*prev_mm;
+#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT
+	struct mm_struct	*prev_lazy_mm;
+#endif
 
 	unsigned int		clock_update_flags;
 	u64			clock;
Nicholas Piggin June 5, 2021, 2:52 a.m. UTC | #5
Excerpts from Nicholas Piggin's message of June 5, 2021 10:26 am:
> Excerpts from Nicholas Piggin's message of June 5, 2021 10:17 am:
>> Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am:
>>> On 6/4/21 9:54 AM, Andy Lutomirski wrote:
>>>> On 5/31/21 11:22 PM, Nicholas Piggin wrote:
>>>>> There haven't been objections to the series since last posting, this
>>>>> is just a rebase and tidies up a few comments minor patch rearranging.
>>>>>
>>>> 
>>>> I continue to object to having too many modes.  I like my more generic
>>>> improvements better.  Let me try to find some time to email again.
>>>> 
>>> 
>>> Specifically, this:
>>> 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm
>> 
>> That's worse than what powerpc does with the shoot lazies code so 
>> we wouldn't use it anyway.
>> 
>> The fact is mm-cpumask and lazy mm is very architecture specific, so I 
>> don't really see that another "mode" is such a problem, it's for the 
>> most part "this is what powerpc does" -> "this is what powerpc does".
>> The only mode in the context switch is just "take a ref on the lazy mm"
>> or "don't take a ref". Surely that's not too onerous to add!?
>> 
>> Actually the bigger part of it is actually the no-lazy mmu mode which
>> is not yet used, I thought it was a neat little demonstrator of how code
>> works with/without lazy but I will get rid of that for submission.
> 
> I admit that does add a bit more churn than necessary maybe that was
> the main objection.
> 
> Here is the entire kernel/sched/core.c change after that is removed.
> Pretty simple now. I'll resubmit.

If it gives you some concerns about a great complex new mode, I'll
put it a different way -- all this allows is the arch to say that it
does not use lazy tlb mms beyond their refcounted lifetime, so there
is no need to refcount the lazy tlb reference.

That's all it is. One implementation of that is shoot lazies, and that
could be done entirely in arch/powerpc via destroy_context (I just put 
it in mm/ in case it is useful to others, but that's no real 
difference).

So you see it's really just about management of lazies, the refcounting
is just a bit on the side. And lazy management is highly arch specific,
x86 being one of the really different complex ones there including
very complex and unique interactions with membarrier ordering, so that
can't be a fair objection.

Thanks,
Nick