diff mbox

[05/17] mm: allow PF_MEMALLOC from softirq context

Message ID 1336657510-24378-6-git-send-email-mgorman@suse.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Mel Gorman May 10, 2012, 1:44 p.m. UTC
This is needed to allow network softirq packet processing to make
use of PF_MEMALLOC.

Currently softirq context cannot use PF_MEMALLOC due to it not being
associated with a task, and therefore not having task flags to fiddle
with - thus the gfp to alloc flag mapping ignores the task flags when
in interrupts (hard or soft) context.

Allowing softirqs to make use of PF_MEMALLOC therefore requires some
trickery.  We basically borrow the task flags from whatever process
happens to be preempted by the softirq.

So we modify the gfp to alloc flags mapping to not exclude task flags
in softirq context, and modify the softirq code to save, clear and
restore the PF_MEMALLOC flag.

The save and clear, ensures the preempted task's PF_MEMALLOC flag
doesn't leak into the softirq. The restore ensures a softirq's
PF_MEMALLOC flag cannot leak back into the preempted process.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/sched.h |    7 +++++++
 kernel/softirq.c      |    9 +++++++++
 mm/page_alloc.c       |    6 +++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

Comments

David Miller May 11, 2012, 4:39 a.m. UTC | #1
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:44:58 +0100

> This is needed to allow network softirq packet processing to make
> use of PF_MEMALLOC.
> 
> Currently softirq context cannot use PF_MEMALLOC due to it not being
> associated with a task, and therefore not having task flags to fiddle
> with - thus the gfp to alloc flag mapping ignores the task flags when
> in interrupts (hard or soft) context.
> 
> Allowing softirqs to make use of PF_MEMALLOC therefore requires some
> trickery.  We basically borrow the task flags from whatever process
> happens to be preempted by the softirq.
> 
> So we modify the gfp to alloc flags mapping to not exclude task flags
> in softirq context, and modify the softirq code to save, clear and
> restore the PF_MEMALLOC flag.
> 
> The save and clear, ensures the preempted task's PF_MEMALLOC flag
> doesn't leak into the softirq. The restore ensures a softirq's
> PF_MEMALLOC flag cannot leak back into the preempted process.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

We're now making changes to task->flags from both base and
softirq context, but with non-atomic operations and no other
kind of synchronization.

As far as I can tell, this has to be racy.

If this works via some magic combination of invariants, you
absolutely have to document this, verbosely.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman May 14, 2012, 10:02 a.m. UTC | #2
On Fri, May 11, 2012 at 12:39:51AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:44:58 +0100
> 
> > This is needed to allow network softirq packet processing to make
> > use of PF_MEMALLOC.
> > 
> > Currently softirq context cannot use PF_MEMALLOC due to it not being
> > associated with a task, and therefore not having task flags to fiddle
> > with - thus the gfp to alloc flag mapping ignores the task flags when
> > in interrupts (hard or soft) context.
> > 
> > Allowing softirqs to make use of PF_MEMALLOC therefore requires some
> > trickery.  We basically borrow the task flags from whatever process
> > happens to be preempted by the softirq.
> > 
> > So we modify the gfp to alloc flags mapping to not exclude task flags
> > in softirq context, and modify the softirq code to save, clear and
> > restore the PF_MEMALLOC flag.
> > 
> > The save and clear, ensures the preempted task's PF_MEMALLOC flag
> > doesn't leak into the softirq. The restore ensures a softirq's
> > PF_MEMALLOC flag cannot leak back into the preempted process.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> We're now making changes to task->flags from both base and
> softirq context, but with non-atomic operations and no other
> kind of synchronization.
> 
> As far as I can tell, this has to be racy.
> 

I'm not seeing the race you are thinking of.

Softirqs can run on multiple CPUs sure but the same task should not be
	executing the same softirq code. Interrupts are disabled and the
	executing process cannot sleep in softirq context so the task flags
	cannot "leak" nor can they be concurrently modified.

Softirqs are not execued from hard interrupt context so there are no
	races with hardirqs.

If the softirq is deferred to ksoftirq then its flags may be used
	instead of a normal tasks but as the softirq cannot be preempted,
	the PF_MEMALLOC flag does not leak to other code by accident.

When __do_softirq() is finished, care is taken to restore the
	PF_MEMALLOC flag to the value when __do_softirq() started. They
	should not be accidentally clearing the flag.

I'm not seeing how current->flags can be modified while the softirq handler
is running in such a way that information is lost or misused. There
would be a problem if softirqs used GFP_KERNEL because the presense of
the PF_MEMALLOC flag would prevent the use of direct reclaim but softirqs
cannot use direct reclaim anyway.

> If this works via some magic combination of invariants, you
> absolutely have to document this, verbosely.

Did I miss a race you are thinking of or should I just add the above
explanation to the changelog?
Mel Gorman May 15, 2012, 1:07 p.m. UTC | #3
On Mon, May 14, 2012 at 11:02:29AM +0100, Mel Gorman wrote:
> Softirqs can run on multiple CPUs sure but the same task should not be
> 	executing the same softirq code. Interrupts are disabled and the
> 	executing process cannot sleep in softirq context so the task flags
> 	cannot "leak" nor can they be concurrently modified.
> 

This comment about hardirq is obviously wrong as __do_softirq() enables
interrupts and can be preempted by a hardirq. I've updated the changelog
now to include the following;

Softirqs can run on multiple CPUs sure but the same task should not be
        executing the same softirq code. Neither should the softirq
        handler be preempted by any other softirq handler so the flags
        should not leak to an unrelated softirq.

Softirqs re-enable hardware interrupts in __do_softirq() so can be
        preempted by hardware interrupts so PF_MEMALLOC is inherited
        by the hard IRQ. However, this is similar to a process in
        reclaim being preempted by a hardirq. While PF_MEMALLOC is
        set, gfp_to_alloc_flags() distinguishes between hard and
        soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
        flag.

If the softirq is deferred to ksoftirq then its flags may be used
        instead of a normal tasks but as the softirq cannot be preempted,
        the PF_MEMALLOC flag does not leak to other code by accident.
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..b5efaf4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1913,6 +1913,13 @@  static inline void rcu_copy_process(struct task_struct *p)
 
 #endif
 
+static inline void tsk_restore_flags(struct task_struct *task,
+				unsigned long orig_flags, unsigned long flags)
+{
+	task->flags &= ~flags;
+	task->flags |= orig_flags & flags;
+}
+
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 671f959..b73e681 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -210,6 +210,14 @@  asmlinkage void __do_softirq(void)
 	__u32 pending;
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	int cpu;
+	unsigned long old_flags = current->flags;
+
+	/*
+	 * Mask out PF_MEMALLOC s current task context is borrowed for the
+	 * softirq. A softirq handled such as network RX might set PF_MEMALLOC
+	 * again if the socket is related to swap
+	 */
+	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
 	account_system_vtime(current);
@@ -265,6 +273,7 @@  restart:
 
 	account_system_vtime(current);
 	__local_bh_enable(SOFTIRQ_OFFSET);
+	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6040520..3738596 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2211,7 +2211,11 @@  gfp_to_alloc_flags(gfp_t gfp_mask)
 	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
 		if (gfp_mask & __GFP_MEMALLOC)
 			alloc_flags |= ALLOC_NO_WATERMARKS;
-		else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
+		else if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
+			alloc_flags |= ALLOC_NO_WATERMARKS;
+		else if (!in_interrupt() &&
+				((current->flags & PF_MEMALLOC) ||
+				 unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}