Patchwork [2/3] mm: slub: Do not take expensive steps for SLUBs speculative high-order allocations

login
register
mail settings
Submitter Mel Gorman
Date May 11, 2011, 3:29 p.m.
Message ID <1305127773-10570-3-git-send-email-mgorman@suse.de>
Download mbox | patch
Permalink /patch/95162/
State Not Applicable
Headers show

Comments

Mel Gorman - May 11, 2011, 3:29 p.m.
To avoid locking and per-cpu overhead, SLUB optimisically uses
high-order allocations and falls back to lower allocations if they
fail. However, by simply trying to allocate, the caller can enter
compaction or reclaim - both of which are likely to cost more than the
benefit of using high-order pages in SLUB. On a desktop system, two
users report that the system is getting stalled with kswapd using large
amounts of CPU.

This patch prevents SLUB taking any expensive steps when trying to
use high-order allocations. Instead, it is expected to fall back to
smaller orders more aggressively. Testing from users was somewhat
inconclusive on how much this helped but local tests showed it made
a positive difference. It makes sense that falling back to order-0
allocations is faster than entering compaction or direct reclaim.

Signed-off-yet: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c |    3 ++-
 mm/slub.c       |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)
David Rientjes - May 11, 2011, 8:38 p.m.
On Wed, 11 May 2011, Mel Gorman wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9f8a97b..057f1e2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  {
>  	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  	const gfp_t wait = gfp_mask & __GFP_WAIT;
> +	const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD);
>  
>  	/* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
>  	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
> @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  	 */
>  	alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH);
>  
> -	if (!wait) {
> +	if (!wait && can_wake_kswapd) {
>  		/*
>  		 * Not worth trying to allocate harder for
>  		 * __GFP_NOMEMALLOC even if it can't schedule.
> diff --git a/mm/slub.c b/mm/slub.c
> index 98c358d..1071723 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	 * Let the initial higher-order allocation fail under memory pressure
>  	 * so we fall-back to the minimum order allocation.
>  	 */
> -	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;
> +	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) &
> +			~(__GFP_NOFAIL | __GFP_WAIT);

__GFP_NORETRY is a no-op without __GFP_WAIT.

>  
>  	page = alloc_slab_page(alloc_gfp, node, oo);
>  	if (unlikely(!page)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman - May 11, 2011, 9:10 p.m.
On Wed, May 11, 2011 at 01:38:44PM -0700, David Rientjes wrote:
> On Wed, 11 May 2011, Mel Gorman wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9f8a97b..057f1e2 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  {
> >  	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
> >  	const gfp_t wait = gfp_mask & __GFP_WAIT;
> > +	const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD);
> >  
> >  	/* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
> >  	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
> > @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  	 */
> >  	alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH);
> >  
> > -	if (!wait) {
> > +	if (!wait && can_wake_kswapd) {
> >  		/*
> >  		 * Not worth trying to allocate harder for
> >  		 * __GFP_NOMEMALLOC even if it can't schedule.
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 98c358d..1071723 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> >  	 * Let the initial higher-order allocation fail under memory pressure
> >  	 * so we fall-back to the minimum order allocation.
> >  	 */
> > -	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;
> > +	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) &
> > +			~(__GFP_NOFAIL | __GFP_WAIT);
> 
> __GFP_NORETRY is a no-op without __GFP_WAIT.
> 

True. I'll remove it in a V2 but I won't respin just yet.

> >  
> >  	page = alloc_slab_page(alloc_gfp, node, oo);
> >  	if (unlikely(!page)) {
Andrea Arcangeli - May 12, 2011, 5:25 p.m.
Hi,

On Wed, May 11, 2011 at 10:10:43PM +0100, Mel Gorman wrote:
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 98c358d..1071723 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > >  	 * Let the initial higher-order allocation fail under memory pressure
> > >  	 * so we fall-back to the minimum order allocation.
> > >  	 */
> > > -	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;
> > > +	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) &
> > > +			~(__GFP_NOFAIL | __GFP_WAIT);
> > 
> > __GFP_NORETRY is a no-op without __GFP_WAIT.
> > 
> 
> True. I'll remove it in a V2 but I won't respin just yet.

Nothing wrong and no performance difference with clearing
__GFP_NORETRY too, if something it doesn't make sense for a caller to
use __GFP_NOFAIL without __GFP_WAIT so the original version above
looks cleaner. I like this change overall to only poll the buddy
allocator without spinning kswapd and without invoking lumpy reclaim.

Like you noted in the first mail, compaction was disabled, and very
bad behavior is expected without it unless GFP_ATOMIC|__GFP_NO_KSWAPD
is set (that was the way I had to use before disabling lumpy
compaction when first developing THP too for the same reasons).

But when compaction enabled slub could try to only clear __GFP_NOFAIL
and leave __GFP_WAIT and no bad behavior should happen... but it's
probably slower so I prefer to clear __GFP_WAIT too (for THP
compaction is worth it because the allocation is generally long lived,
but for slub allocations like tiny skb the allocation can be extremely
short lived so it's unlikely to be worth it). So this way compaction
is then invoked only by the minimal order allocation later if needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f8a97b..057f1e2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1972,6 +1972,7 @@  gfp_to_alloc_flags(gfp_t gfp_mask)
 {
 	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 	const gfp_t wait = gfp_mask & __GFP_WAIT;
+	const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD);
 
 	/* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
 	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
@@ -1984,7 +1985,7 @@  gfp_to_alloc_flags(gfp_t gfp_mask)
 	 */
 	alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH);
 
-	if (!wait) {
+	if (!wait && can_wake_kswapd) {
 		/*
 		 * Not worth trying to allocate harder for
 		 * __GFP_NOMEMALLOC even if it can't schedule.
diff --git a/mm/slub.c b/mm/slub.c
index 98c358d..1071723 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1170,7 +1170,8 @@  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	 * Let the initial higher-order allocation fail under memory pressure
 	 * so we fall-back to the minimum order allocation.
 	 */
-	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;
+	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) &
+			~(__GFP_NOFAIL | __GFP_WAIT);
 
 	page = alloc_slab_page(alloc_gfp, node, oo);
 	if (unlikely(!page)) {