Message ID | 1305295404-12129-4-git-send-email-mgorman@suse.de |
---|---|
State | Not Applicable, archived |
Headers | show |
On Fri, 13 May 2011, Mel Gorman wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9f8a97b..057f1e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > { > int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; > const gfp_t wait = gfp_mask & __GFP_WAIT; > + const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD); > > /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ > BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); > @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > */ > alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); > > - if (!wait) { > + if (!wait && can_wake_kswapd) { > /* > * Not worth trying to allocate harder for > * __GFP_NOMEMALLOC even if it can't schedule. > diff --git a/mm/slub.c b/mm/slub.c > index 98c358d..c5797ab 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > * Let the initial higher-order allocation fail under memory pressure > * so we fall-back to the minimum order allocation. > */ > - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL; > + alloc_gfp = (flags | __GFP_NOWARN | __GFP_NO_KSWAPD) & > + ~(__GFP_NOFAIL | __GFP_WAIT | __GFP_REPEAT); > > page = alloc_slab_page(alloc_gfp, node, oo); > if (unlikely(!page)) { It's unnecessary to clear __GFP_REPEAT, these !__GFP_NOFAIL allocations will immediately fail. alloc_gfp would probably benefit from having a comment about why __GFP_WAIT should be masked off here: that we don't want to do compaction or direct reclaim or retry the allocation more than once (so both __GFP_NORETRY and __GFP_REPEAT are no-ops). -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 16, 2011 at 02:16:46PM -0700, David Rientjes wrote: > On Fri, 13 May 2011, Mel Gorman wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 9f8a97b..057f1e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > { > > int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; > > const gfp_t wait = gfp_mask & __GFP_WAIT; > > + const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD); > > > > /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ > > BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); > > @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > */ > > alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); > > > > - if (!wait) { > > + if (!wait && can_wake_kswapd) { > > /* > > * Not worth trying to allocate harder for > > * __GFP_NOMEMALLOC even if it can't schedule. > > diff --git a/mm/slub.c b/mm/slub.c > > index 98c358d..c5797ab 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > > * Let the initial higher-order allocation fail under memory pressure > > * so we fall-back to the minimum order allocation. > > */ > > - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL; > > + alloc_gfp = (flags | __GFP_NOWARN | __GFP_NO_KSWAPD) & > > + ~(__GFP_NOFAIL | __GFP_WAIT | __GFP_REPEAT); > > > > page = alloc_slab_page(alloc_gfp, node, oo); > > if (unlikely(!page)) { > > It's unnecessary to clear __GFP_REPEAT, these !__GFP_NOFAIL allocations > will immediately fail. > We can enter enter direct compaction or direct reclaim at least once. If compaction is enabled and we enter reclaim/compaction, the presense of __GFP_REPEAT makes a difference in should_continue_reclaim(). With compaction disabled, the presense of the flag is relevant in should_alloc_retry() with it being possible to loop in the allocator instead of failing the SLUB allocation and dropping back. Maybe you meant !__GFP_WAIT instead of !__GFP_NOFAIL which makes more sense. In that case, we clear both flags because __GFP_REPEAT && !_GFP_WAIT is a senseless combination of flags. If for whatever reason the __GFP_WAIT was re-added, the presense of __GFP_REPEAT could cause problems in reclaim that would be hard to spot again. > alloc_gfp would probably benefit from having a comment about why > __GFP_WAIT should be masked off here: that we don't want to do compaction > or direct reclaim or retry the allocation more than once (so both > __GFP_NORETRY and __GFP_REPEAT are no-ops). That would have been helpful all right. I should have caught that and explained it properly. In the event there is a new version of the patch, I'll add one. For the moment, I'm dropping this patch entirely. Christoph wants to maintain historic behaviour of SLUB to maximise the number of high-order pages it uses and at the end of the day, which option performs better depends entirely on the workload and machine configuration.
On Tue, 17 May 2011, Mel Gorman wrote: > entirely. Christoph wants to maintain historic behaviour of SLUB to > maximise the number of high-order pages it uses and at the end of the > day, which option performs better depends entirely on the workload > and machine configuration. That is not what I meant. I would like more higher order allocations to succeed. That does not mean that slubs allocation methods and flags passed have to stay the same. You can change the slub behavior if it helps. I am just suspicious of compaction. If these mods are needed to reduce the amount of higher order pages then compaction does not have the beneficial effect that it should have. It does not actually increase the available higher order pages. Fix that first. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2011 at 08:51:47AM -0500, Christoph Lameter wrote: > On Tue, 17 May 2011, Mel Gorman wrote: > > > entirely. Christoph wants to maintain historic behaviour of SLUB to > > maximise the number of high-order pages it uses and at the end of the > > day, which option performs better depends entirely on the workload > > and machine configuration. > > That is not what I meant. I would like more higher order allocations to > succeed. That does not mean that slubs allocation methods and flags passed > have to stay the same. You can change the slub behavior if it helps. > In this particular patch, the success rate for high order allocations would likely decrease in low memory conditions albeit the latency when calling the page allocator will be lower and the disruption to the system will be less (no copying or reclaim of pages). My expectation would be that it's cheaper for SLUB to fall back than compact memory or reclaim pages even if this means a slab page is smaller until more memory is free. However, if the "goodness" criteria is high order allocation success rate, the patch shouldn't be merged. > I am just suspicious of compaction. If these mods are needed to reduce the > amount of higher order pages then compaction does not have the > beneficial effect that it should have. It does not actually > increase the available higher order pages. Fix that first. > The problem being addressed was the machine being hung at worst and in other cases having kswapd pinned at 99-100% CPU. It's now been shown that modifying SLUB is not necessary to fix this because the bug was in page reclaim. The high-order allocation success rate didn't come into it.
On Tue, 17 May 2011, Mel Gorman wrote: > > That is not what I meant. I would like more higher order allocations to > > succeed. That does not mean that slubs allocation methods and flags passed > > have to stay the same. You can change the slub behavior if it helps. > > > > In this particular patch, the success rate for high order allocations > would likely decrease in low memory conditions albeit the latency when > calling the page allocator will be lower and the disruption to the > system will be less (no copying or reclaim of pages). My expectation > would be that it's cheaper for SLUB to fall back than compact memory > or reclaim pages even if this means a slab page is smaller until more > memory is free. However, if the "goodness" criteria is high order > allocation success rate, the patch shouldn't be merged. The criteria is certainly overall system performance and not a high order allocation rate. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 May 2011, Mel Gorman wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 9f8a97b..057f1e2 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > { > > > int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; > > > const gfp_t wait = gfp_mask & __GFP_WAIT; > > > + const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD); > > > > > > /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ > > > BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); > > > @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > */ > > > alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); > > > > > > - if (!wait) { > > > + if (!wait && can_wake_kswapd) { > > > /* > > > * Not worth trying to allocate harder for > > > * __GFP_NOMEMALLOC even if it can't schedule. > > > diff --git a/mm/slub.c b/mm/slub.c > > > index 98c358d..c5797ab 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > > > * Let the initial higher-order allocation fail under memory pressure > > > * so we fall-back to the minimum order allocation. > > > */ > > > - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL; > > > + alloc_gfp = (flags | __GFP_NOWARN | __GFP_NO_KSWAPD) & > > > + ~(__GFP_NOFAIL | __GFP_WAIT | __GFP_REPEAT); > > > > > > page = alloc_slab_page(alloc_gfp, node, oo); > > > if (unlikely(!page)) { > > > > It's unnecessary to clear __GFP_REPEAT, these !__GFP_NOFAIL allocations > > will immediately fail. > > > > We can enter enter direct compaction or direct reclaim > at least once. If compaction is enabled and we enter > reclaim/compaction, the presense of __GFP_REPEAT makes a difference > in should_continue_reclaim(). With compaction disabled, the presense > of the flag is relevant in should_alloc_retry() with it being possible > to loop in the allocator instead of failing the SLUB allocation and > dropping back. > You've cleared __GFP_WAIT, so it cannot enter direct compaction or direct reclaim, so clearing __GFP_REPEAT here doesn't actually do anything. That's why I suggested adding a comment about why you're clearing __GFP_WAIT: to make it obvious that these allocations will immediately fail if the alloc is unsuccessful and we don't need to add __GFP_NORETRY or remove __GFP_REPEAT. > Maybe you meant !__GFP_WAIT instead of !__GFP_NOFAIL which makes > more sense. No, I meant !__GFP_NOFAIL since the high priority allocations (if PF_MEMALLOC or TIF_MEMDIE) will not loop forever looking for a page without that bit. That allows this !__GFP_WAIT allocation to immediately fail. __GFP_NORETRY and __GFP_REPEAT are no-ops unless you have __GFP_WAIT. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 May 2011, Christoph Lameter wrote: > > In this particular patch, the success rate for high order allocations > > would likely decrease in low memory conditions albeit the latency when > > calling the page allocator will be lower and the disruption to the > > system will be less (no copying or reclaim of pages). My expectation > > would be that it's cheaper for SLUB to fall back than compact memory > > or reclaim pages even if this means a slab page is smaller until more > > memory is free. However, if the "goodness" criteria is high order > > allocation success rate, the patch shouldn't be merged. > > The criteria is certainly overall system performance and not a high order > allocation rate. > SLUB definitely depends on these higher order allocations being successful for performance, dropping back to the min order is a last resort as opposed to failing the kmalloc(). If it's the last resort, then it makes sense that we'd want to try both compaction and reclaim while we're already in the page allocator as we go down the slub slowpath. Why not try just a little harder (compaction and/or reclaim) to alloc the cache's preferred order? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9f8a97b..057f1e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) { int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; const gfp_t wait = gfp_mask & __GFP_WAIT; + const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD); /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) */ alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); - if (!wait) { + if (!wait && can_wake_kswapd) { /* * Not worth trying to allocate harder for * __GFP_NOMEMALLOC even if it can't schedule. diff --git a/mm/slub.c b/mm/slub.c index 98c358d..c5797ab 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Let the initial higher-order allocation fail under memory pressure * so we fall-back to the minimum order allocation. */ - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL; + alloc_gfp = (flags | __GFP_NOWARN | __GFP_NO_KSWAPD) & + ~(__GFP_NOFAIL | __GFP_WAIT | __GFP_REPEAT); page = alloc_slab_page(alloc_gfp, node, oo); if (unlikely(!page)) {
To avoid locking and per-cpu overhead, SLUB optimisically uses high-order allocations and falls back to lower allocations if they fail. However, by simply trying to allocate, the caller can enter compaction or reclaim - both of which are likely to cost more than the benefit of using high-order pages in SLUB. On a desktop system, two users report that the system is getting stalled with kswapd using large amounts of CPU. This patch prevents SLUB taking any expensive steps when trying to use high-order allocations. Instead, it is expected to fall back to smaller orders more aggressively. Testing was somewhat inconclusive on how much this helped but it makes sense that falling back to order-0 allocations is faster than entering compaction or direct reclaim. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_alloc.c | 3 ++- mm/slub.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)