Message ID | 20150221011907.2d26c979.akpm@linux-foundation.org |
---|---|
State | Not Applicable, archived |
Headers | show |
Andrew Morton wrote: > On Fri, 20 Feb 2015 22:20:00 -0500 "Theodore Ts'o" <tytso@mit.edu> wrote: > > > +akpm > > I was hoping not to have to read this thread ;) Sorry for getting so complicated. > What I'm not really understanding is why the pre-3.19 implementation > actually worked. We've exhausted the free pages, we're not succeeding > at reclaiming anything, we aren't able to oom-kill anyone. Yet it > *does* work - we eventually find that memory and everything proceeds. > > How come? Where did that memory come from? > Even without __GFP_NOFAIL, GFP_NOFS / GFP_NOIO allocations retried forever (without invoking the OOM killer) if order <= PAGE_ALLOC_COSTLY_ORDER and TIF_MEMDIE is not set. Somebody else volunteered that memory while retrying. This implies silent hang-up forever if nobody volunteers memory. > And yes, I agree that sites such as xfs's kmem_alloc() should be > passing __GFP_NOFAIL to tell the page allocator what's going on. I > don't think it matters a lot whether kmem_alloc() retains its retry > loop. If __GFP_NOFAIL is working correctly then it will never loop > anyway... Commit 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") inadvertently changed GFP_NOFS / GFP_NOIO allocations not to retry unless __GFP_NOFAIL is specified. Therefore, either applying Johannes's akpm-doesnt-know-why-it-works patch or passing __GFP_NOFAIL will restore the pre-3.19 behavior (with possibility of silent hang-up). -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 21, 2015 at 01:19:07AM -0800, Andrew Morton wrote: > On Fri, 20 Feb 2015 22:20:00 -0500 "Theodore Ts'o" <tytso@mit.edu> wrote: > > > +akpm > > I was hoping not to have to read this thread ;) ditto.... > And yes, I agree that sites such as xfs's kmem_alloc() should be > passing __GFP_NOFAIL to tell the page allocator what's going on. I > don't think it matters a lot whether kmem_alloc() retains its retry > loop. If __GFP_NOFAIL is working correctly then it will never loop > anyway... I'm not about to change behaviour "just because". Any sort of change like this requires a *lot* of low memory regression testing because we'd be replacing long standing known behaviour with behaviour that changes without warning. e.g the ext4 low memory failures starting because of changes made in 3.19-rc6 due to changes in oom-killer behaviour. Those changes *did not affect XFS* and that's the way I'd like things to remain. Put simply: right now I don't trust the mm subsystem to get low memory behaviour right, and this thread has done nothing to convince me that it's going to improve any time soon. > Also, this: > > On Wed, 18 Feb 2015 09:54:30 +1100 Dave Chinner <david@fromorbit.com> wrote: > > > Right now, the oom killer is a liability. Over the past 6 months > > I've slowly had to exclude filesystem regression tests from running > > on small memory machines because the OOM killer is now so unreliable > > that it kills the test harness regularly rather than the process > > generating memory pressure. > > David, I did not know this! If you've been telling us about this then > perhaps it wasn't loud enough. IME, such bug reports get ignored. Instead, over the past few months I have been pointing out bugs and problems in the oom-killer in threads like this because it seems to be the only way to get any attention to the issues I'm seeing. Bug reports simply get ignored. From this process, I've managed to learn that low order memory allocation now never fails (contrary to documentation and long standing behavioural expectations) and pointed out bugs that cause the oom killer to get invoked when the filesystem is saying "I can handle ENOMEM!" (commit 45f87de ("mm: get rid of radix tree gfp mask for pagecache_get_page"). And yes, I've definitely mentioned in these discussions that, for example, xfstests::generic/224 is triggering the oom killer far more often than it used to on my 1GB RAM vm. The only fix that has been made recently that's made any difference is 45f87de, so it's a slow process of raising awareness and trying to ensure things don't get worse before they get better.... Cheers, Dave.
On Sat, Feb 21, 2015 at 01:19:07AM -0800, Andrew Morton wrote: > Short term, we need to fix 3.19.x and 3.20 and that appears to be by > applying Johannes's akpm-doesnt-know-why-it-works patch: > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > if (high_zoneidx < ZONE_NORMAL) > goto out; > /* The OOM killer does not compensate for light reclaim */ > - if (!(gfp_mask & __GFP_FS)) > + if (!(gfp_mask & __GFP_FS)) { > + /* > + * XXX: Page reclaim didn't yield anything, > + * and the OOM killer can't be invoked, but > + * keep looping as per should_alloc_retry(). > + */ > + *did_some_progress = 1; > goto out; > + } > /* > * GFP_THISNODE contains __GFP_NORETRY and we never hit this. > * Sanity check for bare calls of __GFP_THISNODE, not real OOM. > > Have people adequately confirmed that this gets us out of trouble? I'd be interested in this too. Who is seeing these failures? Andrew, can you please use the following changelog for this patch? --- From: Johannes Weiner <hannes@cmpxchg.org> mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change Historically, !__GFP_FS allocations were not allowed to invoke the OOM killer once reclaim had failed, but nevertheless kept looping in the allocator. 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath"), which should have been a simple cleanup patch, accidentally changed the behavior to aborting the allocation at that point. This creates problems with filesystem callers (?) that currently rely on the allocator waiting for other tasks to intervene. Revert the behavior as it shouldn't have been changed as part of a cleanup patch. Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 21-02-15 19:20:58, Johannes Weiner wrote: > On Sat, Feb 21, 2015 at 01:19:07AM -0800, Andrew Morton wrote: > > Short term, we need to fix 3.19.x and 3.20 and that appears to be by > > applying Johannes's akpm-doesnt-know-why-it-works patch: > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > if (high_zoneidx < ZONE_NORMAL) > > goto out; > > /* The OOM killer does not compensate for light reclaim */ > > - if (!(gfp_mask & __GFP_FS)) > > + if (!(gfp_mask & __GFP_FS)) { > > + /* > > + * XXX: Page reclaim didn't yield anything, > > + * and the OOM killer can't be invoked, but > > + * keep looping as per should_alloc_retry(). > > + */ > > + *did_some_progress = 1; > > goto out; > > + } > > /* > > * GFP_THISNODE contains __GFP_NORETRY and we never hit this. > > * Sanity check for bare calls of __GFP_THISNODE, not real OOM. > > > > Have people adequately confirmed that this gets us out of trouble? > > I'd be interested in this too. Who is seeing these failures? > > Andrew, can you please use the following changelog for this patch? > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > > mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change > > Historically, !__GFP_FS allocations were not allowed to invoke the OOM > killer once reclaim had failed, but nevertheless kept looping in the > allocator. 9879de7373fc ("mm: page_alloc: embed OOM killing naturally > into allocation slowpath"), which should have been a simple cleanup > patch, accidentally changed the behavior to aborting the allocation at > that point. This creates problems with filesystem callers (?) that > currently rely on the allocator waiting for other tasks to intervene. > > Revert the behavior as it shouldn't have been changed as part of a > cleanup patch. OK, if this a _short term_ change. I really think that all the requests except for __GFP_NOFAIL should be able to fail. I would argue that it should be the caller who should be fixed but it is true that the patch was introduced too late (rc7) and so it caught other subsystems unprepared so backporting to stable makes sense to me. But can we please move on and stop pretending that allocations do not fail for the upcoming release? > Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz>
On Sat, 21 Feb 2015, Johannes Weiner wrote: > From: Johannes Weiner <hannes@cmpxchg.org> > > mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change > > Historically, !__GFP_FS allocations were not allowed to invoke the OOM > killer once reclaim had failed, but nevertheless kept looping in the > allocator. 9879de7373fc ("mm: page_alloc: embed OOM killing naturally > into allocation slowpath"), which should have been a simple cleanup > patch, accidentally changed the behavior to aborting the allocation at > that point. This creates problems with filesystem callers (?) that > currently rely on the allocator waiting for other tasks to intervene. > > Revert the behavior as it shouldn't have been changed as part of a > cleanup patch. > > Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org [3.19] Acked-by: David Rientjes <rientjes@google.com> -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, if (high_zoneidx < ZONE_NORMAL) goto out; /* The OOM killer does not compensate for light reclaim */ - if (!(gfp_mask & __GFP_FS)) + if (!(gfp_mask & __GFP_FS)) { + /* + * XXX: Page reclaim didn't yield anything, + * and the OOM killer can't be invoked, but + * keep looping as per should_alloc_retry(). + */ + *did_some_progress = 1; goto out; + } /* * GFP_THISNODE contains __GFP_NORETRY and we never hit this. * Sanity check for bare calls of __GFP_THISNODE, not real OOM.