Message ID | 20190225222611.29564-1-mfo@canonical.com |
---|---|
State | New |
Headers | show |
Series | [X/B] mm: do not stall register_shrinker() | expand |
On 2019-02-25 19:26:11 , Mauricio Faria de Oliveira wrote: > From: Minchan Kim <minchan@kernel.org> > > BugLink: https://bugs.launchpad.net/bugs/1817628 > > Shakeel Butt reported he has observed in production systems that the job > loader gets stuck for 10s of seconds while doing a mount operation. It > turns out that it was stuck in register_shrinker() because some > unrelated job was under memory pressure and was spending time in > shrink_slab(). Machines have a lot of shrinkers registered and jobs > under memory pressure have to traverse all of those memcg-aware > shrinkers and affect unrelated jobs which want to register their own > shrinkers. > > To solve the issue, this patch simply bails out slab shrinking if it is > found that someone wants to register a shrinker in parallel. A downside > is it could cause unfair shrinking between shrinkers. However, it > should be rare and we can add compilcated logic if we find it's not > enough. > > [akpm@linux-foundation.org: tweak code comment] > Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox > Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org > Signed-off-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Shakeel Butt <shakeelb@google.com> > Reported-by: Shakeel Butt <shakeelb@google.com> > Tested-by: Shakeel Butt <shakeelb@google.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Michal Hocko <mhocko@suse.com> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb) > [mfo: refresh one context line for do_shrink_slab() arguments] > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> > --- > mm/vmscan.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f53dcdccdb83..decc5f99b85c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > sc.nid = 0; > > freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); > + /* > + * Bail out if someone want to register a new shrinker to > + * prevent the regsitration from being stalled for long periods > + * by parallel ongoing shrinking. > + */ > + if (rwsem_is_contended(&shrinker_rwsem)) { > + freed = freed ? : 1; > + break; > + } > } > > up_read(&shrinker_rwsem); Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 25.02.19 23:26, Mauricio Faria de Oliveira wrote: > From: Minchan Kim <minchan@kernel.org> > > BugLink: https://bugs.launchpad.net/bugs/1817628 > > Shakeel Butt reported he has observed in production systems that the job > loader gets stuck for 10s of seconds while doing a mount operation. It > turns out that it was stuck in register_shrinker() because some > unrelated job was under memory pressure and was spending time in > shrink_slab(). Machines have a lot of shrinkers registered and jobs > under memory pressure have to traverse all of those memcg-aware > shrinkers and affect unrelated jobs which want to register their own > shrinkers. > > To solve the issue, this patch simply bails out slab shrinking if it is > found that someone wants to register a shrinker in parallel. A downside > is it could cause unfair shrinking between shrinkers. However, it > should be rare and we can add compilcated logic if we find it's not > enough. > > [akpm@linux-foundation.org: tweak code comment] > Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox > Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org > Signed-off-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Shakeel Butt <shakeelb@google.com> > Reported-by: Shakeel Butt <shakeelb@google.com> > Tested-by: Shakeel Butt <shakeelb@google.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Michal Hocko <mhocko@suse.com> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb) > [mfo: refresh one context line for do_shrink_slab() arguments] > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> > --- > mm/vmscan.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f53dcdccdb83..decc5f99b85c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > sc.nid = 0; > > freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); > + /* > + * Bail out if someone want to register a new shrinker to > + * prevent the regsitration from being stalled for long periods > + * by parallel ongoing shrinking. > + */ > + if (rwsem_is_contended(&shrinker_rwsem)) { > + freed = freed ? : 1; > + break; > + } > } > > up_read(&shrinker_rwsem); >
On 2019-02-25 19:26:11 , Mauricio Faria de Oliveira wrote: > From: Minchan Kim <minchan@kernel.org> > > BugLink: https://bugs.launchpad.net/bugs/1817628 > > Shakeel Butt reported he has observed in production systems that the job > loader gets stuck for 10s of seconds while doing a mount operation. It > turns out that it was stuck in register_shrinker() because some > unrelated job was under memory pressure and was spending time in > shrink_slab(). Machines have a lot of shrinkers registered and jobs > under memory pressure have to traverse all of those memcg-aware > shrinkers and affect unrelated jobs which want to register their own > shrinkers. > > To solve the issue, this patch simply bails out slab shrinking if it is > found that someone wants to register a shrinker in parallel. A downside > is it could cause unfair shrinking between shrinkers. However, it > should be rare and we can add compilcated logic if we find it's not > enough. > > [akpm@linux-foundation.org: tweak code comment] > Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox > Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org > Signed-off-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Shakeel Butt <shakeelb@google.com> > Reported-by: Shakeel Butt <shakeelb@google.com> > Tested-by: Shakeel Butt <shakeelb@google.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Michal Hocko <mhocko@suse.com> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb) > [mfo: refresh one context line for do_shrink_slab() arguments] > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> > --- > mm/vmscan.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f53dcdccdb83..decc5f99b85c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > sc.nid = 0; > > freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); > + /* > + * Bail out if someone want to register a new shrinker to > + * prevent the regsitration from being stalled for long periods > + * by parallel ongoing shrinking. > + */ > + if (rwsem_is_contended(&shrinker_rwsem)) { > + freed = freed ? : 1; > + break; > + } > } > > up_read(&shrinker_rwsem); > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff --git a/mm/vmscan.c b/mm/vmscan.c index f53dcdccdb83..decc5f99b85c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, sc.nid = 0; freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); + /* + * Bail out if someone want to register a new shrinker to + * prevent the regsitration from being stalled for long periods + * by parallel ongoing shrinking. + */ + if (rwsem_is_contended(&shrinker_rwsem)) { + freed = freed ? : 1; + break; + } } up_read(&shrinker_rwsem);