diff mbox series

[X/B] mm: do not stall register_shrinker()

Message ID 20190225222611.29564-1-mfo@canonical.com
State New
Headers show
Series [X/B] mm: do not stall register_shrinker() | expand

Commit Message

Mauricio Faria de Oliveira Feb. 25, 2019, 10:26 p.m. UTC
From: Minchan Kim <minchan@kernel.org>

BugLink: https://bugs.launchpad.net/bugs/1817628

Shakeel Butt reported he has observed in production systems that the job
loader gets stuck for 10s of seconds while doing a mount operation.  It
turns out that it was stuck in register_shrinker() because some
unrelated job was under memory pressure and was spending time in
shrink_slab().  Machines have a lot of shrinkers registered and jobs
under memory pressure have to traverse all of those memcg-aware
shrinkers and affect unrelated jobs which want to register their own
shrinkers.

To solve the issue, this patch simply bails out slab shrinking if it is
found that someone wants to register a shrinker in parallel.  A downside
is it could cause unfair shrinking between shrinkers.  However, it
should be rare and we can add compilcated logic if we find it's not
enough.

[akpm@linux-foundation.org: tweak code comment]
Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox
Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb)
[mfo: refresh one context line for do_shrink_slab() arguments]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 mm/vmscan.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Khalid Elmously Feb. 26, 2019, 1:58 a.m. UTC | #1
On 2019-02-25 19:26:11 , Mauricio Faria de Oliveira wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> BugLink: https://bugs.launchpad.net/bugs/1817628
> 
> Shakeel Butt reported he has observed in production systems that the job
> loader gets stuck for 10s of seconds while doing a mount operation.  It
> turns out that it was stuck in register_shrinker() because some
> unrelated job was under memory pressure and was spending time in
> shrink_slab().  Machines have a lot of shrinkers registered and jobs
> under memory pressure have to traverse all of those memcg-aware
> shrinkers and affect unrelated jobs which want to register their own
> shrinkers.
> 
> To solve the issue, this patch simply bails out slab shrinking if it is
> found that someone wants to register a shrinker in parallel.  A downside
> is it could cause unfair shrinking between shrinkers.  However, it
> should be rare and we can add compilcated logic if we find it's not
> enough.
> 
> [akpm@linux-foundation.org: tweak code comment]
> Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox
> Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Tested-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb)
> [mfo: refresh one context line for do_shrink_slab() arguments]
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> ---
>  mm/vmscan.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f53dcdccdb83..decc5f99b85c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>  			sc.nid = 0;
>  
>  		freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
> +		/*
> +		 * Bail out if someone want to register a new shrinker to
> +		 * prevent the regsitration from being stalled for long periods
> +		 * by parallel ongoing shrinking.
> +		 */
> +		if (rwsem_is_contended(&shrinker_rwsem)) {
> +			freed = freed ? : 1;
> +			break;
> +		}
>  	}
>  
>  	up_read(&shrinker_rwsem);

Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Stefan Bader Feb. 27, 2019, 9:32 a.m. UTC | #2
On 25.02.19 23:26, Mauricio Faria de Oliveira wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> BugLink: https://bugs.launchpad.net/bugs/1817628
> 
> Shakeel Butt reported he has observed in production systems that the job
> loader gets stuck for 10s of seconds while doing a mount operation.  It
> turns out that it was stuck in register_shrinker() because some
> unrelated job was under memory pressure and was spending time in
> shrink_slab().  Machines have a lot of shrinkers registered and jobs
> under memory pressure have to traverse all of those memcg-aware
> shrinkers and affect unrelated jobs which want to register their own
> shrinkers.
> 
> To solve the issue, this patch simply bails out slab shrinking if it is
> found that someone wants to register a shrinker in parallel.  A downside
> is it could cause unfair shrinking between shrinkers.  However, it
> should be rare and we can add compilcated logic if we find it's not
> enough.
> 
> [akpm@linux-foundation.org: tweak code comment]
> Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox
> Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Tested-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb)
> [mfo: refresh one context line for do_shrink_slab() arguments]
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  mm/vmscan.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f53dcdccdb83..decc5f99b85c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>  			sc.nid = 0;
>  
>  		freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
> +		/*
> +		 * Bail out if someone want to register a new shrinker to
> +		 * prevent the regsitration from being stalled for long periods
> +		 * by parallel ongoing shrinking.
> +		 */
> +		if (rwsem_is_contended(&shrinker_rwsem)) {
> +			freed = freed ? : 1;
> +			break;
> +		}
>  	}
>  
>  	up_read(&shrinker_rwsem);
>
Khalid Elmously Feb. 28, 2019, 11:04 p.m. UTC | #3
On 2019-02-25 19:26:11 , Mauricio Faria de Oliveira wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> BugLink: https://bugs.launchpad.net/bugs/1817628
> 
> Shakeel Butt reported he has observed in production systems that the job
> loader gets stuck for 10s of seconds while doing a mount operation.  It
> turns out that it was stuck in register_shrinker() because some
> unrelated job was under memory pressure and was spending time in
> shrink_slab().  Machines have a lot of shrinkers registered and jobs
> under memory pressure have to traverse all of those memcg-aware
> shrinkers and affect unrelated jobs which want to register their own
> shrinkers.
> 
> To solve the issue, this patch simply bails out slab shrinking if it is
> found that someone wants to register a shrinker in parallel.  A downside
> is it could cause unfair shrinking between shrinkers.  However, it
> should be rare and we can add compilcated logic if we find it's not
> enough.
> 
> [akpm@linux-foundation.org: tweak code comment]
> Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox
> Link: http://lkml.kernel.org/r/1511481899-20335-1-git-send-email-minchan@kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Tested-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (backported from commit e496612c5130567fc9d5f1969ca4b86665aa3cbb)
> [mfo: refresh one context line for do_shrink_slab() arguments]
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> ---
>  mm/vmscan.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f53dcdccdb83..decc5f99b85c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -453,6 +453,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>  			sc.nid = 0;
>  
>  		freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
> +		/*
> +		 * Bail out if someone want to register a new shrinker to
> +		 * prevent the regsitration from being stalled for long periods
> +		 * by parallel ongoing shrinking.
> +		 */
> +		if (rwsem_is_contended(&shrinker_rwsem)) {
> +			freed = freed ? : 1;
> +			break;
> +		}
>  	}
>  
>  	up_read(&shrinker_rwsem);
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f53dcdccdb83..decc5f99b85c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -453,6 +453,15 @@  static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 			sc.nid = 0;
 
 		freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
+		/*
+		 * Bail out if someone want to register a new shrinker to
+		 * prevent the regsitration from being stalled for long periods
+		 * by parallel ongoing shrinking.
+		 */
+		if (rwsem_is_contended(&shrinker_rwsem)) {
+			freed = freed ? : 1;
+			break;
+		}
 	}
 
 	up_read(&shrinker_rwsem);