diff mbox series

[v3,4/5] controllers/memcg: increase memory limit in subgroup charge

Message ID 20210702125338.43248-5-krzysztof.kozlowski@canonical.com
State Superseded
Headers show
Series controllers/memcg: fixes for newer kernels | expand

Commit Message

Krzysztof Kozlowski July 2, 2021, 12:53 p.m. UTC
The memcg_subgroup_charge was failing on kernel v5.8 in around 10% cases
with:

    memcg_subgroup_charge 1 TINFO: Running memcg_process --mmap-anon -s 135168
    memcg_subgroup_charge 1 TINFO: Warming up pid: 19289
    memcg_subgroup_charge 1 TINFO: Process is still here after warm up: 19289
    memcg_subgroup_charge 1 TFAIL: rss is 0, 135168 expected
    memcg_subgroup_charge 1 TPASS: rss is 0 as expected

In dmesg one could see that OOM killer killed the process even though
group memory limit was matching the usage:

    memcg_process invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
    CPU: 4 PID: 19289 Comm: memcg_process Not tainted 5.8.0-1031-oracle #32~20.04.2-Ubuntu
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.4.1 12/03/2020
    ...
    memory: usage 132kB, limit 132kB, failcnt 9
    memory+swap: usage 132kB, limit 9007199254740988kB, failcnt 0
    kmem: usage 4kB, limit 9007199254740988kB, failcnt 0
    ...
    Tasks state (memory values in pages):
    [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
    [  19289]     0 19289      669      389    40960        0             0 memcg_process
    oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp_19257,task_memcg=/ltp_19257,task=memcg_process,pid=19289,uid=0
    Memory cgroup out of memory: Killed process 19289 (memcg_process) total-vm:2676kB, anon-rss:84kB, file-rss:1468kB, shmem-rss:4kB, UID:0 pgtables:40kB oom_score_adj:0
    oom_reaper: reaped process 19289 (memcg_process), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

It seems using 100% of memory assigned to given group might trigger OOM,
so add a space of at least one page.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
---
 .../memcg/functional/memcg_subgroup_charge.sh    | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Krzysztof Kozlowski July 2, 2021, 1:05 p.m. UTC | #1
On 02/07/2021 14:53, Krzysztof Kozlowski wrote:
> The memcg_subgroup_charge was failing on kernel v5.8 in around 10% cases
> with:
> 
>     memcg_subgroup_charge 1 TINFO: Running memcg_process --mmap-anon -s 135168
>     memcg_subgroup_charge 1 TINFO: Warming up pid: 19289
>     memcg_subgroup_charge 1 TINFO: Process is still here after warm up: 19289
>     memcg_subgroup_charge 1 TFAIL: rss is 0, 135168 expected
>     memcg_subgroup_charge 1 TPASS: rss is 0 as expected
> 
> In dmesg one could see that OOM killer killed the process even though
> group memory limit was matching the usage:
> 
>     memcg_process invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>     CPU: 4 PID: 19289 Comm: memcg_process Not tainted 5.8.0-1031-oracle #32~20.04.2-Ubuntu
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.4.1 12/03/2020
>     ...
>     memory: usage 132kB, limit 132kB, failcnt 9
>     memory+swap: usage 132kB, limit 9007199254740988kB, failcnt 0
>     kmem: usage 4kB, limit 9007199254740988kB, failcnt 0
>     ...
>     Tasks state (memory values in pages):
>     [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
>     [  19289]     0 19289      669      389    40960        0             0 memcg_process
>     oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp_19257,task_memcg=/ltp_19257,task=memcg_process,pid=19289,uid=0
>     Memory cgroup out of memory: Killed process 19289 (memcg_process) total-vm:2676kB, anon-rss:84kB, file-rss:1468kB, shmem-rss:4kB, UID:0 pgtables:40kB oom_score_adj:0
>     oom_reaper: reaped process 19289 (memcg_process), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB
> 
> It seems using 100% of memory assigned to given group might trigger OOM,
> so add a space of at least one page.
> 
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
> ---
>  .../memcg/functional/memcg_subgroup_charge.sh    | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
> index 9b23177a4dc5..88ddbabf7fa9 100755
> --- a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
> +++ b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
> @@ -19,9 +19,21 @@ TST_CNT=3
>  # $2 - memory.limit_in_bytes in sub group
>  test_subgroup()
>  {
> +	local limit_parent=$1
> +	local limit_subgroup=$2
> +
> +	# OOM might start killing if memory usage is 100%, so give it
> +	# always one page size more:
> +	if [ $limit_parent -ne 0 ]; then
> +		limit_parent=$((limit_parent + PAGESIZE))

This patch is independent from the other usage_in_bytes checks. I
included it here but maybe that's just noise...

Anyway, for v5.11 I just saw that increasing the limit by one page is
not enough.  Probably due to kernel memory accounted to the group:

[23868.177525] memory: usage 140kB, limit 140kB, failcnt 19
[23868.177527] memory+swap: usage 140kB, limit 9007199254740988kB, failcnt 0
[23868.177529] kmem: usage 16kB, limit 9007199254740988kB, failcnt 0

I am thinking of increasing this to 4 pages, although in the future it might not be enough.

> +	fi
> +	if [ $limit_subgroup -ne 0 ]; then
> +		limit_subgroup=$((limit_subgroup + PAGESIZE))
> +	fi
> +
>  	mkdir subgroup
> -	echo $1 > memory.limit_in_bytes
> -	echo $2 > subgroup/memory.limit_in_bytes
> +	echo $limit_parent > memory.limit_in_bytes
> +	echo $limit_subgroup > subgroup/memory.limit_in_bytes
>  
>  	start_memcg_process --mmap-anon -s $PAGESIZES
>  
> 


Best regards,
Krzysztof
diff mbox series

Patch

diff --git a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
index 9b23177a4dc5..88ddbabf7fa9 100755
--- a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
+++ b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh
@@ -19,9 +19,21 @@  TST_CNT=3
 # $2 - memory.limit_in_bytes in sub group
 test_subgroup()
 {
+	local limit_parent=$1
+	local limit_subgroup=$2
+
+	# OOM might start killing if memory usage is 100%, so give it
+	# always one page size more:
+	if [ $limit_parent -ne 0 ]; then
+		limit_parent=$((limit_parent + PAGESIZE))
+	fi
+	if [ $limit_subgroup -ne 0 ]; then
+		limit_subgroup=$((limit_subgroup + PAGESIZE))
+	fi
+
 	mkdir subgroup
-	echo $1 > memory.limit_in_bytes
-	echo $2 > subgroup/memory.limit_in_bytes
+	echo $limit_parent > memory.limit_in_bytes
+	echo $limit_subgroup > subgroup/memory.limit_in_bytes
 
 	start_memcg_process --mmap-anon -s $PAGESIZES