Message ID | 20210702125338.43248-5-krzysztof.kozlowski@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | controllers/memcg: fixes for newer kernels | expand |
On 02/07/2021 14:53, Krzysztof Kozlowski wrote: > The memcg_subgroup_charge was failing on kernel v5.8 in around 10% cases > with: > > memcg_subgroup_charge 1 TINFO: Running memcg_process --mmap-anon -s 135168 > memcg_subgroup_charge 1 TINFO: Warming up pid: 19289 > memcg_subgroup_charge 1 TINFO: Process is still here after warm up: 19289 > memcg_subgroup_charge 1 TFAIL: rss is 0, 135168 expected > memcg_subgroup_charge 1 TPASS: rss is 0 as expected > > In dmesg one could see that OOM killer killed the process even though > group memory limit was matching the usage: > > memcg_process invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 > CPU: 4 PID: 19289 Comm: memcg_process Not tainted 5.8.0-1031-oracle #32~20.04.2-Ubuntu > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.4.1 12/03/2020 > ... > memory: usage 132kB, limit 132kB, failcnt 9 > memory+swap: usage 132kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 4kB, limit 9007199254740988kB, failcnt 0 > ... > Tasks state (memory values in pages): > [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > [ 19289] 0 19289 669 389 40960 0 0 memcg_process > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp_19257,task_memcg=/ltp_19257,task=memcg_process,pid=19289,uid=0 > Memory cgroup out of memory: Killed process 19289 (memcg_process) total-vm:2676kB, anon-rss:84kB, file-rss:1468kB, shmem-rss:4kB, UID:0 pgtables:40kB oom_score_adj:0 > oom_reaper: reaped process 19289 (memcg_process), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB > > It seems using 100% of memory assigned to given group might trigger OOM, > so add a space of at least one page. > > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > --- > .../memcg/functional/memcg_subgroup_charge.sh | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh > index 9b23177a4dc5..88ddbabf7fa9 100755 > --- a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh > +++ b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh > @@ -19,9 +19,21 @@ TST_CNT=3 > # $2 - memory.limit_in_bytes in sub group > test_subgroup() > { > + local limit_parent=$1 > + local limit_subgroup=$2 > + > + # OOM might start killing if memory usage is 100%, so give it > + # always one page size more: > + if [ $limit_parent -ne 0 ]; then > + limit_parent=$((limit_parent + PAGESIZE)) This patch is independent from the other usage_in_bytes checks. I included it here but maybe that's just noise... Anyway, for v5.11 I just saw that increasing the limit by one page is not enough. Probably due to kernel memory accounted to the group: [23868.177525] memory: usage 140kB, limit 140kB, failcnt 19 [23868.177527] memory+swap: usage 140kB, limit 9007199254740988kB, failcnt 0 [23868.177529] kmem: usage 16kB, limit 9007199254740988kB, failcnt 0 I am thinking of increasing this to 4 pages, although in the future it might not be enough. > + fi > + if [ $limit_subgroup -ne 0 ]; then > + limit_subgroup=$((limit_subgroup + PAGESIZE)) > + fi > + > mkdir subgroup > - echo $1 > memory.limit_in_bytes > - echo $2 > subgroup/memory.limit_in_bytes > + echo $limit_parent > memory.limit_in_bytes > + echo $limit_subgroup > subgroup/memory.limit_in_bytes > > start_memcg_process --mmap-anon -s $PAGESIZES > > Best regards, Krzysztof
diff --git a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh index 9b23177a4dc5..88ddbabf7fa9 100755 --- a/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh +++ b/testcases/kernel/controllers/memcg/functional/memcg_subgroup_charge.sh @@ -19,9 +19,21 @@ TST_CNT=3 # $2 - memory.limit_in_bytes in sub group test_subgroup() { + local limit_parent=$1 + local limit_subgroup=$2 + + # OOM might start killing if memory usage is 100%, so give it + # always one page size more: + if [ $limit_parent -ne 0 ]; then + limit_parent=$((limit_parent + PAGESIZE)) + fi + if [ $limit_subgroup -ne 0 ]; then + limit_subgroup=$((limit_subgroup + PAGESIZE)) + fi + mkdir subgroup - echo $1 > memory.limit_in_bytes - echo $2 > subgroup/memory.limit_in_bytes + echo $limit_parent > memory.limit_in_bytes + echo $limit_subgroup > subgroup/memory.limit_in_bytes start_memcg_process --mmap-anon -s $PAGESIZES
The memcg_subgroup_charge was failing on kernel v5.8 in around 10% cases with: memcg_subgroup_charge 1 TINFO: Running memcg_process --mmap-anon -s 135168 memcg_subgroup_charge 1 TINFO: Warming up pid: 19289 memcg_subgroup_charge 1 TINFO: Process is still here after warm up: 19289 memcg_subgroup_charge 1 TFAIL: rss is 0, 135168 expected memcg_subgroup_charge 1 TPASS: rss is 0 as expected In dmesg one could see that OOM killer killed the process even though group memory limit was matching the usage: memcg_process invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 CPU: 4 PID: 19289 Comm: memcg_process Not tainted 5.8.0-1031-oracle #32~20.04.2-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.4.1 12/03/2020 ... memory: usage 132kB, limit 132kB, failcnt 9 memory+swap: usage 132kB, limit 9007199254740988kB, failcnt 0 kmem: usage 4kB, limit 9007199254740988kB, failcnt 0 ... Tasks state (memory values in pages): [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 19289] 0 19289 669 389 40960 0 0 memcg_process oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp_19257,task_memcg=/ltp_19257,task=memcg_process,pid=19289,uid=0 Memory cgroup out of memory: Killed process 19289 (memcg_process) total-vm:2676kB, anon-rss:84kB, file-rss:1468kB, shmem-rss:4kB, UID:0 pgtables:40kB oom_score_adj:0 oom_reaper: reaped process 19289 (memcg_process), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB It seems using 100% of memory assigned to given group might trigger OOM, so add a space of at least one page. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> --- .../memcg/functional/memcg_subgroup_charge.sh | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)