Message ID | 20211020091353.90731-1-krzysztof.kozlowski@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | [v3] lib: memutils: respect minimum memory watermark when polluting memory | expand |
On Wed, Oct 20, 2021 at 5:14 PM Krzysztof Kozlowski < krzysztof.kozlowski@canonical.com> wrote: > Previous fix for an out-of-memory killer killing ioctl_sg01 process > in commit 4d2e3d44fad5 ("lib: memutils: don't pollute > entire system memory to avoid OoM") was not fully effective. While it > covers most of the cases, an ARM64 machine with 128 GB of memory, 64 kB > page size and v5.11 kernel hit it again. Polluting the memory fails > with OoM: > > LTP: starting ioctl_sg01 > ioctl_sg01 invoked oom-killer: > gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 > ... > Mem-Info: > active_anon:309 inactive_anon:1964781 isolated_anon:0 > active_file:94 inactive_file:0 isolated_file:0 > unevictable:305 dirty:0 writeback:0 > slab_reclaimable:1510 slab_unreclaimable:5012 > mapped:115 shmem:339 pagetables:463 bounce:0 > free:112043 free_pcp:1 free_cma:3159 > Node 0 active_anon:19776kB inactive_anon:125745984kB active_file:6016kB > inactive_file:0kB unevictable:19520kB ... > Node 0 DMA free:710656kB min:205120kB low:256384kB high:307648kB > reserved_highatomic:0KB active_anon:0kB inactive_anon:3332032kB ... > lowmem_reserve[]: 0 0 7908 7908 7908 > Node 0 Normal free:6460096kB min:6463168kB low:8078912kB high:9694656kB > reserved_highatomic:0KB active_anon:19776kB inactive_anon:122413952kB ... > lowmem_reserve[]: 0 0 0 0 0 > > The important part are details of memory on Node 0 in Normal zone: > 1. free memory: 6460096 kB > 2. min (minimum watermark): 6463168 kB > > Parse the /proc/sys/vm/min_free_kbytes which contains the free > memory level used as minimum watermark (triggering OoM killer). > > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > > --- > > Changes since v2: > 1. Use /proc/sys/vm/min_free_kbytes instead of parsing zoneinfo, thanks > tgo Liu Xinpeng. > > Changes since v1: > 1. Add static and rename to count_min_pages(). > --- > lib/tst_memutils.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c > index af132bcc6c24..df53c542d239 100644 > --- a/lib/tst_memutils.c > +++ b/lib/tst_memutils.c > @@ -16,12 +16,18 @@ > void tst_pollute_memory(size_t maxsize, int fillchar) > { > size_t i, map_count = 0, safety = 0, blocksize = BLOCKSIZE; > + unsigned long min_free; > void **map_blocks; > struct sysinfo info; > > + SAFE_FILE_SCANF("/proc/sys/vm/min_free_kbytes", "%lu", &min_free); > + min_free *= 1024; > + /* Apply a margin because we cannot get below "min" watermark */ > + min_free += min_free / 10; > + > SAFE_SYSINFO(&info); > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); > - safety = MAX(safety, (info.freeram / 64)); > + safety = MAX(safety, min_free); > Therically this is correct, and I believe it will work on your tested machine. But my concern is ioctl_sg01 still fails on the special system which MemAvai < MemFree. Just like the one Xinpeng mentioned before: https://lists.linux.it/pipermail/ltp/2021-January/020817.html [root@test-env-nm05-compute-14e5e72e38 ~]# cat /proc/meminfo MemTotal: 526997420 kB MemFree: 520224908 kB MemAvailable: 519936744 kB... [root@test-env-nm05-compute-14e5e72e38 ~]# cat /proc/sys/vm/min_free_kbytes 90112 There even reserve the safety to the 128MB, still less than the gap between MemFree and MemAvailable. MemFree (520224908 kB) - MemAvailable (520224908 kB) = 288164 kB > safety
On 21/10/2021 09:21, Li Wang wrote: > > > On Wed, Oct 20, 2021 at 5:14 PM Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com > <mailto:krzysztof.kozlowski@canonical.com>> wrote: > > Previous fix for an out-of-memory killer killing ioctl_sg01 process > in commit 4d2e3d44fad5 ("lib: memutils: don't pollute > entire system memory to avoid OoM") was not fully effective. While it > covers most of the cases, an ARM64 machine with 128 GB of memory, 64 kB > page size and v5.11 kernel hit it again. Polluting the memory fails > with OoM: > > LTP: starting ioctl_sg01 > ioctl_sg01 invoked oom-killer: > gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, > oom_score_adj=0 > ... > Mem-Info: > active_anon:309 inactive_anon:1964781 isolated_anon:0 > active_file:94 inactive_file:0 isolated_file:0 > unevictable:305 dirty:0 writeback:0 > slab_reclaimable:1510 slab_unreclaimable:5012 > mapped:115 shmem:339 pagetables:463 bounce:0 > free:112043 free_pcp:1 free_cma:3159 > Node 0 active_anon:19776kB inactive_anon:125745984kB > active_file:6016kB inactive_file:0kB unevictable:19520kB ... > Node 0 DMA free:710656kB min:205120kB low:256384kB high:307648kB > reserved_highatomic:0KB active_anon:0kB inactive_anon:3332032kB ... > lowmem_reserve[]: 0 0 7908 7908 7908 > Node 0 Normal free:6460096kB min:6463168kB low:8078912kB > high:9694656kB reserved_highatomic:0KB active_anon:19776kB > inactive_anon:122413952kB ... > lowmem_reserve[]: 0 0 0 0 0 > > The important part are details of memory on Node 0 in Normal zone: > 1. free memory: 6460096 kB > 2. min (minimum watermark): 6463168 kB > > Parse the /proc/sys/vm/min_free_kbytes which contains the free > memory level used as minimum watermark (triggering OoM killer). > > Signed-off-by: Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com > <mailto:krzysztof.kozlowski@canonical.com>> > > --- > > Changes since v2: > 1. Use /proc/sys/vm/min_free_kbytes instead of parsing zoneinfo, thanks > tgo Liu Xinpeng. > > Changes since v1: > 1. Add static and rename to count_min_pages(). > --- > lib/tst_memutils.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c > index af132bcc6c24..df53c542d239 100644 > --- a/lib/tst_memutils.c > +++ b/lib/tst_memutils.c > @@ -16,12 +16,18 @@ > void tst_pollute_memory(size_t maxsize, int fillchar) > { > size_t i, map_count = 0, safety = 0, blocksize = BLOCKSIZE; > + unsigned long min_free; > void **map_blocks; > struct sysinfo info; > > + SAFE_FILE_SCANF("/proc/sys/vm/min_free_kbytes", "%lu", > &min_free); > + min_free *= 1024; > + /* Apply a margin because we cannot get below "min" watermark */ > + min_free += min_free / 10; > + > SAFE_SYSINFO(&info); > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * > 1024); > - safety = MAX(safety, (info.freeram / 64)); > + safety = MAX(safety, min_free); > > > Therically this is correct, and I believe it will work on your tested > machine. > > But my concern is ioctl_sg01 still fails on the special system which > MemAvai < MemFree. > > Just like the one Xinpeng mentioned before: > https://lists.linux.it/pipermail/ltp/2021-January/020817.html > <https://lists.linux.it/pipermail/ltp/2021-January/020817.html> > > [root@test-env-nm05-compute-14e5e72e38 > <mailto:root@test-env-nm05-compute-14e5e72e38>~]# cat /proc/meminfo > > MemTotal: 526997420 kB > MemFree: 520224908 kB > MemAvailable: 519936744 kB > ... > > [root@test-env-nm05-compute-14e5e72e38 <mailto:root@test-env-nm05-compute-14e5e72e38> ~]# cat /proc/sys/vm/min_free_kbytes > 90112 > > > There even reserve the safety to the 128MB, still less than the gap > between MemFree and MemAvailable. > > MemFree (520224908 kB) - MemAvailable (520224908 kB) = 288164 kB > safety I don't have such case and I am not sure it is reasonable. As mentioned in the thread there it looks unusual to have less available memory than free. Maybe the system has some weird memory accounting because MemAvailable is counted from MemFree by adding memory which can be reclaimed. When adding a non-negative number, you should not end up with lower MemAvailable than MemFree. :) Maybe that's the reason why that patch was not accepted - the system is not vanilla, not common? Best regards, Krzysztof
On 21/10/2021 09:36, Krzysztof Kozlowski wrote: > On 21/10/2021 09:21, Li Wang wrote: >> >> >> On Wed, Oct 20, 2021 at 5:14 PM Krzysztof Kozlowski >> <krzysztof.kozlowski@canonical.com >> <mailto:krzysztof.kozlowski@canonical.com>> wrote: >> >> Previous fix for an out-of-memory killer killing ioctl_sg01 process >> in commit 4d2e3d44fad5 ("lib: memutils: don't pollute >> entire system memory to avoid OoM") was not fully effective. While it >> covers most of the cases, an ARM64 machine with 128 GB of memory, 64 kB >> page size and v5.11 kernel hit it again. Polluting the memory fails >> with OoM: >> >> LTP: starting ioctl_sg01 >> ioctl_sg01 invoked oom-killer: >> gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, >> oom_score_adj=0 >> ... >> Mem-Info: >> active_anon:309 inactive_anon:1964781 isolated_anon:0 >> active_file:94 inactive_file:0 isolated_file:0 >> unevictable:305 dirty:0 writeback:0 >> slab_reclaimable:1510 slab_unreclaimable:5012 >> mapped:115 shmem:339 pagetables:463 bounce:0 >> free:112043 free_pcp:1 free_cma:3159 >> Node 0 active_anon:19776kB inactive_anon:125745984kB >> active_file:6016kB inactive_file:0kB unevictable:19520kB ... >> Node 0 DMA free:710656kB min:205120kB low:256384kB high:307648kB >> reserved_highatomic:0KB active_anon:0kB inactive_anon:3332032kB ... >> lowmem_reserve[]: 0 0 7908 7908 7908 >> Node 0 Normal free:6460096kB min:6463168kB low:8078912kB >> high:9694656kB reserved_highatomic:0KB active_anon:19776kB >> inactive_anon:122413952kB ... >> lowmem_reserve[]: 0 0 0 0 0 >> >> The important part are details of memory on Node 0 in Normal zone: >> 1. free memory: 6460096 kB >> 2. min (minimum watermark): 6463168 kB >> >> Parse the /proc/sys/vm/min_free_kbytes which contains the free >> memory level used as minimum watermark (triggering OoM killer). >> >> Signed-off-by: Krzysztof Kozlowski >> <krzysztof.kozlowski@canonical.com >> <mailto:krzysztof.kozlowski@canonical.com>> >> >> --- >> >> Changes since v2: >> 1. Use /proc/sys/vm/min_free_kbytes instead of parsing zoneinfo, thanks >> tgo Liu Xinpeng. >> >> Changes since v1: >> 1. Add static and rename to count_min_pages(). >> --- >> lib/tst_memutils.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c >> index af132bcc6c24..df53c542d239 100644 >> --- a/lib/tst_memutils.c >> +++ b/lib/tst_memutils.c >> @@ -16,12 +16,18 @@ >> void tst_pollute_memory(size_t maxsize, int fillchar) >> { >> size_t i, map_count = 0, safety = 0, blocksize = BLOCKSIZE; >> + unsigned long min_free; >> void **map_blocks; >> struct sysinfo info; >> >> + SAFE_FILE_SCANF("/proc/sys/vm/min_free_kbytes", "%lu", >> &min_free); >> + min_free *= 1024; >> + /* Apply a margin because we cannot get below "min" watermark */ >> + min_free += min_free / 10; >> + >> SAFE_SYSINFO(&info); >> safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * >> 1024); >> - safety = MAX(safety, (info.freeram / 64)); >> + safety = MAX(safety, min_free); >> >> >> Therically this is correct, and I believe it will work on your tested >> machine. >> >> But my concern is ioctl_sg01 still fails on the special system which >> MemAvai < MemFree. >> >> Just like the one Xinpeng mentioned before: >> https://lists.linux.it/pipermail/ltp/2021-January/020817.html >> <https://lists.linux.it/pipermail/ltp/2021-January/020817.html> >> >> [root@test-env-nm05-compute-14e5e72e38 >> <mailto:root@test-env-nm05-compute-14e5e72e38>~]# cat /proc/meminfo >> >> MemTotal: 526997420 kB >> MemFree: 520224908 kB >> MemAvailable: 519936744 kB >> ... >> >> [root@test-env-nm05-compute-14e5e72e38 <mailto:root@test-env-nm05-compute-14e5e72e38> ~]# cat /proc/sys/vm/min_free_kbytes >> 90112 >> >> >> There even reserve the safety to the 128MB, still less than the gap >> between MemFree and MemAvailable. >> >> MemFree (520224908 kB) - MemAvailable (520224908 kB) = 288164 kB > safety > > I don't have such case and I am not sure it is reasonable. > > As mentioned in the thread there it looks unusual to have less available > memory than free. Maybe the system has some weird memory accounting > because MemAvailable is counted from MemFree by adding memory which can > be reclaimed. When adding a non-negative number, you should not end up > with lower MemAvailable than MemFree. :) > > Maybe that's the reason why that patch was not accepted - the system is > not vanilla, not common? OK, I found a possible explanation (on vanilla kernel) - the totalreserve_pages. This is the only subtraction from free memory when counting available. This could happen if someone was setting sysctl lowmem_reserve_ratio or min_free_kbytes. When setting min_free_kbytes, this will be reflected in /proc/sys/vm/min_free_kbytes, so we are good. When setting vm.lowmem_reserve_ratio, this will be missed by my patch and MemAvailable could be lower than MemFree. I'll send a v4. Best regards, Krzysztof
> >> Therically this is correct, and I believe it will work on your tested > >> machine. > >> > >> But my concern is ioctl_sg01 still fails on the special system which > >> MemAvai < MemFree. > >> > >> Just like the one Xinpeng mentioned before: > >> https://lists.linux.it/pipermail/ltp/2021-January/020817.html > >> <https://lists.linux.it/pipermail/ltp/2021-January/020817.html> > >> > >> [root@test-env-nm05-compute-14e5e72e38 > >> <mailto:root@test-env-nm05-compute-14e5e72e38>~]# cat /proc/meminfo > >> > >> MemTotal: 526997420 kB > >> MemFree: 520224908 kB > >> MemAvailable: 519936744 kB > >> ... > >> > >> [root@test-env-nm05-compute-14e5e72e38 <mailto: > root@test-env-nm05-compute-14e5e72e38> ~]# cat > /proc/sys/vm/min_free_kbytes > >> 90112 > >> > >> > >> There even reserve the safety to the 128MB, still less than the gap > >> between MemFree and MemAvailable. > >> > >> MemFree (520224908 kB) - MemAvailable (520224908 kB) = 288164 kB > > safety > > > > I don't have such case and I am not sure it is reasonable. > > > > As mentioned in the thread there it looks unusual to have less available > > memory than free. Maybe the system has some weird memory accounting > > because MemAvailable is counted from MemFree by adding memory which can > > be reclaimed. When adding a non-negative number, you should not end up > > with lower MemAvailable than MemFree. :) > > > > Maybe that's the reason why that patch was not accepted - the system is > > not vanilla, not common? > > OK, I found a possible explanation (on vanilla kernel) - the > totalreserve_pages. This is the only subtraction from free memory when > counting available. This could happen if someone was setting sysctl > lowmem_reserve_ratio or min_free_kbytes. > That's exactly, beside the two controllers, you could also get such a system with enabling smaller swap space on aarch64/x86_64. (I did that and found that 'MemFree > MemAvail' is common to see) > When setting min_free_kbytes, this will be reflected in > /proc/sys/vm/min_free_kbytes, so we are good. > > When setting vm.lowmem_reserve_ratio, this will be missed by my patch > and MemAvailable could be lower than MemFree. > Yes, we have to face different kinds of system configuration in testing. Maybe we'd better keep the (info.freeram/64) in MAX() as well, Or, do a comparison between MemFree and MemAvail to choose the smaller one as baseline.
diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c index af132bcc6c24..df53c542d239 100644 --- a/lib/tst_memutils.c +++ b/lib/tst_memutils.c @@ -16,12 +16,18 @@ void tst_pollute_memory(size_t maxsize, int fillchar) { size_t i, map_count = 0, safety = 0, blocksize = BLOCKSIZE; + unsigned long min_free; void **map_blocks; struct sysinfo info; + SAFE_FILE_SCANF("/proc/sys/vm/min_free_kbytes", "%lu", &min_free); + min_free *= 1024; + /* Apply a margin because we cannot get below "min" watermark */ + min_free += min_free / 10; + SAFE_SYSINFO(&info); safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); - safety = MAX(safety, (info.freeram / 64)); + safety = MAX(safety, min_free); safety /= info.mem_unit; if (info.freeswap > safety)
Previous fix for an out-of-memory killer killing ioctl_sg01 process in commit 4d2e3d44fad5 ("lib: memutils: don't pollute entire system memory to avoid OoM") was not fully effective. While it covers most of the cases, an ARM64 machine with 128 GB of memory, 64 kB page size and v5.11 kernel hit it again. Polluting the memory fails with OoM: LTP: starting ioctl_sg01 ioctl_sg01 invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 ... Mem-Info: active_anon:309 inactive_anon:1964781 isolated_anon:0 active_file:94 inactive_file:0 isolated_file:0 unevictable:305 dirty:0 writeback:0 slab_reclaimable:1510 slab_unreclaimable:5012 mapped:115 shmem:339 pagetables:463 bounce:0 free:112043 free_pcp:1 free_cma:3159 Node 0 active_anon:19776kB inactive_anon:125745984kB active_file:6016kB inactive_file:0kB unevictable:19520kB ... Node 0 DMA free:710656kB min:205120kB low:256384kB high:307648kB reserved_highatomic:0KB active_anon:0kB inactive_anon:3332032kB ... lowmem_reserve[]: 0 0 7908 7908 7908 Node 0 Normal free:6460096kB min:6463168kB low:8078912kB high:9694656kB reserved_highatomic:0KB active_anon:19776kB inactive_anon:122413952kB ... lowmem_reserve[]: 0 0 0 0 0 The important part are details of memory on Node 0 in Normal zone: 1. free memory: 6460096 kB 2. min (minimum watermark): 6463168 kB Parse the /proc/sys/vm/min_free_kbytes which contains the free memory level used as minimum watermark (triggering OoM killer). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> --- Changes since v2: 1. Use /proc/sys/vm/min_free_kbytes instead of parsing zoneinfo, thanks tgo Liu Xinpeng. Changes since v1: 1. Add static and rename to count_min_pages(). --- lib/tst_memutils.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)