Message ID | 20210118163240.29558-1-mdoucha@suse.cz |
---|---|
State | Accepted |
Headers | show |
Series | tst_pollute_memory(): Set minimal safety margin to 64MB | expand |
Hi! > SAFE_SYSINFO(&info); > - safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit; > + safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024); > + safety /= info.mem_unit; I guess that this is safe enough for the release, since it will only increase the safety margin. Naresh can you please test this patch ASAP?
On Tue, 19 Jan 2021 at 18:12, Cyril Hrubis <chrubis@suse.cz> wrote: > > Hi! > > SAFE_SYSINFO(&info); > > - safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit; > > + safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024); > > + safety /= info.mem_unit; > > I guess that this is safe enough for the release, since it will only > increase the safety margin. > > Naresh can you please test this patch ASAP? I have applied your patch and rebuilt completely and retested ioctl_sg01 test case in a loop on three different devices. 1 PASS out of 20 runs with overcommit_memory 0 on x86_64. 1 PASS out of 20 runs with overcommit_memory 1 on x86_64. Which means 19 times the test case triggered oom-killer and the test was broken. - Naresh
Hi! > > > - safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit; > > > + safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024); > > > + safety /= info.mem_unit; > > > > I guess that this is safe enough for the release, since it will only > > increase the safety margin. > > > > Naresh can you please test this patch ASAP? > > I have applied your patch and rebuilt completely and retested > ioctl_sg01 test case in a loop on three different devices. > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64. > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64. > > Which means 19 times the test case triggered oom-killer and the test was broken. So it looks like 64MB is not enough in your case. Does it work with 128MB or 256MB? i.e.: safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024);
On Wed, 20 Jan 2021 at 20:23, Cyril Hrubis <chrubis@suse.cz> wrote: > > Hi! > > > > - safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit; > > > > + safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024); > > > > + safety /= info.mem_unit; > > > > > > I guess that this is safe enough for the release, since it will only > > > increase the safety margin. > > > > > > Naresh can you please test this patch ASAP? > > > > I have applied your patch and rebuilt completely and retested > > ioctl_sg01 test case in a loop on three different devices. > > > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64. > > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64. > > > > Which means 19 times the test case triggered oom-killer and the test was broken. > > So it looks like 64MB is not enough in your case. > > Does it work with 128MB or 256MB? > > i.e.: > > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024); I will test with these changes and get back to you. Meanwhile, 20 PASS out of 20 runs with overcommit_memory 2 on x86_64. - Naresh
Hi! > > > I have applied your patch and rebuilt completely and retested > > > ioctl_sg01 test case in a loop on three different devices. > > > > > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64. > > > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64. > > > > > > Which means 19 times the test case triggered oom-killer and the test was broken. > > > > So it looks like 64MB is not enough in your case. > > > > Does it work with 128MB or 256MB? > > > > i.e.: > > > > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); > > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024); > > > I will test with these changes and get back to you. > > Meanwhile, > > 20 PASS out of 20 runs with overcommit_memory 2 on x86_64. That is to be expected, since with overcommit_memory=2 we fail the mmap() when the kernel does not think there is enough free memory. See the part where we do: if (map_blocks[i] == MAP_FAILED) { map_count = i; break; } It would be interesting to print the map_count and the value of i in this fucntion and see where it breaks with overcommit_memory=2. i.e. if (map_blocks[i] == MAP_FAILED) { tst_res(TINFO, "mmap() failed at %zu expected %zu", i, map_count); map_count = i; break; }
On Wed, 20 Jan 2021 at 20:35, Cyril Hrubis <chrubis@suse.cz> wrote: > > Hi! > > > > I have applied your patch and rebuilt completely and retested > > > > ioctl_sg01 test case in a loop on three different devices. > > > > > > > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64. > > > > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64. > > > > > > > > Which means 19 times the test case triggered oom-killer and the test was broken. > > > > > > So it looks like 64MB is not enough in your case. > > > > > > Does it work with 128MB or 256MB? > > > > > > i.e.: > > > > > > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64. - Naresh
On 20. 01. 21 18:04, Naresh Kamboju wrote: > On Wed, 20 Jan 2021 at 20:35, Cyril Hrubis <chrubis@suse.cz> wrote: >>>> >>>> So it looks like 64MB is not enough in your case. >>>> >>>> Does it work with 128MB or 256MB? >>>> >>>> i.e.: >>>> >>>> safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024); > > After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64. OK, let's go with 128MB then.
Hi!
> After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64.
Thanks a lot, fix pushed.
diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c index f134d90c9..c20c94a00 100644 --- a/lib/tst_memutils.c +++ b/lib/tst_memutils.c @@ -20,7 +20,8 @@ void tst_pollute_memory(size_t maxsize, int fillchar) struct sysinfo info; SAFE_SYSINFO(&info); - safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit; + safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024); + safety /= info.mem_unit; if (info.freeswap > safety) safety = 0;
4096 pages amounts to 16MB on x86_64 or 256MB on PPC64. 16MB is not enough to avoid OOM killer on some systems so set the safety margin to either 64MB or 4096 pages, whichever is higher. Signed-off-by: Martin Doucha <mdoucha@suse.cz> --- Here's my proposed fix for the OOM issues with ioctl_sg01. I can't reproduce the issue on my x86_64 VMs/machines so please confirm that it works. Don't forget to run `make clean` after applying this patch because ioctl_sg01 will not be rebuilt automatically due to weak dependency on libltp.a. lib/tst_memutils.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)