diff mbox series

tst_pollute_memory(): Set minimal safety margin to 64MB

Message ID 20210118163240.29558-1-mdoucha@suse.cz
State Accepted
Headers show
Series tst_pollute_memory(): Set minimal safety margin to 64MB | expand

Commit Message

Martin Doucha Jan. 18, 2021, 4:32 p.m. UTC
4096 pages amounts to 16MB on x86_64 or 256MB on PPC64. 16MB is not enough
to avoid OOM killer on some systems so set the safety margin to either 64MB
or 4096 pages, whichever is higher.

Signed-off-by: Martin Doucha <mdoucha@suse.cz>
---

Here's my proposed fix for the OOM issues with ioctl_sg01. I can't reproduce
the issue on my x86_64 VMs/machines so please confirm that it works.

Don't forget to run `make clean` after applying this patch because ioctl_sg01
will not be rebuilt automatically due to weak dependency on libltp.a.

 lib/tst_memutils.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Cyril Hrubis Jan. 19, 2021, 12:43 p.m. UTC | #1
Hi!
>  	SAFE_SYSINFO(&info);
> -	safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit;
> +	safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024);
> +	safety /= info.mem_unit;

I guess that this is safe enough for the release, since it will only
increase the safety margin.

Naresh can you please test this patch ASAP?
Naresh Kamboju Jan. 20, 2021, 2:47 p.m. UTC | #2
On Tue, 19 Jan 2021 at 18:12, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> >       SAFE_SYSINFO(&info);
> > -     safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit;
> > +     safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024);
> > +     safety /= info.mem_unit;
>
> I guess that this is safe enough for the release, since it will only
> increase the safety margin.
>
> Naresh can you please test this patch ASAP?

I have applied your patch and rebuilt completely and retested
ioctl_sg01 test case in a loop on three different devices.

1 PASS out of 20 runs with overcommit_memory 0 on x86_64.
1 PASS out of 20 runs with overcommit_memory 1 on x86_64.

Which means 19 times the test case triggered oom-killer and the test was broken.

- Naresh
Cyril Hrubis Jan. 20, 2021, 2:54 p.m. UTC | #3
Hi!
> > > -     safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit;
> > > +     safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024);
> > > +     safety /= info.mem_unit;
> >
> > I guess that this is safe enough for the release, since it will only
> > increase the safety margin.
> >
> > Naresh can you please test this patch ASAP?
> 
> I have applied your patch and rebuilt completely and retested
> ioctl_sg01 test case in a loop on three different devices.
> 
> 1 PASS out of 20 runs with overcommit_memory 0 on x86_64.
> 1 PASS out of 20 runs with overcommit_memory 1 on x86_64.
> 
> Which means 19 times the test case triggered oom-killer and the test was broken.

So it looks like 64MB is not enough in your case.

Does it work with 128MB or 256MB?

i.e.:

safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024);
safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024);
Naresh Kamboju Jan. 20, 2021, 2:58 p.m. UTC | #4
On Wed, 20 Jan 2021 at 20:23, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > > > -     safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit;
> > > > +     safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024);
> > > > +     safety /= info.mem_unit;
> > >
> > > I guess that this is safe enough for the release, since it will only
> > > increase the safety margin.
> > >
> > > Naresh can you please test this patch ASAP?
> >
> > I have applied your patch and rebuilt completely and retested
> > ioctl_sg01 test case in a loop on three different devices.
> >
> > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64.
> > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64.
> >
> > Which means 19 times the test case triggered oom-killer and the test was broken.
>
> So it looks like 64MB is not enough in your case.
>
> Does it work with 128MB or 256MB?
>
> i.e.:
>
> safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024);
> safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024);


I will test with these changes and get back to you.

Meanwhile,

20 PASS out of 20 runs with overcommit_memory 2 on x86_64.

- Naresh
Cyril Hrubis Jan. 20, 2021, 3:06 p.m. UTC | #5
Hi!
> > > I have applied your patch and rebuilt completely and retested
> > > ioctl_sg01 test case in a loop on three different devices.
> > >
> > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64.
> > > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64.
> > >
> > > Which means 19 times the test case triggered oom-killer and the test was broken.
> >
> > So it looks like 64MB is not enough in your case.
> >
> > Does it work with 128MB or 256MB?
> >
> > i.e.:
> >
> > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024);
> > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 256 * 1024 * 1024);
> 
> 
> I will test with these changes and get back to you.
> 
> Meanwhile,
> 
> 20 PASS out of 20 runs with overcommit_memory 2 on x86_64.

That is to be expected, since with overcommit_memory=2 we fail the
mmap() when the kernel does  not think there is enough free memory. See
the part where we do:

	if (map_blocks[i] == MAP_FAILED) {
		map_count = i;
		break;
	}

It would be interesting to print the map_count and the value of i in
this fucntion and see where it breaks with overcommit_memory=2.

i.e.

	if (map_blocks[i] == MAP_FAILED) {
		tst_res(TINFO, "mmap() failed at %zu expected %zu",
		        i, map_count);
		map_count = i;
		break;
	}
Naresh Kamboju Jan. 20, 2021, 5:04 p.m. UTC | #6
On Wed, 20 Jan 2021 at 20:35, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > > > I have applied your patch and rebuilt completely and retested
> > > > ioctl_sg01 test case in a loop on three different devices.
> > > >
> > > > 1 PASS out of 20 runs with overcommit_memory 0 on x86_64.
> > > > 1 PASS out of 20 runs with overcommit_memory 1 on x86_64.
> > > >
> > > > Which means 19 times the test case triggered oom-killer and the test was broken.
> > >
> > > So it looks like 64MB is not enough in your case.
> > >
> > > Does it work with 128MB or 256MB?
> > >
> > > i.e.:
> > >
> > > safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024);

After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64.

- Naresh
Martin Doucha Jan. 20, 2021, 5:07 p.m. UTC | #7
On 20. 01. 21 18:04, Naresh Kamboju wrote:
> On Wed, 20 Jan 2021 at 20:35, Cyril Hrubis <chrubis@suse.cz> wrote:
>>>>
>>>> So it looks like 64MB is not enough in your case.
>>>>
>>>> Does it work with 128MB or 256MB?
>>>>
>>>> i.e.:
>>>>
>>>> safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 128 * 1024 * 1024);
> 
> After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64.

OK, let's go with 128MB then.
Cyril Hrubis Jan. 21, 2021, 9:45 a.m. UTC | #8
Hi!
> After changing to 128MB the test got PASS with overcommit_memory 0 on x86_64.

Thanks a lot, fix pushed.
diff mbox series

Patch

diff --git a/lib/tst_memutils.c b/lib/tst_memutils.c
index f134d90c9..c20c94a00 100644
--- a/lib/tst_memutils.c
+++ b/lib/tst_memutils.c
@@ -20,7 +20,8 @@  void tst_pollute_memory(size_t maxsize, int fillchar)
 	struct sysinfo info;
 
 	SAFE_SYSINFO(&info);
-	safety = 4096 * SAFE_SYSCONF(_SC_PAGESIZE) / info.mem_unit;
+	safety = MAX(4096 * SAFE_SYSCONF(_SC_PAGESIZE), 64 * 1024 * 1024);
+	safety /= info.mem_unit;
 
 	if (info.freeswap > safety)
 		safety = 0;