diff mbox series

[RFC] madvise06: shrink to 1 MADV_WILLNEED page to stabilize the test

Message ID 20220615090648.405100-1-liwang@redhat.com
State RFC
Headers show
Series [RFC] madvise06: shrink to 1 MADV_WILLNEED page to stabilize the test | expand

Commit Message

Li Wang June 15, 2022, 9:06 a.m. UTC
Paul Bunyan reports that the madvise06 test fails intermittently with many
LTS kernels, after checking with mm developer we prefer to think this is
more like a test issue (but not kernel bug):

   madvise06.c:231: TFAIL: 4 pages were faulted out of 2 max

So this improvement is target to reduce the false positive happens from
three points:

  1. Adding the while-loop to give more chances for madvise_willneed()
     reads memory asynchronously
  2. Raise value of `loop` to let test waiting for more times if swapchache
     haven't reached the expected
  3. Shrink to only 1 page for MADV_WILLNEED verifying to make the system
     easily takes effect on it

From Rafael Aquini:

  The problem here is that MADV_WILLNEED is an asynchronous non-blocking
  hint, which will tell the kernel to start doing read-ahead work for the
  hinted memory chunk, but will not wait up for the read-ahead to finish.
  So, it is possible that when the dirty_pages() call start re-dirtying
  the pages in that target area, is racing against a scheduled swap-in
  read-ahead that hasn't yet finished. Expecting faulting only 2 pages
  out of 102400 also seems too strict for a PASS threshold.

Note:
  As Rafael suggested, another possible approach to tackle this failure
  is to tally up, and loosen the threshold to more than 2 major faults
  after a call to madvise() with MADV_WILLNEED.
  But from my test, seems the faulted-out page shows a significant
  variance in different platforms, so I didn't take this way.

Btw, this patch get passed on my two easy reproducible systems more than 1000 times

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Paul Bunyan <pbunyan@redhat.com>
Cc: Richard Palethorpe <rpalethorpe@suse.com>
---
 testcases/kernel/syscalls/madvise/madvise06.c | 21 +++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Richard Palethorpe June 16, 2022, 7:21 a.m. UTC | #1
Hello Li,

Li Wang <liwang@redhat.com> writes:

> Paul Bunyan reports that the madvise06 test fails intermittently with many
> LTS kernels, after checking with mm developer we prefer to think this is
> more like a test issue (but not kernel bug):
>
>    madvise06.c:231: TFAIL: 4 pages were faulted out of 2 max
>
> So this improvement is target to reduce the false positive happens from
> three points:
>
>   1. Adding the while-loop to give more chances for madvise_willneed()
>      reads memory asynchronously
>   2. Raise value of `loop` to let test waiting for more times if swapchache
>      haven't reached the expected
>   3. Shrink to only 1 page for MADV_WILLNEED verifying to make the system
>      easily takes effect on it
>
> From Rafael Aquini:
>
>   The problem here is that MADV_WILLNEED is an asynchronous non-blocking
>   hint, which will tell the kernel to start doing read-ahead work for the
>   hinted memory chunk, but will not wait up for the read-ahead to finish.
>   So, it is possible that when the dirty_pages() call start re-dirtying
>   the pages in that target area, is racing against a scheduled swap-in
>   read-ahead that hasn't yet finished. Expecting faulting only 2 pages
>   out of 102400 also seems too strict for a PASS threshold.
>
> Note:
>   As Rafael suggested, another possible approach to tackle this failure
>   is to tally up, and loosen the threshold to more than 2 major faults
>   after a call to madvise() with MADV_WILLNEED.
>   But from my test, seems the faulted-out page shows a significant
>   variance in different platforms, so I didn't take this way.
>
> Btw, this patch get passed on my two easy reproducible systems more than 1000 times
>
> Signed-off-by: Li Wang <liwang@redhat.com>
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Paul Bunyan <pbunyan@redhat.com>
> Cc: Richard Palethorpe <rpalethorpe@suse.com>
> ---
>  testcases/kernel/syscalls/madvise/madvise06.c | 21 +++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c
> index 6d218801c..bfca894f4 100644
> --- a/testcases/kernel/syscalls/madvise/madvise06.c
> +++ b/testcases/kernel/syscalls/madvise/madvise06.c
> @@ -164,7 +164,7 @@ static int get_page_fault_num(void)
>  
>  static void test_advice_willneed(void)
>  {
> -	int loops = 50, res;
> +	int loops = 100, res;
>  	char *target;
>  	long swapcached_start, swapcached;
>  	int page_fault_num_1, page_fault_num_2;
> @@ -202,23 +202,32 @@ static void test_advice_willneed(void)
>  		"%s than %ld Kb were moved to the swap cache",
>  		res ? "more" : "less", PASS_THRESHOLD_KB);
>  
> -
> -	TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED));
> +	loops = 100;
> +	SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start);
> +	TEST(madvise(target, pg_sz, MADV_WILLNEED));
>  	if (TST_RET == -1)
>  		tst_brk(TBROK | TTERRNO, "madvise failed");
> +	do {
> +		loops--;
> +		usleep(100000);
> +		if (stat_refresh_sup)
> +			SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1");
> +		SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
> +				&swapcached);
> +	} while (swapcached < swapcached_start + pg_sz/1024 && loops > 0);
>  
>  	page_fault_num_1 = get_page_fault_num();
>  	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
>  			page_fault_num_1);
> -	dirty_pages(target, PASS_THRESHOLD);
> +	dirty_pages(target, pg_sz);

Adding the loop makes sense to me. However I don't understand why you
have also switched from PASS_THRESHOLD to only a single page?

I guess calling MADV_WILLNEED on a single page is the least realistic
scenario.

If there is an issue with PASS_THRESHOLD perhaps we could scale it based
on page size?

>  	page_fault_num_2 = get_page_fault_num();
>  	tst_res(TINFO, "PageFault(madvice / mem access): %d",
>  			page_fault_num_2);
>  	meminfo_diag("After page access");
>  
>  	res = page_fault_num_2 - page_fault_num_1;
> -	tst_res(res < 3 ? TPASS : TFAIL,
> -		"%d pages were faulted out of 2 max", res);
> +	tst_res(res == 0 ? TPASS : TFAIL,
> +		"%d pages were faulted out of 1 max", res);
>  
>  	SAFE_MUNMAP(target, CHUNK_SZ);
>  }
Li Wang June 17, 2022, 9:42 a.m. UTC | #2
Hi Richard,

Richard Palethorpe <rpalethorpe@suse.de> wrote:


> > --- a/testcases/kernel/syscalls/madvise/madvise06.c
> > +++ b/testcases/kernel/syscalls/madvise/madvise06.c
> > @@ -164,7 +164,7 @@ static int get_page_fault_num(void)
> >
> >  static void test_advice_willneed(void)
> >  {
> > -     int loops = 50, res;
> > +     int loops = 100, res;
> >       char *target;
> >       long swapcached_start, swapcached;
> >       int page_fault_num_1, page_fault_num_2;
> > @@ -202,23 +202,32 @@ static void test_advice_willneed(void)
> >               "%s than %ld Kb were moved to the swap cache",
> >               res ? "more" : "less", PASS_THRESHOLD_KB);
> >
> > -
> > -     TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED));
> > +     loops = 100;
> > +     SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
> &swapcached_start);
> > +     TEST(madvise(target, pg_sz, MADV_WILLNEED));
> >       if (TST_RET == -1)
> >               tst_brk(TBROK | TTERRNO, "madvise failed");
> > +     do {
> > +             loops--;
> > +             usleep(100000);
> > +             if (stat_refresh_sup)
> > +                     SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1");
> > +             SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
> > +                             &swapcached);
> > +     } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0);
> >
> >       page_fault_num_1 = get_page_fault_num();
> >       tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> >                       page_fault_num_1);
> > -     dirty_pages(target, PASS_THRESHOLD);
> > +     dirty_pages(target, pg_sz);
>
> Adding the loop makes sense to me. However I don't understand why you
> have also switched from PASS_THRESHOLD to only a single page?
>

In the test, we use two checks combined to confirm the bug reproduces:

  1. swap cached increasing less than PASS_THRESHOLD_KB
  2. page_fault number large than expected

The 2. case is more easily get failed on kind of platforms and hard
to count an average value for tolerating. So maybe we just reduce
the page to one that would not affect the final result. Because we
rely on both checks happening simultaneously then assume a bug.



>
> I guess calling MADV_WILLNEED on a single page is the least realistic
> scenario.
>

Okay, perhaps it's a step backward:).

I was just thinking it is a regression test and if 1 page works to reproduce
that (but more chunks of memory easily cause false positive), why not.



>
> If there is an issue with PASS_THRESHOLD perhaps we could scale it based
> on page size?
>

This sounds acceptable too.

How many pages do you think are proper, 100 or more?
and, loosen the faulted-out numbers to 1/10 pages?
Richard Palethorpe June 20, 2022, 7:44 a.m. UTC | #3
Hello Li,

Li Wang <liwang@redhat.com> writes:

> Hi Richard,
>
> Richard Palethorpe <rpalethorpe@suse.de> wrote:
>  
>  > --- a/testcases/kernel/syscalls/madvise/madvise06.c
>  > +++ b/testcases/kernel/syscalls/madvise/madvise06.c
>  > @@ -164,7 +164,7 @@ static int get_page_fault_num(void)
>  >  
>  >  static void test_advice_willneed(void)
>  >  {
>  > -     int loops = 50, res;
>  > +     int loops = 100, res;
>  >       char *target;
>  >       long swapcached_start, swapcached;
>  >       int page_fault_num_1, page_fault_num_2;
>  > @@ -202,23 +202,32 @@ static void test_advice_willneed(void)
>  >               "%s than %ld Kb were moved to the swap cache",
>  >               res ? "more" : "less", PASS_THRESHOLD_KB);
>  >  
>  > -
>  > -     TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED));
>  > +     loops = 100;
>  > +     SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start);
>  > +     TEST(madvise(target, pg_sz, MADV_WILLNEED));
>  >       if (TST_RET == -1)
>  >               tst_brk(TBROK | TTERRNO, "madvise failed");
>  > +     do {
>  > +             loops--;
>  > +             usleep(100000);
>  > +             if (stat_refresh_sup)
>  > +                     SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1");
>  > +             SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
>  > +                             &swapcached);
>  > +     } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0);
>  >  
>  >       page_fault_num_1 = get_page_fault_num();
>  >       tst_res(TINFO, "PageFault(madvice / no mem access): %d",
>  >                       page_fault_num_1);
>  > -     dirty_pages(target, PASS_THRESHOLD);
>  > +     dirty_pages(target, pg_sz);
>
>  Adding the loop makes sense to me. However I don't understand why you
>  have also switched from PASS_THRESHOLD to only a single page?
>
> In the test, we use two checks combined to confirm the bug reproduces:
>
>   1. swap cached increasing less than PASS_THRESHOLD_KB
>   2. page_fault number large than expected
>
> The 2. case is more easily get failed on kind of platforms and hard
> to count an average value for tolerating. So maybe we just reduce
> the page to one that would not affect the final result. Because we
> rely on both checks happening simultaneously then assume a bug.
>
>  
>  
>  I guess calling MADV_WILLNEED on a single page is the least realistic
>  scenario.
>
> Okay, perhaps it's a step backward:).
>
> I was just thinking it is a regression test and if 1 page works to reproduce
> that (but more chunks of memory easily cause false positive), why not.

That makes sense, but this test has also found other bugs. I'm not sure
if they are reproducible with only one page.

>
>  
>  
>  If there is an issue with PASS_THRESHOLD perhaps we could scale it based
>  on page size?
>
> This sounds acceptable too.
>
> How many pages do you think are proper, 100 or more?
> and, loosen the faulted-out numbers to 1/10 pages?

I suppose that 100 pages would be too much memory on some systems. I
guess at least 2 or 3 pages are needed so there is some
traversal. Beyond that I don't know what would make a difference.

If there are only max 3 pages and we have a loop, I would not expect any
to be faulted. Although maybe we could allow 1/3 because MADV_WILLNEED
is only an advisory and a lot of time has been spent discussing this
test already.
Li Wang June 20, 2022, 8:28 a.m. UTC | #4
Hi Richard,

Richard Palethorpe <rpalethorpe@suse.de> wrote:


> >  Adding the loop makes sense to me. However I don't understand why you
> >  have also switched from PASS_THRESHOLD to only a single page?
> >
> > In the test, we use two checks combined to confirm the bug reproduces:
> >
> >   1. swap cached increasing less than PASS_THRESHOLD_KB
> >   2. page_fault number large than expected
> >
> > The 2. case is more easily get failed on kind of platforms and hard
> > to count an average value for tolerating. So maybe we just reduce
> > the page to one that would not affect the final result. Because we
> > rely on both checks happening simultaneously then assume a bug.
> >
> >
> >
> >  I guess calling MADV_WILLNEED on a single page is the least realistic
> >  scenario.
> >
> > Okay, perhaps it's a step backward:).
> >
> > I was just thinking it is a regression test and if 1 page works to
> reproduce
> > that (but more chunks of memory easily cause false positive), why not.
>
> That makes sense, but this test has also found other bugs. I'm not sure
> if they are reproducible with only one page.
>

Indeed.



>
> >
> >
> >
> >  If there is an issue with PASS_THRESHOLD perhaps we could scale it based
> >  on page size?
> >
> > This sounds acceptable too.
> >
> > How many pages do you think are proper, 100 or more?
> > and, loosen the faulted-out numbers to 1/10 pages?
>
> I suppose that 100 pages would be too much memory on some systems. I
> guess at least 2 or 3 pages are needed so there is some
> traversal. Beyond that I don't know what would make a difference.
>
> If there are only max 3 pages and we have a loop, I would not expect any
> to be faulted. Although maybe we could allow 1/3 because MADV_WILLNEED
> is only an advisory and a lot of time has been spent discussing this
> test already.
>

It sounds reasonable. Thanks!

I would have a try and go touch 3 pages (with expect 0 page fault) if that
does work.
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c
index 6d218801c..bfca894f4 100644
--- a/testcases/kernel/syscalls/madvise/madvise06.c
+++ b/testcases/kernel/syscalls/madvise/madvise06.c
@@ -164,7 +164,7 @@  static int get_page_fault_num(void)
 
 static void test_advice_willneed(void)
 {
-	int loops = 50, res;
+	int loops = 100, res;
 	char *target;
 	long swapcached_start, swapcached;
 	int page_fault_num_1, page_fault_num_2;
@@ -202,23 +202,32 @@  static void test_advice_willneed(void)
 		"%s than %ld Kb were moved to the swap cache",
 		res ? "more" : "less", PASS_THRESHOLD_KB);
 
-
-	TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED));
+	loops = 100;
+	SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start);
+	TEST(madvise(target, pg_sz, MADV_WILLNEED));
 	if (TST_RET == -1)
 		tst_brk(TBROK | TTERRNO, "madvise failed");
+	do {
+		loops--;
+		usleep(100000);
+		if (stat_refresh_sup)
+			SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1");
+		SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
+				&swapcached);
+	} while (swapcached < swapcached_start + pg_sz/1024 && loops > 0);
 
 	page_fault_num_1 = get_page_fault_num();
 	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
 			page_fault_num_1);
-	dirty_pages(target, PASS_THRESHOLD);
+	dirty_pages(target, pg_sz);
 	page_fault_num_2 = get_page_fault_num();
 	tst_res(TINFO, "PageFault(madvice / mem access): %d",
 			page_fault_num_2);
 	meminfo_diag("After page access");
 
 	res = page_fault_num_2 - page_fault_num_1;
-	tst_res(res < 3 ? TPASS : TFAIL,
-		"%d pages were faulted out of 2 max", res);
+	tst_res(res == 0 ? TPASS : TFAIL,
+		"%d pages were faulted out of 1 max", res);
 
 	SAFE_MUNMAP(target, CHUNK_SZ);
 }