Message ID | 20220615090648.405100-1-liwang@redhat.com |
---|---|
State | RFC |
Headers | show |
Series | [RFC] madvise06: shrink to 1 MADV_WILLNEED page to stabilize the test | expand |
Hello Li, Li Wang <liwang@redhat.com> writes: > Paul Bunyan reports that the madvise06 test fails intermittently with many > LTS kernels, after checking with mm developer we prefer to think this is > more like a test issue (but not kernel bug): > > madvise06.c:231: TFAIL: 4 pages were faulted out of 2 max > > So this improvement is target to reduce the false positive happens from > three points: > > 1. Adding the while-loop to give more chances for madvise_willneed() > reads memory asynchronously > 2. Raise value of `loop` to let test waiting for more times if swapchache > haven't reached the expected > 3. Shrink to only 1 page for MADV_WILLNEED verifying to make the system > easily takes effect on it > > From Rafael Aquini: > > The problem here is that MADV_WILLNEED is an asynchronous non-blocking > hint, which will tell the kernel to start doing read-ahead work for the > hinted memory chunk, but will not wait up for the read-ahead to finish. > So, it is possible that when the dirty_pages() call start re-dirtying > the pages in that target area, is racing against a scheduled swap-in > read-ahead that hasn't yet finished. Expecting faulting only 2 pages > out of 102400 also seems too strict for a PASS threshold. > > Note: > As Rafael suggested, another possible approach to tackle this failure > is to tally up, and loosen the threshold to more than 2 major faults > after a call to madvise() with MADV_WILLNEED. > But from my test, seems the faulted-out page shows a significant > variance in different platforms, so I didn't take this way. > > Btw, this patch get passed on my two easy reproducible systems more than 1000 times > > Signed-off-by: Li Wang <liwang@redhat.com> > Cc: Rafael Aquini <aquini@redhat.com> > Cc: Paul Bunyan <pbunyan@redhat.com> > Cc: Richard Palethorpe <rpalethorpe@suse.com> > --- > testcases/kernel/syscalls/madvise/madvise06.c | 21 +++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c > index 6d218801c..bfca894f4 100644 > --- a/testcases/kernel/syscalls/madvise/madvise06.c > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > @@ -164,7 +164,7 @@ static int get_page_fault_num(void) > > static void test_advice_willneed(void) > { > - int loops = 50, res; > + int loops = 100, res; > char *target; > long swapcached_start, swapcached; > int page_fault_num_1, page_fault_num_2; > @@ -202,23 +202,32 @@ static void test_advice_willneed(void) > "%s than %ld Kb were moved to the swap cache", > res ? "more" : "less", PASS_THRESHOLD_KB); > > - > - TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED)); > + loops = 100; > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); > + TEST(madvise(target, pg_sz, MADV_WILLNEED)); > if (TST_RET == -1) > tst_brk(TBROK | TTERRNO, "madvise failed"); > + do { > + loops--; > + usleep(100000); > + if (stat_refresh_sup) > + SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1"); > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", > + &swapcached); > + } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0); > > page_fault_num_1 = get_page_fault_num(); > tst_res(TINFO, "PageFault(madvice / no mem access): %d", > page_fault_num_1); > - dirty_pages(target, PASS_THRESHOLD); > + dirty_pages(target, pg_sz); Adding the loop makes sense to me. However I don't understand why you have also switched from PASS_THRESHOLD to only a single page? I guess calling MADV_WILLNEED on a single page is the least realistic scenario. If there is an issue with PASS_THRESHOLD perhaps we could scale it based on page size? > page_fault_num_2 = get_page_fault_num(); > tst_res(TINFO, "PageFault(madvice / mem access): %d", > page_fault_num_2); > meminfo_diag("After page access"); > > res = page_fault_num_2 - page_fault_num_1; > - tst_res(res < 3 ? TPASS : TFAIL, > - "%d pages were faulted out of 2 max", res); > + tst_res(res == 0 ? TPASS : TFAIL, > + "%d pages were faulted out of 1 max", res); > > SAFE_MUNMAP(target, CHUNK_SZ); > }
Hi Richard, Richard Palethorpe <rpalethorpe@suse.de> wrote: > > --- a/testcases/kernel/syscalls/madvise/madvise06.c > > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > > @@ -164,7 +164,7 @@ static int get_page_fault_num(void) > > > > static void test_advice_willneed(void) > > { > > - int loops = 50, res; > > + int loops = 100, res; > > char *target; > > long swapcached_start, swapcached; > > int page_fault_num_1, page_fault_num_2; > > @@ -202,23 +202,32 @@ static void test_advice_willneed(void) > > "%s than %ld Kb were moved to the swap cache", > > res ? "more" : "less", PASS_THRESHOLD_KB); > > > > - > > - TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED)); > > + loops = 100; > > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", > &swapcached_start); > > + TEST(madvise(target, pg_sz, MADV_WILLNEED)); > > if (TST_RET == -1) > > tst_brk(TBROK | TTERRNO, "madvise failed"); > > + do { > > + loops--; > > + usleep(100000); > > + if (stat_refresh_sup) > > + SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1"); > > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", > > + &swapcached); > > + } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0); > > > > page_fault_num_1 = get_page_fault_num(); > > tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > page_fault_num_1); > > - dirty_pages(target, PASS_THRESHOLD); > > + dirty_pages(target, pg_sz); > > Adding the loop makes sense to me. However I don't understand why you > have also switched from PASS_THRESHOLD to only a single page? > In the test, we use two checks combined to confirm the bug reproduces: 1. swap cached increasing less than PASS_THRESHOLD_KB 2. page_fault number large than expected The 2. case is more easily get failed on kind of platforms and hard to count an average value for tolerating. So maybe we just reduce the page to one that would not affect the final result. Because we rely on both checks happening simultaneously then assume a bug. > > I guess calling MADV_WILLNEED on a single page is the least realistic > scenario. > Okay, perhaps it's a step backward:). I was just thinking it is a regression test and if 1 page works to reproduce that (but more chunks of memory easily cause false positive), why not. > > If there is an issue with PASS_THRESHOLD perhaps we could scale it based > on page size? > This sounds acceptable too. How many pages do you think are proper, 100 or more? and, loosen the faulted-out numbers to 1/10 pages?
Hello Li, Li Wang <liwang@redhat.com> writes: > Hi Richard, > > Richard Palethorpe <rpalethorpe@suse.de> wrote: > > > --- a/testcases/kernel/syscalls/madvise/madvise06.c > > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > > @@ -164,7 +164,7 @@ static int get_page_fault_num(void) > > > > static void test_advice_willneed(void) > > { > > - int loops = 50, res; > > + int loops = 100, res; > > char *target; > > long swapcached_start, swapcached; > > int page_fault_num_1, page_fault_num_2; > > @@ -202,23 +202,32 @@ static void test_advice_willneed(void) > > "%s than %ld Kb were moved to the swap cache", > > res ? "more" : "less", PASS_THRESHOLD_KB); > > > > - > > - TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED)); > > + loops = 100; > > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); > > + TEST(madvise(target, pg_sz, MADV_WILLNEED)); > > if (TST_RET == -1) > > tst_brk(TBROK | TTERRNO, "madvise failed"); > > + do { > > + loops--; > > + usleep(100000); > > + if (stat_refresh_sup) > > + SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1"); > > + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", > > + &swapcached); > > + } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0); > > > > page_fault_num_1 = get_page_fault_num(); > > tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > page_fault_num_1); > > - dirty_pages(target, PASS_THRESHOLD); > > + dirty_pages(target, pg_sz); > > Adding the loop makes sense to me. However I don't understand why you > have also switched from PASS_THRESHOLD to only a single page? > > In the test, we use two checks combined to confirm the bug reproduces: > > 1. swap cached increasing less than PASS_THRESHOLD_KB > 2. page_fault number large than expected > > The 2. case is more easily get failed on kind of platforms and hard > to count an average value for tolerating. So maybe we just reduce > the page to one that would not affect the final result. Because we > rely on both checks happening simultaneously then assume a bug. > > > > I guess calling MADV_WILLNEED on a single page is the least realistic > scenario. > > Okay, perhaps it's a step backward:). > > I was just thinking it is a regression test and if 1 page works to reproduce > that (but more chunks of memory easily cause false positive), why not. That makes sense, but this test has also found other bugs. I'm not sure if they are reproducible with only one page. > > > > If there is an issue with PASS_THRESHOLD perhaps we could scale it based > on page size? > > This sounds acceptable too. > > How many pages do you think are proper, 100 or more? > and, loosen the faulted-out numbers to 1/10 pages? I suppose that 100 pages would be too much memory on some systems. I guess at least 2 or 3 pages are needed so there is some traversal. Beyond that I don't know what would make a difference. If there are only max 3 pages and we have a loop, I would not expect any to be faulted. Although maybe we could allow 1/3 because MADV_WILLNEED is only an advisory and a lot of time has been spent discussing this test already.
Hi Richard, Richard Palethorpe <rpalethorpe@suse.de> wrote: > > Adding the loop makes sense to me. However I don't understand why you > > have also switched from PASS_THRESHOLD to only a single page? > > > > In the test, we use two checks combined to confirm the bug reproduces: > > > > 1. swap cached increasing less than PASS_THRESHOLD_KB > > 2. page_fault number large than expected > > > > The 2. case is more easily get failed on kind of platforms and hard > > to count an average value for tolerating. So maybe we just reduce > > the page to one that would not affect the final result. Because we > > rely on both checks happening simultaneously then assume a bug. > > > > > > > > I guess calling MADV_WILLNEED on a single page is the least realistic > > scenario. > > > > Okay, perhaps it's a step backward:). > > > > I was just thinking it is a regression test and if 1 page works to > reproduce > > that (but more chunks of memory easily cause false positive), why not. > > That makes sense, but this test has also found other bugs. I'm not sure > if they are reproducible with only one page. > Indeed. > > > > > > > > > If there is an issue with PASS_THRESHOLD perhaps we could scale it based > > on page size? > > > > This sounds acceptable too. > > > > How many pages do you think are proper, 100 or more? > > and, loosen the faulted-out numbers to 1/10 pages? > > I suppose that 100 pages would be too much memory on some systems. I > guess at least 2 or 3 pages are needed so there is some > traversal. Beyond that I don't know what would make a difference. > > If there are only max 3 pages and we have a loop, I would not expect any > to be faulted. Although maybe we could allow 1/3 because MADV_WILLNEED > is only an advisory and a lot of time has been spent discussing this > test already. > It sounds reasonable. Thanks! I would have a try and go touch 3 pages (with expect 0 page fault) if that does work.
diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c index 6d218801c..bfca894f4 100644 --- a/testcases/kernel/syscalls/madvise/madvise06.c +++ b/testcases/kernel/syscalls/madvise/madvise06.c @@ -164,7 +164,7 @@ static int get_page_fault_num(void) static void test_advice_willneed(void) { - int loops = 50, res; + int loops = 100, res; char *target; long swapcached_start, swapcached; int page_fault_num_1, page_fault_num_2; @@ -202,23 +202,32 @@ static void test_advice_willneed(void) "%s than %ld Kb were moved to the swap cache", res ? "more" : "less", PASS_THRESHOLD_KB); - - TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED)); + loops = 100; + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); + TEST(madvise(target, pg_sz, MADV_WILLNEED)); if (TST_RET == -1) tst_brk(TBROK | TTERRNO, "madvise failed"); + do { + loops--; + usleep(100000); + if (stat_refresh_sup) + SAFE_FILE_PRINTF("/proc/sys/vm/stat_refresh", "1"); + SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", + &swapcached); + } while (swapcached < swapcached_start + pg_sz/1024 && loops > 0); page_fault_num_1 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / no mem access): %d", page_fault_num_1); - dirty_pages(target, PASS_THRESHOLD); + dirty_pages(target, pg_sz); page_fault_num_2 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / mem access): %d", page_fault_num_2); meminfo_diag("After page access"); res = page_fault_num_2 - page_fault_num_1; - tst_res(res < 3 ? TPASS : TFAIL, - "%d pages were faulted out of 2 max", res); + tst_res(res == 0 ? TPASS : TFAIL, + "%d pages were faulted out of 1 max", res); SAFE_MUNMAP(target, CHUNK_SZ); }
Paul Bunyan reports that the madvise06 test fails intermittently with many LTS kernels, after checking with mm developer we prefer to think this is more like a test issue (but not kernel bug): madvise06.c:231: TFAIL: 4 pages were faulted out of 2 max So this improvement is target to reduce the false positive happens from three points: 1. Adding the while-loop to give more chances for madvise_willneed() reads memory asynchronously 2. Raise value of `loop` to let test waiting for more times if swapchache haven't reached the expected 3. Shrink to only 1 page for MADV_WILLNEED verifying to make the system easily takes effect on it From Rafael Aquini: The problem here is that MADV_WILLNEED is an asynchronous non-blocking hint, which will tell the kernel to start doing read-ahead work for the hinted memory chunk, but will not wait up for the read-ahead to finish. So, it is possible that when the dirty_pages() call start re-dirtying the pages in that target area, is racing against a scheduled swap-in read-ahead that hasn't yet finished. Expecting faulting only 2 pages out of 102400 also seems too strict for a PASS threshold. Note: As Rafael suggested, another possible approach to tackle this failure is to tally up, and loosen the threshold to more than 2 major faults after a call to madvise() with MADV_WILLNEED. But from my test, seems the faulted-out page shows a significant variance in different platforms, so I didn't take this way. Btw, this patch get passed on my two easy reproducible systems more than 1000 times Signed-off-by: Li Wang <liwang@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Paul Bunyan <pbunyan@redhat.com> Cc: Richard Palethorpe <rpalethorpe@suse.com> --- testcases/kernel/syscalls/madvise/madvise06.c | 21 +++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)