Message ID | 20230218040919.3548296-1-liwang@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | madvise06: Raise the bar for judging failure | expand |
Hell Li, Li Wang <liwang@redhat.com> writes: > There is an intermittent failure which we have observed many times whether > on rhel or mainline kernel. But we're unable to stable reproduce it: > > 43 madvise06.c:201: TFAIL: less than 102400 Kb were moved to the swap cache > ... > > However it does not look like a kernel issue, because SwapCached change is > not strictly abiding by the principle of MADV_WILLNEED advice. That means it > all depends on the kernel's specific circumstances. The value of the threshold > is debatable at least from my point of view, its use 1/4 is not guaranteed > 100% safe. > > As MADV_WILLNEED is just advice to the kernel, not a guarantee. The kernel may > choose to ignore the advice, or may prioritize other memory management tasks > over pre-loading the advised pages. > > So this patch is aimed at improving the accuracy and clarity of the test results. > Specifically, the use of two separate variables to track the results of different > comparisons will make it easier to understand what the test is doing. > > Additionally, the change to report a test result of "TINFO" instead of "TFAIL" > when the swap cache size is less than expected would be intended to indicate > that this is an acceptable outcome. > > Finally, the change to the second tst_res call is intended to make the test more > lenient, as it now passes if either no page faults occur or the swap cache size > is larger than expected. Why not skip to making them all TINFO? It's undefined what action will result from MADV_WILLNEED. If it were better for performance *not* to read in pages, then it would be valid for the kernel to ignore it. Yang Xu added a tag for a perf regression that it could reproduce. However looking at the kernel commit this was first found by stress-ng. commit 66383800df9cbdbf3b0c34d5a51bf35bcdb72fd2 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Sat Nov 21 22:17:22 2020 -0800 mm: fix madvise WILLNEED performance problem The calculation of the end page index was incorrect, leading to a regression of 70% when running stress-ng. With this fix, we instead see a performance improvement of 3% I found a bug with this test, but it was causing an Oops. It wouldn't matter if the test printed pass or fail. So I think we are wasting our time by constantly tweaking this test.
Hi Richard, On Mon, Feb 27, 2023 at 8:27 PM Richard Palethorpe <rpalethorpe@suse.de> wrote: > Hell Li, > > Li Wang <liwang@redhat.com> writes: > > > There is an intermittent failure which we have observed many times > whether > > on rhel or mainline kernel. But we're unable to stable reproduce it: > > > > 43 madvise06.c:201: TFAIL: less than 102400 Kb were moved to > the swap cache > > ... > > > > However it does not look like a kernel issue, because SwapCached change > is > > not strictly abiding by the principle of MADV_WILLNEED advice. That > means it > > all depends on the kernel's specific circumstances. The value of the > threshold > > is debatable at least from my point of view, its use 1/4 is not > guaranteed > > 100% safe. > > > > As MADV_WILLNEED is just advice to the kernel, not a guarantee. The > kernel may > > choose to ignore the advice, or may prioritize other memory management > tasks > > over pre-loading the advised pages. > > > > So this patch is aimed at improving the accuracy and clarity of the test > results. > > Specifically, the use of two separate variables to track the results of > different > > comparisons will make it easier to understand what the test is doing. > > > > Additionally, the change to report a test result of "TINFO" instead of > "TFAIL" > > when the swap cache size is less than expected would be intended to > indicate > > that this is an acceptable outcome. > > > > Finally, the change to the second tst_res call is intended to make the > test more > > lenient, as it now passes if either no page faults occur or the swap > cache size > > is larger than expected. > > Why not skip to making them all TINFO? > > It's undefined what action will result from MADV_WILLNEED. If it were > better for performance *not* to read in pages, then it would be valid > for the kernel to ignore it. > Yes, but I didn't do that because madvise06 test checks free_mem/free_swap size at the beginning, it garantee the system at least with 2 * CHUNK_SZ (800MB + 800MB) memory for the test performing, unless there is something happening parallel otherwise kernel will handle MADV_WILLNEED request correctly for most scenarios. And we indeed do not see page-faults failure out of expected anymore since commit 00e769e63515e51, so I just combined the two judgments together in this patch. I believe it's enough and also give a leeway to the kernel. I hope there could be a lenient test for MADV_WILLNEED. I will decisively take your suggestion once the failure appears again next time. > > Yang Xu added a tag for a perf regression that it could > reproduce. However looking at the kernel commit this was first found by > stress-ng. > > commit 66383800df9cbdbf3b0c34d5a51bf35bcdb72fd2 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Sat Nov 21 22:17:22 2020 -0800 > > mm: fix madvise WILLNEED performance problem > > The calculation of the end page index was incorrect, leading to a > regression of 70% when running stress-ng. > > With this fix, we instead see a performance improvement of 3% > > I found a bug with this test, but it was causing an Oops. It wouldn't > matter if the test printed pass or fail. > > So I think we are wasting our time by constantly tweaking this test. > > -- > Thank you, > Richard. > >
diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c index c7967ae6f..5bd428bd9 100644 --- a/testcases/kernel/syscalls/madvise/madvise06.c +++ b/testcases/kernel/syscalls/madvise/madvise06.c @@ -164,7 +164,7 @@ static int get_page_fault_num(void) static void test_advice_willneed(void) { - int loops = 100, res; + int loops = 100, res1, res2; char *target; long swapcached_start, swapcached; int page_fault_num_1, page_fault_num_2; @@ -197,10 +197,10 @@ static void test_advice_willneed(void) } while (swapcached < swapcached_start + PASS_THRESHOLD_KB && loops > 0); meminfo_diag("After madvise"); - res = swapcached > swapcached_start + PASS_THRESHOLD_KB; - tst_res(res ? TPASS : TFAIL, + res1 = swapcached > swapcached_start + PASS_THRESHOLD_KB; + tst_res(res1 ? TPASS : TINFO, "%s than %ld Kb were moved to the swap cache", - res ? "more" : "less", PASS_THRESHOLD_KB); + res1 ? "more" : "less", PASS_THRESHOLD_KB); loops = 100; SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); @@ -225,9 +225,9 @@ static void test_advice_willneed(void) page_fault_num_2); meminfo_diag("After page access"); - res = page_fault_num_2 - page_fault_num_1; - tst_res(res == 0 ? TPASS : TFAIL, - "%d pages were faulted out of 3 max", res); + res2 = page_fault_num_2 - page_fault_num_1; + tst_res(((res2 == 0) || res1) ? TPASS : TFAIL, + "%d pages were faulted out of 3 max", res2); SAFE_MUNMAP(target, CHUNK_SZ); }
There is an intermittent failure which we have observed many times whether on rhel or mainline kernel. But we're unable to stable reproduce it: 43 madvise06.c:201: TFAIL: less than 102400 Kb were moved to the swap cache ... However it does not look like a kernel issue, because SwapCached change is not strictly abiding by the principle of MADV_WILLNEED advice. That means it all depends on the kernel's specific circumstances. The value of the threshold is debatable at least from my point of view, its use 1/4 is not guaranteed 100% safe. As MADV_WILLNEED is just advice to the kernel, not a guarantee. The kernel may choose to ignore the advice, or may prioritize other memory management tasks over pre-loading the advised pages. So this patch is aimed at improving the accuracy and clarity of the test results. Specifically, the use of two separate variables to track the results of different comparisons will make it easier to understand what the test is doing. Additionally, the change to report a test result of "TINFO" instead of "TFAIL" when the swap cache size is less than expected would be intended to indicate that this is an acceptable outcome. Finally, the change to the second tst_res call is intended to make the test more lenient, as it now passes if either no page faults occur or the swap cache size is larger than expected. Reported-by: Paul Bunyan <pbunyan@redhat.com> Signed-off-by: Li Wang <liwang@redhat.com> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Eirik Fuller <efuller@redhat.com> --- testcases/kernel/syscalls/madvise/madvise06.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)