[v2,0/1] Add some memory page soft-offlining control

Message ID	20230127100553.29986-1-william.roche@oracle.com
Headers	show Return-Path: <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it> From: william.roche@oracle.com To: ltp@lists.linux.it Date: Fri, 27 Jan 2023 10:05:52 +0000 Message-Id: <20230127100553.29986-1-william.roche@oracle.com> In-Reply-To: <87bksklax3.fsf@suse.de> References: <87bksklax3.fsf@suse.de> MIME-Version: 1.0 Subject: [LTP] [LTP PATCH v2 0/1] Add some memory page soft-offlining control Precedence: list Cc: william.roche@oracle.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it Sender: "ltp" <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it>
Series	Add some memory page soft-offlining control \| expand [v2,0/1] Add some memory page soft-offlining control [v2,1/1] madvise11: Add test for memory allocation / Soft-offlining possible race

Message ID

20230127100553.29986-1-william.roche@oracle.com

Headers

From: william.roche@oracle.com
To: ltp@lists.linux.it
Date: Fri, 27 Jan 2023 10:05:52 +0000
Message-Id: <20230127100553.29986-1-william.roche@oracle.com>
In-Reply-To: <87bksklax3.fsf@suse.de>
References: <87bksklax3.fsf@suse.de>
MIME-Version: 1.0
Subject: [LTP] [LTP PATCH v2 0/1] Add some memory page soft-offlining control
Precedence: list
Cc: william.roche@oracle.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it
Sender: "ltp" <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it>

Series

Add some memory page soft-offlining control | expand

Message

“William Roche Jan. 27, 2023, 10:05 a.m. UTC

From: William Roche <william.roche@oracle.com>

After a long delay (since August) and many days of work on this topic,
I come back with a new version of this test proposal.
This version is still using a set of threads running the same code and
competing with each other. They all allocate a set of memory pages,
write a sentinel value into each of them and soft-offline them before
verifying the sentinel value and unmapping them - in a loop.

I've tried to address all the feedbacks I had:

- added madvise11 to the runtest/syscalls file [Petr]
- more complete and compliant Description comment [Petr]
- removed no longer used header files
- removed inline comments [Petr + Richard]
- removed unnecessary comments [Petr]
- number of threads dynamically tuned (with limits) [Richard]
- warn about unexpected mmap errors [Richard]
- lower case (not camel) variable names [Petr + Richard]
- removal of an unneeded temporary "copy" variable [Richard]
- removed unnecessary additional checks of SAFE_* functions [Petr]
- removed the min_kver=2.6.33 [Petr]
- added the commit id into the test_tst structure [Richard]
- "make check-madvise11" is now clean [Petr + Richard]

But also:

- separate functions for mmap and madvise (dealing with error cases)
- simplified the page sentinel value setting and verification
- give information about number of threads and memory to be used by an
  iteration of the test
- count the iterations to unpoison the right number of pages in case of
  multiple successful iterations
- moved sigaction setting to setup()
- SAFE_MALLOC() used
- significantly reduced the number of threads used
- significantly reduced the runtime timeout



Note about the tst_fuzzy_sync framework use:
What required the largest part of my work was this aspect that has been
mentioned by Richard, as I agree with him about putting the emphasis on
the competing critical sections of code (mmap and madvise). I finally
could create a version of this test using the tst_fuzzy_sync mechanism
that could reproduce the race condition.
But I chose not to use it for the following reasons:
- my fuzzy version was not as reliable as the multithreaded version to
  identify our race condition -- On a kernel where the race fixed by
  d4ae9916ea29 is still there, the fuzzy version of the test could give
  false positive results on about 10% of the runs, where this
  multithreaded version hasn't shown a false positive in my tests.
- Another reason why I chose to submit this multithreaded test version is
  that it is generally (about 80% of the cases) much faster to fall on
  the race condition than the fuzzy version.

So I hope you'll find this multithreaded test useful.
Tested on ARM and x86.


William Roche (1):
  madvise11: Add test for memory allocation / Soft-offlining possible
    race

 runtest/syscalls                              |   1 +
 testcases/kernel/syscalls/madvise/.gitignore  |   1 +
 testcases/kernel/syscalls/madvise/Makefile    |   3 +
 testcases/kernel/syscalls/madvise/madvise11.c | 405 ++++++++++++++++++
 4 files changed, 410 insertions(+)
 create mode 100644 testcases/kernel/syscalls/madvise/madvise11.c

Comments

Richard Palethorpe Feb. 13, 2023, 9:34 a.m. UTC | #1

Hello,

william.roche@oracle.com writes:

> From: William Roche <william.roche@oracle.com>
>
> After a long delay (since August) and many days of work on this topic,
> I come back with a new version of this test proposal.
> This version is still using a set of threads running the same code and
> competing with each other. They all allocate a set of memory pages,
> write a sentinel value into each of them and soft-offline them before
> verifying the sentinel value and unmapping them - in a loop.
>
> I've tried to address all the feedbacks I had:
>
> - added madvise11 to the runtest/syscalls file [Petr]
> - more complete and compliant Description comment [Petr]
> - removed no longer used header files
> - removed inline comments [Petr + Richard]
> - removed unnecessary comments [Petr]
> - number of threads dynamically tuned (with limits) [Richard]
> - warn about unexpected mmap errors [Richard]
> - lower case (not camel) variable names [Petr + Richard]
> - removal of an unneeded temporary "copy" variable [Richard]
> - removed unnecessary additional checks of SAFE_* functions [Petr]
> - removed the min_kver=2.6.33 [Petr]
> - added the commit id into the test_tst structure [Richard]
> - "make check-madvise11" is now clean [Petr + Richard]
>
> But also:
>
> - separate functions for mmap and madvise (dealing with error cases)
> - simplified the page sentinel value setting and verification
> - give information about number of threads and memory to be used by an
>   iteration of the test
> - count the iterations to unpoison the right number of pages in case of
>   multiple successful iterations
> - moved sigaction setting to setup()
> - SAFE_MALLOC() used
> - significantly reduced the number of threads used
> - significantly reduced the runtime timeout
>
>
>
> Note about the tst_fuzzy_sync framework use:
> What required the largest part of my work was this aspect that has been
> mentioned by Richard, as I agree with him about putting the emphasis on
> the competing critical sections of code (mmap and madvise). I finally
> could create a version of this test using the tst_fuzzy_sync mechanism
> that could reproduce the race condition.
> But I chose not to use it for the following reasons:
> - my fuzzy version was not as reliable as the multithreaded version to
>   identify our race condition -- On a kernel where the race fixed by
>   d4ae9916ea29 is still there, the fuzzy version of the test could give
>   false positive results on about 10% of the runs, where this
>   multithreaded version hasn't shown a false positive in my tests.
> - Another reason why I chose to submit this multithreaded test version is
>   that it is generally (about 80% of the cases) much faster to fall on
>   the race condition than the fuzzy version.
>
> So I hope you'll find this multithreaded test useful.
> Tested on ARM and x86.

OK, just looking now. There was a two week delay because I was focused
on non LTP stuff.

>
>
> William Roche (1):
>   madvise11: Add test for memory allocation / Soft-offlining possible
>     race
>
>  runtest/syscalls                              |   1 +
>  testcases/kernel/syscalls/madvise/.gitignore  |   1 +
>  testcases/kernel/syscalls/madvise/Makefile    |   3 +
>  testcases/kernel/syscalls/madvise/madvise11.c | 405 ++++++++++++++++++
>  4 files changed, 410 insertions(+)
>  create mode 100644 testcases/kernel/syscalls/madvise/madvise11.c