diff mbox series

[RFC] move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE)

Message ID 20190607095213.13372-1-liwang@redhat.com
State Superseded
Headers show
Series [RFC] move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE) | expand

Commit Message

Li Wang June 7, 2019, 9:52 a.m. UTC
The test#2 is going to simulate the race condition, where move_pages()
and soft offline are called on a single hugetlb page concurrently. But,
it return EBUSY and report FAIL in soft-offline a moving hugepage as a
result sometimes.

The root cause seems a call to page_huge_active return false, then the
soft offline action will failed to isolate hugepage with EBUSY return as
below call trace:

In Parent:
  madvise(..., MADV_SOFT_OFFLINE)
  ...
    soft_offline_page
      soft_offline_in_use_page
        soft_offline_huge_page
          isolate_huge_page
            page_huge_active  --> return false at here

In Child:
  move_pages()
  ...
    do_move_pages
      do_move_pages_to_node
        add_page_for_migration
          isolate_huge_page   --> it has already isolated the hugepage

In this patch, I simply regard the returned EBUSY as a normal situation and
mask it in error handler. Because move_pages is calling add_page_for_migration
to isolate hugepage before do migration, so that's very possible to hit the
collision and return EBUSY on the same page.

Error log:
----------
move_pages12.c:235: INFO: Free RAM 8386256 kB
move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to 6
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:169: PASS: Bug not reproduced
move_pages12.c:81: FAIL: madvise failed: SUCCESS
move_pages12.c:81: FAIL: madvise failed: SUCCESS
move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM
move_pages12.c:114: FAIL: move_pages failed: EINVAL

Dmesg:
------
[165435.492170] soft offline: 0x61c00 hugepage failed to isolate
[165435.590252] soft offline: 0x61c00 hugepage failed to isolate
[165435.725493] soft offline: 0x61400 hugepage failed to isolate

Other two fixes in this patch:
 * use TERRNO(but not TTERRNO) to catch madvise(..., MADV_SOFT_OFFLINE) errno
 * go out test when hugepage allocating failed with ENOMEM

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xiao Yang <yangx.jy@cn.fujitsu.com>
Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
---
 .../kernel/syscalls/move_pages/move_pages12.c | 33 ++++++++++++++-----
 1 file changed, 24 insertions(+), 9 deletions(-)

Comments

Naoya Horiguchi June 10, 2019, 3:27 a.m. UTC | #1
Hi Li Wang,

Thank you for maintaining the testcase.

Recently (since 4.19) we have a semantics change on the return value of
madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage migration
succeeded and error containment failed:

  commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
  Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
  Date:   Thu Aug 23 17:00:38 2018 -0700
  
      mm: fix race on soft-offlining free huge pages

, so we don't have to consider this EBUSY as error, but a good report
for application. Your change meets the change.

Feel free to add my ack:

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks,
- Naoya

On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
> The test#2 is going to simulate the race condition, where move_pages()
> and soft offline are called on a single hugetlb page concurrently. But,
> it return EBUSY and report FAIL in soft-offline a moving hugepage as a
> result sometimes.
> 
> The root cause seems a call to page_huge_active return false, then the
> soft offline action will failed to isolate hugepage with EBUSY return as
> below call trace:
> 
> In Parent:
>   madvise(..., MADV_SOFT_OFFLINE)
>   ...
>     soft_offline_page
>       soft_offline_in_use_page
>         soft_offline_huge_page
>           isolate_huge_page
>             page_huge_active  --> return false at here
> 
> In Child:
>   move_pages()
>   ...
>     do_move_pages
>       do_move_pages_to_node
>         add_page_for_migration
>           isolate_huge_page   --> it has already isolated the hugepage
> 
> In this patch, I simply regard the returned EBUSY as a normal situation and
> mask it in error handler. Because move_pages is calling add_page_for_migration
> to isolate hugepage before do migration, so that's very possible to hit the
> collision and return EBUSY on the same page.
> 
> Error log:
> ----------
> move_pages12.c:235: INFO: Free RAM 8386256 kB
> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to 4
> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to 6
> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
> move_pages12.c:169: PASS: Bug not reproduced
> move_pages12.c:81: FAIL: madvise failed: SUCCESS
> move_pages12.c:81: FAIL: madvise failed: SUCCESS
> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM
> move_pages12.c:114: FAIL: move_pages failed: EINVAL
> 
> Dmesg:
> ------
> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
> 
> Other two fixes in this patch:
>  * use TERRNO(but not TTERRNO) to catch madvise(..., MADV_SOFT_OFFLINE) errno
>  * go out test when hugepage allocating failed with ENOMEM
> 
> Signed-off-by: Li Wang <liwang@redhat.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com>
> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
> ---
>  .../kernel/syscalls/move_pages/move_pages12.c | 33 ++++++++++++++-----
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/testcases/kernel/syscalls/move_pages/move_pages12.c b/testcases/kernel/syscalls/move_pages/move_pages12.c
> index 964b712fb..c446396dc 100644
> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
> @@ -77,8 +77,8 @@ static void *addr;
>  static int do_soft_offline(int tpgs)
>  {
>  	if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
> -		if (errno != EINVAL)
> -			tst_res(TFAIL | TTERRNO, "madvise failed");
> +		if (errno != EINVAL && errno != EBUSY)
> +			tst_res(TFAIL | TERRNO, "madvise failed");
>  		return errno;
>  	}
>  	return 0;
> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
>  
>  static void do_test(unsigned int n)
>  {
> -	int i;
> +	int i, ret;
> +	void *ptr;
>  	pid_t cpid = -1;
>  	int status;
>  	unsigned int twenty_percent = (tst_timeout_remaining() / 5);
> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
>  		do_child(tcases[n].tpages);
>  
>  	for (i = 0; i < LOOPS; i++) {
> -		void *ptr;
> +		ptr = mmap(NULL, tcases[n].tpages * hpsz,
> +				PROT_READ | PROT_WRITE,
> +				MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
> +		if (ptr == MAP_FAILED) {
> +			if (errno == ENOMEM) {
> +				tst_res(TCONF,
> +					"Cannot allocate hugepage, memory too fragmented?");
> +				goto out;
> +			}
> +
> +			tst_brk(TBROK | TERRNO, "Cannot allocate hugepage");
> +		}
>  
> -		ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
> -			PROT_READ | PROT_WRITE,
> -			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
>  		if (ptr != addr)
>  			tst_brk(TBROK, "Failed to mmap at desired addr");
>  
>  		memset(addr, 0, tcases[n].tpages * hpsz);
>  
>  		if (tcases[n].offline) {
> -			if (do_soft_offline(tcases[n].tpages) == EINVAL) {
> +			ret = do_soft_offline(tcases[n].tpages);
> +
> +			if (ret == EINVAL) {
>  				SAFE_KILL(cpid, SIGKILL);
>  				SAFE_WAITPID(cpid, &status, 0);
>  				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
>  				tst_res(TCONF,
>  					"madvise() didn't support MADV_SOFT_OFFLINE");
>  				return;
> +			} else if (ret == EBUSY) {
> +				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> +				goto out;
>  			}
>  		}
>  
> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
>  			break;
>  	}
>  
> +out:
>  	SAFE_KILL(cpid, SIGKILL);
>  	SAFE_WAITPID(cpid, &status, 0);
> -	if (!WIFEXITED(status))
> +	if (!WIFEXITED(status) && ptr != MAP_FAILED)
>  		tst_res(TPASS, "Bug not reproduced");
>  }
>  
> -- 
> 2.20.1
> 
>
Yang Xu June 21, 2019, 5:58 a.m. UTC | #2
> Hi Li Wang,
>
> Thank you for maintaining the testcase.
>
> Recently (since 4.19) we have a semantics change on the return value of
> madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage migration
> succeeded and error containment failed:
>
>   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
>   Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>   Date:   Thu Aug 23 17:00:38 2018 -0700
>   
>       mm: fix race on soft-offlining free huge pages
>
> , so we don't have to consider this EBUSY as error, but a good report
> for application. Your change meets the change.
>
> Feel free to add my ack:
>
> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>
> Thanks,
> - Naoya
>
> On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
>> The test#2 is going to simulate the race condition, where move_pages()
>> and soft offline are called on a single hugetlb page concurrently. But,
>> it return EBUSY and report FAIL in soft-offline a moving hugepage as a
>> result sometimes.
>>
>> The root cause seems a call to page_huge_active return false, then the
>> soft offline action will failed to isolate hugepage with EBUSY return as
>> below call trace:
>>
>> In Parent:
>>   madvise(..., MADV_SOFT_OFFLINE)
>>   ...
>>     soft_offline_page
>>       soft_offline_in_use_page
>>         soft_offline_huge_page
>>           isolate_huge_page
>>             page_huge_active  --> return false at here
>>
>> In Child:
>>   move_pages()
>>   ...
>>     do_move_pages
>>       do_move_pages_to_node
>>         add_page_for_migration
>>           isolate_huge_page   --> it has already isolated the hugepage
>>
>> In this patch, I simply regard the returned EBUSY as a normal situation and
>> mask it in error handler. Because move_pages is calling add_page_for_migration
>> to isolate hugepage before do migration, so that's very possible to hit the
>> collision and return EBUSY on the same page.
>>
>> Error log:
>> ----------
>> move_pages12.c:235: INFO: Free RAM 8386256 kB
>> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to 4
>> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to 6
>> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
>> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
>> move_pages12.c:169: PASS: Bug not reproduced
>> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM
>> move_pages12.c:114: FAIL: move_pages failed: EINVAL
>>
>> Dmesg:
>> ------
>> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
>> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
>> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
>>
>> Other two fixes in this patch:
>>  * use TERRNO(but not TTERRNO) to catch madvise(..., MADV_SOFT_OFFLINE) errno
>>  * go out test when hugepage allocating failed with ENOMEM
Hi Li

Your patch can handle EBUSY errno correctly for soft offline. 
But move page  may be killed by SIGBUS because of  MCE  when we soft offline concurrently.  
That leads to move_page failed with ESRCH.   Also, move page may fails with ENOMEM .
Do you notice it ?

I think ESRCH error can represent the soft offline bug not reproduce because it don't trigger a crash.
What do you think about it?

err_log:
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
move_pages12.c:236: INFO: Free RAM 119568 kB
move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node 0 to 83
move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node 1 to 94
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:170: PASS: Bug not reproduced
tst_test.c:1141: BROK: Test killed by SIGBUS!

Summary:
passed   1
failed   0
skipped  0
warnings 0

move_pages12.c:114: FAIL: move_pages failed: ESRCH

dmesg
[ 9868.180669] MCE: Killing move_pages12:29616 due to hardware memory corruption fault at 2aaaaac00018
[ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
[ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
[ 9990.060395] Soft offlining page 51000 at 2aaaaac00000

Kind Regards,
Yang Xu

>> Signed-off-by: Li Wang <liwang@redhat.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com>
>> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
>> ---
>>  .../kernel/syscalls/move_pages/move_pages12.c | 33 ++++++++++++++-----
>>  1 file changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/testcases/kernel/syscalls/move_pages/move_pages12.c b/testcases/kernel/syscalls/move_pages/move_pages12.c
>> index 964b712fb..c446396dc 100644
>> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
>> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
>> @@ -77,8 +77,8 @@ static void *addr;
>>  static int do_soft_offline(int tpgs)
>>  {
>>  	if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
>> -		if (errno != EINVAL)
>> -			tst_res(TFAIL | TTERRNO, "madvise failed");
>> +		if (errno != EINVAL && errno != EBUSY)
>> +			tst_res(TFAIL | TERRNO, "madvise failed");
>>  		return errno;
>>  	}
>>  	return 0;
>> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
>>  
>>  static void do_test(unsigned int n)
>>  {
>> -	int i;
>> +	int i, ret;
>> +	void *ptr;
>>  	pid_t cpid = -1;
>>  	int status;
>>  	unsigned int twenty_percent = (tst_timeout_remaining() / 5);
>> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
>>  		do_child(tcases[n].tpages);
>>  
>>  	for (i = 0; i < LOOPS; i++) {
>> -		void *ptr;
>> +		ptr = mmap(NULL, tcases[n].tpages * hpsz,
>> +				PROT_READ | PROT_WRITE,
>> +				MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
>> +		if (ptr == MAP_FAILED) {
>> +			if (errno == ENOMEM) {
>> +				tst_res(TCONF,
>> +					"Cannot allocate hugepage, memory too fragmented?");
>> +				goto out;
>> +			}
>> +
>> +			tst_brk(TBROK | TERRNO, "Cannot allocate hugepage");
>> +		}
>>  
>> -		ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
>> -			PROT_READ | PROT_WRITE,
>> -			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
>>  		if (ptr != addr)
>>  			tst_brk(TBROK, "Failed to mmap at desired addr");
>>  
>>  		memset(addr, 0, tcases[n].tpages * hpsz);
>>  
>>  		if (tcases[n].offline) {
>> -			if (do_soft_offline(tcases[n].tpages) == EINVAL) {
>> +			ret = do_soft_offline(tcases[n].tpages);
>> +
>> +			if (ret == EINVAL) {
>>  				SAFE_KILL(cpid, SIGKILL);
>>  				SAFE_WAITPID(cpid, &status, 0);
>>  				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
>>  				tst_res(TCONF,
>>  					"madvise() didn't support MADV_SOFT_OFFLINE");
>>  				return;
>> +			} else if (ret == EBUSY) {
>> +				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
>> +				goto out;
>>  			}
>>  		}
>>  
>> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
>>  			break;
>>  	}
>>  
>> +out:
>>  	SAFE_KILL(cpid, SIGKILL);
>>  	SAFE_WAITPID(cpid, &status, 0);
>> -	if (!WIFEXITED(status))
>> +	if (!WIFEXITED(status) && ptr != MAP_FAILED)
>>  		tst_res(TPASS, "Bug not reproduced");
>>  }
>>  
>> -- 
>> 2.20.1
>>
>>
>
> .
>
Li Wang June 24, 2019, 2:43 a.m. UTC | #3
Hi Xu Yang,

On Fri, Jun 21, 2019 at 1:58 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com>
wrote:

>
> > Hi Li Wang,
> >
> > Thank you for maintaining the testcase.
> >
> > Recently (since 4.19) we have a semantics change on the return value of
> > madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage migration
> > succeeded and error containment failed:
> >
> >   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
> >   Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >   Date:   Thu Aug 23 17:00:38 2018 -0700
> >
> >       mm: fix race on soft-offlining free huge pages
> >
> > , so we don't have to consider this EBUSY as error, but a good report
> > for application. Your change meets the change.
> >
> > Feel free to add my ack:
> >
> > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >
> > Thanks,
> > - Naoya
> >
> > On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
> >> The test#2 is going to simulate the race condition, where move_pages()
> >> and soft offline are called on a single hugetlb page concurrently. But,
> >> it return EBUSY and report FAIL in soft-offline a moving hugepage as a
> >> result sometimes.
> >>
> >> The root cause seems a call to page_huge_active return false, then the
> >> soft offline action will failed to isolate hugepage with EBUSY return as
> >> below call trace:
> >>
> >> In Parent:
> >>   madvise(..., MADV_SOFT_OFFLINE)
> >>   ...
> >>     soft_offline_page
> >>       soft_offline_in_use_page
> >>         soft_offline_huge_page
> >>           isolate_huge_page
> >>             page_huge_active  --> return false at here
> >>
> >> In Child:
> >>   move_pages()
> >>   ...
> >>     do_move_pages
> >>       do_move_pages_to_node
> >>         add_page_for_migration
> >>           isolate_huge_page   --> it has already isolated the hugepage
> >>
> >> In this patch, I simply regard the returned EBUSY as a normal situation
> and
> >> mask it in error handler. Because move_pages is calling
> add_page_for_migration
> >> to isolate hugepage before do migration, so that's very possible to hit
> the
> >> collision and return EBUSY on the same page.
> >>
> >> Error log:
> >> ----------
> >> move_pages12.c:235: INFO: Free RAM 8386256 kB
> >> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to
> 4
> >> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to
> 6
> >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
> >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
> >> move_pages12.c:169: PASS: Bug not reproduced
> >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
> >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
> >> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0) failed:
> ENOMEM
> >> move_pages12.c:114: FAIL: move_pages failed: EINVAL
> >>
> >> Dmesg:
> >> ------
> >> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
> >> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
> >> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
> >>
> >> Other two fixes in this patch:
> >>  * use TERRNO(but not TTERRNO) to catch madvise(..., MADV_SOFT_OFFLINE)
> errno
> >>  * go out test when hugepage allocating failed with ENOMEM
> Hi Li
>
> Your patch can handle EBUSY errno correctly for soft offline.
> But move page  may be killed by SIGBUS because of  MCE  when we soft
> offline concurrently.
> That leads to move_page failed with ESRCH.   Also, move page may fails
> with ENOMEM .
> Do you notice it ?
>

I didn't get this failure, it seems not related to this patch. Two
questions:

1. which kernel version do you test?
2. can you reproduce this without my patch?



>
> I think ESRCH error can represent the soft offline bug not reproduce
> because it don't trigger a crash.
> What do you think about it?
>

Maybe, but it needs to check details on your kernel.

>
> err_log:
> tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
> move_pages12.c:236: INFO: Free RAM 119568 kB
> move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node 0 to 83
> move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node 1 to 94
> move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
> move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
> move_pages12.c:170: PASS: Bug not reproduced
> tst_test.c:1141: BROK: Test killed by SIGBUS!
>
> Summary:
> passed   1
> failed   0
> skipped  0
> warnings 0
>
> move_pages12.c:114: FAIL: move_pages failed: ESRCH
>
> dmesg
> [ 9868.180669] MCE: Killing move_pages12:29616 due to hardware memory
> corruption fault at 2aaaaac00018
> [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
> [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
> [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000
>
> Kind Regards,
> Yang Xu
>
> >> Signed-off-by: Li Wang <liwang@redhat.com>
> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com>
> >> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
> >> ---
> >>  .../kernel/syscalls/move_pages/move_pages12.c | 33 ++++++++++++++-----
> >>  1 file changed, 24 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/testcases/kernel/syscalls/move_pages/move_pages12.c
> b/testcases/kernel/syscalls/move_pages/move_pages12.c
> >> index 964b712fb..c446396dc 100644
> >> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
> >> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
> >> @@ -77,8 +77,8 @@ static void *addr;
> >>  static int do_soft_offline(int tpgs)
> >>  {
> >>      if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
> >> -            if (errno != EINVAL)
> >> -                    tst_res(TFAIL | TTERRNO, "madvise failed");
> >> +            if (errno != EINVAL && errno != EBUSY)
> >> +                    tst_res(TFAIL | TERRNO, "madvise failed");
> >>              return errno;
> >>      }
> >>      return 0;
> >> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
> >>
> >>  static void do_test(unsigned int n)
> >>  {
> >> -    int i;
> >> +    int i, ret;
> >> +    void *ptr;
> >>      pid_t cpid = -1;
> >>      int status;
> >>      unsigned int twenty_percent = (tst_timeout_remaining() / 5);
> >> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
> >>              do_child(tcases[n].tpages);
> >>
> >>      for (i = 0; i < LOOPS; i++) {
> >> -            void *ptr;
> >> +            ptr = mmap(NULL, tcases[n].tpages * hpsz,
> >> +                            PROT_READ | PROT_WRITE,
> >> +                            MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
> -1, 0);
> >> +            if (ptr == MAP_FAILED) {
> >> +                    if (errno == ENOMEM) {
> >> +                            tst_res(TCONF,
> >> +                                    "Cannot allocate hugepage, memory
> too fragmented?");
> >> +                            goto out;
> >> +                    }
> >> +
> >> +                    tst_brk(TBROK | TERRNO, "Cannot allocate
> hugepage");
> >> +            }
> >>
> >> -            ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
> >> -                    PROT_READ | PROT_WRITE,
> >> -                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
> >>              if (ptr != addr)
> >>                      tst_brk(TBROK, "Failed to mmap at desired addr");
> >>
> >>              memset(addr, 0, tcases[n].tpages * hpsz);
> >>
> >>              if (tcases[n].offline) {
> >> -                    if (do_soft_offline(tcases[n].tpages) == EINVAL) {
> >> +                    ret = do_soft_offline(tcases[n].tpages);
> >> +
> >> +                    if (ret == EINVAL) {
> >>                              SAFE_KILL(cpid, SIGKILL);
> >>                              SAFE_WAITPID(cpid, &status, 0);
> >>                              SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> >>                              tst_res(TCONF,
> >>                                      "madvise() didn't support
> MADV_SOFT_OFFLINE");
> >>                              return;
> >> +                    } else if (ret == EBUSY) {
> >> +                            SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> >> +                            goto out;
> >>                      }
> >>              }
> >>
> >> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
> >>                      break;
> >>      }
> >>
> >> +out:
> >>      SAFE_KILL(cpid, SIGKILL);
> >>      SAFE_WAITPID(cpid, &status, 0);
> >> -    if (!WIFEXITED(status))
> >> +    if (!WIFEXITED(status) && ptr != MAP_FAILED)
> >>              tst_res(TPASS, "Bug not reproduced");
> >>  }
> >>
> >> --
> >> 2.20.1
> >>
> >>
> >
> > .
> >
>
>
>
>
Yang Xu June 27, 2019, 2:50 a.m. UTC | #4
on 2019/06/24 10:43, Li Wang wrote:

> Hi Xu Yang,
>
> On Fri, Jun 21, 2019 at 1:58 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com 
> <mailto:xuyang2018.jy@cn.fujitsu.com>> wrote:
>
>
>     > Hi Li Wang,
>     >
>     > Thank you for maintaining the testcase.
>     >
>     > Recently (since 4.19) we have a semantics change on the return
>     value of
>     > madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage
>     migration
>     > succeeded and error containment failed:
>     >
>     >   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
>     >   Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >   Date:   Thu Aug 23 17:00:38 2018 -0700
>     >
>     >       mm: fix race on soft-offlining free huge pages
>     >
>     > , so we don't have to consider this EBUSY as error, but a good
>     report
>     > for application. Your change meets the change.
>     >
>     > Feel free to add my ack:
>     >
>     > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >
>     > Thanks,
>     > - Naoya
>     >
>     > On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
>     >> The test#2 is going to simulate the race condition, where
>     move_pages()
>     >> and soft offline are called on a single hugetlb page
>     concurrently. But,
>     >> it return EBUSY and report FAIL in soft-offline a moving
>     hugepage as a
>     >> result sometimes.
>     >>
>     >> The root cause seems a call to page_huge_active return false,
>     then the
>     >> soft offline action will failed to isolate hugepage with EBUSY
>     return as
>     >> below call trace:
>     >>
>     >> In Parent:
>     >>   madvise(..., MADV_SOFT_OFFLINE)
>     >>   ...
>     >>     soft_offline_page
>     >>       soft_offline_in_use_page
>     >>         soft_offline_huge_page
>     >>           isolate_huge_page
>     >>             page_huge_active  --> return false at here
>     >>
>     >> In Child:
>     >>   move_pages()
>     >>   ...
>     >>     do_move_pages
>     >>       do_move_pages_to_node
>     >>         add_page_for_migration
>     >>           isolate_huge_page   --> it has already isolated the
>     hugepage
>     >>
>     >> In this patch, I simply regard the returned EBUSY as a normal
>     situation and
>     >> mask it in error handler. Because move_pages is calling
>     add_page_for_migration
>     >> to isolate hugepage before do migration, so that's very
>     possible to hit the
>     >> collision and return EBUSY on the same page.
>     >>
>     >> Error log:
>     >> ----------
>     >> move_pages12.c:235: INFO: Free RAM 8386256 kB
>     >> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on
>     node 0 to 4
>     >> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on
>     node 1 to 6
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 0
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 1
>     >> move_pages12.c:169: PASS: Bug not reproduced
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0)
>     failed: ENOMEM
>     >> move_pages12.c:114: FAIL: move_pages failed: EINVAL
>     >>
>     >> Dmesg:
>     >> ------
>     >> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
>     >>
>     >> Other two fixes in this patch:
>     >>  * use TERRNO(but not TTERRNO) to catch madvise(...,
>     MADV_SOFT_OFFLINE) errno
>     >>  * go out test when hugepage allocating failed with ENOMEM
>     Hi Li
>
>     Your patch can handle EBUSY errno correctly for soft offline.
>     But move page  may be killed by SIGBUS because of  MCE  when we
>     soft offline concurrently.
>     That leads to move_page failed with ESRCH.   Also, move page may
>     fails with ENOMEM .
>     Do you notice it ?
>
>
> I didn't get this failure, it seems not related to this patch. Two 
> questions:
>
> 1. which kernel version do you test?
> 2. can you reproduce this without my patch?
Hi Li

I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support numa 
and i enable it on kvm. as below:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Penryn</model>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
<numa>
<cell id='0' cpus='0' memory='1048576' unit='KiB'/>
<cell id='1' cpus='1' memory='1048576' unit='KiB'/>
</numa>
</cpu>

Does it only exist on kvm and doesn't  exist on physical machine?  I 
don't have physical machine that supports numa.

And the fix patch has been merged since  3.10.0-957.el7.x86_64 .
Yes,   I can reproduce this without your patch because MCE kills child 
process and move_page gets ESRCH error.


>
>
>     I think ESRCH error can represent the soft offline bug not
>     reproduce because it don't trigger a crash.
>     What do you think about it?
>
>
> Maybe, but it needs to check details on your kernel.
>
>
>     err_log:
>     tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
>     move_pages12.c:236: INFO: Free RAM 119568 kB
>     move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node
>     0 to 83
>     move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node
>     1 to 94
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
>     move_pages12.c:170: PASS: Bug not reproduced
>     tst_test.c:1141: BROK: Test killed by SIGBUS!
>
>     Summary:
>     passed   1
>     failed   0
>     skipped  0
>     warnings 0
>
>     move_pages12.c:114: FAIL: move_pages failed: ESRCH
>
>     dmesg
>     [ 9868.180669] MCE: Killing move_pages12:29616 due to hardware
>     memory corruption fault at 2aaaaac00018
>     [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
>     [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
>     [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000
>
>     Kind Regards,
>     Yang Xu
>
>     >> Signed-off-by: Li Wang <liwang@redhat.com
>     <mailto:liwang@redhat.com>>
>     >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com
>     <mailto:yangx.jy@cn.fujitsu.com>>
>     >> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com
>     <mailto:xuyang2018.jy@cn.fujitsu.com>>
>     >> ---
>     >>  .../kernel/syscalls/move_pages/move_pages12.c | 33
>     ++++++++++++++-----
>     >>  1 file changed, 24 insertions(+), 9 deletions(-)
>     >>
>     >> diff --git
>     a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> index 964b712fb..c446396dc 100644
>     >> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> @@ -77,8 +77,8 @@ static void *addr;
>     >>  static int do_soft_offline(int tpgs)
>     >>  {
>     >>      if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
>     >> -            if (errno != EINVAL)
>     >> -                    tst_res(TFAIL | TTERRNO, "madvise failed");
>     >> +            if (errno != EINVAL && errno != EBUSY)
>     >> +                    tst_res(TFAIL | TERRNO, "madvise failed");
>     >>              return errno;
>     >>      }
>     >>      return 0;
>     >> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
>     >>
>     >>  static void do_test(unsigned int n)
>     >>  {
>     >> -    int i;
>     >> +    int i, ret;
>     >> +    void *ptr;
>     >>      pid_t cpid = -1;
>     >>      int status;
>     >>      unsigned int twenty_percent = (tst_timeout_remaining() / 5);
>     >> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
>     >>              do_child(tcases[n].tpages);
>     >>
>     >>      for (i = 0; i < LOOPS; i++) {
>     >> -            void *ptr;
>     >> +            ptr = mmap(NULL, tcases[n].tpages * hpsz,
>     >> +                            PROT_READ | PROT_WRITE,
>     >> +                            MAP_PRIVATE | MAP_ANONYMOUS |
>     MAP_HUGETLB, -1, 0);
>     >> +            if (ptr == MAP_FAILED) {
>     >> +                    if (errno == ENOMEM) {
>     >> +                            tst_res(TCONF,
>     >> +                                    "Cannot allocate hugepage,
>     memory too fragmented?");
>     >> +                            goto out;
>     >> +                    }
>     >> +
>     >> +                    tst_brk(TBROK | TERRNO, "Cannot allocate
>     hugepage");
>     >> +            }
>     >>
>     >> -            ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
>     >> -                    PROT_READ | PROT_WRITE,
>     >> -                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
>     -1, 0);
>     >>              if (ptr != addr)
>     >>                      tst_brk(TBROK, "Failed to mmap at desired
>     addr");
>     >>
>     >>              memset(addr, 0, tcases[n].tpages * hpsz);
>     >>
>     >>              if (tcases[n].offline) {
>     >> -                    if (do_soft_offline(tcases[n].tpages) ==
>     EINVAL) {
>     >> +                    ret = do_soft_offline(tcases[n].tpages);
>     >> +
>     >> +                    if (ret == EINVAL) {
>     >>                              SAFE_KILL(cpid, SIGKILL);
>     >>                              SAFE_WAITPID(cpid, &status, 0);
>     >>                              SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >>                              tst_res(TCONF,
>     >>                                      "madvise() didn't support
>     MADV_SOFT_OFFLINE");
>     >>                              return;
>     >> +                    } else if (ret == EBUSY) {
>     >> +                            SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >> +                            goto out;
>     >>                      }
>     >>              }
>     >>
>     >> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
>     >>                      break;
>     >>      }
>     >>
>     >> +out:
>     >>      SAFE_KILL(cpid, SIGKILL);
>     >>      SAFE_WAITPID(cpid, &status, 0);
>     >> -    if (!WIFEXITED(status))
>     >> +    if (!WIFEXITED(status) && ptr != MAP_FAILED)
>     >>              tst_res(TPASS, "Bug not reproduced");
>     >>  }
>     >>
>     >> --
>     >> 2.20.1
>     >>
>     >>
>     >
>     > .
>     >
>
>
>
>
>
> -- 
> Regards,
> Li Wang
Cyril Hrubis July 3, 2019, 1:10 p.m. UTC | #5
Hi!
> +			if (ret == EINVAL) {
>  				SAFE_KILL(cpid, SIGKILL);
>  				SAFE_WAITPID(cpid, &status, 0);
>  				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
>  				tst_res(TCONF,
>  					"madvise() didn't support MADV_SOFT_OFFLINE");
>  				return;
> +			} else if (ret == EBUSY) {
> +				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> +				goto out;

Shouldn't we continue with the test here rather than exit?

I guess that there is no harm in doing a few more iterations if we
manage to hit EBUSY, or is there a good reason to exit the test here?

Otherwise the patch looks good.
Li Wang July 4, 2019, 3:29 a.m. UTC | #6
Hi Xu,

On Thu, Jun 27, 2019 at 10:50 AM Yang Xu <xuyang2018.jy@cn.fujitsu.com>
wrote:

> ...
>> Hi Li
>>
>> Your patch can handle EBUSY errno correctly for soft offline.
>> But move page  may be killed by SIGBUS because of  MCE  when we soft
>> offline concurrently.
>> That leads to move_page failed with ESRCH.   Also, move page may fails
>> with ENOMEM .
>> Do you notice it ?
>>
>
> I didn't get this failure, it seems not related to this patch. Two
> questions:
>
> 1. which kernel version do you test?
> 2. can you reproduce this without my patch?
>
> Hi Li
>
> I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support numa
> and i enable it on kvm. as below:
>  <cpu mode='custom' match='exact' check='full'>
>     <model fallback='forbid'>Penryn</model>
>     <feature policy='require' name='x2apic'/>
>     <feature policy='require' name='hypervisor'/>
>     <numa>
>       <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
>       <cell id='1' cpus='1' memory='1048576' unit='KiB'/>
>     </numa>
>   </cpu>
>
> Does it only exist on kvm and doesn't  exist on physical machine?  I don't
> have physical machine that supports numa.
>

I can reproduce your problem on bare metal too, it seems like you hit the
bug as the commit 6bc9b56433b (mm: fix race on soft-offlining free huge
pages) described, which Naoya pointed out before:

See:

+               /*
+                * We set PG_hwpoison only when the migration source
hugepage
+                * was successfully dissolved, because otherwise hwpoisoned
+                * hugepage remains on free hugepage list, then userspace
will
+                * find it as SIGBUS by allocation failure. That's not
expected
+                * in soft-offlining.
+                */
+               ret = dissolve_free_huge_page(page);
+               if (!ret) {
+                       if (set_hwpoison_free_buddy_page(page))
+                               num_poisoned_pages_inc();
+               }

And, this bz still exists in the latest rhel7 kernel, I will open a bug to
RHEL7 product.
Li Wang July 4, 2019, 5:48 a.m. UTC | #7
On Wed, Jul 3, 2019 at 9:10 PM Cyril Hrubis <chrubis@suse.cz> wrote:

> Hi!
> > +                     if (ret == EINVAL) {
> >                               SAFE_KILL(cpid, SIGKILL);
> >                               SAFE_WAITPID(cpid, &status, 0);
> >                               SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> >                               tst_res(TCONF,
> >                                       "madvise() didn't support
> MADV_SOFT_OFFLINE");
> >                               return;
> > +                     } else if (ret == EBUSY) {
> > +                             SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
> > +                             goto out;
>
> Shouldn't we continue with the test here rather than exit?
>
> I guess that there is no harm in doing a few more iterations if we
> manage to hit EBUSY, or is there a good reason to exit the test here?
>

Yes, we can do more iterations then, but it probably makes no sense.

The reason I guess is that, if we get an EBUSY on the hugepage offline,
that means the page is already being isolated by move_pages() in the child
at that moment and we can't really release it. So in the next iteration,
the mmap() will be failed with ENOMEM(since we only have 1 huge page in
/proc/.../nr_hugepages).

To confirm that, I change the code to continue after get EBUSY, but it
couldn't:

# ./move_pages12
tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s
move_pages12.c:251: INFO: Free RAM 30860672 kB
move_pages12.c:269: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:279: INFO: Increasing 2048kB hugepages pool on node 1 to 5
move_pages12.c:195: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:195: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:185: PASS: Bug not reproduced
move_pages12.c:146: CONF: Cannot allocate hugepage, memory too fragmented?

>
> Otherwise the patch looks good.
>

Thanks for review.
Li Wang July 4, 2019, 6:23 a.m. UTC | #8
> iteration, the mmap() will be failed with ENOMEM(since we only have 1 huge
> page in /proc/.../nr_hugepages).
>

Sentence correction:
    It is not "only have 1 huge page in nr_hugepages", I mixed this test
with another case, sorry about that.

But the justification is the same, we don't have enough memory for the
parent does mmap(..., MAP_HUGETLB) in a new loop.
Cyril Hrubis July 10, 2019, 12:59 p.m. UTC | #9
Hi!
> > iteration, the mmap() will be failed with ENOMEM(since we only have 1 huge
> > page in /proc/.../nr_hugepages).
> >
> 
> Sentence correction:
>     It is not "only have 1 huge page in nr_hugepages", I mixed this test
> with another case, sorry about that.
> 
> But the justification is the same, we don't have enough memory for the
> parent does mmap(..., MAP_HUGETLB) in a new loop.

I guess I get it now, if we attempt to continue after EBUSY we unmap()
the memory but that unmap() will happen asynchronously because the
migration is in progress and we hit ENOMEM just in the next iteration of
the loop.

Should we then attempt to retry the mmap() on ENOMEM as well, ideally
with exponential backoff?

Unfortunately we cannot reuse the TST_RETRY_FUNC() as it is because it
exits the test with TBROK on failure, we need a function that actually
returns the last function return value on timeout.
Li Wang July 15, 2019, 9:33 a.m. UTC | #10
On Wed, Jul 10, 2019 at 9:00 PM Cyril Hrubis <chrubis@suse.cz> wrote:

> Hi!
> > > iteration, the mmap() will be failed with ENOMEM(since we only have 1
> huge
> > > page in /proc/.../nr_hugepages).
> > >
> >
> > Sentence correction:
> >     It is not "only have 1 huge page in nr_hugepages", I mixed this test
> > with another case, sorry about that.
> >
> > But the justification is the same, we don't have enough memory for the
> > parent does mmap(..., MAP_HUGETLB) in a new loop.
>
> I guess I get it now, if we attempt to continue after EBUSY we unmap()
> the memory but that unmap() will happen asynchronously because the
> migration is in progress and we hit ENOMEM just in the next iteration of
> the loop.
>
> Should we then attempt to retry the mmap() on ENOMEM as well, ideally
> with exponential backoff?
>

Not very sure if that worth to do.

>
> Unfortunately we cannot reuse the TST_RETRY_FUNC() as it is because it
> exits the test with TBROK on failure, we need a function that actually
> returns the last function return value on timeout.
>

Yes, we could define a new TST_WAIT_FUNC() to return mmap() returned value
on timeout, but it seems hard to give an expected return(ERET) value for
that function, in this case, we could define the ERET as addr since we know
it, but for most situations, we can't make sure what is the address being
returned. Once the returned address is not equal ERET, then it will retry
the mmap() and do not unmmap() the previous memory. That will be terrible.
Li Wang July 24, 2019, 9:33 a.m. UTC | #11
Hi Cyril,

Would you mind if I apply this patch? or do you have other thoughts
besides retry mmap on ENOMEM?
Cyril Hrubis July 24, 2019, 10:46 a.m. UTC | #12
Hi!
> Would you mind if I apply this patch? or do you have other thoughts
> besides retry mmap on ENOMEM?

My only concern is that we may exit the test too soon if we do not
attempt to retry.
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/move_pages/move_pages12.c b/testcases/kernel/syscalls/move_pages/move_pages12.c
index 964b712fb..c446396dc 100644
--- a/testcases/kernel/syscalls/move_pages/move_pages12.c
+++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
@@ -77,8 +77,8 @@  static void *addr;
 static int do_soft_offline(int tpgs)
 {
 	if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
-		if (errno != EINVAL)
-			tst_res(TFAIL | TTERRNO, "madvise failed");
+		if (errno != EINVAL && errno != EBUSY)
+			tst_res(TFAIL | TERRNO, "madvise failed");
 		return errno;
 	}
 	return 0;
@@ -121,7 +121,8 @@  static void do_child(int tpgs)
 
 static void do_test(unsigned int n)
 {
-	int i;
+	int i, ret;
+	void *ptr;
 	pid_t cpid = -1;
 	int status;
 	unsigned int twenty_percent = (tst_timeout_remaining() / 5);
@@ -136,24 +137,37 @@  static void do_test(unsigned int n)
 		do_child(tcases[n].tpages);
 
 	for (i = 0; i < LOOPS; i++) {
-		void *ptr;
+		ptr = mmap(NULL, tcases[n].tpages * hpsz,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+		if (ptr == MAP_FAILED) {
+			if (errno == ENOMEM) {
+				tst_res(TCONF,
+					"Cannot allocate hugepage, memory too fragmented?");
+				goto out;
+			}
+
+			tst_brk(TBROK | TERRNO, "Cannot allocate hugepage");
+		}
 
-		ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
-			PROT_READ | PROT_WRITE,
-			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
 		if (ptr != addr)
 			tst_brk(TBROK, "Failed to mmap at desired addr");
 
 		memset(addr, 0, tcases[n].tpages * hpsz);
 
 		if (tcases[n].offline) {
-			if (do_soft_offline(tcases[n].tpages) == EINVAL) {
+			ret = do_soft_offline(tcases[n].tpages);
+
+			if (ret == EINVAL) {
 				SAFE_KILL(cpid, SIGKILL);
 				SAFE_WAITPID(cpid, &status, 0);
 				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
 				tst_res(TCONF,
 					"madvise() didn't support MADV_SOFT_OFFLINE");
 				return;
+			} else if (ret == EBUSY) {
+				SAFE_MUNMAP(addr, tcases[n].tpages * hpsz);
+				goto out;
 			}
 		}
 
@@ -163,9 +177,10 @@  static void do_test(unsigned int n)
 			break;
 	}
 
+out:
 	SAFE_KILL(cpid, SIGKILL);
 	SAFE_WAITPID(cpid, &status, 0);
-	if (!WIFEXITED(status))
+	if (!WIFEXITED(status) && ptr != MAP_FAILED)
 		tst_res(TPASS, "Bug not reproduced");
 }