[RFC] mm: correct status code which move_pages() returns for zero page
diff mbox series

Message ID 20180417110615.16043-1-liwang@redhat.com
State Superseded
Delegated to: Cyril Hrubis
Headers show
Series
  • [RFC] mm: correct status code which move_pages() returns for zero page
Related show

Commit Message

Li Wang April 17, 2018, 11:06 a.m. UTC
move_pages(2) declears that status code for zero page is supposed to
be -EFAULT. But now it (LTP/move_pages04 test) gets -EPERM, the root
cause is that not goto out_flush after store_status() saves the err
which add_page_for_migration() returns for zero page.

LTP move_pages04:
   TFAIL  :  move_pages04.c:143: status[1] is EPERM, expected EFAULT

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko April 17, 2018, 1:03 p.m. UTC | #1
On Tue 17-04-18 19:06:15, Li Wang wrote:
[...]
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f65dd69..2b315fc 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>  			continue;
>  
>  		err = store_status(status, i, err, 1);
> -		if (err)
> +		if (!err)
>  			goto out_flush;

This change just doesn't make any sense to me. Why should we bail out if
the store_status is successul? I am trying to wrap my head around the
test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
explain that move_pages has some semantic issues and the new
implementation might be not 100% replacement. Anyway I am studying the
test case to come up with a proper fix.

>  
>  		err = do_move_pages_to_node(mm, &pagelist, current_node);
> -- 
> 2.9.5
>
Michal Hocko April 17, 2018, 2:14 p.m. UTC | #2
On Tue 17-04-18 15:03:00, Michal Hocko wrote:
> On Tue 17-04-18 19:06:15, Li Wang wrote:
> [...]
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index f65dd69..2b315fc 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >  			continue;
> >  
> >  		err = store_status(status, i, err, 1);
> > -		if (err)
> > +		if (!err)
> >  			goto out_flush;
> 
> This change just doesn't make any sense to me. Why should we bail out if
> the store_status is successul? I am trying to wrap my head around the
> test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
> explain that move_pages has some semantic issues and the new
> implementation might be not 100% replacement. Anyway I am studying the
> test case to come up with a proper fix.

OK, I get what the test cases does. I've failed to see the subtle
difference between alloc_pages_on_node and numa_alloc_onnode. The later
doesn't faul in anything.

Why are we getting EPERM is quite not yet clear to me.
add_page_for_migration uses FOLL_DUMP which should return EFAULT on
zero pages (no_page_table()).

	err = PTR_ERR(page);
	if (IS_ERR(page))
		goto out;

therefore bails out from add_page_for_migration and store_status should
store that value. There shouldn't be any EPERM on the way.

Let me try to reproduce and see what is going on. Btw. which kernel do
you try this on?
Li Wang April 17, 2018, 2:28 p.m. UTC | #3
On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote:

> On Tue 17-04-18 15:03:00, Michal Hocko wrote:
> > On Tue 17-04-18 19:06:15, Li Wang wrote:
> > [...]
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index f65dd69..2b315fc 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm,
> nodemask_t task_nodes,
> > >                     continue;
> > >
> > >             err = store_status(status, i, err, 1);
> > > -           if (err)
> > > +           if (!err)
> > >                     goto out_flush;
> >
> > This change just doesn't make any sense to me. Why should we bail out if
> > the store_status is successul? I am trying to wrap my head around the
> > test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
> > explain that move_pages has some semantic issues and the new
> > implementation might be not 100% replacement. Anyway I am studying the
> > test case to come up with a proper fix.
>
> OK, I get what the test cases does. I've failed to see the subtle
> difference between alloc_pages_on_node and numa_alloc_onnode. The later
> doesn't faul in anything.
>
> Why are we getting EPERM is quite not yet clear to me.
> add_page_for_migration uses FOLL_DUMP which should return EFAULT on
> zero pages (no_page_table()).
>
>         err = PTR_ERR(page);
>         if (IS_ERR(page))
>                 goto out;
>
> therefore bails out from add_page_for_migration and store_status should
> store that value. There shouldn't be any EPERM on the way.
>

Yes, I print the the return value and confirmed the
add_page_for_migration()​
do right things for zero page. and after store_status(...) the status saves
-EFAULT.
So I did the change above.



>
> Let me try to reproduce and see what is going on. Btw. which kernel do
> you try this on?
>

​The latest mainline kernel-4.17-rc1.
Michal Hocko April 17, 2018, 7 p.m. UTC | #4
On Tue 17-04-18 22:28:33, Li Wang wrote:
> On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Tue 17-04-18 15:03:00, Michal Hocko wrote:
> > > On Tue 17-04-18 19:06:15, Li Wang wrote:
> > > [...]
> > > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > > index f65dd69..2b315fc 100644
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm,
> > nodemask_t task_nodes,
> > > >                     continue;
> > > >
> > > >             err = store_status(status, i, err, 1);
> > > > -           if (err)
> > > > +           if (!err)
> > > >                     goto out_flush;
> > >
> > > This change just doesn't make any sense to me. Why should we bail out if
> > > the store_status is successul? I am trying to wrap my head around the
> > > test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
> > > explain that move_pages has some semantic issues and the new
> > > implementation might be not 100% replacement. Anyway I am studying the
> > > test case to come up with a proper fix.
> >
> > OK, I get what the test cases does. I've failed to see the subtle
> > difference between alloc_pages_on_node and numa_alloc_onnode. The later
> > doesn't faul in anything.
> >
> > Why are we getting EPERM is quite not yet clear to me.
> > add_page_for_migration uses FOLL_DUMP which should return EFAULT on
> > zero pages (no_page_table()).
> >
> >         err = PTR_ERR(page);
> >         if (IS_ERR(page))
> >                 goto out;
> >
> > therefore bails out from add_page_for_migration and store_status should
> > store that value. There shouldn't be any EPERM on the way.
> >
> 
> Yes, I print the the return value and confirmed the
> add_page_for_migration()​
> do right things for zero page. and after store_status(...) the status saves
> -EFAULT.
> So I did the change above.

OK, I guess I knnow what is going on. I must be overwriting the status
on the way out by

out_flush:
	/* Make sure we do not overwrite the existing error */
	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
	if (!err1)
		err1 = store_status(status, start, current_node, i - start);

This error handling is rather fragile and I was quite unhappy about it
at the time I was developing it. I have to remember all the details why
I've done it that way but I would bet my hat this is it. More on this
tomorrow.
Zi Yan April 17, 2018, 8:09 p.m. UTC | #5
On 17 Apr 2018, at 15:00, Michal Hocko wrote:

> On Tue 17-04-18 22:28:33, Li Wang wrote:
>> On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote:
>>
>>> On Tue 17-04-18 15:03:00, Michal Hocko wrote:
>>>> On Tue 17-04-18 19:06:15, Li Wang wrote:
>>>> [...]
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index f65dd69..2b315fc 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm,
>>> nodemask_t task_nodes,
>>>>>                     continue;
>>>>>
>>>>>             err = store_status(status, i, err, 1);
>>>>> -           if (err)
>>>>> +           if (!err)
>>>>>                     goto out_flush;
>>>>
>>>> This change just doesn't make any sense to me. Why should we bail out if
>>>> the store_status is successul? I am trying to wrap my head around the
>>>> test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
>>>> explain that move_pages has some semantic issues and the new
>>>> implementation might be not 100% replacement. Anyway I am studying the
>>>> test case to come up with a proper fix.
>>>
>>> OK, I get what the test cases does. I've failed to see the subtle
>>> difference between alloc_pages_on_node and numa_alloc_onnode. The later
>>> doesn't faul in anything.
>>>
>>> Why are we getting EPERM is quite not yet clear to me.
>>> add_page_for_migration uses FOLL_DUMP which should return EFAULT on
>>> zero pages (no_page_table()).
>>>
>>>         err = PTR_ERR(page);
>>>         if (IS_ERR(page))
>>>                 goto out;
>>>
>>> therefore bails out from add_page_for_migration and store_status should
>>> store that value. There shouldn't be any EPERM on the way.
>>>
>>
>> Yes, I print the the return value and confirmed the
>> add_page_for_migration()​
>> do right things for zero page. and after store_status(...) the status saves
>> -EFAULT.
>> So I did the change above.
>
> OK, I guess I knnow what is going on. I must be overwriting the status
> on the way out by
>
> out_flush:
> 	/* Make sure we do not overwrite the existing error */
> 	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
> 	if (!err1)
> 		err1 = store_status(status, start, current_node, i - start);
>
> This error handling is rather fragile and I was quite unhappy about it
> at the time I was developing it. I have to remember all the details why
> I've done it that way but I would bet my hat this is it. More on this
> tomorrow.

Hi Michal and Li,

The problem is that the variable start is not set properly after store_status(),
like the "start = i;" after the first store_status().

The following patch should fix the problem (it has passed all move_pages test cases from ltp
on my machine):

diff --git a/mm/migrate.c b/mm/migrate.c
index f65dd69e1fd1..32afa4723e7f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
                        if (err)
                                goto out;
                }
+               /* Move to next page (i+1), after we have saved page status (until i) */
+               start = i + 1;
                current_node = NUMA_NO_NODE;
        }
 out_flush:

Feel free to check it by yourselves.

--
Best Regards
Yan Zi
Michal Hocko April 18, 2018, 9:07 a.m. UTC | #6
On Tue 17-04-18 16:09:33, Zi Yan wrote:
> On 17 Apr 2018, at 15:00, Michal Hocko wrote:
> 
> > On Tue 17-04-18 22:28:33, Li Wang wrote:
> >> On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote:
> >>
> >>> On Tue 17-04-18 15:03:00, Michal Hocko wrote:
> >>>> On Tue 17-04-18 19:06:15, Li Wang wrote:
> >>>> [...]
> >>>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>>> index f65dd69..2b315fc 100644
> >>>>> --- a/mm/migrate.c
> >>>>> +++ b/mm/migrate.c
> >>>>> @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm,
> >>> nodemask_t task_nodes,
> >>>>>                     continue;
> >>>>>
> >>>>>             err = store_status(status, i, err, 1);
> >>>>> -           if (err)
> >>>>> +           if (!err)
> >>>>>                     goto out_flush;
> >>>>
> >>>> This change just doesn't make any sense to me. Why should we bail out if
> >>>> the store_status is successul? I am trying to wrap my head around the
> >>>> test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to
> >>>> explain that move_pages has some semantic issues and the new
> >>>> implementation might be not 100% replacement. Anyway I am studying the
> >>>> test case to come up with a proper fix.
> >>>
> >>> OK, I get what the test cases does. I've failed to see the subtle
> >>> difference between alloc_pages_on_node and numa_alloc_onnode. The later
> >>> doesn't faul in anything.
> >>>
> >>> Why are we getting EPERM is quite not yet clear to me.
> >>> add_page_for_migration uses FOLL_DUMP which should return EFAULT on
> >>> zero pages (no_page_table()).
> >>>
> >>>         err = PTR_ERR(page);
> >>>         if (IS_ERR(page))
> >>>                 goto out;
> >>>
> >>> therefore bails out from add_page_for_migration and store_status should
> >>> store that value. There shouldn't be any EPERM on the way.
> >>>
> >>
> >> Yes, I print the the return value and confirmed the
> >> add_page_for_migration()​
> >> do right things for zero page. and after store_status(...) the status saves
> >> -EFAULT.
> >> So I did the change above.
> >
> > OK, I guess I knnow what is going on. I must be overwriting the status
> > on the way out by
> >
> > out_flush:
> > 	/* Make sure we do not overwrite the existing error */
> > 	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
> > 	if (!err1)
> > 		err1 = store_status(status, start, current_node, i - start);
> >
> > This error handling is rather fragile and I was quite unhappy about it
> > at the time I was developing it. I have to remember all the details why
> > I've done it that way but I would bet my hat this is it. More on this
> > tomorrow.
> 
> Hi Michal and Li,
> 
> The problem is that the variable start is not set properly after store_status(),
> like the "start = i;" after the first store_status().
> 
> The following patch should fix the problem (it has passed all move_pages test cases from ltp
> on my machine):
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f65dd69e1fd1..32afa4723e7f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>                         if (err)
>                                 goto out;
>                 }
> +               /* Move to next page (i+1), after we have saved page status (until i) */
> +               start = i + 1;
>                 current_node = NUMA_NO_NODE;
>         }
>  out_flush:
> 
> Feel free to check it by yourselves.

Yes, you are right. I never update start if the last page in the range
fails and so we overwrite the whole [start, i] range. I wish the code
wasn't that ugly and subtle but considering how we can fail in different
ways and that we want to batch as much as possible I do not see an easy
way.

Care to send the patch? I would just drop the comment.

Thanks!
Michal Hocko April 18, 2018, 9:19 a.m. UTC | #7
On Wed 18-04-18 11:07:22, Michal Hocko wrote:
> On Tue 17-04-18 16:09:33, Zi Yan wrote:
[...]
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index f65dd69e1fd1..32afa4723e7f 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >                         if (err)
> >                                 goto out;
> >                 }
> > +               /* Move to next page (i+1), after we have saved page status (until i) */
> > +               start = i + 1;
> >                 current_node = NUMA_NO_NODE;
> >         }
> >  out_flush:
> > 
> > Feel free to check it by yourselves.
> 
> Yes, you are right. I never update start if the last page in the range
> fails and so we overwrite the whole [start, i] range. I wish the code
> wasn't that ugly and subtle but considering how we can fail in different
> ways and that we want to batch as much as possible I do not see an easy
> way.
> 
> Care to send the patch? I would just drop the comment.

Hmm, thinking about it some more. An alternative would be to check for
list_empty on the page list. It is a bit larger diff but maybe that
would be tiny bit cleaner because there is simply no point to call
do_move_pages_to_node on an empty list in the first place.
---
diff --git a/mm/migrate.c b/mm/migrate.c
index 507cf9ba21bf..46f93b9ba724 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1634,12 +1634,14 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		current_node = NUMA_NO_NODE;
 	}
 out_flush:
-	/* Make sure we do not overwrite the existing error */
-	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
-	if (!err1)
-		err1 = store_status(status, start, current_node, i - start);
-	if (!err)
-		err = err1;
+	if (!list_empty(&pagelist)) {
+		/* Make sure we do not overwrite the existing error */
+		err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+		if (!err1)
+			err1 = store_status(status, start, current_node, i - start);
+		if (!err)
+			err = err1;
+	}
 out:
 	return err;
 }
Li Wang April 18, 2018, 10:39 a.m. UTC | #8
On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote:

> On Wed 18-04-18 11:07:22, Michal Hocko wrote:
> > On Tue 17-04-18 16:09:33, Zi Yan wrote:
> [...]
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index f65dd69e1fd1..32afa4723e7f 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm,
> nodemask_t task_nodes,
> > >                         if (err)
> > >                                 goto out;
> > >                 }
> > > +               /* Move to next page (i+1), after we have saved page
> status (until i) */
> > > +               start = i + 1;
> > >                 current_node = NUMA_NO_NODE;
> > >         }
> > >  out_flush:
> > >
> > > Feel free to check it by yourselves.
> >
> > Yes, you are right. I never update start if the last page in the range
> > fails and so we overwrite the whole [start, i] range. I wish the code
> > wasn't that ugly and subtle but considering how we can fail in different
> > ways and that we want to batch as much as possible I do not see an easy
> > way.
> >
> > Care to send the patch? I would just drop the comment.
>
> Hmm, thinking about it some more. An alternative would be to check for
> list_empty on the page list. It is a bit larger diff but maybe that
> would be tiny bit cleaner because there is simply no point to call
> do_move_pages_to_node on an empty list in the first place.
>

​Hi Michal, Zi

I tried your patch separately, both of them works fine to me.
Michal Hocko April 18, 2018, 11:29 a.m. UTC | #9
On Wed 18-04-18 18:39:19, Li Wang wrote:
> On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Wed 18-04-18 11:07:22, Michal Hocko wrote:
> > > On Tue 17-04-18 16:09:33, Zi Yan wrote:
> > [...]
> > > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > > index f65dd69e1fd1..32afa4723e7f 100644
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm,
> > nodemask_t task_nodes,
> > > >                         if (err)
> > > >                                 goto out;
> > > >                 }
> > > > +               /* Move to next page (i+1), after we have saved page
> > status (until i) */
> > > > +               start = i + 1;
> > > >                 current_node = NUMA_NO_NODE;
> > > >         }
> > > >  out_flush:
> > > >
> > > > Feel free to check it by yourselves.
> > >
> > > Yes, you are right. I never update start if the last page in the range
> > > fails and so we overwrite the whole [start, i] range. I wish the code
> > > wasn't that ugly and subtle but considering how we can fail in different
> > > ways and that we want to batch as much as possible I do not see an easy
> > > way.
> > >
> > > Care to send the patch? I would just drop the comment.
> >
> > Hmm, thinking about it some more. An alternative would be to check for
> > list_empty on the page list. It is a bit larger diff but maybe that
> > would be tiny bit cleaner because there is simply no point to call
> > do_move_pages_to_node on an empty list in the first place.
> >
> 
> ​Hi Michal, Zi
> 
> I tried your patch separately, both of them works fine to me.

Thanks for retesting! Do you plan to post a patch with the changelog or
should I do it?
Li Wang April 18, 2018, 11:46 a.m. UTC | #10
On Wed, Apr 18, 2018, 19:29 Michal Hocko <mhocko@suse.com> wrote:

> On Wed 18-04-18 18:39:19, Li Wang wrote:
> > On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote:
> >
> > > On Wed 18-04-18 11:07:22, Michal Hocko wrote:
> > > > On Tue 17-04-18 16:09:33, Zi Yan wrote:
> > > [...]
> > > > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > > > index f65dd69e1fd1..32afa4723e7f 100644
> > > > > --- a/mm/migrate.c
> > > > > +++ b/mm/migrate.c
> > > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct
> *mm,
> > > nodemask_t task_nodes,
> > > > >                         if (err)
> > > > >                                 goto out;
> > > > >                 }
> > > > > +               /* Move to next page (i+1), after we have saved
> page
> > > status (until i) */
> > > > > +               start = i + 1;
> > > > >                 current_node = NUMA_NO_NODE;
> > > > >         }
> > > > >  out_flush:
> > > > >
> > > > > Feel free to check it by yourselves.
> > > >
> > > > Yes, you are right. I never update start if the last page in the
> range
> > > > fails and so we overwrite the whole [start, i] range. I wish the code
> > > > wasn't that ugly and subtle but considering how we can fail in
> different
> > > > ways and that we want to batch as much as possible I do not see an
> easy
> > > > way.
> > > >
> > > > Care to send the patch? I would just drop the comment.
> > >
> > > Hmm, thinking about it some more. An alternative would be to check for
> > > list_empty on the page list. It is a bit larger diff but maybe that
> > > would be tiny bit cleaner because there is simply no point to call
> > > do_move_pages_to_node on an empty list in the first place.
> > >
> >
> > ​Hi Michal, Zi
> >
> > I tried your patch separately, both of them works fine to me.
>
> Thanks for retesting! Do you plan to post a patch with the changelog or
> should I do it?
>

You better.

Li Wang
<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Apr 18, 2018, 19:29 Michal Hocko &lt;<a href="mailto:mhocko@suse.com" target="_blank" rel="noreferrer">mhocko@suse.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Wed 18-04-18 18:39:19, Li Wang wrote:<br>
&gt; On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko &lt;<a href="mailto:mhocko@suse.com" rel="noreferrer noreferrer" target="_blank">mhocko@suse.com</a>&gt; wrote:<br>
&gt; <br>
&gt; &gt; On Wed 18-04-18 11:07:22, Michal Hocko wrote:<br>
&gt; &gt; &gt; On Tue 17-04-18 16:09:33, Zi Yan wrote:<br>
&gt; &gt; [...]<br>
&gt; &gt; &gt; &gt; diff --git a/mm/migrate.c b/mm/migrate.c<br>
&gt; &gt; &gt; &gt; index f65dd69e1fd1..32afa4723e7f 100644<br>
&gt; &gt; &gt; &gt; --- a/mm/migrate.c<br>
&gt; &gt; &gt; &gt; +++ b/mm/migrate.c<br>
&gt; &gt; &gt; &gt; @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm,<br>
&gt; &gt; nodemask_t task_nodes,<br>
&gt; &gt; &gt; &gt;                         if (err)<br>
&gt; &gt; &gt; &gt;                                 goto out;<br>
&gt; &gt; &gt; &gt;                 }<br>
&gt; &gt; &gt; &gt; +               /* Move to next page (i+1), after we have saved page<br>
&gt; &gt; status (until i) */<br>
&gt; &gt; &gt; &gt; +               start = i + 1;<br>
&gt; &gt; &gt; &gt;                 current_node = NUMA_NO_NODE;<br>
&gt; &gt; &gt; &gt;         }<br>
&gt; &gt; &gt; &gt;  out_flush:<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; Feel free to check it by yourselves.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Yes, you are right. I never update start if the last page in the range<br>
&gt; &gt; &gt; fails and so we overwrite the whole [start, i] range. I wish the code<br>
&gt; &gt; &gt; wasn&#39;t that ugly and subtle but considering how we can fail in different<br>
&gt; &gt; &gt; ways and that we want to batch as much as possible I do not see an easy<br>
&gt; &gt; &gt; way.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Care to send the patch? I would just drop the comment.<br>
&gt; &gt;<br>
&gt; &gt; Hmm, thinking about it some more. An alternative would be to check for<br>
&gt; &gt; list_empty on the page list. It is a bit larger diff but maybe that<br>
&gt; &gt; would be tiny bit cleaner because there is simply no point to call<br>
&gt; &gt; do_move_pages_to_node on an empty list in the first place.<br>
&gt; &gt;<br>
&gt; <br>
&gt; ​Hi Michal, Zi<br>
&gt; <br>
&gt; I tried your patch separately, both of them works fine to me.<br>
<br>
Thanks for retesting! Do you plan to post a patch with the changelog or<br>
should I do it?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">You better.</div><div dir="auto"><br></div><div dir="auto">Li Wang</div></div>

Patch
diff mbox series

diff --git a/mm/migrate.c b/mm/migrate.c
index f65dd69..2b315fc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1608,7 +1608,7 @@  static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			continue;
 
 		err = store_status(status, i, err, 1);
-		if (err)
+		if (!err)
 			goto out_flush;
 
 		err = do_move_pages_to_node(mm, &pagelist, current_node);