hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
diff mbox series

Message ID 20190808000533.7701-1-mike.kravetz@oracle.com
State Not Applicable
Headers show
Series
  • hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
Related show

Commit Message

Mike Kravetz Aug. 8, 2019, 12:05 a.m. UTC
Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
page migration and page fault.

If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
mutex exists to prevent two tasks from trying to instantiate the same
page.  This protects against the situation where there is only one
hugetlb page, and both tasks would try to allocate.  Without the mutex,
one would fail and SIGBUS even though the other fault would be successful.

There is a similar race between hugetlb page migration and fault.
Migration code will allocate a page for the target of the migration.
It will then unmap the original page from all page tables.  It does
this unmap by first clearing the pte and then writing a migration
entry.  The page table lock is held for the duration of this clear and
write operation.  However, the beginnings of the hugetlb page fault
code optimistically checks the pte without taking the page table lock.
If clear (as it can be during the migration unmap operation), a hugetlb
page allocation is attempted to satisfy the fault.  Note that the page
which will eventually satisfy this fault was already allocated by the
migration code.  However, the allocation within the fault path could
fail which would result in the task incorrectly being sent SIGBUS.

Ideally, we could take the hugetlb fault mutex in the migration code
when modifying the page tables.  However, locks must be taken in the
order of hugetlb fault mutex, page lock, page table lock.  This would
require significant rework of the migration code.  Instead, the issue
is addressed in the hugetlb fault code.  After failing to allocate a
huge page, take the page table lock and check for huge_pte_none before
returning an error.  This is the same check that must be made further
in the code even if page allocation is successful.

Reported-by: Li Wang <liwang@redhat.com>
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Li Wang <liwang@redhat.com>
---
 mm/hugetlb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Naoya Horiguchi Aug. 8, 2019, 3:36 a.m. UTC | #1
On Wed, Aug 07, 2019 at 05:05:33PM -0700, Mike Kravetz wrote:
> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
> in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
> page migration and page fault.
> 
> If a hugetlb page can not be allocated to satisfy a page fault, the task
> is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
> mutex exists to prevent two tasks from trying to instantiate the same
> page.  This protects against the situation where there is only one
> hugetlb page, and both tasks would try to allocate.  Without the mutex,
> one would fail and SIGBUS even though the other fault would be successful.
> 
> There is a similar race between hugetlb page migration and fault.
> Migration code will allocate a page for the target of the migration.
> It will then unmap the original page from all page tables.  It does
> this unmap by first clearing the pte and then writing a migration
> entry.  The page table lock is held for the duration of this clear and
> write operation.  However, the beginnings of the hugetlb page fault
> code optimistically checks the pte without taking the page table lock.
> If clear (as it can be during the migration unmap operation), a hugetlb
> page allocation is attempted to satisfy the fault.  Note that the page
> which will eventually satisfy this fault was already allocated by the
> migration code.  However, the allocation within the fault path could
> fail which would result in the task incorrectly being sent SIGBUS.
> 
> Ideally, we could take the hugetlb fault mutex in the migration code
> when modifying the page tables.  However, locks must be taken in the
> order of hugetlb fault mutex, page lock, page table lock.  This would
> require significant rework of the migration code.  Instead, the issue
> is addressed in the hugetlb fault code.  After failing to allocate a
> huge page, take the page table lock and check for huge_pte_none before
> returning an error.  This is the same check that must be made further
> in the code even if page allocation is successful.
> 
> Reported-by: Li Wang <liwang@redhat.com>
> Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Tested-by: Li Wang <liwang@redhat.com>

Thanks for the work and nice description.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
>  mm/hugetlb.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ede7e7f5d1ab..6d7296dd11b8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3856,6 +3856,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  
>  		page = alloc_huge_page(vma, haddr, 0);
>  		if (IS_ERR(page)) {
> +			/*
> +			 * Returning error will result in faulting task being
> +			 * sent SIGBUS.  The hugetlb fault mutex prevents two
> +			 * tasks from racing to fault in the same page which
> +			 * could result in false unable to allocate errors.
> +			 * Page migration does not take the fault mutex, but
> +			 * does a clear then write of pte's under page table
> +			 * lock.  Page fault code could race with migration,
> +			 * notice the clear pte and try to allocate a page
> +			 * here.  Before returning error, get ptl and make
> +			 * sure there really is no pte entry.
> +			 */
> +			ptl = huge_pte_lock(h, mm, ptep);
> +			if (!huge_pte_none(huge_ptep_get(ptep))) {
> +				ret = 0;
> +				spin_unlock(ptl);
> +				goto out;
> +			}
> +			spin_unlock(ptl);
>  			ret = vmf_error(PTR_ERR(page));
>  			goto out;
>  		}
> -- 
> 2.20.1
> 
>
Michal Hocko Aug. 8, 2019, 7:46 a.m. UTC | #2
On Wed 07-08-19 17:05:33, Mike Kravetz wrote:
> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
> in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
> page migration and page fault.
> 
> If a hugetlb page can not be allocated to satisfy a page fault, the task
> is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
> mutex exists to prevent two tasks from trying to instantiate the same
> page.  This protects against the situation where there is only one
> hugetlb page, and both tasks would try to allocate.  Without the mutex,
> one would fail and SIGBUS even though the other fault would be successful.
> 
> There is a similar race between hugetlb page migration and fault.
> Migration code will allocate a page for the target of the migration.
> It will then unmap the original page from all page tables.  It does
> this unmap by first clearing the pte and then writing a migration
> entry.  The page table lock is held for the duration of this clear and
> write operation.  However, the beginnings of the hugetlb page fault
> code optimistically checks the pte without taking the page table lock.
> If clear (as it can be during the migration unmap operation), a hugetlb
> page allocation is attempted to satisfy the fault.  Note that the page
> which will eventually satisfy this fault was already allocated by the
> migration code.  However, the allocation within the fault path could
> fail which would result in the task incorrectly being sent SIGBUS.
> 
> Ideally, we could take the hugetlb fault mutex in the migration code
> when modifying the page tables.  However, locks must be taken in the
> order of hugetlb fault mutex, page lock, page table lock.  This would
> require significant rework of the migration code.  Instead, the issue
> is addressed in the hugetlb fault code.  After failing to allocate a
> huge page, take the page table lock and check for huge_pte_none before
> returning an error.  This is the same check that must be made further
> in the code even if page allocation is successful.
> 
> Reported-by: Li Wang <liwang@redhat.com>
> Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Tested-by: Li Wang <liwang@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/hugetlb.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ede7e7f5d1ab..6d7296dd11b8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3856,6 +3856,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  
>  		page = alloc_huge_page(vma, haddr, 0);
>  		if (IS_ERR(page)) {
> +			/*
> +			 * Returning error will result in faulting task being
> +			 * sent SIGBUS.  The hugetlb fault mutex prevents two
> +			 * tasks from racing to fault in the same page which
> +			 * could result in false unable to allocate errors.
> +			 * Page migration does not take the fault mutex, but
> +			 * does a clear then write of pte's under page table
> +			 * lock.  Page fault code could race with migration,
> +			 * notice the clear pte and try to allocate a page
> +			 * here.  Before returning error, get ptl and make
> +			 * sure there really is no pte entry.
> +			 */
> +			ptl = huge_pte_lock(h, mm, ptep);
> +			if (!huge_pte_none(huge_ptep_get(ptep))) {
> +				ret = 0;
> +				spin_unlock(ptl);
> +				goto out;
> +			}
> +			spin_unlock(ptl);
>  			ret = vmf_error(PTR_ERR(page));
>  			goto out;
>  		}
> -- 
> 2.20.1
Michal Hocko Aug. 8, 2019, 7:47 a.m. UTC | #3
On Thu 08-08-19 09:46:07, Michal Hocko wrote:
> On Wed 07-08-19 17:05:33, Mike Kravetz wrote:
> > Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
> > in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
> > page migration and page fault.
> > 
> > If a hugetlb page can not be allocated to satisfy a page fault, the task
> > is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
> > mutex exists to prevent two tasks from trying to instantiate the same
> > page.  This protects against the situation where there is only one
> > hugetlb page, and both tasks would try to allocate.  Without the mutex,
> > one would fail and SIGBUS even though the other fault would be successful.
> > 
> > There is a similar race between hugetlb page migration and fault.
> > Migration code will allocate a page for the target of the migration.
> > It will then unmap the original page from all page tables.  It does
> > this unmap by first clearing the pte and then writing a migration
> > entry.  The page table lock is held for the duration of this clear and
> > write operation.  However, the beginnings of the hugetlb page fault
> > code optimistically checks the pte without taking the page table lock.
> > If clear (as it can be during the migration unmap operation), a hugetlb
> > page allocation is attempted to satisfy the fault.  Note that the page
> > which will eventually satisfy this fault was already allocated by the
> > migration code.  However, the allocation within the fault path could
> > fail which would result in the task incorrectly being sent SIGBUS.
> > 
> > Ideally, we could take the hugetlb fault mutex in the migration code
> > when modifying the page tables.  However, locks must be taken in the
> > order of hugetlb fault mutex, page lock, page table lock.  This would
> > require significant rework of the migration code.  Instead, the issue
> > is addressed in the hugetlb fault code.  After failing to allocate a
> > huge page, take the page table lock and check for huge_pte_none before
> > returning an error.  This is the same check that must be made further
> > in the code even if page allocation is successful.
> > 
> > Reported-by: Li Wang <liwang@redhat.com>
> > Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Tested-by: Li Wang <liwang@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Btw. is this worth marking for stable? I haven't seen it triggering
anywhere but artificial tests. On the other hand the patch is quite
straightforward so it shouldn't hurt in general.
Mike Kravetz Aug. 8, 2019, 4:55 p.m. UTC | #4
On 8/8/19 12:47 AM, Michal Hocko wrote:
> On Thu 08-08-19 09:46:07, Michal Hocko wrote:
>> On Wed 07-08-19 17:05:33, Mike Kravetz wrote:
>>> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
>>> in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
>>> page migration and page fault.
<snip>
>>> Reported-by: Li Wang <liwang@redhat.com>
>>> Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>>> Tested-by: Li Wang <liwang@redhat.com>
>>
>> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Btw. is this worth marking for stable? I haven't seen it triggering
> anywhere but artificial tests. On the other hand the patch is quite
> straightforward so it shouldn't hurt in general.

I don't think this really is material for stable.  I added the tag as the
stable AI logic seems to pick up patches whether marked for stable or not.
For example, here is one I explicitly said did not need to go to stable.

https://lkml.org/lkml/2019/6/1/165

Ironic to find that commit message in a stable backport.

I'm happy to drop the Fixes tag.

Andrew, can you drop the tag?  Or would you like me to resend?
Michal Hocko Aug. 8, 2019, 6:53 p.m. UTC | #5
On Thu 08-08-19 09:55:45, Mike Kravetz wrote:
> On 8/8/19 12:47 AM, Michal Hocko wrote:
> > On Thu 08-08-19 09:46:07, Michal Hocko wrote:
> >> On Wed 07-08-19 17:05:33, Mike Kravetz wrote:
> >>> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
> >>> in the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
> >>> page migration and page fault.
> <snip>
> >>> Reported-by: Li Wang <liwang@redhat.com>
> >>> Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
> >>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>> Tested-by: Li Wang <liwang@redhat.com>
> >>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> > Btw. is this worth marking for stable? I haven't seen it triggering
> > anywhere but artificial tests. On the other hand the patch is quite
> > straightforward so it shouldn't hurt in general.
> 
> I don't think this really is material for stable.  I added the tag as the
> stable AI logic seems to pick up patches whether marked for stable or not.
> For example, here is one I explicitly said did not need to go to stable.
> 
> https://lkml.org/lkml/2019/6/1/165
> 
> Ironic to find that commit message in a stable backport.
> 
> I'm happy to drop the Fixes tag.

No, please do not drop the Fixes tag. That is a very _useful_
information. If the stable tree maintainers want to abuse it so be it.
They are responsible for their tree. If you do not think this is a
stable material then fine with me. I tend to agree but that doesn't mean
that we should obfuscate Fixes.
Andrew Morton Aug. 8, 2019, 11:39 p.m. UTC | #6
On Thu, 8 Aug 2019 20:53:13 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> > https://lkml.org/lkml/2019/6/1/165
> > 
> > Ironic to find that commit message in a stable backport.
> > 
> > I'm happy to drop the Fixes tag.
> 
> No, please do not drop the Fixes tag. That is a very _useful_
> information. If the stable tree maintainers want to abuse it so be it.
> They are responsible for their tree. If you do not think this is a
> stable material then fine with me. I tend to agree but that doesn't mean
> that we should obfuscate Fixes.

Well, we're responsible for stable trees too.  And yes, I find it
irksome.  I/we evaluate *every* fix for -stable inclusion and if I/we
decide "no" then dangit, it should be backported.

Maybe we should introduce the Fixes-no-stable: tag.  That should get
their attention.
Michal Hocko Aug. 9, 2019, 6:46 a.m. UTC | #7
On Thu 08-08-19 16:39:28, Andrew Morton wrote:
> On Thu, 8 Aug 2019 20:53:13 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > > https://lkml.org/lkml/2019/6/1/165
> > > 
> > > Ironic to find that commit message in a stable backport.
> > > 
> > > I'm happy to drop the Fixes tag.
> > 
> > No, please do not drop the Fixes tag. That is a very _useful_
> > information. If the stable tree maintainers want to abuse it so be it.
> > They are responsible for their tree. If you do not think this is a
> > stable material then fine with me. I tend to agree but that doesn't mean
> > that we should obfuscate Fixes.
> 
> Well, we're responsible for stable trees too.

We are only responsible as far as to consider whether a patch is worth
backporting to stable trees and my view is that we are doing that
responsible. What do stable maintainers do in the end is their business.

> And yes, I find it
> irksome.  I/we evaluate *every* fix for -stable inclusion and if I/we
> decide "no" then dangit, it should be backported.

Exactly

> Maybe we should introduce the Fixes-no-stable: tag.  That should get
> their attention.

No please, Fixes shouldn't be really tight to any stable tree rules. It
is a very useful indication of which commit has introduced bug/problem
or whatever that the patch follows up to. We in Suse are using this tag
to evaluate potential fixes as the stable is not reliable. We could live
with Fixes-no-stable or whatever other name but does it really makes
sense to complicate the existing state when stable maintainers are doing
whatever they want anyway? Does a tag like that force AI from selecting
a patch? I am not really convinced.
Andrew Morton Aug. 9, 2019, 10:17 p.m. UTC | #8
On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> > Maybe we should introduce the Fixes-no-stable: tag.  That should get
> > their attention.
> 
> No please, Fixes shouldn't be really tight to any stable tree rules. It
> is a very useful indication of which commit has introduced bug/problem
> or whatever that the patch follows up to. We in Suse are using this tag
> to evaluate potential fixes as the stable is not reliable. We could live
> with Fixes-no-stable or whatever other name but does it really makes
> sense to complicate the existing state when stable maintainers are doing
> whatever they want anyway? Does a tag like that force AI from selecting
> a patch? I am not really convinced.

It should work if we ask stable trees maintainers not to backport
such patches.

Sasha, please don't backport patches which are marked Fixes-no-stable:
and which lack a cc:stable tag.
Sasha Levin Aug. 11, 2019, 11:46 p.m. UTC | #9
On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
>On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>
>> > Maybe we should introduce the Fixes-no-stable: tag.  That should get
>> > their attention.
>>
>> No please, Fixes shouldn't be really tight to any stable tree rules. It
>> is a very useful indication of which commit has introduced bug/problem
>> or whatever that the patch follows up to. We in Suse are using this tag
>> to evaluate potential fixes as the stable is not reliable. We could live
>> with Fixes-no-stable or whatever other name but does it really makes
>> sense to complicate the existing state when stable maintainers are doing
>> whatever they want anyway? Does a tag like that force AI from selecting
>> a patch? I am not really convinced.
>
>It should work if we ask stable trees maintainers not to backport
>such patches.
>
>Sasha, please don't backport patches which are marked Fixes-no-stable:
>and which lack a cc:stable tag.

I'll add it to my filter, thank you!

--
Thanks,
Sasha
Michal Hocko Aug. 12, 2019, 8:45 a.m. UTC | #10
On Sun 11-08-19 19:46:14, Sasha Levin wrote:
> On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
> > On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > > Maybe we should introduce the Fixes-no-stable: tag.  That should get
> > > > their attention.
> > > 
> > > No please, Fixes shouldn't be really tight to any stable tree rules. It
> > > is a very useful indication of which commit has introduced bug/problem
> > > or whatever that the patch follows up to. We in Suse are using this tag
> > > to evaluate potential fixes as the stable is not reliable. We could live
> > > with Fixes-no-stable or whatever other name but does it really makes
> > > sense to complicate the existing state when stable maintainers are doing
> > > whatever they want anyway? Does a tag like that force AI from selecting
> > > a patch? I am not really convinced.
> > 
> > It should work if we ask stable trees maintainers not to backport
> > such patches.
> > 
> > Sasha, please don't backport patches which are marked Fixes-no-stable:
> > and which lack a cc:stable tag.
> 
> I'll add it to my filter, thank you!

I would really prefer to stick with Fixes: tag and stable only picking
up cc: stable patches. I really hate to see workarounds for sensible
workflows (marking the Fixes) just because we are trying to hide
something from stable maintainers. Seriously, if stable maintainers have
a different idea about what should be backported, it is their call. They
are the ones to deal with regressions and the backporting effort in
those cases of disagreement.
Vlastimil Babka Aug. 12, 2019, 1:14 p.m. UTC | #11
On 8/12/19 10:45 AM, Michal Hocko wrote:
> On Sun 11-08-19 19:46:14, Sasha Levin wrote:
>> On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
>>> On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>> It should work if we ask stable trees maintainers not to backport
>>> such patches.
>>>
>>> Sasha, please don't backport patches which are marked Fixes-no-stable:
>>> and which lack a cc:stable tag.
>>
>> I'll add it to my filter, thank you!
> 
> I would really prefer to stick with Fixes: tag and stable only picking
> up cc: stable patches. I really hate to see workarounds for sensible
> workflows (marking the Fixes) just because we are trying to hide
> something from stable maintainers. Seriously, if stable maintainers have
> a different idea about what should be backported, it is their call. They
> are the ones to deal with regressions and the backporting effort in
> those cases of disagreement.

+1 on not replacing Fixes: tag with some other name, as there might be
automation (not just at SUSE) relying on it.
As a compromise, we can use something else to convey the "maintainers
really don't recommend a stable backport", that Sasha can add to his filter.
Perhaps counter-intuitively, but it could even look like this:
Cc: stable@vger.kernel.org # not recommended at all by maintainer
Michal Hocko Aug. 12, 2019, 1:22 p.m. UTC | #12
On Mon 12-08-19 15:14:12, Vlastimil Babka wrote:
> On 8/12/19 10:45 AM, Michal Hocko wrote:
> > On Sun 11-08-19 19:46:14, Sasha Levin wrote:
> >> On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
> >>> On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> >>>
> >>> It should work if we ask stable trees maintainers not to backport
> >>> such patches.
> >>>
> >>> Sasha, please don't backport patches which are marked Fixes-no-stable:
> >>> and which lack a cc:stable tag.
> >>
> >> I'll add it to my filter, thank you!
> > 
> > I would really prefer to stick with Fixes: tag and stable only picking
> > up cc: stable patches. I really hate to see workarounds for sensible
> > workflows (marking the Fixes) just because we are trying to hide
> > something from stable maintainers. Seriously, if stable maintainers have
> > a different idea about what should be backported, it is their call. They
> > are the ones to deal with regressions and the backporting effort in
> > those cases of disagreement.
> 
> +1 on not replacing Fixes: tag with some other name, as there might be
> automation (not just at SUSE) relying on it.
> As a compromise, we can use something else to convey the "maintainers
> really don't recommend a stable backport", that Sasha can add to his filter.
> Perhaps counter-intuitively, but it could even look like this:
> Cc: stable@vger.kernel.org # not recommended at all by maintainer

I thought that absence of the Cc is the indication :P. Anyway, I really
do not understand why should we bother, really. I have tried to explain
that stable maintainers should follow Cc: stable because we bother to
consider that part and we are quite good at not forgetting (Thanks
Andrew for persistence). Sasha has told me that MM will be blacklisted
from automagic selection procedure.

I really do not know much more we can do and I really have strong doubts
we should care at all. What is the worst that can happen? A potentially
dangerous commit gets to the stable tree and that blows up? That is
something that is something inherent when relying on AI and
aplies-it-must-be-ok workflow.
Sasha Levin Aug. 12, 2019, 3:33 p.m. UTC | #13
On Mon, Aug 12, 2019 at 03:22:26PM +0200, Michal Hocko wrote:
>On Mon 12-08-19 15:14:12, Vlastimil Babka wrote:
>> On 8/12/19 10:45 AM, Michal Hocko wrote:
>> > On Sun 11-08-19 19:46:14, Sasha Levin wrote:
>> >> On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
>> >>> On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>> >>>
>> >>> It should work if we ask stable trees maintainers not to backport
>> >>> such patches.
>> >>>
>> >>> Sasha, please don't backport patches which are marked Fixes-no-stable:
>> >>> and which lack a cc:stable tag.
>> >>
>> >> I'll add it to my filter, thank you!
>> >
>> > I would really prefer to stick with Fixes: tag and stable only picking
>> > up cc: stable patches. I really hate to see workarounds for sensible
>> > workflows (marking the Fixes) just because we are trying to hide
>> > something from stable maintainers. Seriously, if stable maintainers have
>> > a different idea about what should be backported, it is their call. They
>> > are the ones to deal with regressions and the backporting effort in
>> > those cases of disagreement.
>>
>> +1 on not replacing Fixes: tag with some other name, as there might be
>> automation (not just at SUSE) relying on it.
>> As a compromise, we can use something else to convey the "maintainers
>> really don't recommend a stable backport", that Sasha can add to his filter.
>> Perhaps counter-intuitively, but it could even look like this:
>> Cc: stable@vger.kernel.org # not recommended at all by maintainer
>
>I thought that absence of the Cc is the indication :P. Anyway, I really
>do not understand why should we bother, really. I have tried to explain
>that stable maintainers should follow Cc: stable because we bother to
>consider that part and we are quite good at not forgetting (Thanks
>Andrew for persistence). Sasha has told me that MM will be blacklisted
>from automagic selection procedure.

I'll add mm/ to the ignore list for AUTOSEL patches.

>I really do not know much more we can do and I really have strong doubts
>we should care at all. What is the worst that can happen? A potentially
>dangerous commit gets to the stable tree and that blows up? That is
>something that is something inherent when relying on AI and
>aplies-it-must-be-ok workflow.

The issue I see here is that there's no way to validate the patches that
go in mm/. I'd happily run whatever test suite you use to validate these
patches, but it doesn't exist.

I can run xfstests for fs/, I can run blktests for block/, I can run
kselftests for quite a few other subsystems in the kernel. What can I
run for mm?

I'd be happy to run whatever validation/regression suite for mm/ you
would suggest.

I've heard the "every patch is a snowflake" story quite a few times, and
I understand that most mm/ patches are complex, but we agree that
manually testing every patch isn't scalable, right? Even for patches
that mm/ tags for stable, are they actually tested on every stable tree?
How is it different from the "aplies-it-must-be-ok workflow"?

--
Thanks,
Sasha
Qian Cai Aug. 12, 2019, 4:09 p.m. UTC | #14
On Mon, 2019-08-12 at 11:33 -0400, Sasha Levin wrote:
> On Mon, Aug 12, 2019 at 03:22:26PM +0200, Michal Hocko wrote:
> > On Mon 12-08-19 15:14:12, Vlastimil Babka wrote:
> > > On 8/12/19 10:45 AM, Michal Hocko wrote:
> > > > On Sun 11-08-19 19:46:14, Sasha Levin wrote:
> > > > > On Fri, Aug 09, 2019 at 03:17:18PM -0700, Andrew Morton wrote:
> > > > > > On Fri, 9 Aug 2019 08:46:33 +0200 Michal Hocko <mhocko@kernel.org>
> > > > > > wrote:
> > > > > > 
> > > > > > It should work if we ask stable trees maintainers not to backport
> > > > > > such patches.
> > > > > > 
> > > > > > Sasha, please don't backport patches which are marked Fixes-no-
> > > > > > stable:
> > > > > > and which lack a cc:stable tag.
> > > > > 
> > > > > I'll add it to my filter, thank you!
> > > > 
> > > > I would really prefer to stick with Fixes: tag and stable only picking
> > > > up cc: stable patches. I really hate to see workarounds for sensible
> > > > workflows (marking the Fixes) just because we are trying to hide
> > > > something from stable maintainers. Seriously, if stable maintainers have
> > > > a different idea about what should be backported, it is their call. They
> > > > are the ones to deal with regressions and the backporting effort in
> > > > those cases of disagreement.
> > > 
> > > +1 on not replacing Fixes: tag with some other name, as there might be
> > > automation (not just at SUSE) relying on it.
> > > As a compromise, we can use something else to convey the "maintainers
> > > really don't recommend a stable backport", that Sasha can add to his
> > > filter.
> > > Perhaps counter-intuitively, but it could even look like this:
> > > Cc: stable@vger.kernel.org # not recommended at all by maintainer
> > 
> > I thought that absence of the Cc is the indication :P. Anyway, I really
> > do not understand why should we bother, really. I have tried to explain
> > that stable maintainers should follow Cc: stable because we bother to
> > consider that part and we are quite good at not forgetting (Thanks
> > Andrew for persistence). Sasha has told me that MM will be blacklisted
> > from automagic selection procedure.
> 
> I'll add mm/ to the ignore list for AUTOSEL patches.
> 
> > I really do not know much more we can do and I really have strong doubts
> > we should care at all. What is the worst that can happen? A potentially
> > dangerous commit gets to the stable tree and that blows up? That is
> > something that is something inherent when relying on AI and
> > aplies-it-must-be-ok workflow.
> 
> The issue I see here is that there's no way to validate the patches that
> go in mm/. I'd happily run whatever test suite you use to validate these
> patches, but it doesn't exist.
> 
> I can run xfstests for fs/, I can run blktests for block/, I can run
> kselftests for quite a few other subsystems in the kernel. What can I
> run for mm?

I have been running this for linux-next daily.

https://github.com/cailca/linux-mm

"test.sh" will give you some ideas. All the .config has almost all the MM
debugging options turned on, but it might need some modifications to run on QEMU
 etc.

"compile.sh" will have some additional MM debugging command-line options, and
some keywords to catch compilation warnings for MM.

> 
> I'd be happy to run whatever validation/regression suite for mm/ you
> would suggest.
> 
> I've heard the "every patch is a snowflake" story quite a few times, and
> I understand that most mm/ patches are complex, but we agree that
> manually testing every patch isn't scalable, right? Even for patches
> that mm/ tags for stable, are they actually tested on every stable tree?
> How is it different from the "aplies-it-must-be-ok workflow"?
> 
> --
> Thanks,
> Sasha
>
Andrew Morton Aug. 12, 2019, 9:37 p.m. UTC | #15
On Mon, 12 Aug 2019 11:33:26 -0400 Sasha Levin <sashal@kernel.org> wrote:

> >I thought that absence of the Cc is the indication :P. Anyway, I really
> >do not understand why should we bother, really. I have tried to explain
> >that stable maintainers should follow Cc: stable because we bother to
> >consider that part and we are quite good at not forgetting (Thanks
> >Andrew for persistence). Sasha has told me that MM will be blacklisted
> >from automagic selection procedure.
> 
> I'll add mm/ to the ignore list for AUTOSEL patches.

Thanks, I'm OK with that.  I'll undo Fixes-no-stable.

Although I'd prefer that "akpm" was ignored, rather than "./mm/". 
Plenty of "mm" patches don't touch mm/, such as drivers/base/memory.c,
include/linux/blah, fs/, etc.  And I am diligent about considering
-stable for all the other code I look after.

This doesn't mean that I'm correct all the time, by any means - I'd
like to hear about patches which autosel thinks should be backported
but which don't include the c:stable tag.
Michal Hocko Aug. 13, 2019, 8:43 a.m. UTC | #16
On Mon 12-08-19 11:33:26, Sasha Levin wrote:
[...]
> I'd be happy to run whatever validation/regression suite for mm/ you
> would suggest.

You would have to develop one first and I am afraid that won't be really
simple and useful.

> I've heard the "every patch is a snowflake" story quite a few times, and
> I understand that most mm/ patches are complex, but we agree that
> manually testing every patch isn't scalable, right? Even for patches
> that mm/ tags for stable, are they actually tested on every stable tree?
> How is it different from the "aplies-it-must-be-ok workflow"?

There is a human brain put in and process each patch to make sure that
the change makes sense and we won't break none of many workloads that
people care about. Even if you run your patch throug mm tests which is
by far the most comprehensive test suite I know of we do regress from
time to time. We simply do not have a realistic testing coverage becuase
workload differ quite a lot and they are not really trivial to isolate
to a self contained test case. A lot of functionality doesn't have a
direct interface to test for because it triggers when the system gets
into some state.

Ideal? Not at all and I am happy to hear some better ideas. Until then
we simply have to rely on gut feeling and understanding of the code
and experience from workloads we have seen in the past.

Patch
diff mbox series

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ede7e7f5d1ab..6d7296dd11b8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3856,6 +3856,25 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 
 		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
+			/*
+			 * Returning error will result in faulting task being
+			 * sent SIGBUS.  The hugetlb fault mutex prevents two
+			 * tasks from racing to fault in the same page which
+			 * could result in false unable to allocate errors.
+			 * Page migration does not take the fault mutex, but
+			 * does a clear then write of pte's under page table
+			 * lock.  Page fault code could race with migration,
+			 * notice the clear pte and try to allocate a page
+			 * here.  Before returning error, get ptl and make
+			 * sure there really is no pte entry.
+			 */
+			ptl = huge_pte_lock(h, mm, ptep);
+			if (!huge_pte_none(huge_ptep_get(ptep))) {
+				ret = 0;
+				spin_unlock(ptl);
+				goto out;
+			}
+			spin_unlock(ptl);
 			ret = vmf_error(PTR_ERR(page));
 			goto out;
 		}