diff mbox series

[v4,4/7] mm/x86: Make pud_leaf() only care about PSE bit

Message ID 20240807194812.819412-5-peterx@redhat.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series mm/mprotect: Fix dax puds | expand

Commit Message

Peter Xu Aug. 7, 2024, 7:48 p.m. UTC
An entry should be reported as PUD leaf even if it's PROT_NONE, in which
case PRESENT bit isn't there. I hit bad pud without this when testing dax
1G on zapping a PROT_NONE PUD.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/pgtable.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Thomas Gleixner Aug. 7, 2024, 10:22 p.m. UTC | #1
On Wed, Aug 07 2024 at 15:48, Peter Xu wrote:
> An entry should be reported as PUD leaf even if it's PROT_NONE, in which
> case PRESENT bit isn't there. I hit bad pud without this when testing dax
> 1G on zapping a PROT_NONE PUD.

That does not qualify as a change log. What you hit is irrelevant unless
you explain the actual underlying problem. See Documentation/process/
including the TIP documentation.

> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index e39311a89bf4..a2a3bd4c1bda 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1078,8 +1078,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
>  #define pud_leaf pud_leaf
>  static inline bool pud_leaf(pud_t pud)
>  {
> -	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
> -		(_PAGE_PSE | _PAGE_PRESENT);
> +	return pud_val(pud) & _PAGE_PSE;
>  }

And the changelog does not explain why this change is not affecting any
existing user of pud_leaf().

Thanks,

        tglx
Peter Xu Aug. 8, 2024, 2:54 p.m. UTC | #2
On Thu, Aug 08, 2024 at 12:22:38AM +0200, Thomas Gleixner wrote:
> On Wed, Aug 07 2024 at 15:48, Peter Xu wrote:
> > An entry should be reported as PUD leaf even if it's PROT_NONE, in which
> > case PRESENT bit isn't there. I hit bad pud without this when testing dax
> > 1G on zapping a PROT_NONE PUD.
> 
> That does not qualify as a change log. What you hit is irrelevant unless
> you explain the actual underlying problem. See Documentation/process/
> including the TIP documentation.

Firstly, thanks a lot for the reviews.

I thought the commit message explained exactly what is the underlying
problem, no?

The problem is even if PROT_NONE, as long as the PSE bit is set on the PUD
it should be treated as a PUD leaf.  Currently, the code will return
pud_leaf()==false for those PROT_NONE PUD entries, and IMHO that is wrong.
This patch wants to make it right.  I still think that's mostly what I put
there in the commit message..

Would you please suggest something so I can try to make it better,
otherwise?  Or it'll be helpful too if you could point out which part of
the two documentations I should reference.

> 
> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > index e39311a89bf4..a2a3bd4c1bda 100644
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -1078,8 +1078,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
> >  #define pud_leaf pud_leaf
> >  static inline bool pud_leaf(pud_t pud)
> >  {
> > -	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
> > -		(_PAGE_PSE | _PAGE_PRESENT);
> > +	return pud_val(pud) & _PAGE_PSE;
> >  }
> 
> And the changelog does not explain why this change is not affecting any
> existing user of pud_leaf().

That's what I want to do: I want to affect them..

And IMHO it's mostly fine before because mprotect() is broken with 1g
anyway, and I guess nobody managed to populate any pud entry with PROT_NONE
on dax 1g before, and that's what this whole series is trying to fix.

Thanks,
Thomas Gleixner Aug. 9, 2024, 12:08 p.m. UTC | #3
On Thu, Aug 08 2024 at 10:54, Peter Xu wrote:
> On Thu, Aug 08, 2024 at 12:22:38AM +0200, Thomas Gleixner wrote:
>> On Wed, Aug 07 2024 at 15:48, Peter Xu wrote:
>> > An entry should be reported as PUD leaf even if it's PROT_NONE, in which
>> > case PRESENT bit isn't there. I hit bad pud without this when testing dax
>> > 1G on zapping a PROT_NONE PUD.
>> 
>> That does not qualify as a change log. What you hit is irrelevant unless
>> you explain the actual underlying problem. See Documentation/process/
>> including the TIP documentation.
>
> Firstly, thanks a lot for the reviews.
>
> I thought the commit message explained exactly what is the underlying
> problem, no?
>
> The problem is even if PROT_NONE, as long as the PSE bit is set on the PUD
> it should be treated as a PUD leaf.

Sure. But why should it be treated so.

> Currently, the code will return pud_leaf()==false for those PROT_NONE
> PUD entries, and IMHO that is wrong.

Your humble opinion is fine, but hardly a technical argument.

> This patch wants to make it right.  I still think that's mostly what I put
> there in the commit message..
>
> Would you please suggest something so I can try to make it better,
> otherwise?  Or it'll be helpful too if you could point out which part of
> the two documentations I should reference.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

  A good structure is to explain the context, the problem and the
  solution in separate paragraphs and this order

>> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> > index e39311a89bf4..a2a3bd4c1bda 100644
>> > --- a/arch/x86/include/asm/pgtable.h
>> > +++ b/arch/x86/include/asm/pgtable.h
>> > @@ -1078,8 +1078,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
>> >  #define pud_leaf pud_leaf
>> >  static inline bool pud_leaf(pud_t pud)
>> >  {
>> > -	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
>> > -		(_PAGE_PSE | _PAGE_PRESENT);
>> > +	return pud_val(pud) & _PAGE_PSE;
>> >  }
>> 
>> And the changelog does not explain why this change is not affecting any
>> existing user of pud_leaf().
>
> That's what I want to do: I want to affect them..

Fine. Just the change log does not tell me what the actual problem is
("I hit something" does not qualify) and "should be reported" is not
helpful either as it does not explain anything

> And IMHO it's mostly fine before because mprotect() is broken with 1g
> anyway, and I guess nobody managed to populate any pud entry with PROT_NONE
> on dax 1g before, and that's what this whole series is trying to fix.

Again your humble opinion matters, but technical facts and analysis
matter way more.

Thanks,

        tglx
Peter Xu Aug. 9, 2024, 1:53 p.m. UTC | #4
On Fri, Aug 09, 2024 at 02:08:28PM +0200, Thomas Gleixner wrote:
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
> 
>   A good structure is to explain the context, the problem and the
>   solution in separate paragraphs and this order

I'll try to follow, thanks.

[...]

> > And IMHO it's mostly fine before because mprotect() is broken with 1g
> > anyway, and I guess nobody managed to populate any pud entry with PROT_NONE
> > on dax 1g before, and that's what this whole series is trying to fix.
> 
> Again your humble opinion matters, but technical facts and analysis
> matter way more.

All the rest comments in the reply were about "why it's a PUD leaf".  So
let me reply in one shot.

Referring to pXd_leaf() documentation in linux/pgtable.h:

/*
 * pXd_leaf() is the API to check whether a pgtable entry is a huge page
 * mapping.  It should work globally across all archs, without any
 * dependency on CONFIG_* options.  For architectures that do not support
 * huge mappings on specific levels, below fallbacks will be used.
 *
 * A leaf pgtable entry should always imply the following:
 *
 * - It is a "present" entry.  IOW, before using this API, please check it
 *   with pXd_present() first. NOTE: it may not always mean the "present
 *   bit" is set.  For example, PROT_NONE entries are always "present".
 *
 * - It should _never_ be a swap entry of any type.  Above "present" check
 *   should have guarded this, but let's be crystal clear on this.
 *
 * - It should contain a huge PFN, which points to a huge page larger than
 *   PAGE_SIZE of the platform.  The PFN format isn't important here.
 *
 * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(),
 *   pXd_devmap(), or hugetlb mappings).
 */

It explicitly stated that PROT_NONE should be treated as a present entry,
and also a leaf. The document is for pXd_leaf(), so it should cover puds
too.  In this specific case of the zapping path, it's only possible it's a
DAX 1G thp.  But pud_leaf() should work for hugetlb too, for example, when
PROT_NONE applied on top of a 1G hugetlb with PSE set.

Unfortunately, I wrote this document in 64078b3d57.. so that's also another
way of saying "my humble opinion".. it's just nobody disagreed so far, and
please shoot if you see any issue out of it.

IOW, I don't think we must define pXd_leaf() like this - we used to define
pXd_leaf() to cover migration entries at least on x86, for example. But per
my own past mm experience, the current way is the right thing to do to make
everything much easier and less error prone.  Sorry, I can't get rid of
"IMHO" here.

Another example of "we can define pXd_leaf() in other ways" is I believe
for PPC 8XX series it's possible to make special use of pmd_leaf() by
allowing pmd_leaf() to return true even for two continuous pte pgtable
covering 8MB memory.  But that will be an extremely special use of
pmd_leaf() even if it comes, maybe worth an update above when it happens,
and it'll only be used by powerpc not any other arch.  It won't happen if
we want to drop 8MB support, though.

So in short, I don't think there's a 100% correct "technical" answer of
saying "how to define pxx_leaf()"; things just keep evolving, and "humble
opinions" keeps coming with some good reasons.  Hope that answers the
question to some extent.

Taking all things into account, I wonder whether below enriched commit
message would get me closer to your ACK on this, trying to follow the rule
you referenced on the order of how context/problem/solution should be
ordered and addressed:

    When working on mprotect() on 1G dax entries, I hit an zap bad pud
    error when zapping a huge pud that is with PROT_NONE permission.

    Here the problem is x86's pud_leaf() requires both PRESENT and PSE bits
    set to report a pud entry as a leaf, but that doesn't look right, as
    it's not following the pXd_leaf() definition that we stick with so far,
    where PROT_NONE entries should be reported as leaves.

    To fix it, change x86's pud_leaf() implementation to only check against
    PSE bit to report a leaf, irrelevant of whether PRESENT bit is set.

Thanks,
diff mbox series

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e39311a89bf4..a2a3bd4c1bda 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1078,8 +1078,7 @@  static inline pmd_t *pud_pgtable(pud_t pud)
 #define pud_leaf pud_leaf
 static inline bool pud_leaf(pud_t pud)
 {
-	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
-		(_PAGE_PSE | _PAGE_PRESENT);
+	return pud_val(pud) & _PAGE_PSE;
 }
 
 static inline int pud_bad(pud_t pud)