mbox series

[RFC,00/13] mm/treewide: Remove pXd_huge() API

Message ID 20240306104147.193052-1-peterx@redhat.com
Headers show
Series mm/treewide: Remove pXd_huge() API | expand

Message

Peter Xu March 6, 2024, 10:41 a.m. UTC
From: Peter Xu <peterx@redhat.com>

[based on akpm/mm-unstable latest commit a7f399ae964e]

In previous work [1], we removed the pXd_large() API, which is arch
specific.  This patchset further removes the hugetlb pXd_huge() API.

Hugetlb was never special on creating huge mappings when compared with
other huge mappings.  Having a standalone API just to detect such pgtable
entries is more or less redundant, especially after the pXd_leaf() API set
is introduced with/without CONFIG_HUGETLB_PAGE.

When looking at this problem, a few issues are also exposed that we don't
have a clear definition of the *_huge() variance API.  This patchset
started by cleaning these issues first, then replace all *_huge() users to
use *_leaf(), then drop all *_huge() code.

On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for
all the rest archs they're reported "false" instead.  This part is done in
patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll
leave that to hmm experts to decide.

Besides, there are three archs (arm, arm64, powerpc) that have slightly
different definitions between the *_huge() v.s. *_leaf() variances.  I
tackled them separately so that it'll be easier for arch experts to chim in
when necessary.  This part is done in patch 6-9.

The final patches 10-13 do the rest on the final removal, since *_leaf()
will be the ultimate API in the future, and we seem to have quite some
confusions on how *_huge() APIs can be defined, provide a rich comment for
*_leaf() API set to define them properly to avoid future misuse, and
hopefully that'll also help new archs to start support huge mappings and
avoid traps (like either swap entries, or PROT_NONE entry checks).

The whole series is only lightly tested on x86, while as usual I don't have
the capability to test all archs that it touches.

Marking this series RFC as of now.

[1] https://lore.kernel.org/r/20240305043750.93762-1-peterx@redhat.com

Peter Xu (13):
  mm/hmm: Process pud swap entry without pud_huge()
  mm/gup: Cache p4d in follow_p4d_mask()
  mm/gup: Check p4d presence before going on
  mm/x86: Change pXd_huge() behavior to exclude swap entries
  mm/sparc: Change pXd_huge() behavior to exclude swap entries
  mm/arm: Use macros to define pmd/pud helpers
  mm/arm: Redefine pmd_huge() with pmd_leaf()
  mm/arm64: Merge pXd_huge() and pXd_leaf() definitions
  mm/powerpc: Redefine pXd_huge() with pXd_leaf()
  mm/gup: Merge pXd huge mapping checks
  mm/treewide: Replace pXd_huge() with pXd_leaf()
  mm/treewide: Remove pXd_huge()
  mm: Document pXd_leaf() API

 arch/arm/include/asm/pgtable-2level.h         |  4 +--
 arch/arm/include/asm/pgtable-3level-hwdef.h   |  1 +
 arch/arm/include/asm/pgtable-3level.h         |  6 ++--
 arch/arm/mm/Makefile                          |  1 -
 arch/arm/mm/hugetlbpage.c                     | 34 -------------------
 arch/arm64/include/asm/pgtable.h              |  6 +++-
 arch/arm64/mm/hugetlbpage.c                   | 18 ++--------
 arch/loongarch/mm/hugetlbpage.c               | 12 +------
 arch/mips/include/asm/pgtable-32.h            |  2 +-
 arch/mips/include/asm/pgtable-64.h            |  2 +-
 arch/mips/mm/hugetlbpage.c                    | 10 ------
 arch/mips/mm/tlb-r4k.c                        |  2 +-
 arch/parisc/mm/hugetlbpage.c                  | 11 ------
 .../include/asm/book3s/64/pgtable-4k.h        | 20 -----------
 .../include/asm/book3s/64/pgtable-64k.h       | 25 --------------
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  3 --
 arch/powerpc/include/asm/nohash/pgtable.h     | 10 ------
 arch/powerpc/mm/pgtable_64.c                  |  6 ++--
 arch/riscv/mm/hugetlbpage.c                   | 10 ------
 arch/s390/mm/hugetlbpage.c                    | 10 ------
 arch/sh/mm/hugetlbpage.c                      | 10 ------
 arch/sparc/mm/hugetlbpage.c                   | 12 -------
 arch/x86/mm/hugetlbpage.c                     | 26 --------------
 arch/x86/mm/pgtable.c                         |  4 +--
 include/linux/hugetlb.h                       | 24 -------------
 include/linux/pgtable.h                       | 24 ++++++++++---
 mm/gup.c                                      | 24 ++++++-------
 mm/hmm.c                                      |  9 ++---
 mm/memory.c                                   |  2 +-
 29 files changed, 56 insertions(+), 272 deletions(-)
 delete mode 100644 arch/arm/mm/hugetlbpage.c

Comments

Christophe Leroy March 11, 2024, 9:58 a.m. UTC | #1
Le 06/03/2024 à 11:41, peterx@redhat.com a écrit :
> From: Peter Xu <peterx@redhat.com>
> 
> [based on akpm/mm-unstable latest commit a7f399ae964e]
> 
> In previous work [1], we removed the pXd_large() API, which is arch
> specific.  This patchset further removes the hugetlb pXd_huge() API.
> 
> Hugetlb was never special on creating huge mappings when compared with
> other huge mappings.  Having a standalone API just to detect such pgtable
> entries is more or less redundant, especially after the pXd_leaf() API set
> is introduced with/without CONFIG_HUGETLB_PAGE.
> 
> When looking at this problem, a few issues are also exposed that we don't
> have a clear definition of the *_huge() variance API.  This patchset
> started by cleaning these issues first, then replace all *_huge() users to
> use *_leaf(), then drop all *_huge() code.
> 
> On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for
> all the rest archs they're reported "false" instead.  This part is done in
> patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll
> leave that to hmm experts to decide.
> 
> Besides, there are three archs (arm, arm64, powerpc) that have slightly
> different definitions between the *_huge() v.s. *_leaf() variances.  I
> tackled them separately so that it'll be easier for arch experts to chim in
> when necessary.  This part is done in patch 6-9.
> 
> The final patches 10-13 do the rest on the final removal, since *_leaf()
> will be the ultimate API in the future, and we seem to have quite some
> confusions on how *_huge() APIs can be defined, provide a rich comment for
> *_leaf() API set to define them properly to avoid future misuse, and
> hopefully that'll also help new archs to start support huge mappings and
> avoid traps (like either swap entries, or PROT_NONE entry checks).
> 
> The whole series is only lightly tested on x86, while as usual I don't have
> the capability to test all archs that it touches.
> 
> Marking this series RFC as of now.
> 
> [1] https://lore.kernel.org/r/20240305043750.93762-1-peterx@redhat.com
> 

Hi Peter, and nice job you are doing in cleaning up things around _huge 
stuff.

One thing that might be worth looking at also at some point is the mess 
around pmd_clear_huge() and pud_clear_huge().

I tried to clean things up with commit c742199a014d ("mm/pgtable: add 
stubs for {pmd/pub}_{set/clear}_huge") but it was reverted because of 
arm64 by commit d8a719059b9d ("Revert "mm/pgtable: add stubs for 
{pmd/pub}_{set/clear}_huge"")

So now powerpc/8xx has to implement pmd_clear_huge() and 
pud_clear_huge() allthough 8xx page hierarchy only has 2 levels.

Christophe
Peter Xu March 12, 2024, 8:01 p.m. UTC | #2
Hi, Christophe,

On Mon, Mar 11, 2024 at 09:58:47AM +0000, Christophe Leroy wrote:
> Hi Peter, and nice job you are doing in cleaning up things around _huge 
> stuff.

Thanks.  I appreciate your help along the way on Power.

> 
> One thing that might be worth looking at also at some point is the mess 
> around pmd_clear_huge() and pud_clear_huge().
> 
> I tried to clean things up with commit c742199a014d ("mm/pgtable: add 
> stubs for {pmd/pub}_{set/clear}_huge") but it was reverted because of 
> arm64 by commit d8a719059b9d ("Revert "mm/pgtable: add stubs for 
> {pmd/pub}_{set/clear}_huge"")
> 
> So now powerpc/8xx has to implement pmd_clear_huge() and 
> pud_clear_huge() allthough 8xx page hierarchy only has 2 levels.

Those are so far out of my radar, as my focus right now is still more on
hugetlbfs relevant side of things, while kernel mappings are not yet
directly involved in hugetlbfs, even though they're still huge mappings.

It's a pity to know that broke arm and got reverted, as that looks like a
good thing to clean it up if ever possible.  I tend to agree with you that
it seems for 3lvl we should define pgd_huge*() instead of pud_huge*(), so
that it looks like the only way to provide such a treewide clean API is to
properly define those APIs for aarch64, and define different pud helpers
for either 3/4 levels.  But I confess I don't think I fully digested all
the bits.

Thanks,