diff mbox series

[6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

Message ID 848d857871f457f4df37e80fad468d615b237c24.1643015752.git.christophe.leroy@csgroup.eu (mailing list archive)
State Superseded
Headers show
Series Allocate module text and data separately | expand

Commit Message

Christophe Leroy Jan. 24, 2022, 9:22 a.m. UTC
Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC to allow architectures
to request having modules data in vmalloc area instead of module area.

This is required on powerpc book3s/32 in order to set data non
executable, because it is not possible to set executability on page
basis, this is done per 256 Mbytes segments. The module area has exec
right, vmalloc area has noexec.

This can also be useful on other powerpc/32 in order to maximize the
chance of code being close enough to kernel core to avoid branch
trampolines.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
---
 arch/Kconfig                |  6 +++
 include/linux/module.h      |  8 ++++
 kernel/debug/kdb/kdb_main.c | 10 ++++-
 kernel/module.c             | 73 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 4 deletions(-)

Comments

Doug Anderson Jan. 24, 2022, 9:43 p.m. UTC | #1
Hi,

On Mon, Jan 24, 2022 at 1:22 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2022,8 +2022,11 @@ static int kdb_lsmod(int argc, const char **argv)
>                 if (mod->state == MODULE_STATE_UNFORMED)
>                         continue;
>
> -               kdb_printf("%-20s%8u  0x%px ", mod->name,
> -                          mod->core_layout.size, (void *)mod);
> +               kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +               kdb_printf("/%8u  0x%px ", mod->data_layout.size);

Just counting percentages and arguments, it seems like something's
wrong in the above print statement.

-Doug
Christophe Leroy Jan. 25, 2022, 5:43 a.m. UTC | #2
Le 24/01/2022 à 22:43, Doug Anderson a écrit :
> Hi,
> 
> On Mon, Jan 24, 2022 at 1:22 AM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>> --- a/kernel/debug/kdb/kdb_main.c
>> +++ b/kernel/debug/kdb/kdb_main.c
>> @@ -2022,8 +2022,11 @@ static int kdb_lsmod(int argc, const char **argv)
>>                  if (mod->state == MODULE_STATE_UNFORMED)
>>                          continue;
>>
>> -               kdb_printf("%-20s%8u  0x%px ", mod->name,
>> -                          mod->core_layout.size, (void *)mod);
>> +               kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
>> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>> +               kdb_printf("/%8u  0x%px ", mod->data_layout.size);
> 
> Just counting percentages and arguments, it seems like something's
> wrong in the above print statement.
> 

Yes it seems, the build robot reported something here as well.

Thanks
Christophe
Luis Chamberlain Jan. 25, 2022, 9:10 p.m. UTC | #3
On Mon, Jan 24, 2022 at 09:22:34AM +0000, Christophe Leroy wrote:
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.

Curious about all this branch trampoline talk. Do you have data to show
negative impact with things as-is?

Also, was powerpc/32 broken then without this? The commit log seems to
suggest so, but I don't think that's the case. How was this issue noticed?

Are there other future CPU families being planned where this is all true for
as well? Are they goin to be 32-bit as well?

  Luis
Christophe Leroy Jan. 26, 2022, 6:38 a.m. UTC | #4
Le 25/01/2022 à 22:10, Luis Chamberlain a écrit :
> On Mon, Jan 24, 2022 at 09:22:34AM +0000, Christophe Leroy wrote:
>> This can also be useful on other powerpc/32 in order to maximize the
>> chance of code being close enough to kernel core to avoid branch
>> trampolines.
> 
> Curious about all this branch trampoline talk. Do you have data to show
> negative impact with things as-is?

See 
https://github.com/linuxppc/linux/commit/2ec13df167040cd153c25c4d96d0ffc573ac4c40

Or 
https://github.com/linuxppc/linux/commit/7d485f647c1f4a6976264c90447fb0dbf07b111d


> 
> Also, was powerpc/32 broken then without this? The commit log seems to
> suggest so, but I don't think that's the case. How was this issue noticed?


Your question is related to the trampoline topic or the exec/noexec 
flagging ?

Regarding trampoline, everything is working OK. That's just cherry on 
the cake, when putting data away you can have more code closer to the 
kernel. But that would not have been a reason in itself for this series.

Regarding the exec/noexec discussion, it's a real issue. powerpc/32 
doesn't honor page exec flag, so when you select STRICT_MODULES_RWX and 
flag module data as no-exec, it remains executable. That's because 
powerpc/32 MMU doesn't have a per page exec flag but only a per 
256Mbytes segment exec flag.

Typical PPC32 layount:
0xf0000000-0xffffffff : VMALLOC AREA ==> NO EXEC
0xc0000000-0xefffffff : Linear kernel memory mapping
0xb0000000-0xbfffffff : MODULES AREA ==> EXEC
0x00000000-0xafffffff : User space ==> EXEC

So STRICT_MODULES_RWX is broken on some powerpc/32

> 
> Are there other future CPU families being planned where this is all true for
> as well? Are they goin to be 32-bit as well?

Future I don't know.

Regarding the trampoline stuff, I see at least the following existing 
architectures with a similar constraint:

ARM:

https://elixir.bootlin.com/linux/v5.16/source/arch/arm/include/asm/memory.h#L55

ARM even has a config item to allow trampolines or not. I might add the 
same to powerpc to reduce number of pages used by modules.

https://elixir.bootlin.com/linux/v5.16/source/arch/arm/Kconfig#L1514

NDS32 has the constraint

https://elixir.bootlin.com/linux/v5.16/source/arch/nds32/include/asm/memory.h#L41

NIOS2 has the constraint, allthough they handled it in a different way:

https://elixir.bootlin.com/linux/v5.16/source/arch/nios2/kernel/module.c#L30



Even ARM64 benefits from modules closer to kernel:

https://elixir.bootlin.com/linux/v5.16/source/arch/arm64/Kconfig#L1848


Another future opportunity with the ability to allocate module parts 
separately is the possibility to then use huge vmalloc mappings.

Today huge vmalloc mappings cannot be used for modules, see recent 
discussion at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211227145903.187152-4-wangkefeng.wang@huawei.com/

Christophe
Miroslav Benes Jan. 27, 2022, 4:05 p.m. UTC | #5
> @@ -195,6 +208,9 @@ static void mod_tree_remove(struct module *mod)
>  {
>  	__mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
>  	mod_tree_remove_init(mod);
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +	__mod_tree_remove(&mod->core_layout.mtn, &mod_data_tree);

s/core_layout/data_layout/ ?

> +#endif
>  }

Miroslav
Christophe Leroy Jan. 27, 2022, 6:04 p.m. UTC | #6
Le 27/01/2022 à 17:05, Miroslav Benes a écrit :
>> @@ -195,6 +208,9 @@ static void mod_tree_remove(struct module *mod)
>>   {
>>   	__mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
>>   	mod_tree_remove_init(mod);
>> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>> +	__mod_tree_remove(&mod->core_layout.mtn, &mod_data_tree);
> 
> s/core_layout/data_layout/ ?

Oops, you are right. I should have awaited a few more hours before 
sending v2.

Thanks

Christophe
Luis Chamberlain Feb. 2, 2022, 11:34 p.m. UTC | #7
On Wed, Jan 26, 2022 at 06:38:30AM +0000, Christophe Leroy wrote:
> 
> 
> Le 25/01/2022 à 22:10, Luis Chamberlain a écrit :
> > On Mon, Jan 24, 2022 at 09:22:34AM +0000, Christophe Leroy wrote:
> >> This can also be useful on other powerpc/32 in order to maximize the
> >> chance of code being close enough to kernel core to avoid branch
> >> trampolines.
> > 
> > Curious about all this branch trampoline talk. Do you have data to show
> > negative impact with things as-is?
> 
> See 
> https://github.com/linuxppc/linux/commit/2ec13df167040cd153c25c4d96d0ffc573ac4c40
> 
> Or 
> https://github.com/linuxppc/linux/commit/7d485f647c1f4a6976264c90447fb0dbf07b111d


This was useful and fun to read, thanks.

> > Also, was powerpc/32 broken then without this? The commit log seems to
> > suggest so, but I don't think that's the case. How was this issue noticed?
> 
> 
> Your question is related to the trampoline topic or the exec/noexec 
> flagging ?
> 
> Regarding trampoline, everything is working OK. That's just cherry on 
> the cake, when putting data away you can have more code closer to the 
> kernel. But that would not have been a reason in itself for this series.
> 
> Regarding the exec/noexec discussion, it's a real issue. powerpc/32 
> doesn't honor page exec flag, so when you select STRICT_MODULES_RWX and 
> flag module data as no-exec, it remains executable. That's because 
> powerpc/32 MMU doesn't have a per page exec flag but only a per 
> 256Mbytes segment exec flag.
> 
> Typical PPC32 layount:
> 0xf0000000-0xffffffff : VMALLOC AREA ==> NO EXEC
> 0xc0000000-0xefffffff : Linear kernel memory mapping
> 0xb0000000-0xbfffffff : MODULES AREA ==> EXEC
> 0x00000000-0xafffffff : User space ==> EXEC
> 
> So STRICT_MODULES_RWX is broken on some powerpc/32

You know, this is the sort of information that I think would be
very useful for the commit log. Can you ammend?

> > 
> > Are there other future CPU families being planned where this is all true for
> > as well? Are they goin to be 32-bit as well?
> 
> Future I don't know.
> 
> Regarding the trampoline stuff, I see at least the following existing 
> architectures with a similar constraint:
> 
> ARM:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/include/asm/memory.h#L55
> 
> ARM even has a config item to allow trampolines or not. I might add the 
> same to powerpc to reduce number of pages used by modules.
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/Kconfig#L1514
> 
> NDS32 has the constraint
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nds32/include/asm/memory.h#L41
> 
> NIOS2 has the constraint, allthough they handled it in a different way:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nios2/kernel/module.c#L30
> 
> 
> 
> Even ARM64 benefits from modules closer to kernel:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm64/Kconfig#L1848
> 
> 
> Another future opportunity with the ability to allocate module parts 
> separately is the possibility to then use huge vmalloc mappings.
> 
> Today huge vmalloc mappings cannot be used for modules, see recent 
> discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211227145903.187152-4-wangkefeng.wang@huawei.com/

Alrighty, this is sufficient information, thanks!

  Luis
diff mbox series

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index 847fde3d22cd..ed6a5ab8796d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -883,6 +883,12 @@  config MODULES_USE_ELF_REL
 	  Modules only use ELF REL relocations.  Modules with ELF RELA
 	  relocations will give an error.
 
+config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	bool
+	help
+	  For architectures like powerpc/32 which have constraints on module
+	  allocation and need to allocate module data outside of module area.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
 	bool
 	help
diff --git a/include/linux/module.h b/include/linux/module.h
index fc7adb110a81..3d908bb7ed08 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -421,6 +421,9 @@  struct module {
 	/* Core layout: rbtree is accessed frequently, so keep together. */
 	struct module_layout core_layout __module_layout_align;
 	struct module_layout init_layout;
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	struct module_layout data_layout;
+#endif
 
 	/* Arch-specific module values */
 	struct mod_arch_specific arch;
@@ -592,7 +595,12 @@  static inline bool within_module_layout(unsigned long addr,
 static inline bool within_module_core(unsigned long addr,
 				      const struct module *mod)
 {
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	return within_module_layout(addr, &mod->core_layout) ||
+	       within_module_layout(addr, &mod->data_layout);
+#else
 	return within_module_layout(addr, &mod->core_layout);
+#endif
 }
 
 static inline bool within_module_init(unsigned long addr,
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..b09e92f2c78d 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2022,8 +2022,11 @@  static int kdb_lsmod(int argc, const char **argv)
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
 
-		kdb_printf("%-20s%8u  0x%px ", mod->name,
-			   mod->core_layout.size, (void *)mod);
+		kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+		kdb_printf("/%8u  0x%px ", mod->data_layout.size);
+#endif
+		kdb_printf("  0x%px ", (void *)mod);
 #ifdef CONFIG_MODULE_UNLOAD
 		kdb_printf("%4d ", module_refcount(mod));
 #endif
@@ -2034,6 +2037,9 @@  static int kdb_lsmod(int argc, const char **argv)
 		else
 			kdb_printf(" (Live)");
 		kdb_printf(" 0x%px", mod->core_layout.base);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+		kdb_printf("/0x%px", mod->data_layout.base);
+#endif
 
 #ifdef CONFIG_MODULE_UNLOAD
 		{
diff --git a/kernel/module.c b/kernel/module.c
index de1a9de6a544..53486a65750e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -81,7 +81,9 @@ 
 /* If this is set, the section belongs in the init part of the module */
 #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
 
+#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
 #define	data_layout core_layout
+#endif
 
 /*
  * Mutex protects:
@@ -108,6 +110,12 @@  static struct mod_tree_root {
 	.addr_min = -1UL,
 };
 
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+static struct mod_tree_root mod_data_tree __cacheline_aligned = {
+	.addr_min = -1UL,
+};
+#endif
+
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 
 /*
@@ -183,6 +191,11 @@  static void mod_tree_insert(struct module *mod)
 	__mod_tree_insert(&mod->core_layout.mtn, &mod_tree);
 	if (mod->init_layout.size)
 		__mod_tree_insert(&mod->init_layout.mtn, &mod_tree);
+
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	mod->data_layout.mtn.mod = mod;
+	__mod_tree_insert(&mod->data_layout.mtn, &mod_data_tree);
+#endif
 }
 
 static void mod_tree_remove_init(struct module *mod)
@@ -195,6 +208,9 @@  static void mod_tree_remove(struct module *mod)
 {
 	__mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
 	mod_tree_remove_init(mod);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	__mod_tree_remove(&mod->core_layout.mtn, &mod_data_tree);
+#endif
 }
 
 static struct module *mod_find(unsigned long addr, struct mod_tree_root *tree)
@@ -249,6 +265,9 @@  static void mod_update_bounds(struct module *mod)
 	__mod_update_bounds(mod->core_layout.base, mod->core_layout.size, &mod_tree);
 	if (mod->init_layout.size)
 		__mod_update_bounds(mod->init_layout.base, mod->init_layout.size, &mod_tree);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	__mod_update_bounds(mod->data_layout.base, mod->data_layout.size, &mod_data_tree);
+#endif
 }
 
 #ifdef CONFIG_KGDB_KDB
@@ -1179,6 +1198,17 @@  static ssize_t show_coresize(struct module_attribute *mattr,
 static struct module_attribute modinfo_coresize =
 	__ATTR(coresize, 0444, show_coresize, NULL);
 
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+static ssize_t show_datasize(struct module_attribute *mattr,
+			     struct module_kobject *mk, char *buffer)
+{
+	return sprintf(buffer, "%u\n", mk->mod->data_layout.size);
+}
+
+static struct module_attribute modinfo_datasize =
+	__ATTR(datasize, 0444, show_datasize, NULL);
+#endif
+
 static ssize_t show_initsize(struct module_attribute *mattr,
 			     struct module_kobject *mk, char *buffer)
 {
@@ -1207,6 +1237,9 @@  static struct module_attribute *modinfo_attrs[] = {
 	&modinfo_srcversion,
 	&modinfo_initstate,
 	&modinfo_coresize,
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	&modinfo_datasize,
+#endif
 	&modinfo_initsize,
 	&modinfo_taint,
 #ifdef CONFIG_MODULE_UNLOAD
@@ -2209,6 +2242,9 @@  static void free_module(struct module *mod)
 
 	/* Finally, free the core (containing the module structure) */
 	module_memfree(mod->core_layout.base);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	vfree(mod->data_layout.base);
+#endif
 }
 
 void *__symbol_get(const char *symbol)
@@ -3456,6 +3492,24 @@  static int move_module(struct module *mod, struct load_info *info)
 	} else
 		mod->init_layout.base = NULL;
 
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	/* Do the allocs. */
+	ptr = vmalloc(mod->data_layout.size);
+	/*
+	 * The pointer to this block is stored in the module structure
+	 * which is inside the block. Just mark it as not being a
+	 * leak.
+	 */
+	kmemleak_not_leak(ptr);
+	if (!ptr) {
+		module_memfree(mod->core_layout.base);
+		module_memfree(mod->init_layout.base);
+		return -ENOMEM;
+	}
+
+	memset(ptr, 0, mod->data_layout.size);
+	mod->data_layout.base = ptr;
+#endif
 	/* Transfer each section which specifies SHF_ALLOC */
 	pr_debug("final section addresses:\n");
 	for (i = 0; i < info->hdr->e_shnum; i++) {
@@ -3631,6 +3685,9 @@  static void module_deallocate(struct module *mod, struct load_info *info)
 	module_arch_freeing_init(mod);
 	module_memfree(mod->init_layout.base);
 	module_memfree(mod->core_layout.base);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	vfree(mod->data_layout.base);
+#endif
 }
 
 int __weak module_finalize(const Elf_Ehdr *hdr,
@@ -4597,8 +4654,13 @@  static int m_show(struct seq_file *m, void *p)
 	if (mod->state == MODULE_STATE_UNFORMED)
 		return 0;
 
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	seq_printf(m, "%s %u", mod->name,
+		   mod->init_layout.size + mod->core_layout.size + mod->data_layout.size);
+#else
 	seq_printf(m, "%s %u",
 		   mod->name, mod->init_layout.size + mod->core_layout.size);
+#endif
 	print_unload_info(m, mod);
 
 	/* Informative for users. */
@@ -4721,13 +4783,20 @@  bool is_module_address(unsigned long addr)
 struct module *__module_address(unsigned long addr)
 {
 	struct module *mod;
+	struct mod_tree_root *tree;
 
-	if (addr < mod_tree.addr_min || addr > mod_tree.addr_max)
+	if (addr >= mod_tree.addr_min && addr <= mod_tree.addr_max)
+		tree = &mod_tree;
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+	else if (addr >= mod_data_tree.addr_min && addr <= mod_data_tree.addr_max)
+		tree = &mod_data_tree;
+#endif
+	else
 		return NULL;
 
 	module_assert_mutex_or_preempt();
 
-	mod = mod_find(addr, &mod_tree);
+	mod = mod_find(addr, tree);
 	if (mod) {
 		BUG_ON(!within_module(addr, mod));
 		if (mod->state == MODULE_STATE_UNFORMED)