diff mbox

powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

Message ID 20160804214713.4baa832e@roar.ozlabs.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Nicholas Piggin Aug. 4, 2016, 11:47 a.m. UTC
On Thu, 04 Aug 2016 12:37:41 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday, August 4, 2016 11:00:49 AM CEST Arnd Bergmann wrote:
> > I tried this
> > 
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index b5e40ed86e60..89bca1a25916 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -44,7 +44,7 @@ modpost_link()
> >         local objects
> >  
> >         if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> > -               objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> > +               objects="${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> >         else
> >                 objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> >         fi
> > 
> > but that did not seem to change anything, the extra symbols are
> > still there. I have not tried to understand what that actually
> > does, so maybe I misunderstood your suggestion.
> >   
> 
> On a second attempt, I did the same change for vmlinux instead of the
> module (d'oh), and got a link failure instead:
> 
> 
> arch/arm/mm/proc-xscale.o: In function `cpu_xscale_do_resume':
> (.text+0x3d4): undefined reference to `cpu_resume_mmu'
> arch/arm/kernel/setup.o: In function `setup_arch':
> setup.c:(.init.text+0x910): undefined reference to `init_uts_ns'
> kernel/nsproxy.o:(.data+0x4): undefined reference to `init_uts_ns'
> kernel/sched/core.o: In function `update_rq_clock':
> core.c:(.text+0x6d8): undefined reference to `paravirt_steal_rq_enabled'
> core.c:(.text+0x6dc): undefined reference to `pv_time_ops'
> kernel/sched/cputime.o: In function `account_process_tick':
> cputime.c:(.text+0x794): undefined reference to `paravirt_steal_enabled'
> cputime.c:(.text+0x7a0): undefined reference to `pv_time_ops'
> kernel/locking/lockdep.o: In function `save_trace':
> lockdep.c:(.text+0xfe8): undefined reference to `save_stack_trace'
> kernel/module.o: In function `load_module':
> module.c:(.text+0x1b54): undefined reference to `elf_check_arch'
> module.c:(.text+0x2024): undefined reference to `apply_relocate'
> kernel/debug/debug_core.o: In function `kgdb_unregister_io_module':
> debug_core.c:(.text+0x2e4): undefined reference to `kgdb_arch_exit'
> kernel/debug/debug_core.o: In function `kgdb_arch_set_breakpoint':
> debug_core.c:(.text+0x3bc): undefined reference to `arch_kgdb_ops'
> kernel/debug/debug_core.o: In function `dbg_remove_all_break':
> debug_core.c:(.text+0x6d0): undefined reference to `arch_kgdb_ops'
> ...
> 
> However, I also see a link failure in some rare configurations
> with just your patch:
> 
> arch/arm/lib/lib.a(io-acorn.o): In function `outsl':
> (.text+0x38): undefined reference to `printk'
> 
> The problem being a file in a library object that is not referenced,
> but that references another symbol that is not defined
> (CONFIG_PRINTK=n).

The first problem is the existing link system is buggy. I think an
unconditional switch to --whole-archive (at least for modular kernels)
should probably be done anyway. For example, on powerpc when building
with --whole-archive, I have:

+dma_noop_alloc
+dma_noop_free
+dma_noop_map_page
+dma_noop_mapping_error
+dma_noop_map_sg
+dma_noop_ops
+dma_noop_supported
+fdt_add_reservemap_entry
+fdt_begin_node
+fdt_create
+fdt_create_empty_tree
+fdt_end_node
+fdt_errtable
+find_cpio_data
+ioremap_page_range

find_cpio_data is unnecessary and it's a codesize regression to link it.
But dma_noop_ops and ioremap_page_range are exported symbols. If I
reference dma_noop_ops from some random module with otherwise unpatched
kernel:

ERROR: "dma_noop_ops" [drivers/char/bsr.ko] undefined!

The real problem is that our linkage requirements are like a shared
library when we build modular.

We could build a list of exports and make it link objects with those
symbols, to solve this, but IMO that's just wasting lipstick on a pig.
But I will to propose a patch to always use --whole-archive, thin
archives or not, and transition all archs over to it in a few release
cycles. It just works by luck right now.

Why is it a pig? Because having the linker to notice no external
references and just skipping the .o completely is trying to use a hammer
as a scalpel. It's just not a very effective way to eliminate dead code
--  I pulled in only a handful of unneeded functions by switching it.

I mean it is a quick simple feature that probably works well enough with
simple build systems. But not an advanced one that builds almost
everything on demand and also has loadable modules and must act like a
shared library.

Real linker DCE is a valid optimisation that can't be replaced by the
build system of course, but we need to do it properly. Here's what I'm
working on.

It applies on top of the previous patch I sent, plus some powerpc stuff
I'm working on that you should be able to just ignore for another arch.
it's a WIP, but if you can see if it works for arm that would be cool.

It doesn't actually build allyesconfig after this,
ld: .tmp_vmlinux1: Too many sections: 220655 (>= 65280)

But on a more reasonable configuration (ppc64le)
    text      data   bss            dec   filename
11191672   1183536   1923820   14299028   vmlinux
10625528    861895   1919707   13407130	  vmlinux.thin+gc

10M-552K   1M-314K         ~   13M-870K

And it actually boots too, which is fairly astounding considering that
it lost half a meg of code and 1/3 of its data. I'm not completely sure
I've not done something wrong...

Thanks,
Nick

Comments

Arnd Bergmann Aug. 4, 2016, 12:09 p.m. UTC | #1
On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:
> On Thu, 04 Aug 2016 12:37:41 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday, August 4, 2016 11:00:49 AM CEST Arnd Bergmann wrote:
> > > I tried this
> > > 
> > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > > index b5e40ed86e60..89bca1a25916 100755
> > > --- a/scripts/link-vmlinux.sh
> > > +++ b/scripts/link-vmlinux.sh
> > > @@ -44,7 +44,7 @@ modpost_link()
> > >         local objects
> > >  
> > >         if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> > > -               objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> > > +               objects="${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> > >         else
> > >                 objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> > >         fi
> > > 
> > > but that did not seem to change anything, the extra symbols are
> > > still there. I have not tried to understand what that actually
> > > does, so maybe I misunderstood your suggestion.
> > >   
> > 
> > On a second attempt, I did the same change for vmlinux instead of the
> > module (d'oh), and got a link failure instead:
> > 
> > 
> > arch/arm/mm/proc-xscale.o: In function `cpu_xscale_do_resume':
> > (.text+0x3d4): undefined reference to `cpu_resume_mmu'
> > arch/arm/kernel/setup.o: In function `setup_arch':
> > ...
> > 
> > However, I also see a link failure in some rare configurations
> > with just your patch:
> > 
> > arch/arm/lib/lib.a(io-acorn.o): In function `outsl':
> > (.text+0x38): undefined reference to `printk'
> > 
> > The problem being a file in a library object that is not referenced,
> > but that references another symbol that is not defined
> > (CONFIG_PRINTK=n).
> 
> The first problem is the existing link system is buggy. I think an
> unconditional switch to --whole-archive (at least for modular kernels)
> should probably be done anyway. For example, on powerpc when building
> with --whole-archive, I have:
> 
> +dma_noop_alloc
> +dma_noop_free
> +dma_noop_map_page
> +dma_noop_mapping_error
> +dma_noop_map_sg
> +dma_noop_ops
> +dma_noop_supported
> +fdt_add_reservemap_entry
> +fdt_begin_node
> +fdt_create
> +fdt_create_empty_tree
> +fdt_end_node
> +fdt_errtable
> +find_cpio_data
> +ioremap_page_range
> 
> find_cpio_data is unnecessary and it's a codesize regression to link it.
> But dma_noop_ops and ioremap_page_range are exported symbols. If I
> reference dma_noop_ops from some random module with otherwise unpatched
> kernel:
> 
> ERROR: "dma_noop_ops" [drivers/char/bsr.ko] undefined!

Right, but only on s390, which is the one architecture using this.
I think we should just have a Kconfig symbol for this file that
gets selected by any architecture that needs it.

This is also what we have ended up doing for almost all other
files in lib/

> The real problem is that our linkage requirements are like a shared
> library when we build modular.
> 
> We could build a list of exports and make it link objects with those
> symbols, to solve this, but IMO that's just wasting lipstick on a pig.
> But I will to propose a patch to always use --whole-archive, thin
> archives or not, and transition all archs over to it in a few release
> cycles. It just works by luck right now.
>
> Why is it a pig? Because having the linker to notice no external
> references and just skipping the .o completely is trying to use a hammer
> as a scalpel. It's just not a very effective way to eliminate dead code
> --  I pulled in only a handful of unneeded functions by switching it.

If we do that, we may just as well get rid of $(lib-y) in the process and
always use $(obj-y).

> I mean it is a quick simple feature that probably works well enough with
> simple build systems. But not an advanced one that builds almost
> everything on demand and also has loadable modules and must act like a
> shared library.
> 
> Real linker DCE is a valid optimisation that can't be replaced by the
> build system of course, but we need to do it properly. Here's what I'm
> working on.
> 
> It applies on top of the previous patch I sent, plus some powerpc stuff
> I'm working on that you should be able to just ignore for another arch.
> it's a WIP, but if you can see if it works for arm that would be cool.
> 
> It doesn't actually build allyesconfig after this,
> ld: .tmp_vmlinux1: Too many sections: 220655 (>= 65280)
> 
> But on a more reasonable configuration (ppc64le)
>     text      data   bss            dec   filename
> 11191672   1183536   1923820   14299028   vmlinux
> 10625528    861895   1919707   13407130	  vmlinux.thin+gc
> 
> 10M-552K   1M-314K         ~   13M-870K

Nice!

> And it actually boots too, which is fairly astounding considering that
> it lost half a meg of code and 1/3 of its data. I'm not completely sure
> I've not done something wrong...

Nicolas Pitre has done some related work, adding him to Cc. IIRC we have
actually had multiple implementations of -ffunction-sections/--gc-sections
in the past that people have used in production, but none of them
ever made it upstream.

One question is whether we should bother with --gc-sections at all,
or use full LTO instead.

	Arnd
---
(full patch quoted below for Nico, no further comments)

> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index e75e17c..1594072 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -104,6 +104,10 @@ LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
>  LDFLAGS_vmlinux	+= --emit-relocs
>  KBUILD_LDFLAGS_MODULE += --emit-relocs
>  
> +KBUILD_CFLAGS	+= -ffunction-sections -fdata-sections
> +LDFLAGS_vmlinux	+= --gc-sections
> +
> +
>  ifeq ($(CONFIG_PPC64),y)
>  ifeq ($(call cc-option-yn,-mcmodel=medium),y)
>  	# -mcmodel=medium breaks modules because it uses 32bit offsets from
> @@ -234,6 +238,8 @@ KBUILD_CFLAGS += $(cpu-as-y)
>  archscripts: scripts_basic
>  	$(Q)$(MAKE) $(build)=arch/powerpc/tools
>  
> +CFLAGS_head_$(CONFIG_WORD_SIZE).o = -fno-function-sections
> +
>  head-y				:= arch/powerpc/kernel/head_$(CONFIG_WORD_SIZE).o
>  head-$(CONFIG_8xx)		:= arch/powerpc/kernel/head_8xx.o
>  head-$(CONFIG_40x)		:= arch/powerpc/kernel/head_40x.o
> @@ -245,6 +251,7 @@ head-$(CONFIG_PPC_FPU)		+= arch/powerpc/kernel/fpu.o
>  head-$(CONFIG_ALTIVEC)		+= arch/powerpc/kernel/vector.o
>  head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)  += arch/powerpc/kernel/prom_init.o
>  
> +
>  core-y				+= arch/powerpc/kernel/ \
>  				   arch/powerpc/mm/ \
>  				   arch/powerpc/lib/ \
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 2da380f..b356e59 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -4,7 +4,10 @@
>  
>  CFLAGS_ptrace.o		+= -DUTS_MACHINE='"$(UTS_MACHINE)"'
>  
> +ccflags-y		+= -fno-function-sections -fno-data-sections
> +
>  subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
> +subdir-ccflags-y	+= -fno-function-sections -fno-data-sections
>  
>  ifeq ($(CONFIG_PPC64),y)
>  CFLAGS_prom_init.o	+= $(NO_MINIMAL_TOC)
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> index 959c131..0856d62 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -56,16 +56,16 @@ SECTIONS
>  	 * in order to optimize stub generation.
>  	 */
>  	.head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
> -		*(.head.text.first_256B);
> +		KEEP(*(.head.text.first_256B));
>  #ifndef CONFIG_PPC_BOOK3S
>  		. = 0x100;
>  #else
> -		*(.head.text.real_vectors);
> -		*(.head.text.real_trampolines);
> -		*(.head.text.virt_vectors);
> -		*(.head.text.virt_trampolines);
> +		KEEP(*(.head.text.real_vectors));
> +		KEEP(*(.head.text.real_trampolines));
> +		KEEP(*(.head.text.virt_vectors));
> +		KEEP(*(.head.text.virt_trampolines));
>  #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
> -		*(.head.data.fwnmi_page);
> +		KEEP(*(.head.data.fwnmi_page));
>  		. = 0x8000;
>  #else
>  		. = 0x7000;
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 6a67ab9..3a35719 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -312,76 +312,76 @@
>  	/* Kernel symbol table: Normal symbols */			\
>  	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
>  		VMLINUX_SYMBOL(__start___ksymtab) = .;			\
> -		*(SORT(___ksymtab+*))					\
> +		KEEP(*(SORT(___ksymtab+*)))				\
>  		VMLINUX_SYMBOL(__stop___ksymtab) = .;			\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-only symbols */			\
>  	__ksymtab_gpl     : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) {	\
>  		VMLINUX_SYMBOL(__start___ksymtab_gpl) = .;		\
> -		*(SORT(___ksymtab_gpl+*))				\
> +		KEEP(*(SORT(___ksymtab_gpl+*)))				\
>  		VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .;		\
>  	}								\
>  									\
>  	/* Kernel symbol table: Normal unused symbols */		\
>  	__ksymtab_unused  : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) {	\
>  		VMLINUX_SYMBOL(__start___ksymtab_unused) = .;		\
> -		*(SORT(___ksymtab_unused+*))				\
> +		KEEP(*(SORT(___ksymtab_unused+*)))			\
>  		VMLINUX_SYMBOL(__stop___ksymtab_unused) = .;		\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-only unused symbols */		\
>  	__ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \
>  		VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .;	\
> -		*(SORT(___ksymtab_unused_gpl+*))			\
> +		KEEP(*(SORT(___ksymtab_unused_gpl+*)))			\
>  		VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .;	\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-future-only symbols */		\
>  	__ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \
>  		VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .;	\
> -		*(SORT(___ksymtab_gpl_future+*))			\
> +		KEEP(*(SORT(___ksymtab_gpl_future+*)))			\
>  		VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .;	\
>  	}								\
>  									\
>  	/* Kernel symbol table: Normal symbols */			\
>  	__kcrctab         : AT(ADDR(__kcrctab) - LOAD_OFFSET) {		\
>  		VMLINUX_SYMBOL(__start___kcrctab) = .;			\
> -		*(SORT(___kcrctab+*))					\
> +		KEEP(*(SORT(___kcrctab+*)))				\
>  		VMLINUX_SYMBOL(__stop___kcrctab) = .;			\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-only symbols */			\
>  	__kcrctab_gpl     : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) {	\
>  		VMLINUX_SYMBOL(__start___kcrctab_gpl) = .;		\
> -		*(SORT(___kcrctab_gpl+*))				\
> +		KEEP(*(SORT(___kcrctab_gpl+*)))				\
>  		VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .;		\
>  	}								\
>  									\
>  	/* Kernel symbol table: Normal unused symbols */		\
>  	__kcrctab_unused  : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) {	\
>  		VMLINUX_SYMBOL(__start___kcrctab_unused) = .;		\
> -		*(SORT(___kcrctab_unused+*))				\
> +		KEEP(*(SORT(___kcrctab_unused+*)))			\
>  		VMLINUX_SYMBOL(__stop___kcrctab_unused) = .;		\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-only unused symbols */		\
>  	__kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \
>  		VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .;	\
> -		*(SORT(___kcrctab_unused_gpl+*))			\
> +		KEEP(*(SORT(___kcrctab_unused_gpl+*)))			\
>  		VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .;	\
>  	}								\
>  									\
>  	/* Kernel symbol table: GPL-future-only symbols */		\
>  	__kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \
>  		VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .;	\
> -		*(SORT(___kcrctab_gpl_future+*))			\
> +		KEEP(*(SORT(___kcrctab_gpl_future+*)))			\
>  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
>  	}								\
>  									\
>  	/* Kernel symbol table: strings */				\
>          __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
> -		*(__ksymtab_strings)					\
> +		KEEP(*(__ksymtab_strings))				\
>  	}								\
>  									\
>  	/* __*init sections */						\
> @@ -519,6 +519,7 @@
>  
>  /* init and exit section handling */
>  #define INIT_DATA							\
> +	KEEP(*(SORT(___kentry+*)))					\
>  	*(.init.data)							\
>  	MEM_DISCARD(init.data)						\
>  	KERNEL_CTORS()							\
> @@ -695,9 +696,9 @@
>  #define INIT_RAM_FS							\
>  	. = ALIGN(4);							\
>  	VMLINUX_SYMBOL(__initramfs_start) = .;				\
> -	*(.init.ramfs)							\
> +	KEEP(*(.init.ramfs))						\
>  	. = ALIGN(8);							\
> -	*(.init.ramfs.info)
> +	KEEP(*(.init.ramfs.info))
>  #else
>  #define INIT_RAM_FS
>  #endif
> diff --git a/include/linux/export.h b/include/linux/export.h
> index 2f9ccbe..a921862 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -46,7 +46,7 @@ extern struct module __this_module;
>  	extern __visible void *__crc_##sym __attribute__((weak));		\
>  	static const unsigned long __kcrctab_##sym		\
>  	__used							\
> -	__attribute__((section("___kcrctab" sec "+" #sym), unused))	\
> +	__attribute__((section("___kcrctab" sec "+" #sym ",\"a\",@note #"), used))	\
>  	= (unsigned long) &__crc_##sym;
>  #else
>  #define __CRC_SYMBOL(sym, sec)
> @@ -57,12 +57,12 @@ extern struct module __this_module;
>  	extern typeof(sym) sym;					\
>  	__CRC_SYMBOL(sym, sec)					\
>  	static const char __kstrtab_##sym[]			\
> -	__attribute__((section("__ksymtab_strings"), aligned(1))) \
> +	__attribute__((section("__ksymtab_strings" ",\"a\",@note #"), aligned(1))) \
>  	= VMLINUX_SYMBOL_STR(sym);				\
>  	extern const struct kernel_symbol __ksymtab_##sym;	\
>  	__visible const struct kernel_symbol __ksymtab_##sym	\
>  	__used							\
> -	__attribute__((section("___ksymtab" sec "+" #sym), unused))	\
> +	__attribute__((section("___ksymtab" sec "+" #sym ",\"a\",@note #"), used))	\
>  	= { (unsigned long)&sym, __kstrtab_##sym }
>  
>  #if defined(__KSYM_DEPS__)
> diff --git a/include/linux/init.h b/include/linux/init.h
> index aedb254..51393f4 100644
> --- a/include/linux/init.h
> +++ b/include/linux/init.h
> @@ -156,19 +156,20 @@ extern bool initcall_debug;
>  
>  #ifndef __ASSEMBLY__
>  
> -#ifdef CONFIG_LTO
> +#if 1
>  /* Work around a LTO gcc problem: when there is no reference to a variable
>   * in a module it will be moved to the end of the program. This causes
>   * reordering of initcalls which the kernel does not like.
>   * Add a dummy reference function to avoid this. The function is
>   * deleted by the linker.
>   */
> -#define LTO_REFERENCE_INITCALL(x) \
> -	; /* yes this is needed */			\
> -	static __used __exit void *reference_##x(void)	\
> -	{						\
> -		return &x;				\
> -	}
> +#define LTO_REFERENCE_INITCALL(sym) \
> +	extern typeof(sym) sym;					\
> +	/* extern const unsigned long __kentry_##sym; */		\
> +	static /* __visible */ const unsigned long __kentry_##sym		\
> +	__used							\
> +	__attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \
> +	= (unsigned long)&sym;
>  #else
>  #define LTO_REFERENCE_INITCALL(x)
>  #endif
> @@ -222,16 +223,18 @@ extern bool initcall_debug;
>  
>  #define __initcall(fn) device_initcall(fn)
>  
> -#define __exitcall(fn) \
> -	static exitcall_t __exitcall_##fn __exit_call = fn
> +#define __exitcall(fn)						\
> +	static exitcall_t __exitcall_##fn __exit_call = fn;	\
>  
> -#define console_initcall(fn) \
> -	static initcall_t __initcall_##fn \
> -	__used __section(.con_initcall.init) = fn
> +#define console_initcall(fn)					\
> +	static initcall_t __initcall_##fn			\
> +	__used __section(.con_initcall.init) = fn;		\
> +	LTO_REFERENCE_INITCALL(__initcall_##fn)
>  
> -#define security_initcall(fn) \
> -	static initcall_t __initcall_##fn \
> -	__used __section(.security_initcall.init) = fn
> +#define security_initcall(fn)					\
> +	static initcall_t __initcall_##fn			\
> +	__used __section(.security_initcall.init) = fn;		\
> +	LTO_REFERENCE_INITCALL(__initcall_##fn)
>  
>  struct obs_kernel_param {
>  	const char *str;
> diff --git a/init/Makefile b/init/Makefile
> index 7bc47ee..c4fb455 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -2,6 +2,8 @@
>  # Makefile for the linux kernel.
>  #
>  
> +ccflags-y := -fno-function-sections -fno-data-sections
> +
>  obj-y                          := main.o version.o mounts.o
>  ifneq ($(CONFIG_BLK_DEV_INITRD),y)
>  obj-y                          += noinitramfs.o
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index ef4658f..fb848af 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -37,17 +37,22 @@ info()
>  	fi
>  }
>  
> +# Grab all the EXPORT_SYMBOL symbols in the vmlinux build
> +# ${1} - output file
> +exports_extract()
> +{
> +	${NM} -g ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} |
> +		grep "R __ksymtab_" |
> +		sed 's/.*__ksymtab_\(.*\)$/\1/' > ${1}
> +}
> +
>  # Link of vmlinux.o used for section mismatch analysis
>  # ${1} output file
>  modpost_link()
>  {
>  	local objects
>  
> -	if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> -		objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> -	else
> -		objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> -	fi
> +	objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
>  	${LD} ${LDFLAGS} -r -o ${1} ${objects}
>  }
>  
> @@ -60,11 +65,7 @@ vmlinux_link()
>  	local objects
>  
>  	if [ "${SRCARCH}" != "um" ]; then
> -		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> -			objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> -		else
> -			objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> -		fi
> +		objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
>  		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
>  			-T ${lds} ${objects} ${1}
>  	else
>
Nicholas Piggin Aug. 4, 2016, 12:31 p.m. UTC | #2
On Thu, 04 Aug 2016 14:09:02 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:
> > On Thu, 04 Aug 2016 12:37:41 +0200 Arnd Bergmann <arnd@arndb.de> wrote:  
> > > On Thursday, August 4, 2016 11:00:49 AM CEST Arnd Bergmann wrote:  
> > > > I tried this
> > > > 
> > > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > > > index b5e40ed86e60..89bca1a25916 100755
> > > > --- a/scripts/link-vmlinux.sh
> > > > +++ b/scripts/link-vmlinux.sh
> > > > @@ -44,7 +44,7 @@ modpost_link()
> > > >         local objects
> > > >  
> > > >         if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> > > > -               objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> > > > +               objects="${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> > > >         else
> > > >                 objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> > > >         fi
> > > > 
> > > > but that did not seem to change anything, the extra symbols are
> > > > still there. I have not tried to understand what that actually
> > > > does, so maybe I misunderstood your suggestion.
> > > >     
> > > 
> > > On a second attempt, I did the same change for vmlinux instead of the
> > > module (d'oh), and got a link failure instead:
> > > 
> > > 
> > > arch/arm/mm/proc-xscale.o: In function `cpu_xscale_do_resume':
> > > (.text+0x3d4): undefined reference to `cpu_resume_mmu'
> > > arch/arm/kernel/setup.o: In function `setup_arch':
> > > ...
> > > 
> > > However, I also see a link failure in some rare configurations
> > > with just your patch:
> > > 
> > > arch/arm/lib/lib.a(io-acorn.o): In function `outsl':
> > > (.text+0x38): undefined reference to `printk'
> > > 
> > > The problem being a file in a library object that is not referenced,
> > > but that references another symbol that is not defined
> > > (CONFIG_PRINTK=n).  
> > 
> > The first problem is the existing link system is buggy. I think an
> > unconditional switch to --whole-archive (at least for modular kernels)
> > should probably be done anyway. For example, on powerpc when building
> > with --whole-archive, I have:
> > 
> > +dma_noop_alloc
> > +dma_noop_free
> > +dma_noop_map_page
> > +dma_noop_mapping_error
> > +dma_noop_map_sg
> > +dma_noop_ops
> > +dma_noop_supported
> > +fdt_add_reservemap_entry
> > +fdt_begin_node
> > +fdt_create
> > +fdt_create_empty_tree
> > +fdt_end_node
> > +fdt_errtable
> > +find_cpio_data
> > +ioremap_page_range
> > 
> > find_cpio_data is unnecessary and it's a codesize regression to link it.
> > But dma_noop_ops and ioremap_page_range are exported symbols. If I
> > reference dma_noop_ops from some random module with otherwise unpatched
> > kernel:
> > 
> > ERROR: "dma_noop_ops" [drivers/char/bsr.ko] undefined!  
> 
> Right, but only on s390, which is the one architecture using this.
> I think we should just have a Kconfig symbol for this file that
> gets selected by any architecture that needs it.

No, the problem is that the module is being selected and built
but it is missing from the vmlinux despite being exported.


> This is also what we have ended up doing for almost all other
> files in lib/
> 
> > The real problem is that our linkage requirements are like a shared
> > library when we build modular.
> > 
> > We could build a list of exports and make it link objects with those
> > symbols, to solve this, but IMO that's just wasting lipstick on a pig.
> > But I will to propose a patch to always use --whole-archive, thin
> > archives or not, and transition all archs over to it in a few release
> > cycles. It just works by luck right now.
> >
> > Why is it a pig? Because having the linker to notice no external
> > references and just skipping the .o completely is trying to use a hammer
> > as a scalpel. It's just not a very effective way to eliminate dead code
> > --  I pulled in only a handful of unneeded functions by switching it.  
> 
> If we do that, we may just as well get rid of $(lib-y) in the process and
> always use $(obj-y).

Sure, after we switch everybody over.


> > I mean it is a quick simple feature that probably works well enough with
> > simple build systems. But not an advanced one that builds almost
> > everything on demand and also has loadable modules and must act like a
> > shared library.
> > 
> > Real linker DCE is a valid optimisation that can't be replaced by the
> > build system of course, but we need to do it properly. Here's what I'm
> > working on.
> > 
> > It applies on top of the previous patch I sent, plus some powerpc stuff
> > I'm working on that you should be able to just ignore for another arch.
> > it's a WIP, but if you can see if it works for arm that would be cool.
> > 
> > It doesn't actually build allyesconfig after this,
> > ld: .tmp_vmlinux1: Too many sections: 220655 (>= 65280)
> > 
> > But on a more reasonable configuration (ppc64le)
> >     text      data   bss            dec   filename
> > 11191672   1183536   1923820   14299028   vmlinux
> > 10625528    861895   1919707   13407130	  vmlinux.thin+gc
> > 
> > 10M-552K   1M-314K         ~   13M-870K  
> 
> Nice!
> 
> > And it actually boots too, which is fairly astounding considering that
> > it lost half a meg of code and 1/3 of its data. I'm not completely sure
> > I've not done something wrong...  
> 
> Nicolas Pitre has done some related work, adding him to Cc. IIRC we have
> actually had multiple implementations of -ffunction-sections/--gc-sections
> in the past that people have used in production, but none of them
> ever made it upstream.

Well I'll try to get it upstream for powerpc so that Stephen's thin ar
patch does not cause a regression. I don't see the problem -- except
with huge configs (that don't build with mainline powerpc anyway), but
it could be an option for build testers who want to do all(yes|mod)config 

 
> One question is whether we should bother with --gc-sections at all,
> or use full LTO instead.

It's no bother. I'm not even sure lto is a complete superset of
ffunction-sections/gc-sections, but either way it is a huge change to
the build and toolchain, whereas gc sections is relatively unremarkable.
Lto is very interesting but will take a big effort to implement and
prove itself I think.

Thanks,
Nick
Nicholas Piggin Aug. 4, 2016, 1:54 p.m. UTC | #3
On Thu, 4 Aug 2016 22:31:39 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:
> On Thu, 04 Aug 2016 14:09:02 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> > Nicolas Pitre has done some related work, adding him to Cc. IIRC we have
> > actually had multiple implementations of -ffunction-sections/--gc-sections
> > in the past that people have used in production, but none of them
> > ever made it upstream.  

After some googling around it seems lto has been difficult to
get in and it was agreed this gc-sections should be done first
anyway (although it may indeed provide a superset of DCE, but
it's always going to be more costly and complicated). Lto would
have the same issue with liveness of entry points, which is
really the only thing you need change in the kernel as far as I
can see.

I didn't really see what problems people were having with it
though, so maybe it's architecture specific or something I
haven't run into yet.

Thanks,
Nick
Arnd Bergmann Aug. 4, 2016, 3:43 p.m. UTC | #4
On Thursday, August 4, 2016 11:54:18 PM CEST Nicholas Piggin wrote:
> On Thu, 4 Aug 2016 22:31:39 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> > On Thu, 04 Aug 2016 14:09:02 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> > > Nicolas Pitre has done some related work, adding him to Cc. IIRC we have
> > > actually had multiple implementations of -ffunction-sections/--gc-sections
> > > in the past that people have used in production, but none of them
> > > ever made it upstream.  
> 
> After some googling around it seems lto has been difficult to
> get in and it was agreed this gc-sections should be done first
> anyway (although it may indeed provide a superset of DCE, but
> it's always going to be more costly and complicated). Lto would
> have the same issue with liveness of entry points, which is
> really the only thing you need change in the kernel as far as I
> can see.

Ok, good.

> I didn't really see what problems people were having with it
> though, so maybe it's architecture specific or something I
> haven't run into yet.

I remember trying it a few years ago without success, it's possible
that old binutils versions were more problematic.

I'm happy to test your patches on ARM, with my randconfig builder
I tend to find obscure bugs in corner cases that you might not
normally find with just defconfig/allmodconfig builds.

	Arnd
Arnd Bergmann Aug. 4, 2016, 4:10 p.m. UTC | #5
On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:

> +	__used							\
> +	__attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \


I've just started testing this, but the first problem I ran into
is that @ and # are special characters that have an architecture
specific meaning to the assembler. On ARM, you need "%note @" instead
of "@note #".

	Arnd
Segher Boessenkool Aug. 4, 2016, 5:06 p.m. UTC | #6
On Thu, Aug 04, 2016 at 06:10:57PM +0200, Arnd Bergmann wrote:
> On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:
> 
> > +	__used							\
> > +	__attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \
> 
> 
> I've just started testing this, but the first problem I ran into
> is that @ and # are special characters that have an architecture
> specific meaning to the assembler. On ARM, you need "%note @" instead
> of "@note #".

That comment trick (I still feel guilty about it) causes more problems
than it solves.  Please don't try to use it :-)


Segher
Nicholas Piggin Aug. 5, 2016, 8:41 a.m. UTC | #7
On Thu, 4 Aug 2016 12:06:41 -0500
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Aug 04, 2016 at 06:10:57PM +0200, Arnd Bergmann wrote:
> > On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:
> >   
> > > +	__used							\
> > > +	__attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \  
> > 
> > 
> > I've just started testing this, but the first problem I ran into
> > is that @ and # are special characters that have an architecture
> > specific meaning to the assembler. On ARM, you need "%note @" instead
> > of "@note #".  
> 
> That comment trick (I still feel guilty about it) causes more problems
> than it solves.  Please don't try to use it :-)

Yeah that's a funny hack. I don't think it's required though, but I'm just
running through some more tests.

I think I found an improvement with the thin archives as well -- we were
still building symbol table after removing the s option (that only avoids
index). "S" is required to not build symbol table.

I'll send out an RFC on a slightly more polished patch series shortly.

Thanks,
Nick
diff mbox

Patch

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index e75e17c..1594072 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -104,6 +104,10 @@  LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
 LDFLAGS_vmlinux	+= --emit-relocs
 KBUILD_LDFLAGS_MODULE += --emit-relocs
 
+KBUILD_CFLAGS	+= -ffunction-sections -fdata-sections
+LDFLAGS_vmlinux	+= --gc-sections
+
+
 ifeq ($(CONFIG_PPC64),y)
 ifeq ($(call cc-option-yn,-mcmodel=medium),y)
 	# -mcmodel=medium breaks modules because it uses 32bit offsets from
@@ -234,6 +238,8 @@  KBUILD_CFLAGS += $(cpu-as-y)
 archscripts: scripts_basic
 	$(Q)$(MAKE) $(build)=arch/powerpc/tools
 
+CFLAGS_head_$(CONFIG_WORD_SIZE).o = -fno-function-sections
+
 head-y				:= arch/powerpc/kernel/head_$(CONFIG_WORD_SIZE).o
 head-$(CONFIG_8xx)		:= arch/powerpc/kernel/head_8xx.o
 head-$(CONFIG_40x)		:= arch/powerpc/kernel/head_40x.o
@@ -245,6 +251,7 @@  head-$(CONFIG_PPC_FPU)		+= arch/powerpc/kernel/fpu.o
 head-$(CONFIG_ALTIVEC)		+= arch/powerpc/kernel/vector.o
 head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)  += arch/powerpc/kernel/prom_init.o
 
+
 core-y				+= arch/powerpc/kernel/ \
 				   arch/powerpc/mm/ \
 				   arch/powerpc/lib/ \
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..b356e59 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -4,7 +4,10 @@ 
 
 CFLAGS_ptrace.o		+= -DUTS_MACHINE='"$(UTS_MACHINE)"'
 
+ccflags-y		+= -fno-function-sections -fno-data-sections
+
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
+subdir-ccflags-y	+= -fno-function-sections -fno-data-sections
 
 ifeq ($(CONFIG_PPC64),y)
 CFLAGS_prom_init.o	+= $(NO_MINIMAL_TOC)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 959c131..0856d62 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -56,16 +56,16 @@  SECTIONS
 	 * in order to optimize stub generation.
 	 */
 	.head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
-		*(.head.text.first_256B);
+		KEEP(*(.head.text.first_256B));
 #ifndef CONFIG_PPC_BOOK3S
 		. = 0x100;
 #else
-		*(.head.text.real_vectors);
-		*(.head.text.real_trampolines);
-		*(.head.text.virt_vectors);
-		*(.head.text.virt_trampolines);
+		KEEP(*(.head.text.real_vectors));
+		KEEP(*(.head.text.real_trampolines));
+		KEEP(*(.head.text.virt_vectors));
+		KEEP(*(.head.text.virt_trampolines));
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
-		*(.head.data.fwnmi_page);
+		KEEP(*(.head.data.fwnmi_page));
 		. = 0x8000;
 #else
 		. = 0x7000;
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 6a67ab9..3a35719 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -312,76 +312,76 @@ 
 	/* Kernel symbol table: Normal symbols */			\
 	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start___ksymtab) = .;			\
-		*(SORT(___ksymtab+*))					\
+		KEEP(*(SORT(___ksymtab+*)))				\
 		VMLINUX_SYMBOL(__stop___ksymtab) = .;			\
 	}								\
 									\
 	/* Kernel symbol table: GPL-only symbols */			\
 	__ksymtab_gpl     : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) {	\
 		VMLINUX_SYMBOL(__start___ksymtab_gpl) = .;		\
-		*(SORT(___ksymtab_gpl+*))				\
+		KEEP(*(SORT(___ksymtab_gpl+*)))				\
 		VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .;		\
 	}								\
 									\
 	/* Kernel symbol table: Normal unused symbols */		\
 	__ksymtab_unused  : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) {	\
 		VMLINUX_SYMBOL(__start___ksymtab_unused) = .;		\
-		*(SORT(___ksymtab_unused+*))				\
+		KEEP(*(SORT(___ksymtab_unused+*)))			\
 		VMLINUX_SYMBOL(__stop___ksymtab_unused) = .;		\
 	}								\
 									\
 	/* Kernel symbol table: GPL-only unused symbols */		\
 	__ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .;	\
-		*(SORT(___ksymtab_unused_gpl+*))			\
+		KEEP(*(SORT(___ksymtab_unused_gpl+*)))			\
 		VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .;	\
 	}								\
 									\
 	/* Kernel symbol table: GPL-future-only symbols */		\
 	__ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .;	\
-		*(SORT(___ksymtab_gpl_future+*))			\
+		KEEP(*(SORT(___ksymtab_gpl_future+*)))			\
 		VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .;	\
 	}								\
 									\
 	/* Kernel symbol table: Normal symbols */			\
 	__kcrctab         : AT(ADDR(__kcrctab) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start___kcrctab) = .;			\
-		*(SORT(___kcrctab+*))					\
+		KEEP(*(SORT(___kcrctab+*)))				\
 		VMLINUX_SYMBOL(__stop___kcrctab) = .;			\
 	}								\
 									\
 	/* Kernel symbol table: GPL-only symbols */			\
 	__kcrctab_gpl     : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) {	\
 		VMLINUX_SYMBOL(__start___kcrctab_gpl) = .;		\
-		*(SORT(___kcrctab_gpl+*))				\
+		KEEP(*(SORT(___kcrctab_gpl+*)))				\
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .;		\
 	}								\
 									\
 	/* Kernel symbol table: Normal unused symbols */		\
 	__kcrctab_unused  : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) {	\
 		VMLINUX_SYMBOL(__start___kcrctab_unused) = .;		\
-		*(SORT(___kcrctab_unused+*))				\
+		KEEP(*(SORT(___kcrctab_unused+*)))			\
 		VMLINUX_SYMBOL(__stop___kcrctab_unused) = .;		\
 	}								\
 									\
 	/* Kernel symbol table: GPL-only unused symbols */		\
 	__kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .;	\
-		*(SORT(___kcrctab_unused_gpl+*))			\
+		KEEP(*(SORT(___kcrctab_unused_gpl+*)))			\
 		VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .;	\
 	}								\
 									\
 	/* Kernel symbol table: GPL-future-only symbols */		\
 	__kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .;	\
-		*(SORT(___kcrctab_gpl_future+*))			\
+		KEEP(*(SORT(___kcrctab_gpl_future+*)))			\
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
-		*(__ksymtab_strings)					\
+		KEEP(*(__ksymtab_strings))				\
 	}								\
 									\
 	/* __*init sections */						\
@@ -519,6 +519,7 @@ 
 
 /* init and exit section handling */
 #define INIT_DATA							\
+	KEEP(*(SORT(___kentry+*)))					\
 	*(.init.data)							\
 	MEM_DISCARD(init.data)						\
 	KERNEL_CTORS()							\
@@ -695,9 +696,9 @@ 
 #define INIT_RAM_FS							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__initramfs_start) = .;				\
-	*(.init.ramfs)							\
+	KEEP(*(.init.ramfs))						\
 	. = ALIGN(8);							\
-	*(.init.ramfs.info)
+	KEEP(*(.init.ramfs.info))
 #else
 #define INIT_RAM_FS
 #endif
diff --git a/include/linux/export.h b/include/linux/export.h
index 2f9ccbe..a921862 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -46,7 +46,7 @@  extern struct module __this_module;
 	extern __visible void *__crc_##sym __attribute__((weak));		\
 	static const unsigned long __kcrctab_##sym		\
 	__used							\
-	__attribute__((section("___kcrctab" sec "+" #sym), unused))	\
+	__attribute__((section("___kcrctab" sec "+" #sym ",\"a\",@note #"), used))	\
 	= (unsigned long) &__crc_##sym;
 #else
 #define __CRC_SYMBOL(sym, sec)
@@ -57,12 +57,12 @@  extern struct module __this_module;
 	extern typeof(sym) sym;					\
 	__CRC_SYMBOL(sym, sec)					\
 	static const char __kstrtab_##sym[]			\
-	__attribute__((section("__ksymtab_strings"), aligned(1))) \
+	__attribute__((section("__ksymtab_strings" ",\"a\",@note #"), aligned(1))) \
 	= VMLINUX_SYMBOL_STR(sym);				\
 	extern const struct kernel_symbol __ksymtab_##sym;	\
 	__visible const struct kernel_symbol __ksymtab_##sym	\
 	__used							\
-	__attribute__((section("___ksymtab" sec "+" #sym), unused))	\
+	__attribute__((section("___ksymtab" sec "+" #sym ",\"a\",@note #"), used))	\
 	= { (unsigned long)&sym, __kstrtab_##sym }
 
 #if defined(__KSYM_DEPS__)
diff --git a/include/linux/init.h b/include/linux/init.h
index aedb254..51393f4 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -156,19 +156,20 @@  extern bool initcall_debug;
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_LTO
+#if 1
 /* Work around a LTO gcc problem: when there is no reference to a variable
  * in a module it will be moved to the end of the program. This causes
  * reordering of initcalls which the kernel does not like.
  * Add a dummy reference function to avoid this. The function is
  * deleted by the linker.
  */
-#define LTO_REFERENCE_INITCALL(x) \
-	; /* yes this is needed */			\
-	static __used __exit void *reference_##x(void)	\
-	{						\
-		return &x;				\
-	}
+#define LTO_REFERENCE_INITCALL(sym) \
+	extern typeof(sym) sym;					\
+	/* extern const unsigned long __kentry_##sym; */		\
+	static /* __visible */ const unsigned long __kentry_##sym		\
+	__used							\
+	__attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \
+	= (unsigned long)&sym;
 #else
 #define LTO_REFERENCE_INITCALL(x)
 #endif
@@ -222,16 +223,18 @@  extern bool initcall_debug;
 
 #define __initcall(fn) device_initcall(fn)
 
-#define __exitcall(fn) \
-	static exitcall_t __exitcall_##fn __exit_call = fn
+#define __exitcall(fn)						\
+	static exitcall_t __exitcall_##fn __exit_call = fn;	\
 
-#define console_initcall(fn) \
-	static initcall_t __initcall_##fn \
-	__used __section(.con_initcall.init) = fn
+#define console_initcall(fn)					\
+	static initcall_t __initcall_##fn			\
+	__used __section(.con_initcall.init) = fn;		\
+	LTO_REFERENCE_INITCALL(__initcall_##fn)
 
-#define security_initcall(fn) \
-	static initcall_t __initcall_##fn \
-	__used __section(.security_initcall.init) = fn
+#define security_initcall(fn)					\
+	static initcall_t __initcall_##fn			\
+	__used __section(.security_initcall.init) = fn;		\
+	LTO_REFERENCE_INITCALL(__initcall_##fn)
 
 struct obs_kernel_param {
 	const char *str;
diff --git a/init/Makefile b/init/Makefile
index 7bc47ee..c4fb455 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -2,6 +2,8 @@ 
 # Makefile for the linux kernel.
 #
 
+ccflags-y := -fno-function-sections -fno-data-sections
+
 obj-y                          := main.o version.o mounts.o
 ifneq ($(CONFIG_BLK_DEV_INITRD),y)
 obj-y                          += noinitramfs.o
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index ef4658f..fb848af 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -37,17 +37,22 @@  info()
 	fi
 }
 
+# Grab all the EXPORT_SYMBOL symbols in the vmlinux build
+# ${1} - output file
+exports_extract()
+{
+	${NM} -g ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} |
+		grep "R __ksymtab_" |
+		sed 's/.*__ksymtab_\(.*\)$/\1/' > ${1}
+}
+
 # Link of vmlinux.o used for section mismatch analysis
 # ${1} output file
 modpost_link()
 {
 	local objects
 
-	if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
-		objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
-	else
-		objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
-	fi
+	objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
 	${LD} ${LDFLAGS} -r -o ${1} ${objects}
 }
 
@@ -60,11 +65,7 @@  vmlinux_link()
 	local objects
 
 	if [ "${SRCARCH}" != "um" ]; then
-		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
-			objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
-		else
-			objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
-		fi
+		objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
 		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
 			-T ${lds} ${objects} ${1}
 	else