diff mbox

powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

Message ID 2852406.SOgyPXcJfO@wuerfel (mailing list archive)
State Not Applicable
Headers show

Commit Message

Arnd Bergmann Aug. 5, 2016, 4:01 p.m. UTC
On Friday, August 5, 2016 10:26:25 PM CEST Nicholas Piggin wrote:
> On Fri, 05 Aug 2016 12:17:27 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:

> > and I also get link errors for the .text.fixup section
> > for any users of __put_user() in really large kernels:
> > net/batman-adv/batman-adv.o:(.text.fixup+0x4): relocation truncated to fit: R_ARM_JUMP24 against `.text.batadv_log_read'
> 
> This may be fixed by fixing the linker script to bring in the new
> sections properly (see new patchset).
> 
> If not, then if you can combine the sections rather than have them
> consecutive in the output, e.g.,:
> 
>     *(.text .text.fixup)
> 
> Rather than
> 
>     *(.text)            
>     *(.text.fixup)
> 
> Then the linker has more freedom to rearrange them. I realize it's
> not that simple with ARM's .text.fixup, but maybe that helps you
> get it to work.

This did the trick:



It also got much faster again, the link time for an allyesconfig
kernel is now 18 minutes instead of 10 hours, but it's still
much worse than the 2 minutes I had earlier or the four minutes
with the previous patch.

	Arnd

Comments

Nicholas Piggin Aug. 5, 2016, 4:16 p.m. UTC | #1
On Fri, 05 Aug 2016 18:01:13 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Friday, August 5, 2016 10:26:25 PM CEST Nicholas Piggin wrote:
> > On Fri, 05 Aug 2016 12:17:27 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:  
> 
> > > and I also get link errors for the .text.fixup section
> > > for any users of __put_user() in really large kernels:
> > > net/batman-adv/batman-adv.o:(.text.fixup+0x4): relocation truncated to fit: R_ARM_JUMP24 against `.text.batadv_log_read'  
> > 
> > This may be fixed by fixing the linker script to bring in the new
> > sections properly (see new patchset).
> > 
> > If not, then if you can combine the sections rather than have them
> > consecutive in the output, e.g.,:
> > 
> >     *(.text .text.fixup)
> > 
> > Rather than
> > 
> >     *(.text)            
> >     *(.text.fixup)
> > 
> > Then the linker has more freedom to rearrange them. I realize it's
> > not that simple with ARM's .text.fixup, but maybe that helps you
> > get it to work.  
> 
> This did the trick:
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 0ec807d69f18..7a3ad269fa23 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -433,7 +433,7 @@
>   * during second ld run in second ld pass when generating System.map */
>  #define TEXT_TEXT							\
>  		ALIGN_FUNCTION();					\
> -		*(.text.hot .text .text.fixup .text.unlikely)		\
> +		*(.text.hot .text .text.* .text.fixup .text.unlikely)	\
>  		*(.ref.text)						\
>  	MEM_KEEP(init.text)						\
>  	MEM_KEEP(exit.text)						\
> 
> 
> It also got much faster again, the link time for an allyesconfig
> kernel is now 18 minutes instead of 10 hours, but it's still
> much worse than the 2 minutes I had earlier or the four minutes
> with the previous patch.

Are you using the patches I just sent? Either way, you also need
to do the same for data and bss sections as you are using
-fdata-sections too.

I've found virtually no build time regression on powerpc or x86
when those are taken care of properly (x86 numbers I sent are typo,
it's not 5m20, it's 5m02).

Thanks,
Nick
Arnd Bergmann Aug. 5, 2016, 7:16 p.m. UTC | #2
On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote:
> > 
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index 0ec807d69f18..7a3ad269fa23 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -433,7 +433,7 @@
> >   * during second ld run in second ld pass when generating System.map */
> >  #define TEXT_TEXT                                                    \
> >               ALIGN_FUNCTION();                                       \
> > -             *(.text.hot .text .text.fixup .text.unlikely)           \
> > +             *(.text.hot .text .text.* .text.fixup .text.unlikely)   \
> >               *(.ref.text)                                            \
> >       MEM_KEEP(init.text)                                             \
> >       MEM_KEEP(exit.text)                                             \
> > 
> > 
> > It also got much faster again, the link time for an allyesconfig
> > kernel is now 18 minutes instead of 10 hours, but it's still
> > much worse than the 2 minutes I had earlier or the four minutes
> > with the previous patch.
> 
> Are you using the patches I just sent?

Not yet, I was still busy with the older version, and trying to
figure out exactly what went wrong in ld.bfd. FWIW, I first tried
to see if the hash tables were just too small, but as it turned
out that was not the problem. When I tried to change the default
hash table sizes, making them bigger only made things slower.

I also found the --hash-size=xxx option, which has a significant
impact on runtime speed. Interestingly again, using sizes less
than the default made things faster in practice. If we can
work out the optimum size for the kernel build, that might
shave a few minutes off the total build time.

> Either way, you also need
> to do the same for data and bss sections as you are using
> -fdata-sections too.

Right.

> I've found virtually no build time regression on powerpc or x86
> when those are taken care of properly (x86 numbers I sent are typo,
> it's not 5m20, it's 5m02).

Interesting. I wonder if it's got something to do with the
generation of the branch trampolines on ARM, as we have a lot
of them on an allyesconfig.

Is the 5m20 the total build time for the kernel, the time for
rebuilding after a trivial change, or the time to call 'ld.bfd'
once?

Are you using ld.bfd on x86 or ld.gold? For me ld.gold either
works and is really fast, or it crashes, depending on the
configuration. I also don't think it supports big-endian ARM
(which is what allyesconfig ends up using).

	Arnd
Nicholas Piggin Aug. 6, 2016, 4:17 a.m. UTC | #3
On Fri, 05 Aug 2016 21:16:00 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote:
> > > 
> > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > > index 0ec807d69f18..7a3ad269fa23 100644
> > > --- a/include/asm-generic/vmlinux.lds.h
> > > +++ b/include/asm-generic/vmlinux.lds.h
> > > @@ -433,7 +433,7 @@
> > >   * during second ld run in second ld pass when generating System.map */
> > >  #define TEXT_TEXT                                                    \
> > >               ALIGN_FUNCTION();                                       \
> > > -             *(.text.hot .text .text.fixup .text.unlikely)           \
> > > +             *(.text.hot .text .text.* .text.fixup .text.unlikely)   \
> > >               *(.ref.text)                                            \
> > >       MEM_KEEP(init.text)                                             \
> > >       MEM_KEEP(exit.text)                                             \
> > > 
> > > 
> > > It also got much faster again, the link time for an allyesconfig
> > > kernel is now 18 minutes instead of 10 hours, but it's still
> > > much worse than the 2 minutes I had earlier or the four minutes
> > > with the previous patch.  
> > 
> > Are you using the patches I just sent?  
> 
> Not yet, I was still busy with the older version, and trying to
> figure out exactly what went wrong in ld.bfd. FWIW, I first tried
> to see if the hash tables were just too small, but as it turned
> out that was not the problem. When I tried to change the default
> hash table sizes, making them bigger only made things slower.
> 
> I also found the --hash-size=xxx option, which has a significant
> impact on runtime speed. Interestingly again, using sizes less
> than the default made things faster in practice. If we can
> work out the optimum size for the kernel build, that might
> shave a few minutes off the total build time.
> 
> > Either way, you also need
> > to do the same for data and bss sections as you are using
> > -fdata-sections too.  
> 
> Right.
> 
> > I've found virtually no build time regression on powerpc or x86
> > when those are taken care of properly (x86 numbers I sent are typo,
> > it's not 5m20, it's 5m02).  
> 
> Interesting. I wonder if it's got something to do with the
> generation of the branch trampolines on ARM, as we have a lot
> of them on an allyesconfig.

Powerpc generates quite a few branch trampolines as well, so
I'm not sure if that would be the issue. Can you get a profile
of the link?

Are you linking with archives? Do your input archives have a
symbol index built?


> Is the 5m20 the total build time for the kernel, the time for
> rebuilding after a trivial change, or the time to call 'ld.bfd'
> once?

5m02 was the total time for x86 defconfig. With the powerpc
allyesconfig build, the final link:

$ time ld -EL -m elf64lppc -pie --emit-relocs --build-id --gc-sections -X -o vmlinux -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o

real	0m15.556s
user	0m13.288s
sys	0m2.240s

$ ls -lh vmlinux
-rwxrwxr-x 1 npiggin npiggin 279M Aug  6 14:02 vmlinux

Without -pie --emit-relocs it's 11.8s and 150M but I'm using
emit-relocs for a post-link step.


> Are you using ld.bfd on x86 or ld.gold? For me ld.gold either
> works and is really fast, or it crashes, depending on the
> configuration. I also don't think it supports big-endian ARM
> (which is what allyesconfig ends up using).

ld.bfd on both. Gold crashed on powerpc and I didn't try it on x86.

Thanks,
Nick
Arnd Bergmann Aug. 6, 2016, 9:13 p.m. UTC | #4
On Saturday, August 6, 2016 2:17:16 PM CEST Nicholas Piggin wrote:
> On Fri, 05 Aug 2016 21:16:00 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote:
> > > > 
> > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > > > index 0ec807d69f18..7a3ad269fa23 100644
> > > > --- a/include/asm-generic/vmlinux.lds.h
> > > > +++ b/include/asm-generic/vmlinux.lds.h
> > > > @@ -433,7 +433,7 @@
> > > >   * during second ld run in second ld pass when generating System.map */
> > > >  #define TEXT_TEXT                                                    \
> > > >               ALIGN_FUNCTION();                                       \
> > > > -             *(.text.hot .text .text.fixup .text.unlikely)           \
> > > > +             *(.text.hot .text .text.* .text.fixup .text.unlikely)   \
> > > >               *(.ref.text)                                            \
> > > >       MEM_KEEP(init.text)                                             \
> > > >       MEM_KEEP(exit.text)                                             \
> > > > 
> > > > 
> > > > It also got much faster again, the link time for an allyesconfig
> > > > kernel is now 18 minutes instead of 10 hours, but it's still
> > > > much worse than the 2 minutes I had earlier or the four minutes
> > > > with the previous patch.  
> > > 
> > > Are you using the patches I just sent?  
> > 
> > Not yet, I was still busy with the older version, and trying to
> > figure out exactly what went wrong in ld.bfd. FWIW, I first tried
> > to see if the hash tables were just too small, but as it turned
> > out that was not the problem. When I tried to change the default
> > hash table sizes, making them bigger only made things slower.
> > 
> > I also found the --hash-size=xxx option, which has a significant
> > impact on runtime speed. Interestingly again, using sizes less
> > than the default made things faster in practice. If we can
> > work out the optimum size for the kernel build, that might
> > shave a few minutes off the total build time.
> > 
> > > Either way, you also need
> > > to do the same for data and bss sections as you are using
> > > -fdata-sections too.  
> > 
> > Right.
> > 
> > > I've found virtually no build time regression on powerpc or x86
> > > when those are taken care of properly (x86 numbers I sent are typo,
> > > it's not 5m20, it's 5m02).  
> > 
> > Interesting. I wonder if it's got something to do with the
> > generation of the branch trampolines on ARM, as we have a lot
> > of them on an allyesconfig.
> 
> Powerpc generates quite a few branch trampolines as well, so
> I'm not sure if that would be the issue. Can you get a profile
> of the link?


CPU: AMD64 family15h, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
1212556  63.6990  ld-new                   bfd_hash_lookup
416050   21.8563  ld-new                   bfd_hash_hash
64861     3.4073  no-vmlinux               /no-vmlinux
59038     3.1014  ld-new                   bfd_hash_traverse
13873     0.7288  ld-new                   bfd_get_next_section_by_name
9880      0.5190  ld-new                   strrevcmp

I've manually marked bfd_hash_hash as __attribute__((noinline))
to see it separately from bfd_hash_lookup.

The vast majority of these calls seem to come from _bfd_elf_strtab_add
and from bfd_get_section_by_name/bfd_get_next_section_by_name.

While I first thought the hash tables were too slow, investigating
further showed that most of the hash tables are really small
(and appropriately sized), we just do a lot of lookups on them.

> Are you linking with archives? Do your input archives have a
> symbol index built?

yes, and don't know. I've moved on to your new patches now, will
see how that goes.

> > Is the 5m20 the total build time for the kernel, the time for
> > rebuilding after a trivial change, or the time to call 'ld.bfd'
> > once?
> 
> 5m02 was the total time for x86 defconfig. With the powerpc
> allyesconfig build, the final link:
> 
> $ time ld -EL -m elf64lppc -pie --emit-relocs --build-id --gc-sections -X -o vmlinux -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o
> 
> real	0m15.556s
> user	0m13.288s
> sys	0m2.240s
> 
> $ ls -lh vmlinux
> -rwxrwxr-x 1 npiggin npiggin 279M Aug  6 14:02 vmlinux
> 
> Without -pie --emit-relocs it's 11.8s and 150M but I'm using
> emit-relocs for a post-link step.

Interesting, that does sound more like an ARM specific bug in ld
then. 

> > Are you using ld.bfd on x86 or ld.gold? For me ld.gold either
> > works and is really fast, or it crashes, depending on the
> > configuration. I also don't think it supports big-endian ARM
> > (which is what allyesconfig ends up using).
> 
> ld.bfd on both. Gold crashed on powerpc and I didn't try it on x86.

Ok.

	Arnd
diff mbox

Patch

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 0ec807d69f18..7a3ad269fa23 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -433,7 +433,7 @@ 
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot .text .text.fixup .text.unlikely)		\
+		*(.text.hot .text .text.* .text.fixup .text.unlikely)	\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\