Message ID | 2852406.SOgyPXcJfO@wuerfel (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, 05 Aug 2016 18:01:13 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Friday, August 5, 2016 10:26:25 PM CEST Nicholas Piggin wrote: > > On Fri, 05 Aug 2016 12:17:27 +0200 > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > and I also get link errors for the .text.fixup section > > > for any users of __put_user() in really large kernels: > > > net/batman-adv/batman-adv.o:(.text.fixup+0x4): relocation truncated to fit: R_ARM_JUMP24 against `.text.batadv_log_read' > > > > This may be fixed by fixing the linker script to bring in the new > > sections properly (see new patchset). > > > > If not, then if you can combine the sections rather than have them > > consecutive in the output, e.g.,: > > > > *(.text .text.fixup) > > > > Rather than > > > > *(.text) > > *(.text.fixup) > > > > Then the linker has more freedom to rearrange them. I realize it's > > not that simple with ARM's .text.fixup, but maybe that helps you > > get it to work. > > This did the trick: > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index 0ec807d69f18..7a3ad269fa23 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -433,7 +433,7 @@ > * during second ld run in second ld pass when generating System.map */ > #define TEXT_TEXT \ > ALIGN_FUNCTION(); \ > - *(.text.hot .text .text.fixup .text.unlikely) \ > + *(.text.hot .text .text.* .text.fixup .text.unlikely) \ > *(.ref.text) \ > MEM_KEEP(init.text) \ > MEM_KEEP(exit.text) \ > > > It also got much faster again, the link time for an allyesconfig > kernel is now 18 minutes instead of 10 hours, but it's still > much worse than the 2 minutes I had earlier or the four minutes > with the previous patch. Are you using the patches I just sent? Either way, you also need to do the same for data and bss sections as you are using -fdata-sections too. I've found virtually no build time regression on powerpc or x86 when those are taken care of properly (x86 numbers I sent are typo, it's not 5m20, it's 5m02). Thanks, Nick
On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote: > > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > > index 0ec807d69f18..7a3ad269fa23 100644 > > --- a/include/asm-generic/vmlinux.lds.h > > +++ b/include/asm-generic/vmlinux.lds.h > > @@ -433,7 +433,7 @@ > > * during second ld run in second ld pass when generating System.map */ > > #define TEXT_TEXT \ > > ALIGN_FUNCTION(); \ > > - *(.text.hot .text .text.fixup .text.unlikely) \ > > + *(.text.hot .text .text.* .text.fixup .text.unlikely) \ > > *(.ref.text) \ > > MEM_KEEP(init.text) \ > > MEM_KEEP(exit.text) \ > > > > > > It also got much faster again, the link time for an allyesconfig > > kernel is now 18 minutes instead of 10 hours, but it's still > > much worse than the 2 minutes I had earlier or the four minutes > > with the previous patch. > > Are you using the patches I just sent? Not yet, I was still busy with the older version, and trying to figure out exactly what went wrong in ld.bfd. FWIW, I first tried to see if the hash tables were just too small, but as it turned out that was not the problem. When I tried to change the default hash table sizes, making them bigger only made things slower. I also found the --hash-size=xxx option, which has a significant impact on runtime speed. Interestingly again, using sizes less than the default made things faster in practice. If we can work out the optimum size for the kernel build, that might shave a few minutes off the total build time. > Either way, you also need > to do the same for data and bss sections as you are using > -fdata-sections too. Right. > I've found virtually no build time regression on powerpc or x86 > when those are taken care of properly (x86 numbers I sent are typo, > it's not 5m20, it's 5m02). Interesting. I wonder if it's got something to do with the generation of the branch trampolines on ARM, as we have a lot of them on an allyesconfig. Is the 5m20 the total build time for the kernel, the time for rebuilding after a trivial change, or the time to call 'ld.bfd' once? Are you using ld.bfd on x86 or ld.gold? For me ld.gold either works and is really fast, or it crashes, depending on the configuration. I also don't think it supports big-endian ARM (which is what allyesconfig ends up using). Arnd
On Fri, 05 Aug 2016 21:16:00 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote: > > > > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > > > index 0ec807d69f18..7a3ad269fa23 100644 > > > --- a/include/asm-generic/vmlinux.lds.h > > > +++ b/include/asm-generic/vmlinux.lds.h > > > @@ -433,7 +433,7 @@ > > > * during second ld run in second ld pass when generating System.map */ > > > #define TEXT_TEXT \ > > > ALIGN_FUNCTION(); \ > > > - *(.text.hot .text .text.fixup .text.unlikely) \ > > > + *(.text.hot .text .text.* .text.fixup .text.unlikely) \ > > > *(.ref.text) \ > > > MEM_KEEP(init.text) \ > > > MEM_KEEP(exit.text) \ > > > > > > > > > It also got much faster again, the link time for an allyesconfig > > > kernel is now 18 minutes instead of 10 hours, but it's still > > > much worse than the 2 minutes I had earlier or the four minutes > > > with the previous patch. > > > > Are you using the patches I just sent? > > Not yet, I was still busy with the older version, and trying to > figure out exactly what went wrong in ld.bfd. FWIW, I first tried > to see if the hash tables were just too small, but as it turned > out that was not the problem. When I tried to change the default > hash table sizes, making them bigger only made things slower. > > I also found the --hash-size=xxx option, which has a significant > impact on runtime speed. Interestingly again, using sizes less > than the default made things faster in practice. If we can > work out the optimum size for the kernel build, that might > shave a few minutes off the total build time. > > > Either way, you also need > > to do the same for data and bss sections as you are using > > -fdata-sections too. > > Right. > > > I've found virtually no build time regression on powerpc or x86 > > when those are taken care of properly (x86 numbers I sent are typo, > > it's not 5m20, it's 5m02). > > Interesting. I wonder if it's got something to do with the > generation of the branch trampolines on ARM, as we have a lot > of them on an allyesconfig. Powerpc generates quite a few branch trampolines as well, so I'm not sure if that would be the issue. Can you get a profile of the link? Are you linking with archives? Do your input archives have a symbol index built? > Is the 5m20 the total build time for the kernel, the time for > rebuilding after a trivial change, or the time to call 'ld.bfd' > once? 5m02 was the total time for x86 defconfig. With the powerpc allyesconfig build, the final link: $ time ld -EL -m elf64lppc -pie --emit-relocs --build-id --gc-sections -X -o vmlinux -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o real 0m15.556s user 0m13.288s sys 0m2.240s $ ls -lh vmlinux -rwxrwxr-x 1 npiggin npiggin 279M Aug 6 14:02 vmlinux Without -pie --emit-relocs it's 11.8s and 150M but I'm using emit-relocs for a post-link step. > Are you using ld.bfd on x86 or ld.gold? For me ld.gold either > works and is really fast, or it crashes, depending on the > configuration. I also don't think it supports big-endian ARM > (which is what allyesconfig ends up using). ld.bfd on both. Gold crashed on powerpc and I didn't try it on x86. Thanks, Nick
On Saturday, August 6, 2016 2:17:16 PM CEST Nicholas Piggin wrote: > On Fri, 05 Aug 2016 21:16:00 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > > On Saturday, August 6, 2016 2:16:42 AM CEST Nicholas Piggin wrote: > > > > > > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > > > > index 0ec807d69f18..7a3ad269fa23 100644 > > > > --- a/include/asm-generic/vmlinux.lds.h > > > > +++ b/include/asm-generic/vmlinux.lds.h > > > > @@ -433,7 +433,7 @@ > > > > * during second ld run in second ld pass when generating System.map */ > > > > #define TEXT_TEXT \ > > > > ALIGN_FUNCTION(); \ > > > > - *(.text.hot .text .text.fixup .text.unlikely) \ > > > > + *(.text.hot .text .text.* .text.fixup .text.unlikely) \ > > > > *(.ref.text) \ > > > > MEM_KEEP(init.text) \ > > > > MEM_KEEP(exit.text) \ > > > > > > > > > > > > It also got much faster again, the link time for an allyesconfig > > > > kernel is now 18 minutes instead of 10 hours, but it's still > > > > much worse than the 2 minutes I had earlier or the four minutes > > > > with the previous patch. > > > > > > Are you using the patches I just sent? > > > > Not yet, I was still busy with the older version, and trying to > > figure out exactly what went wrong in ld.bfd. FWIW, I first tried > > to see if the hash tables were just too small, but as it turned > > out that was not the problem. When I tried to change the default > > hash table sizes, making them bigger only made things slower. > > > > I also found the --hash-size=xxx option, which has a significant > > impact on runtime speed. Interestingly again, using sizes less > > than the default made things faster in practice. If we can > > work out the optimum size for the kernel build, that might > > shave a few minutes off the total build time. > > > > > Either way, you also need > > > to do the same for data and bss sections as you are using > > > -fdata-sections too. > > > > Right. > > > > > I've found virtually no build time regression on powerpc or x86 > > > when those are taken care of properly (x86 numbers I sent are typo, > > > it's not 5m20, it's 5m02). > > > > Interesting. I wonder if it's got something to do with the > > generation of the branch trampolines on ARM, as we have a lot > > of them on an allyesconfig. > > Powerpc generates quite a few branch trampolines as well, so > I'm not sure if that would be the issue. Can you get a profile > of the link? CPU: AMD64 family15h, speed 2600 MHz (estimated) Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 1212556 63.6990 ld-new bfd_hash_lookup 416050 21.8563 ld-new bfd_hash_hash 64861 3.4073 no-vmlinux /no-vmlinux 59038 3.1014 ld-new bfd_hash_traverse 13873 0.7288 ld-new bfd_get_next_section_by_name 9880 0.5190 ld-new strrevcmp I've manually marked bfd_hash_hash as __attribute__((noinline)) to see it separately from bfd_hash_lookup. The vast majority of these calls seem to come from _bfd_elf_strtab_add and from bfd_get_section_by_name/bfd_get_next_section_by_name. While I first thought the hash tables were too slow, investigating further showed that most of the hash tables are really small (and appropriately sized), we just do a lot of lookups on them. > Are you linking with archives? Do your input archives have a > symbol index built? yes, and don't know. I've moved on to your new patches now, will see how that goes. > > Is the 5m20 the total build time for the kernel, the time for > > rebuilding after a trivial change, or the time to call 'ld.bfd' > > once? > > 5m02 was the total time for x86 defconfig. With the powerpc > allyesconfig build, the final link: > > $ time ld -EL -m elf64lppc -pie --emit-relocs --build-id --gc-sections -X -o vmlinux -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o > > real 0m15.556s > user 0m13.288s > sys 0m2.240s > > $ ls -lh vmlinux > -rwxrwxr-x 1 npiggin npiggin 279M Aug 6 14:02 vmlinux > > Without -pie --emit-relocs it's 11.8s and 150M but I'm using > emit-relocs for a post-link step. Interesting, that does sound more like an ARM specific bug in ld then. > > Are you using ld.bfd on x86 or ld.gold? For me ld.gold either > > works and is really fast, or it crashes, depending on the > > configuration. I also don't think it supports big-endian ARM > > (which is what allyesconfig ends up using). > > ld.bfd on both. Gold crashed on powerpc and I didn't try it on x86. Ok. Arnd
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 0ec807d69f18..7a3ad269fa23 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -433,7 +433,7 @@ * during second ld run in second ld pass when generating System.map */ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ - *(.text.hot .text .text.fixup .text.unlikely) \ + *(.text.hot .text .text.* .text.fixup .text.unlikely) \ *(.ref.text) \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \