Patchwork [8/9,HACK] ARM: imx: work around v7_cpu_resume link error

login
register
mail settings
Submitter Arnd Bergmann
Date Feb. 14, 2013, 10:47 p.m.
Message ID <1360882071-4072668-9-git-send-email-arnd@arndb.de>
Download mbox | patch
Permalink /patch/220541/
State New
Headers show

Comments

Arnd Bergmann - Feb. 14, 2013, 10:47 p.m.
Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S"
moves the v7_invalidate_l1 symbol out of imx/headsmp.S,
which seems to cause a link error because it is now
too far away from v7_cpu_resume when building an
allyesconfig kernel.

If we move the v7_cpu_resume function from the .data
section to .text, that creates another link error
for the reference to phys_l2x0_saved_regs, but we
can move all of the above to .text.

I believe that this is not a correct bug fix but just
a bad workaround, so I'm open to ideas from people
who understand the bigger picture.

Without this patch, building allyesconfig results in:

arch/arm/mach-imx/built-in.o: In function `v7_cpu_resume':
arch/arm/mach-imx/headsmp.S:55:(.data+0x87f8): relocation truncated to fit: R_ARM_CALL against symbol `v7_invalidate_l1' defined in .text section in arch/arm/mm/built-in.o

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Dinh Nguyen <dinguyen@altera.com>
Cc: Pavel Machek <pavel@denx.de>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
---
 arch/arm/mach-imx/headsmp.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
Stephen Warren - Feb. 14, 2013, 11:45 p.m.
On 02/14/2013 03:47 PM, Arnd Bergmann wrote:
> Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S"
> moves the v7_invalidate_l1 symbol out of imx/headsmp.S,
> which seems to cause a link error because it is now
> too far away from v7_cpu_resume when building an
> allyesconfig kernel.

Is the problem from the following in arch/arm/mach-imx/headsmp.S:

ENTRY(v7_cpu_resume)
        bl      v7_invalidate_l1

Isn't the range of bl +/- 32MiB (or +/- 16MibB in Thumb 2). Is the
kernel really that big? Sorry, I'm having trouble understanding what
causes the problem.
Arnd Bergmann - Feb. 15, 2013, 11:05 a.m.
On Thursday 14 February 2013, Stephen Warren wrote:
> On 02/14/2013 03:47 PM, Arnd Bergmann wrote:
> > Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S"
> > moves the v7_invalidate_l1 symbol out of imx/headsmp.S,
> > which seems to cause a link error because it is now
> > too far away from v7_cpu_resume when building an
> > allyesconfig kernel.
> 
> Is the problem from the following in arch/arm/mach-imx/headsmp.S:
> 
> ENTRY(v7_cpu_resume)
>         bl      v7_invalidate_l1
> 
> Isn't the range of bl +/- 32MiB (or +/- 16MibB in Thumb 2). Is the
> kernel really that big? Sorry, I'm having trouble understanding what
> causes the problem.


Well, it is an "allyesconfig" kernel, so things can get pretty big:

$ size  obj-tmp/vmlinux -A
obj-tmp/vmlinux  :
section                  size         addr
.head.text                504   3221258240
.text                32707336   3221258752
.text.head                  8   3253966088
.rodata              14028722   3253968896
__bug_table            127764   3267997624
.builtin_fw               684   3268125388
__ksymtab               53424   3268126072
__ksymtab_gpl           43560   3268179496
__kcrctab               26712   3268223056
__kcrctab_gpl           21780   3268249768
__ksymtab_strings      233706   3268271548
__param                 33104   3268505256
__modver                 4104   3268538360
__ex_table               4112   3268542464
.ARM.unwind_idx        967784   3268546576
.ARM.unwind_tab       1452168   3269514360
.notes                     36   3270966528
.init.text             677840   3270967296
.exit.text             125672   3271645136
.init.arch.info          5396   3271770808
.init.tagtable             72   3271776204
.init.smpalt              928   3271776276
.init.pv_table           1704   3271777204
.init.data             678108   3271778912
.exit.data                119   3272457020
.data..percpu         1460032   3272458240
.data                 3370068   3273924608
.tcm_start                940   3277294676
.bss                  8007724   3277295616
.comment                   43            0
.ARM.attributes            50            0
.debug_line          15780363            0
.debug_info          57192143            0
.debug_abbrev         5747374            0
.debug_aranges         299608            0
.debug_ranges         5414592            0
.debug_frame          4801748            0
.debug_str            7003282            0
.debug_loc           36237476            0
Total               196510790

THUMB2 support is obviously enabled (allyesconfig), and from the start of the
.head.text section to the end of .bss, it is 64,045,100 bytes, using yesterday's
linux-next kernel with my fixes. It will get bigger as we add more stuff
to multiplatform. The .text section alone is just short of 32MB.

	Arnd
Russell King - ARM Linux - Feb. 15, 2013, 11:07 a.m.
On Thu, Feb 14, 2013 at 11:47:50PM +0100, Arnd Bergmann wrote:
> Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S"
> moves the v7_invalidate_l1 symbol out of imx/headsmp.S,
> which seems to cause a link error because it is now
> too far away from v7_cpu_resume when building an
> allyesconfig kernel.
> 
> If we move the v7_cpu_resume function from the .data
> section to .text, that creates another link error
> for the reference to phys_l2x0_saved_regs, but we
> can move all of the above to .text.
> 
> I believe that this is not a correct bug fix but just
> a bad workaround, so I'm open to ideas from people
> who understand the bigger picture.

Unfortunately, this ends up with writable data in the .text section, which
is supposed to be read-only.  We should try to preserve that status, even
though we don't enforce it.

I guess the problem is that we end up with the .data segment soo far away
from the .text segment that these branches no longer work (and remember
that this code executes with the MMU off.)

The only solution I can think is that we need to do quite a bit of magic
here to jump to an absolute address, but taking account of the V:P offset.
That's not going to be particularly nice, and it's going to then also have
to jump back the other way - to the cpu_resume code which is also in the
.data section.
Russell King - ARM Linux - Feb. 15, 2013, 11:13 a.m.
On Fri, Feb 15, 2013 at 11:05:14AM +0000, Arnd Bergmann wrote:
> $ size  obj-tmp/vmlinux -A
> obj-tmp/vmlinux  :
> section                  size         addr
> .head.text                504   3221258240
> .text                32707336   3221258752
> .text.head                  8   3253966088

Interesting... I wonder if that should be .head.text, or maybe just .text.
Looking at iMX, it's just the secondary startup, which is calling into
.head.text, so it would be much safer given the size of the .text segment
for it to be in .head.text.

> The .text section alone is just short of 32MB.

I suspect when it does go over that, we'll see a lot more link time
failures due to the 'bl' instructions failing to encode their PC relative
jumps.  The only solution then will be to switch everything to use the
less efficient long jumps (load address from literal pool, bx) which'll
also need much of the asm changed.  That also brings up the question
whether we want to penalize the kernel performance just to make
allyesconfig work... I guess we can make it a compile time option, which
of course allyesconfig will automatically enable.  I suspect we'll see
a lot of people mistakenly enabling it too though.
Arnd Bergmann - Feb. 15, 2013, 3:49 p.m.
On Friday 15 February 2013, Russell King - ARM Linux wrote:
> > The .text section alone is just short of 32MB.
> 
> I suspect when it does go over that, we'll see a lot more link time
> failures due to the 'bl' instructions failing to encode their PC relative
> jumps.  The only solution then will be to switch everything to use the
> less efficient long jumps (load address from literal pool, bx) which'll
> also need much of the asm changed.  That also brings up the question
> whether we want to penalize the kernel performance just to make
> allyesconfig work... I guess we can make it a compile time option, which
> of course allyesconfig will automatically enable.  I suspect we'll see
> a lot of people mistakenly enabling it too though.

I'm not too worried about people accidentally enabling such a build option,
at least if it's appropriately labeled as something that nobody should
need.

Changing the inline assembly however sounds like a big pain to do, which
would cause performance and size overhead as well as regressions in
rarely executed code when we get it wrong. I've tried rebulding
the allyesconfig kernel with -O3 -funroll-all-loops to make it even
bigger on purpose, and got these errors:

arch/arm/kernel/built-in.o:(.fixup+0x8): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x10): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x18): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x20): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x28): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x30): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x38): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x40): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x48): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x50): relocation truncated to fit: R_ARM_JUMP24 against `.text'
arch/arm/kernel/built-in.o:(.fixup+0x58): additional relocation overflows omitted from the output

so the .fixup section is completely out of reach of the start of the .text section.

The size output for the partial link of vmlinux.o is

section                     size         addr
.head.text                   492   3221258240
.text                   43889568   3221258752
.text.head                     8            0
.rodata                 11020608         4096
__bug_table               138636     11024704
.pci_fixup                     0     11163340
.builtin_fw                  684     11163340
.rio_ops                       0     11164024
__ksymtab                  53424     11164024
__ksymtab_gpl              43560     11217448
__ksymtab_unused               0     11261008
__ksymtab_unused_gpl           0     11261008
__ksymtab_gpl_future           0     11261008
__kcrctab                  26712     11261008
__kcrctab_gpl              21780     11287720
__kcrctab_unused               0     11309500
__kcrctab_unused_gpl           0     11309500
__kcrctab_gpl_future           0     11309500
__ksymtab_strings         251888     11309500
__param                    33104     11561388
__modver                    1284     11594492
__ex_table                  6288     11595776
.ARM.unwind_idx           907888     11602064
.ARM.unwind_tab          1362276     12509952
.notes                        36     13872228
.init.text                685436     13873152
.exit.text                124324     14558588
.init.arch.info             5396     14682912
.init.tagtable                72     14688308
.init.smpalt                 952     14688380
.init.pv_table              2232     14689332
.init.data                649164     14691568
.exit.data                   120     15340732
.data..percpu            1460032     15343616
.data                    3370676     16809984
.tcm_start                   332     20180660
.text_itcm                     0   4294836224
.dtcm_start                    0     20180992
.data_dtcm                     0   4294868992
.tcm_end                       0     20180992
.bss                     8007852     20180992
.comment                  314424            0
.ARM.attributes               50            0
.debug_line             19185565            0
.debug_info             61757649            0
.debug_abbrev            5835457            0
.debug_aranges            299416            0
.debug_ranges           10482064            0
.debug_frame             4601768            0
.debug_str              37433732            0
.debug_loc              45578206            0
Total                  257553155

I had the idea that turning off THUMB2 support in allyesconfig would help,
but then I found that it's already disabled unlike what I claimed earlier,
because ARMv6 support is enabled alongside ARMv7, so we are still screwed.

Also, with the size of the kernel binary growing, it may just be a
matter of time until this is not just a problem for artificial cases
like allyesconfig, but also for something that users will actually
want to run.

IIRC, 10 years ago, half a megabyte seemed like a reasonable size
for a kernel binary, while today 5 megabyte are not out of the
ordinary on ARM. If the trend continues the same way, we might
have users complaining about 32 megabyte kernels in 5-7 years from
now, unless everybody is using 64-bit by then ;-)

	Arnd

Patch

diff --git a/arch/arm/mach-imx/headsmp.S b/arch/arm/mach-imx/headsmp.S
index 921fc15..0de76cc 100644
--- a/arch/arm/mach-imx/headsmp.S
+++ b/arch/arm/mach-imx/headsmp.S
@@ -30,7 +30,7 @@  ENDPROC(v7_secondary_startup)
  * allow phys_l2x0_saved_regs to be accessed with a relative load
  * as we are running on physical address here.
  */
-	.data
+	.text
 	.align
 
 #ifdef CONFIG_CACHE_L2X0
@@ -51,6 +51,8 @@  phys_l2x0_saved_regs:
 	.endm
 #endif
 
+	.text
+
 ENTRY(v7_cpu_resume)
 	bl	v7_invalidate_l1
 	pl310_resume