diff mbox series

[OpenWrt-Devel,RFC,v2] mvebu: compile the kernel in Thumb-2 mode for ARMv7 targets

Message ID 20200618131644.3781-1-rsalvaterra@gmail.com
State Not Applicable
Delegated to: Petr Štetiar
Headers show
Series [OpenWrt-Devel,RFC,v2] mvebu: compile the kernel in Thumb-2 mode for ARMv7 targets | expand

Commit Message

Rui Salvaterra June 18, 2020, 1:16 p.m. UTC
(Sending as RFC due to the note below.)

The Thumb-2 instruction set generates denser code, allowing for more efficient
use of the cache and consequently higher execution performance.

Vmlinux (uncompressed) size comparison for my personal configuration (Linux
5.4.46, compiled with gcc 9.3.0 and binutils 2.34):

Pure ARM:
24243392 bytes

Thumb-2:
22102716 bytes

NOTE: This requires enabling a linker bug workaround to avoid the emission of
R_ARM_THM_JUMP11 relocations [1] in modules, which the kernel doesn't support.
Since this effectively implies -fno-optimize-sibling-calls [2], we're generating
suboptimal code. While compat (and in-tree) modules load and run correctly
without this workaround, WireGuard fails to load with an unknown relocation 102
error.

[1] https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf (page 28)
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/Makefile?h=linux-5.4.y#n129

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
---
 target/linux/mvebu/cortexa9/config-5.4 | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 target/linux/mvebu/cortexa9/config-5.4

Comments

Hauke Mehrtens June 30, 2020, 8:32 p.m. UTC | #1
On 6/18/20 3:16 PM, Rui Salvaterra wrote:
> (Sending as RFC due to the note below.)
> 
> The Thumb-2 instruction set generates denser code, allowing for more efficient
> use of the cache and consequently higher execution performance.

Did you run a benchmark to test how much faster it is?

> Vmlinux (uncompressed) size comparison for my personal configuration (Linux
> 5.4.46, compiled with gcc 9.3.0 and binutils 2.34):

Did you also tested this with the current default toolchain?
I would like to use binutils 2.34 as default soon, but gcc will probably
not be updated soon.

> 
> Pure ARM:
> 24243392 bytes
> 
> Thumb-2:
> 22102716 bytes
> 
> NOTE: This requires enabling a linker bug workaround to avoid the emission of
> R_ARM_THM_JUMP11 relocations [1] in modules, which the kernel doesn't support.
> Since this effectively implies -fno-optimize-sibling-calls [2], we're generating
> suboptimal code. While compat (and in-tree) modules load and run correctly
> without this workaround, WireGuard fails to load with an unknown relocation 102
> error.

Do you know if there is a fix for GNU gas available?

> 
> [1] https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf (page 28)
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/Makefile?h=linux-5.4.y#n129
> 
> Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
> ---
>  target/linux/mvebu/cortexa9/config-5.4 | 2 ++
>  1 file changed, 2 insertions(+)
>  create mode 100644 target/linux/mvebu/cortexa9/config-5.4
> 
> diff --git a/target/linux/mvebu/cortexa9/config-5.4 b/target/linux/mvebu/cortexa9/config-5.4
> new file mode 100644
> index 0000000000..6aff77fda7
> --- /dev/null
> +++ b/target/linux/mvebu/cortexa9/config-5.4
> @@ -0,0 +1,2 @@
> +CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y
> +CONFIG_THUMB2_KERNEL=y
>
Rui Salvaterra June 30, 2020, 10:30 p.m. UTC | #2
Hi, Hauke,

On Tue, 30 Jun 2020 at 21:32, Hauke Mehrtens <hauke@hauke-m.de> wrote:
>
> Did you run a benchmark to test how much faster it is?

I haven't done any performance testing, but I expect it to be a wash
(improvements/regressions within around 2-3 % across the board). The
big win here is in the memory/icache footprint.
How do we usually do performance testing in OpenWrt, by the way? I
could try and see if there's any meaningful difference in, say,
OpenSSL speed, but that won't tell us anything about networking
performance.

> Did you also tested this with the current default toolchain?
> I would like to use binutils 2.34 as default soon, but gcc will probably
> not be updated soon.

There's no reason gcc 8 won't work fine too, the only snag we hit was
with gas, and that's already fixed (see below). I'll test with gcc 8
too, though, just to make sure (and 10, as soon as it hits master :)).

> Do you know if there is a fix for GNU gas available?

Yes, I do. Jason filled an upstream bug and it's already fixed [1]. I
don't know if it'll be backported to other binutils versions, though.
In any case, the only problematic module was WireGuard (due to an old
hack), which Jason already fixed in the latest compat backport [2]
(note that with this fix in place, the kernel R_ARM_THM_JUMP11
workaround isn't required at all).

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=26141
[2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=ea5192e6c505c53b0565f63841ff22dbe51c302c

Thanks,
Rui
Rui Salvaterra July 1, 2020, 11:43 a.m. UTC | #3
Hi again, Hauke,

On Tue, 30 Jun 2020 at 23:30, Rui Salvaterra <rsalvaterra@gmail.com> wrote:
>
> There's no reason gcc 8 won't work fine too, the only snag we hit was
> with gas, and that's already fixed (see below). I'll test with gcc 8
> too, though, just to make sure (and 10, as soon as it hits master :)).

I just built a new image with gcc 8 and it's working correctly, as I expected.

Cheers,
Rui
Hauke Mehrtens July 1, 2020, 9:34 p.m. UTC | #4
On 7/1/20 12:30 AM, Rui Salvaterra wrote:
> Hi, Hauke,
> 
> On Tue, 30 Jun 2020 at 21:32, Hauke Mehrtens <hauke@hauke-m.de> wrote:
>>
>> Did you run a benchmark to test how much faster it is?
> 
> I haven't done any performance testing, but I expect it to be a wash
> (improvements/regressions within around 2-3 % across the board). The
> big win here is in the memory/icache footprint.
> How do we usually do performance testing in OpenWrt, by the way? I
> could try and see if there's any meaningful difference in, say,
> OpenSSL speed, but that won't tell us anything about networking
> performance.

I would like to know if these changes are worth the effort. If you see
some performance improvements somewhere like some VPN bandwidth is now
3% higher, it would be nice to know.

>> Did you also tested this with the current default toolchain?
>> I would like to use binutils 2.34 as default soon, but gcc will probably
>> not be updated soon.
> 
> There's no reason gcc 8 won't work fine too, the only snag we hit was
> with gas, and that's already fixed (see below). I'll test with gcc 8
> too, though, just to make sure (and 10, as soon as it hits master :)).
> 
>> Do you know if there is a fix for GNU gas available?
> 
> Yes, I do. Jason filled an upstream bug and it's already fixed [1]. I
> don't know if it'll be backported to other binutils versions, though.
> In any case, the only problematic module was WireGuard (due to an old
> hack), which Jason already fixed in the latest compat backport [2]
> (note that with this fix in place, the kernel R_ARM_THM_JUMP11
> workaround isn't required at all).

We could also backport this fix to the OpenWrt binutils version and then
we do not need the workaround any more. The fix is small and we can get
rid of the patch when we update binutils to the next major version.

Hauke
diff mbox series

Patch

diff --git a/target/linux/mvebu/cortexa9/config-5.4 b/target/linux/mvebu/cortexa9/config-5.4
new file mode 100644
index 0000000000..6aff77fda7
--- /dev/null
+++ b/target/linux/mvebu/cortexa9/config-5.4
@@ -0,0 +1,2 @@ 
+CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y
+CONFIG_THUMB2_KERNEL=y