Message ID | 20200618131644.3781-1-rsalvaterra@gmail.com |
---|---|
State | Not Applicable |
Delegated to: | Petr Štetiar |
Headers | show |
Series | [OpenWrt-Devel,RFC,v2] mvebu: compile the kernel in Thumb-2 mode for ARMv7 targets | expand |
On 6/18/20 3:16 PM, Rui Salvaterra wrote: > (Sending as RFC due to the note below.) > > The Thumb-2 instruction set generates denser code, allowing for more efficient > use of the cache and consequently higher execution performance. Did you run a benchmark to test how much faster it is? > Vmlinux (uncompressed) size comparison for my personal configuration (Linux > 5.4.46, compiled with gcc 9.3.0 and binutils 2.34): Did you also tested this with the current default toolchain? I would like to use binutils 2.34 as default soon, but gcc will probably not be updated soon. > > Pure ARM: > 24243392 bytes > > Thumb-2: > 22102716 bytes > > NOTE: This requires enabling a linker bug workaround to avoid the emission of > R_ARM_THM_JUMP11 relocations [1] in modules, which the kernel doesn't support. > Since this effectively implies -fno-optimize-sibling-calls [2], we're generating > suboptimal code. While compat (and in-tree) modules load and run correctly > without this workaround, WireGuard fails to load with an unknown relocation 102 > error. Do you know if there is a fix for GNU gas available? > > [1] https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf (page 28) > [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/Makefile?h=linux-5.4.y#n129 > > Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> > --- > target/linux/mvebu/cortexa9/config-5.4 | 2 ++ > 1 file changed, 2 insertions(+) > create mode 100644 target/linux/mvebu/cortexa9/config-5.4 > > diff --git a/target/linux/mvebu/cortexa9/config-5.4 b/target/linux/mvebu/cortexa9/config-5.4 > new file mode 100644 > index 0000000000..6aff77fda7 > --- /dev/null > +++ b/target/linux/mvebu/cortexa9/config-5.4 > @@ -0,0 +1,2 @@ > +CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y > +CONFIG_THUMB2_KERNEL=y >
Hi, Hauke, On Tue, 30 Jun 2020 at 21:32, Hauke Mehrtens <hauke@hauke-m.de> wrote: > > Did you run a benchmark to test how much faster it is? I haven't done any performance testing, but I expect it to be a wash (improvements/regressions within around 2-3 % across the board). The big win here is in the memory/icache footprint. How do we usually do performance testing in OpenWrt, by the way? I could try and see if there's any meaningful difference in, say, OpenSSL speed, but that won't tell us anything about networking performance. > Did you also tested this with the current default toolchain? > I would like to use binutils 2.34 as default soon, but gcc will probably > not be updated soon. There's no reason gcc 8 won't work fine too, the only snag we hit was with gas, and that's already fixed (see below). I'll test with gcc 8 too, though, just to make sure (and 10, as soon as it hits master :)). > Do you know if there is a fix for GNU gas available? Yes, I do. Jason filled an upstream bug and it's already fixed [1]. I don't know if it'll be backported to other binutils versions, though. In any case, the only problematic module was WireGuard (due to an old hack), which Jason already fixed in the latest compat backport [2] (note that with this fix in place, the kernel R_ARM_THM_JUMP11 workaround isn't required at all). [1] https://sourceware.org/bugzilla/show_bug.cgi?id=26141 [2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=ea5192e6c505c53b0565f63841ff22dbe51c302c Thanks, Rui
Hi again, Hauke, On Tue, 30 Jun 2020 at 23:30, Rui Salvaterra <rsalvaterra@gmail.com> wrote: > > There's no reason gcc 8 won't work fine too, the only snag we hit was > with gas, and that's already fixed (see below). I'll test with gcc 8 > too, though, just to make sure (and 10, as soon as it hits master :)). I just built a new image with gcc 8 and it's working correctly, as I expected. Cheers, Rui
On 7/1/20 12:30 AM, Rui Salvaterra wrote: > Hi, Hauke, > > On Tue, 30 Jun 2020 at 21:32, Hauke Mehrtens <hauke@hauke-m.de> wrote: >> >> Did you run a benchmark to test how much faster it is? > > I haven't done any performance testing, but I expect it to be a wash > (improvements/regressions within around 2-3 % across the board). The > big win here is in the memory/icache footprint. > How do we usually do performance testing in OpenWrt, by the way? I > could try and see if there's any meaningful difference in, say, > OpenSSL speed, but that won't tell us anything about networking > performance. I would like to know if these changes are worth the effort. If you see some performance improvements somewhere like some VPN bandwidth is now 3% higher, it would be nice to know. >> Did you also tested this with the current default toolchain? >> I would like to use binutils 2.34 as default soon, but gcc will probably >> not be updated soon. > > There's no reason gcc 8 won't work fine too, the only snag we hit was > with gas, and that's already fixed (see below). I'll test with gcc 8 > too, though, just to make sure (and 10, as soon as it hits master :)). > >> Do you know if there is a fix for GNU gas available? > > Yes, I do. Jason filled an upstream bug and it's already fixed [1]. I > don't know if it'll be backported to other binutils versions, though. > In any case, the only problematic module was WireGuard (due to an old > hack), which Jason already fixed in the latest compat backport [2] > (note that with this fix in place, the kernel R_ARM_THM_JUMP11 > workaround isn't required at all). We could also backport this fix to the OpenWrt binutils version and then we do not need the workaround any more. The fix is small and we can get rid of the patch when we update binutils to the next major version. Hauke
diff --git a/target/linux/mvebu/cortexa9/config-5.4 b/target/linux/mvebu/cortexa9/config-5.4 new file mode 100644 index 0000000000..6aff77fda7 --- /dev/null +++ b/target/linux/mvebu/cortexa9/config-5.4 @@ -0,0 +1,2 @@ +CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y +CONFIG_THUMB2_KERNEL=y
(Sending as RFC due to the note below.) The Thumb-2 instruction set generates denser code, allowing for more efficient use of the cache and consequently higher execution performance. Vmlinux (uncompressed) size comparison for my personal configuration (Linux 5.4.46, compiled with gcc 9.3.0 and binutils 2.34): Pure ARM: 24243392 bytes Thumb-2: 22102716 bytes NOTE: This requires enabling a linker bug workaround to avoid the emission of R_ARM_THM_JUMP11 relocations [1] in modules, which the kernel doesn't support. Since this effectively implies -fno-optimize-sibling-calls [2], we're generating suboptimal code. While compat (and in-tree) modules load and run correctly without this workaround, WireGuard fails to load with an unknown relocation 102 error. [1] https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf (page 28) [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/Makefile?h=linux-5.4.y#n129 Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> --- target/linux/mvebu/cortexa9/config-5.4 | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 target/linux/mvebu/cortexa9/config-5.4