Message ID | trinity-9a34f671-de0c-4525-a66e-7dcbe02484ae-1477217711975@3capp-gmx-bs60 |
---|---|
State | RFC |
Headers | show |
On 23 October 2016 at 12:15, <p.wassi@gmx.at> wrote: > As one of LEDE's TODOs is "Bump all kernels to v4.4", I've done > some testing on two brcm47xx devices (brcm47xx is still on kernel 4.1). > > The devices I tested on + bootlog: > Linksys WRT54GL - https://pwassi.privatedns.org/lede/brcm47xx/#wrt54gl > ASUS WL500gP V2 - https://pwassi.privatedns.org/lede/brcm47xx/#wl500gpv2 > > I know, that there are much more devices on brcm47xx which are untested yet. > However, these are the ones I have at home to play with and everything seems > to work fine there. So what do you think about the following 'patch'? It breaks Linksys WRT300N V1 and that's why I still didn't bump it. Last time I talked with Felix we suggested some offset in lzmaloader, I hope to finally find time to try that...
On 23 October 2016 at 19:49, Rafał Miłecki <zajec5@gmail.com> wrote: > On 23 October 2016 at 12:15, <p.wassi@gmx.at> wrote: >> As one of LEDE's TODOs is "Bump all kernels to v4.4", I've done >> some testing on two brcm47xx devices (brcm47xx is still on kernel 4.1). >> >> The devices I tested on + bootlog: >> Linksys WRT54GL - https://pwassi.privatedns.org/lede/brcm47xx/#wrt54gl >> ASUS WL500gP V2 - https://pwassi.privatedns.org/lede/brcm47xx/#wl500gpv2 >> >> I know, that there are much more devices on brcm47xx which are untested yet. >> However, these are the ones I have at home to play with and everything seems >> to work fine there. So what do you think about the following 'patch'? > > It breaks Linksys WRT300N V1 and that's why I still didn't bump it. > Last time I talked with Felix we suggested some offset in lzmaloader, > I hope to finally find time to try that... I started working on this today, I reproduce see the issue during a lot of tests until it magically disappeared. Then LEDE started booting on my unit with kernel 4.4. I bumped kernel with some lengthy description (for a further reference, just in case): https://git.lede-project.org/?p=source.git;a=commitdiff;h=06405df7a8da24b7d735b32454c7d3b1f2ebaabc
> I started working on this today, I reproduce see the issue during a > lot of tests until it magically disappeared. Then LEDE started booting > on my unit with kernel 4.4. Oh - that rings a bell. May I point you to a discussion we had earlier this year? http://lists.infradead.org/pipermail/lede-dev/2016-May/000980.html I had this issue on all my WRT54GLs: trunk images stopped booting at "Starting program at 0x80001000". However, if I compiled the images myself (with the exact same configuration as the buildbot), the images would boot fine. I later checked again: buildbot's images did not boot. (It was kernel 4.1 back then, but it's the exact same behaviour as you describe in the commit message) Additionally builtbot's images worked out-of-the-box on an ASUS WL500gP V2. Best regards, P. Wassi
On 24 October 2016 at 18:43, <p.wassi@gmx.at> wrote: >> I started working on this today, I reproduce see the issue during a >> lot of tests until it magically disappeared. Then LEDE started booting >> on my unit with kernel 4.4. > > Oh - that rings a bell. > May I point you to a discussion we had earlier this year? > http://lists.infradead.org/pipermail/lede-dev/2016-May/000980.html > > I had this issue on all my WRT54GLs: > trunk images stopped booting at "Starting program at 0x80001000". > However, if I compiled the images myself (with the exact same configuration > as the buildbot), the images would boot fine. > I later checked again: buildbot's images did not boot. > (It was kernel 4.1 back then, but it's the exact same behaviour as you describe > in the commit message) > Additionally builtbot's images worked out-of-the-box on an ASUS WL500gP V2. It seems I'm experiencing the same crazy problem. Local builds work for me (as explained in commit message), but image from buildbot (https://downloads.lede-project.org/snapshots/targets/brcm47xx/legacy/) doesn't boot. This is some totally crazy thing :|
> It seems I'm experiencing the same crazy problem. Local builds work > for me (as explained in commit message), but image from buildbot > doesn't boot. This is some totally crazy thing :| I opt for staying at 4.4 and not reverting, since the issue already occured on kernel 4.1 for the WRT54GL (and probably other devices as well?). So it's _not_ a 4.1 -> 4.4 issue! The last kernel, that booted fine on my devices (with buildbot images) was OpenWRT 15.05.1 - 3.18.23 As you've experienced yourself: local images also work fine. With 4.1 and 4.4. Best regards, P. Wassi
On 25.10.2016 9:13, p.wassi@gmx.at wrote: >> It seems I'm experiencing the same crazy problem. Local builds work >> for me (as explained in commit message), but image from buildbot >> doesn't boot. This is some totally crazy thing :| > I opt for staying at 4.4 and not reverting, since the issue already occured > on kernel 4.1 for the WRT54GL (and probably other devices as well?). > So it's _not_ a 4.1 -> 4.4 issue! The last kernel, that booted fine > on my devices (with buildbot images) was OpenWRT 15.05.1 - 3.18.23 > As you've experienced yourself: local images also work fine. With 4.1 and 4.4. This freezing after "> Starting program at 0x80001000" reminds me of an old issue on ar71xx/WNDR3700: This might be something related to uncompressing the image by the bootloader. Kernel & firmware size growth may be exposing some compression/uncompression problem that gets semi-arbitrarily triggered now (when the images sizes have grown due to kernel size growth?). It is possible that the old bootloader fails to decompress the image but has no proper verbose error message about that. There was something rather similar four years ago with ar71xx-based WNDR3700/WNDR3800 series. There the problem was fixed by decreasing the compression dictionary size used for WNDR3700/3800 devices. I debugged that after receiving a hint from Rafal (who had had a bit similar problems with WNDR4500). Rafal's last message of a long thread: https://lists.openwrt.org/pipermail/openwrt-devel/2012-June/015846.html For reference, the ar71xx WNDR3700 debugging: https://forum.openwrt.org/viewtopic.php?id=40565 https://lists.openwrt.org/pipermail/openwrt-devel/2012-November/thread.html#17445 https://dev.openwrt.org/ticket/12454#comment:12
Hi Rafal, did you ever test local builds with all additional kmods (including all kmods from feeds) enabled as <m> ? I guess this will bump the kernel size somewhat due to additional subsystems which are getting enabled. ~ Jo
On 25 October 2016 at 08:13, <p.wassi@gmx.at> wrote: >> It seems I'm experiencing the same crazy problem. Local builds work >> for me (as explained in commit message), but image from buildbot >> doesn't boot. This is some totally crazy thing :| > > I opt for staying at 4.4 and not reverting, since the issue already occured > on kernel 4.1 for the WRT54GL (and probably other devices as well?). > So it's _not_ a 4.1 -> 4.4 issue! The last kernel, that booted fine > on my devices (with buildbot images) was OpenWRT 15.05.1 - 3.18.23 > As you've experienced yourself: local images also work fine. With 4.1 and 4.4. I agree, it doesn't make much sense to revert. Can you test if (uns)setting CONFIG_KERNEL_KALLSYMS makes a difference for your local builds? Please remember to compile without -j N as it isn't reliable, see: [LEDE-DEV] make -j 4: race between Image/Prepare and device images? http://lists.infradead.org/pipermail/lede-dev/2016-October/003632.html
On 25 October 2016 at 09:55, Hannu Nyman <hannu.nyman@iki.fi> wrote: > On 25.10.2016 9:13, p.wassi@gmx.at wrote: >>> >>> It seems I'm experiencing the same crazy problem. Local builds work >>> for me (as explained in commit message), but image from buildbot >>> doesn't boot. This is some totally crazy thing :| >> >> I opt for staying at 4.4 and not reverting, since the issue already >> occured >> on kernel 4.1 for the WRT54GL (and probably other devices as well?). >> So it's _not_ a 4.1 -> 4.4 issue! The last kernel, that booted fine >> on my devices (with buildbot images) was OpenWRT 15.05.1 - 3.18.23 >> As you've experienced yourself: local images also work fine. With 4.1 and >> 4.4. > > > This freezing after "> Starting program at 0x80001000" reminds me of an old > issue on ar71xx/WNDR3700: > > This might be something related to uncompressing the image by the > bootloader. Kernel & firmware size growth may be exposing some > compression/uncompression problem that gets semi-arbitrarily triggered now > (when the images sizes have grown due to kernel size growth?). It is > possible that the old bootloader fails to decompress the image but has no > proper verbose error message about that. For Linksys WRT300N v1 we use lzma-loader compressed using gzip. It didn't change between 4.1 and 4.4. So CFE decompression doesn't matter as it deals with the same loader. If there is some decompression bug it may be inside lzma-loader.
On 25 October 2016 at 10:00, Jo-Philipp Wich <jo@mein.io> wrote: > did you ever test local builds with all additional kmods (including all > kmods from feeds) enabled as <m> ? I guess this will bump the kernel > size somewhat due to additional subsystems which are getting enabled. No, I never expected extra kmod packages to affect kernel size. Anyway I finally debugged this local vs. buildbot difference to the CONFIG_KERNEL_KALLSYMS. Images from buildbot have this symbol enabled which slightly increases kernel size. Enough to stop it from booting on WRT300N v1. The reason it took me so much time to realize it's related to CONFIG_KERNEL_KALLSYMS is bug in LEDE building system I just spotted: [LEDE-DEV] make -j 4: race between Image/Prepare and device images? http://lists.infradead.org/pipermail/lede-dev/2016-October/003632.html
> Anyway I finally debugged this local vs. buildbot difference to the > CONFIG_KERNEL_KALLSYMS. Images from buildbot have this symbol enabled > which slightly increases kernel size. Enough to stop it from booting > on WRT300N v1. There must be something more... What I had on WRT54GL: http://lists.infradead.org/pipermail/lede-dev/2016-June/001162.html Buildbot's images did _not_ boot (assuming they have KALLSYMS as you stated above). However, my local images did boot fine (with KALLSYMS enabled). So although I had KALLSYMS increasing the kernel size, a local image booted, while buildbots image did not (both using the same config as jow already showed me here: http://lists.infradead.org/pipermail/lede-dev/2016-June/001153.html ) Anyway, I'm just compiling a new set of images and will report back.
Ok, here's some news on this topic. I've built some images for WRT54GL to test, here come the results: -) Builtbot's image: does NOT boot (as expected) -) Local image without KALLSYMS: works fine -) Local image with KALLSYMS: does NOT boot (which is unexpected, as such an image booted without issues back in June) Here are my vmlinux.lzma sizes: 1140026 bytes - without KALLSYMS - works 1229451 bytes - with KALLSYMS - does not work To further investigate where the 'magic border' is, (assuming it really is dependant on the vmlinux.lzma size), I've disabled KALLSYMS and introduced a new char array in the kernel with random contents. I could more or less define what size the kernel should have. Here are the results: size | status 1140026 | Ok <= this is the untouched kernel without KALLSYMS 1195918 | Ok 1206415 | Ok 1227370 | Ok 1229451 | FAIL <= this is the untouched kernel with KALLSYMS enabled 1237671 | Ok 1279313 | Ok So there must be more than just the bare vmlinux-size. Best regards, P. Wassi
Hi Rafal, > On 25 October 2016 at 9:41, Rafał Miłecki <zajec5@gmail.com> wrote: > Anyway I finally debugged this local vs. buildbot difference to the > CONFIG_KERNEL_KALLSYMS. Images from buildbot have this symbol enabled > which slightly increases kernel size. Enough to stop it from booting > on WRT300N v1. Yesterday, I had the serial console attached to a WRT54GL for other things I had to do and tried buildbot's brcm47xx image for this device. It did not boot. So I'm again using my local builds without CONFIG_KERNEL_KALLSYMS Since some kind of release seems to be coming, I think we can't leave it in this state. Either disable the image generation for that device, or get the image built without KALLSYMS. What do you think about this? Best regards, P. Wassi
On 22 November 2016 at 18:36, <p.wassi@gmx.at> wrote: >> On 25 October 2016 at 9:41, Rafał Miłecki <zajec5@gmail.com> wrote: >> Anyway I finally debugged this local vs. buildbot difference to the >> CONFIG_KERNEL_KALLSYMS. Images from buildbot have this symbol enabled >> which slightly increases kernel size. Enough to stop it from booting >> on WRT300N v1. > > Yesterday, I had the serial console attached to a WRT54GL for other things I had > to do and tried buildbot's brcm47xx image for this device. It did not boot. > So I'm again using my local builds without CONFIG_KERNEL_KALLSYMS > > Since some kind of release seems to be coming, I think we can't leave it in > this state. Either disable the image generation for that device, or get the image > built without KALLSYMS. What do you think about this? I think building OpenWrt releases without KALLSYMS was always the case for brcm47xx / legacy or even all the targets. I expect to do the same for LEDE.
diff --git a/target/linux/brcm47xx/Makefile b/target/linux/brcm47xx/Makefile --- a/target/linux/brcm47xx/Makefile +++ b/target/linux/brcm47xx/Makefile @@ -13,7 +13,7 @@ FEATURES:=squashfs usb SUBTARGETS:=generic mips74k legacy MAINTAINER:=Hauke Mehrtens <hauke@hauke-m.de> -KERNEL_PATCHVER:=4.1 +KERNEL_PATCHVER:=4.4 define Target/Description Build firmware images for Broadcom based BCM47xx/53xx routers with MIPS CPU, *not* ARM.