diff mbox series

[2/2] ath79: mikrotik: Change the moment of routerboot partition parser init

Message ID 20211109144607.7571-3-denis281089@gmail.com
State Superseded
Headers show
Series ath79: add a support for reset key on MikroTik RB912 | expand

Commit Message

Denis K Nov. 9, 2021, 2:46 p.m. UTC
If kernel invokes routerboot partion parser module
(drivers/platform/mikrotik/routerboot.c) init function too early, when SPI flash
hasn't been found yet, its rb_hardconfig_init() failed with -ENODEV (since no
mtd device with name "hard_config" was found). In this case when kernel calls
ath9k probe function, it tries a sysfs fallback for loading calibration data,
but hotplug script fails to load it, since "/sys/firmware/mikrotik/hard_config"
is missing (whole "/sys/firmware/mikrotik" dir is empty, since
rb_hardconfig_init() has failed). As a result we see no phy0 and no wlan0, and
in logs we found:

root@OpenWrt:/dev# logread | grep ath9k
kern.warn kernel: [   13.818737] ath9k 18100000.wmac: Direct firmware load
  for ath9k-eeprom-ahb-18100000.wmac.bin failed with error -2
kern.warn kernel: [   13.829463] ath9k 18100000.wmac: Falling back to sysfs
  fallback for: ath9k-eeprom-ahb-18100000.wmac.bin
kern.err kernel: [   14.838766] ath: phy0: Unable to load EEPROM file
  ath9k-eeprom-ahb-18100000.wmac.bin
kern.err kernel: [   14.846909] ath9k 18100000.wmac: failed to initialize device
kern.warn kernel: [   14.852870] ath9k: probe of 18100000.wmac failed with error -22

We managed to fix this by changing module_init() to late_initcall()
in routerboot module:

root@OpenWrt:~# logread | grep ath
kern.warn kernel: [   13.661861] ath9k 18100000.wmac: Direct firmware load for
  ath9k-eeprom-ahb-18100000.wmac.bin failed with error -2
kern.warn kernel: [   13.672583] ath9k 18100000.wmac: Falling back to sysfs for:
  ath9k-eeprom-ahb-18100000.wmac.bin
kern.debug kernel: [   14.569014] ath: EEPROM regdomain sanitized
kern.debug kernel: [   14.569038] ath: EEPROM regdomain: 0x64
kern.debug kernel: [   14.569044] ath: EEPROM indicates we should expect a
  direct regpair map
kern.debug kernel: [   14.569073] ath: Country alpha2 being used: 00
kern.debug kernel: [   14.569079] ath: Regpair used: 0x64

root@OpenWrt:~# iw dev
phy#0
	Interface wlan0
		ifindex 3
		wdev 0x1
		addr xx:xx:xx:xx:xx:xx
		type managed
		txpower 20.00 dBm

Signed-off-by: Denis Kalashnikov <denis281089@gmail.com>
---
 .../linux/generic/files/drivers/platform/mikrotik/routerboot.c  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Robert Marko Nov. 9, 2021, 5:06 p.m. UTC | #1
On Tue, 9 Nov 2021 at 15:47, Denis Kalashnikov <denis281089@gmail.com> wrote:
>
> If kernel invokes routerboot partion parser module
> (drivers/platform/mikrotik/routerboot.c) init function too early, when SPI flash
> hasn't been found yet, its rb_hardconfig_init() failed with -ENODEV (since no
> mtd device with name "hard_config" was found). In this case when kernel calls
> ath9k probe function, it tries a sysfs fallback for loading calibration data,
> but hotplug script fails to load it, since "/sys/firmware/mikrotik/hard_config"
> is missing (whole "/sys/firmware/mikrotik" dir is empty, since
> rb_hardconfig_init() has failed). As a result we see no phy0 and no wlan0, and
> in logs we found:
>
> root@OpenWrt:/dev# logread | grep ath9k
> kern.warn kernel: [   13.818737] ath9k 18100000.wmac: Direct firmware load
>   for ath9k-eeprom-ahb-18100000.wmac.bin failed with error -2
> kern.warn kernel: [   13.829463] ath9k 18100000.wmac: Falling back to sysfs
>   fallback for: ath9k-eeprom-ahb-18100000.wmac.bin
> kern.err kernel: [   14.838766] ath: phy0: Unable to load EEPROM file
>   ath9k-eeprom-ahb-18100000.wmac.bin
> kern.err kernel: [   14.846909] ath9k 18100000.wmac: failed to initialize device
> kern.warn kernel: [   14.852870] ath9k: probe of 18100000.wmac failed with error -22
>
> We managed to fix this by changing module_init() to late_initcall()
> in routerboot module:

Hi, wouldn't it be better for the rb_hardconfig_init to return
-EPROBE_DEFER so the
the core can try probing later?

Regards,
Robert
>
> root@OpenWrt:~# logread | grep ath
> kern.warn kernel: [   13.661861] ath9k 18100000.wmac: Direct firmware load for
>   ath9k-eeprom-ahb-18100000.wmac.bin failed with error -2
> kern.warn kernel: [   13.672583] ath9k 18100000.wmac: Falling back to sysfs for:
>   ath9k-eeprom-ahb-18100000.wmac.bin
> kern.debug kernel: [   14.569014] ath: EEPROM regdomain sanitized
> kern.debug kernel: [   14.569038] ath: EEPROM regdomain: 0x64
> kern.debug kernel: [   14.569044] ath: EEPROM indicates we should expect a
>   direct regpair map
> kern.debug kernel: [   14.569073] ath: Country alpha2 being used: 00
> kern.debug kernel: [   14.569079] ath: Regpair used: 0x64
>
> root@OpenWrt:~# iw dev
> phy#0
>         Interface wlan0
>                 ifindex 3
>                 wdev 0x1
>                 addr xx:xx:xx:xx:xx:xx
>                 type managed
>                 txpower 20.00 dBm
>
> Signed-off-by: Denis Kalashnikov <denis281089@gmail.com>
> ---
>  .../linux/generic/files/drivers/platform/mikrotik/routerboot.c  | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
> index 4c8c0bfac5..e5745c88a9 100644
> --- a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
> +++ b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
> @@ -210,7 +210,7 @@ ssize_t routerboot_tag_show_u32s(const u8 *pld, u16 pld_len, char *buf)
>         return out - buf;
>  }
>
> -module_init(routerboot_init);
> +late_initcall(routerboot_init);
>  module_exit(routerboot_exit);
>
>  MODULE_LICENSE("GPL v2");
> --
> 2.31.1
>
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Denis K Nov. 10, 2021, 9:45 a.m. UTC | #2
> Hi, wouldn't it be better for the rb_hardconfig_init to return
> -EPROBE_DEFER so the
> the core can try probing later?
Hello, Robert! I've tried what you suggested (return -EPROBE_DEFER
from rb_hardconfig_init and routerboot_init). It doesn't work.
do_one_initcall() from linux-5.4.158/init/main.c calls module init
function and it doesn't check return code and recall it later.
Robert Marko Nov. 10, 2021, 9:48 a.m. UTC | #3
On Wed, 10 Nov 2021 at 10:45, Denis K <denis281089@gmail.com> wrote:
>
> > Hi, wouldn't it be better for the rb_hardconfig_init to return
> > -EPROBE_DEFER so the
> > the core can try probing later?
> Hello, Robert! I've tried what you suggested (return -EPROBE_DEFER
> from rb_hardconfig_init and routerboot_init). It doesn't work.
> do_one_initcall() from linux-5.4.158/init/main.c calls module init
> function and it doesn't check return code and recall it later.

Ahh, yeah since it's not a platform driver.
Ideally, it should be converted to one so it has a probe method, and
then deferring will work.

Adding Thibaut as he is the original author, maybe there is a reason
why the platform driver approach won't work.

Regards,
Robert
Thibaut Nov. 10, 2021, 10:20 a.m. UTC | #4
> Le 10 nov. 2021 à 10:48, Robert Marko <robimarko@gmail.com> a écrit :
> 
> On Wed, 10 Nov 2021 at 10:45, Denis K <denis281089@gmail.com> wrote:
>> 
>>> Hi, wouldn't it be better for the rb_hardconfig_init to return
>>> -EPROBE_DEFER so the
>>> the core can try probing later?
>> Hello, Robert! I've tried what you suggested (return -EPROBE_DEFER
>> from rb_hardconfig_init and routerboot_init). It doesn't work.
>> do_one_initcall() from linux-5.4.158/init/main.c calls module init
>> function and it doesn't check return code and recall it later.
> 
> Ahh, yeah since it's not a platform driver.
> Ideally, it should be converted to one so it has a probe method, and
> then deferring will work.
> 
> Adding Thibaut as he is the original author,

Thanks, I don’t follow openwrt-devel.

> maybe there is a reason
> why the platform driver approach won't work.

Quite simply because there is no platform device to probe() for.

Can I get a full boot log (dmesg output) with the problem? It seems none was provided with the original patch here[1]
I’m not sure I can make sense of this: because the driver depends on SPI routines, it should be linked and loaded after SPI. This was the case so far.

Thanks,
Thibaut

[1] https://patchwork.ozlabs.org/project/openwrt/patch/20211109144607.7571-3-denis281089@gmail.com/
Denis K Nov. 10, 2021, 12:06 p.m. UTC | #5
Hello, Thibaut! Here is a dmesg output as an attachment.

I've added printk into rb_hardconfig_init:
[    0.806327] rb_hardconfig_init: get_mtd_device_nm returns error: -19
Thibaut Nov. 11, 2021, 11:43 a.m. UTC | #6
Hi,

> Le 10 nov. 2021 à 13:06, Denis K <denis281089@gmail.com> a écrit :
> 
> Hello, Thibaut! Here is a dmesg output as an attachment.

I’m afraid this dmesg makes no sense to me.

rb_hardconfig should never be loaded where it is: platform drivers (where it is currently hooked) are linked much later in kernel so something looks amiss here.

Unfortunately I don’t have much time to investigate right now, but I’m not (yet) convinced there’s a problem with the driver’s code at this point.

I see this is using kernel 5.10 but afaict master is still using 5.4 for ath79. Maybe this is related?

HTH,
Thibaut
Denis K Nov. 11, 2021, 3:45 p.m. UTC | #7
Hello, Thibaut!

> because the driver depends on SPI routines, it should be linked and loaded after SPI
> rb_hardconfig should never be loaded where it is: platform drivers (where it is currently hooked)
are linked much later in kernel

Maybe the order of init functions of SPI and routerboot depends on a link
order. But SPI probe function can be called in any time: before or after
routerboot init. Adding one more driver -- a new rb91x-key somehow
shifts init timings so SPI _probe_ (not init) now is called after routerboot
init. I suppose that SPI probe is called later now due to several deferred
probes (spi, gpio-latch and rb91x-nand), you can see this in the attached
dmesg:

[    0.348321] gpio-latch gpio_latch: failed to get gpio 7: -517
[    0.355504] gpio-rb91x-key gpio_key: probe ok
...
[    0.405835] rb91x-nand nand_gpio: failed to get gpios: -517
...
[    0.806348] rb_hardconfig_init: get_mtd_device_nm returns error: -19
...
[    0.909300] gpio-latch gpio_latch: probe ok
[    0.927785] spi-nor spi0.0: w25x05 (64 Kbytes)
[    0.932811] 1 routerbootpart partitions found on MTD device spi0.0
[    0.939189] Creating 1 MTD partitions on "spi0.0":
[    0.944193] 0x000000000000-0x000000010000 : "partitions"
[    0.952617] 4 routerbootpart partitions found on MTD device partitions
[    0.959369] Creating 4 MTD partitions on "partitions":
[    0.964750] 0x000000000000-0x00000000c000 : "routerboot"
[    0.971481] 0x00000000c000-0x00000000d000 : "hard_config"
[    0.978215] 0x00000000d000-0x00000000e000 : "bios"
[    0.984340] 0x00000000e000-0x00000000f000 : "soft_config"
[    0.991016] spi-nor spi0.0: probe ok
...
[    1.118306] rb91x-nand nand_gpio: probe ok

> I see this is using kernel 5.10 but afaict master is still using 5.4 for ath79.
> Maybe this is related?
The same thing is when kernel 5.4 is used (see the attachment).

I see two ways to eliminate this race condition:
1) late_initcall of routerboot module, only a single line need to be changed,
2) Somehow add a probe function to routerboot module. May be it should be
called from routerboot_partitions_parse function (from
mtd/parsers/routerbootpart.c).

Regards, Denis.
Thibaut Nov. 11, 2021, 4:41 p.m. UTC | #8
> Le 11 nov. 2021 à 16:45, Denis K <denis281089@gmail.com> a écrit :
> 
> I see two ways to eliminate this race condition:
> 1) late_initcall of routerboot module, only a single line need to be changed,

OK, let’s do that as a workaround and hope that deferred probes won’t overlap past late_initcall()

> 2) Somehow add a probe function to routerboot module. May be it should be
> called from routerboot_partitions_parse function (from
> mtd/parsers/routerbootpart.c).

No. There’s no reason why one should depend on the other (hard/soft config could be defined statically without using the parser, and the driver should still work then).

HTH
T
Denis K Nov. 12, 2021, 3:25 p.m. UTC | #9
> OK, let’s do that as a workaround and hope that deferred probes won’t overlap past late_initcall()
What about using register_mtd_user function: routerboot can call
rb_hardconfig_init and rb_softconfig_init when mtd subsystem notifies
it about a new mtd device? It works.

---
 .../drivers/platform/mikrotik/rb_hardconfig.c | 12 ++++------
 .../drivers/platform/mikrotik/rb_softconfig.c | 12 ++++------
 .../drivers/platform/mikrotik/routerboot.c    | 24 +++++++++++++------
 .../drivers/platform/mikrotik/routerboot.h    |  5 ++--
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/target/linux/generic/files/drivers/platform/mikrotik/rb_hardconfig.c
b/target/linux/generic/files/drivers/platform/mikrotik/rb_hardconfig.c
index e6a6928896..724851474e 100644
--- a/target/linux/generic/files/drivers/platform/mikrotik/rb_hardconfig.c
+++ b/target/linux/generic/files/drivers/platform/mikrotik/rb_hardconfig.c
@@ -676,10 +676,9 @@ static ssize_t hc_wlan_data_bin_read(struct file
*filp, struct kobject *kobj,
     return count;
 }

-int __init rb_hardconfig_init(struct kobject *rb_kobj)
+int __init rb_hardconfig_init(struct kobject *rb_kobj, struct mtd_info *mtd)
 {
     struct kobject *hc_wlan_kobj;
-    struct mtd_info *mtd;
     size_t bytes_read, buflen, outlen;
     const u8 *buf;
     void *outbuf;
@@ -690,20 +689,19 @@ int __init rb_hardconfig_init(struct kobject *rb_kobj)
     hc_kobj = NULL;
     hc_wlan_kobj = NULL;

-    // TODO allow override
-    mtd = get_mtd_device_nm(RB_MTD_HARD_CONFIG);
-    if (IS_ERR(mtd))
+    ret = __get_mtd_device(mtd);
+    if (IS_ERR(ret))
         return -ENODEV;

     hc_buflen = mtd->size;
     hc_buf = kmalloc(hc_buflen, GFP_KERNEL);
     if (!hc_buf) {
-        put_mtd_device(mtd);
+        __put_mtd_device(mtd);
         return -ENOMEM;
     }

     ret = mtd_read(mtd, 0, hc_buflen, &bytes_read, hc_buf);
-    put_mtd_device(mtd);
+    __put_mtd_device(mtd);

     if (ret)
         goto fail;
diff --git a/target/linux/generic/files/drivers/platform/mikrotik/rb_softconfig.c
b/target/linux/generic/files/drivers/platform/mikrotik/rb_softconfig.c
index 070bd32d5a..59c42e4cf4 100644
--- a/target/linux/generic/files/drivers/platform/mikrotik/rb_softconfig.c
+++ b/target/linux/generic/files/drivers/platform/mikrotik/rb_softconfig.c
@@ -705,9 +705,8 @@ mtdfail:

 static struct kobj_attribute sc_kattrcommit = __ATTR(commit,
RB_SC_RMODE|RB_SC_WMODE, sc_commit_show, sc_commit_store);

-int __init rb_softconfig_init(struct kobject *rb_kobj)
+int __init rb_softconfig_init(struct kobject *rb_kobj, struct mtd_info *mtd)
 {
-    struct mtd_info *mtd;
     size_t bytes_read, buflen;
     const u8 *buf;
     int i, ret;
@@ -716,20 +715,19 @@ int __init rb_softconfig_init(struct kobject *rb_kobj)
     sc_buf = NULL;
     sc_kobj = NULL;

-    // TODO allow override
-    mtd = get_mtd_device_nm(RB_MTD_SOFT_CONFIG);
-    if (IS_ERR(mtd))
+    ret = __get_mtd_device(mtd);
+    if (IS_ERR(ret))
         return -ENODEV;

     sc_buflen = mtd->size;
     sc_buf = kmalloc(sc_buflen, GFP_KERNEL);
     if (!sc_buf) {
-        put_mtd_device(mtd);
+        __put_mtd_device(mtd);
         return -ENOMEM;
     }

     ret = mtd_read(mtd, 0, sc_buflen, &bytes_read, sc_buf);
-    put_mtd_device(mtd);
+    __put_mtd_device(mtd);

     if (ret)
         goto fail;
diff --git a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
index 4c8c0bfac5..77f39709ce 100644
--- a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
+++ b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
@@ -160,25 +160,35 @@ fail:
     return ret;
 }

+static void routerboot_mtd_notifier_add(struct mtd_info *mtd)
+{
+    if (mtd->type != MTD_NORFLASH)
+        return;
+    if (!strcmp(mtd->name, RB_MTD_HARD_CONFIG)) {
+        rb_hardconfig_init(rb_kobj, mtd);
+    } else if (!strcmp(mtd->name, RB_MTD_SOFT_CONFIG)) {
+        rb_softconfig_init(rb_kobj, mtd);
+    }
+}
+
+static struct mtd_notifier routerboot_mtd_notifier = {
+        .add = routerboot_mtd_notifier_add,
+};
+
 static int __init routerboot_init(void)
 {
     rb_kobj = kobject_create_and_add("mikrotik", firmware_kobj);
     if (!rb_kobj)
         return -ENOMEM;

-    /*
-     * We ignore the following return values and always register.
-     * These init() routines are designed so that their failed state is
-     * always manageable by the corresponding exit() calls.
-     */
-    rb_hardconfig_init(rb_kobj);
-    rb_softconfig_init(rb_kobj);
+    register_mtd_user(&routerboot_mtd_notifier);

     return 0;
 }

 static void __exit routerboot_exit(void)
 {
+    unregister_mtd_user(&routerboot_mtd_notifier);
     rb_softconfig_exit();
     rb_hardconfig_exit();
     kobject_put(rb_kobj);    // recursive afaict
diff --git a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.h
b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.h
index 67d89808d5..5277a3cd94 100644
--- a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.h
+++ b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.h
@@ -10,6 +10,7 @@
 #define _ROUTERBOOT_H_

 #include <linux/types.h>
+#include <linux/mtd/mtd.h>

 // these magic values are stored in cpu-endianness on flash
 #define RB_MAGIC_HARD    (('H') | ('a' << 8) | ('r' << 16) | ('d' << 24))
@@ -25,10 +26,10 @@
 int routerboot_tag_find(const u8 *bufhead, const size_t buflen, const
u16 tag_id, u16 *pld_ofs, u16 *pld_len);
 int routerboot_rle_decode(const u8 *in, size_t inlen, u8 *out, size_t *outlen);

-int __init rb_hardconfig_init(struct kobject *rb_kobj);
+int __init rb_hardconfig_init(struct kobject *rb_kobj, struct mtd_info *mtd);
 void __exit rb_hardconfig_exit(void);

-int __init rb_softconfig_init(struct kobject *rb_kobj);
+int __init rb_softconfig_init(struct kobject *rb_kobj, struct mtd_info *mtd);
 void __exit rb_softconfig_exit(void);

 ssize_t routerboot_tag_show_string(const u8 *pld, u16 pld_len, char *buf);
--

Regards, Denis
Thibaut Nov. 12, 2021, 5:37 p.m. UTC | #10
> Le 12 nov. 2021 à 16:25, Denis K <denis281089@gmail.com> a écrit :
> 
>> OK, let’s do that as a workaround and hope that deferred probes won’t overlap past late_initcall()
> What about using register_mtd_user function: routerboot can call
> rb_hardconfig_init and rb_softconfig_init when mtd subsystem notifies
> it about a new mtd device? It works.

Looks like a good idea, I should have thought about it ;)
My only nit is that the _init() calls will be executed with the MTD table mutex held, but I suppose it’s acceptable.

However please do not remove the comments I left in the code. Furthermore, for the sake of consistency the module should also react to the (arguably unlikely) removal of the target MTD partitions: a .remove() callback will deal with that. Finally the inclusion of mtd/mtd.h is not necessary in the routerboot.h header (we’re using pointers and that header has no dependency on mtd).

I have amended your patch to reflect the above comments, and adjusted the commit message.

Can you confirm this still works for you? I cannot test it myself now.

Thanks,
T
Denis K Nov. 15, 2021, 1:39 p.m. UTC | #11
I've tested it (on 5.4 and 5.10). On reboot I've had this kernel panic:

[ 1028.460043] Reserved instruction in kernel code[#1]:
[ 1028.465191] CPU: 0 PID: 2406 Comm: procd Not tainted 5.4.158 #0
[ 1028.471300] $ 0   : 00000000 00000001 00000000 00000000
[ 1028.476702] $ 4   : 8342118c 806adf50 00000000 00000000
[ 1028.482095] $ 8   : 83812ab0 00000000 83812ad4 00000002
[ 1028.487490] $12   : fffffffd 00000402 80731904 00000040
[ 1028.492883] $16   : 8073503c 83419000 807318ec 80730000
[ 1028.498278] $20   : 80730000 808a0000 806a9efc 806a9eec
[ 1028.503671] $24   : 00000000 00000000
[ 1028.509065] $28   : 83c42000 83c43b80 83cefc00 803c1fd0
[ 1028.514459] Hi    : 00474bff
[ 1028.517426] Lo    : b47b3346
[ 1028.520400] epc   : 80764000 0x80764000
[ 1028.524371] ra    : 803c1fd0 del_mtd_device+0x68/0x100
[ 1028.529673] Status: 1100dc03    KERNEL EXL IE
[ 1028.533994] Cause : 00800028 (ExcCode 0a)
[ 1028.538135] PrId  : 0001974c (MIPS 74Kc)
[ 1028.542181] Modules linked in: ath9k ath9k_common pppoe ppp_async
iptable_nat ath9k_hw ath xt_state xt_nat xt_conntrack xt_REDIRECT
xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw
nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time
xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS
xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_filter ip_tables crc_ccitt compat evdev
input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter
ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 sha256_generic
libsha256 seqiv jitterentropy_rng drbg hmac ghash_generic gf128mul gcm
ctr cmac ccm fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd
gpio_button_hotplug usbcore nls_base usb_common aead cryptomgr
crypto_null crypto_hash
[ 1028.615764] Process procd (pid: 2406, threadinfo=2c3273bc,
task=f31e3962, tls=77ef0dcc)
[ 1028.624021] Stack : 00000000 802183b4 80680000 8066ecb0 80731768
83419000 80731978 00000000
[ 1028.632649]         80730000 803c4a5c 80730000 00000001 83c09400
802150e8 000b0054 83418400
[ 1028.641278]         83ff4000 80731978 00000000 803c4a1c 83ff1360
00000000 00000001 808a0000
[ 1028.649906]         83ff1360 80731768 83d99880 80731978 00000000
803c63a4 806a9efc 806a9eec
[ 1028.658535]         83cefc00 83d01d40 83d99880 80733b18 00000000
80733b18 00000044 803c2248
[ 1028.667163]         ...
[ 1028.669683] Call Trace:
[ 1028.669685]
[ 1028.673757] [<802183b4>] sysfs_remove_files+0x38/0x5c
[ 1028.679019] [<803c4a5c>] __mtd_del_partition+0xa8/0x100
[ 1028.684439] [<802150e8>] __kernfs_remove.part.0+0x1e4/0x318
[ 1028.690204] [<803c4a1c>] __mtd_del_partition+0x68/0x100
[ 1028.695604] [<803c63a4>] del_mtd_partitions+0x78/0xf0
[ 1028.700826] [<803c2248>] mtd_device_unregister+0x28/0x5c
[ 1028.706314] [<803a1c80>] __device_release_driver+0x178/0x214
[ 1028.712159] [<8039fdf0>] klist_devices_put+0x0/0x8
[ 1028.717105] [<803a1d48>] device_release_driver+0x2c/0x44
[ 1028.722589] [<803a0f68>] bus_remove_device+0x154/0x168
[ 1028.727905] [<803b7944>] __unregister+0x0/0x20
[ 1028.732522] [<8039cd80>] device_del+0x15c/0x458
[ 1028.737228] [<803428e0>] spi_sync_transfer.constprop.0+0x60/0x6c
[ 1028.743438] [<803b7348>] spi_complete+0x0/0x8
[ 1028.747936] [<803b7944>] __unregister+0x0/0x20
[ 1028.752524] [<803b7910>] spi_unregister_device+0x40/0x74
[ 1028.758007] [<803b7954>] __unregister+0x10/0x20
[ 1028.762685] [<803b7944>] __unregister+0x0/0x20
[ 1028.767277] [<8039bd78>] device_for_each_child+0x50/0xa4
[ 1028.772778] [<803b903c>] spi_unregister_controller+0x3c/0x170
[ 1028.778724] [<803bdb90>] ath79_spi_remove+0x1c/0x78
[ 1028.783766] [<8039f658>] device_shutdown+0x13c/0x1f4
[ 1028.788897] [<800a4814>] blocking_notifier_call_chain+0x74/0xe8
[ 1028.795019] [<800a5d6c>] kernel_restart+0x40/0xac
[ 1028.799877] [<800a5ed8>] __do_sys_reboot+0x100/0x214
[ 1028.805037] [<8012fdc8>] filemap_map_pages+0x3a0/0x3d0
[ 1028.810352] [<801617e4>] handle_mm_fault+0x89c/0xcf0
[ 1028.815520] [<8006fdec>] do_page_fault+0xb4/0x4b8
[ 1028.820379] [<8006e18c>] syscall_common+0x34/0x58
[ 1028.825233]
[ 1028.826769] Code: 00000000  00000000  00000000 <7f454c46> 01020100
00000000  00000000  00030008  00000001
[ 1028.836839]
[ 1028.838458] ---[ end trace 37f524f87b727ac2 ]---
[ 1028.843240] Kernel panic - not syncing: Fatal exception
[ 1028.848639] Rebooting in 3 seconds..

If I comment out calls of rb_hardconfig_exit and rb_softconfig_exit in
routerboot_mtd_notifier_remove, there is no panic. I'm trying to
determine why this is happening.

Regards, Denis
Denis K Nov. 16, 2021, 9:46 a.m. UTC | #12
I got it! _exit macro of rb_hardconfig_init seems to cause this kernel
panic. Here is a patch.

Regards, Denis.
Thibaut Nov. 16, 2021, 10:20 a.m. UTC | #13
Hi,

> Le 16 nov. 2021 à 10:46, Denis K <denis281089@gmail.com> a écrit :
> 
> I got it! _exit macro of rb_hardconfig_init seems to cause this kernel
> panic. Here is a patch.

Thanks; indeed that’s logical: functions marked as __exit are discarded when the module is built into the kernel: they’re no longer available when the mtd_notifier executes the remove() callback. I missed that. Since these are very small, I agree it’s ok to remove the __exit qualifier.

I see the same rationale for removing the __init qualifier from the init functions since the mtd notifier could be called at any point during kernel execution, but that bothers me a bit more because these are somewhat large functions. Then again it’s a tradeoff between doing the right thing (using a notifier) vs using a workaround (late_initcall()). I think it’s acceptable.

Anyway, I’ve refactored the patch with your changes and I added a note and edited the commit message about the point above; see attachment below. Hopefully this one is good to go :)

You are the author of this patch and you did all the heavy lifting (thanks again for that), so if you have no further changes/comments I’d suggest you submit this as a v2. I’ll ACK it then (please CC-me - this should be automatic with my SoB-line added).

Thanks,
Thibaut
Denis K Nov. 16, 2021, 4:17 p.m. UTC | #14
Thank you for your help, Thibaut. I think we've found the right way of
initializing routerboot. I've sent a v2 of the patch.

Denis

вт, 16 нояб. 2021 г. в 13:20, Thibaut <hacks@slashdirt.org>:
>
> Hi,
>
> > Le 16 nov. 2021 à 10:46, Denis K <denis281089@gmail.com> a écrit :
> >
> > I got it! _exit macro of rb_hardconfig_init seems to cause this kernel
> > panic. Here is a patch.
>
> Thanks; indeed that’s logical: functions marked as __exit are discarded when the module is built into the kernel: they’re no longer available when the mtd_notifier executes the remove() callback. I missed that. Since these are very small, I agree it’s ok to remove the __exit qualifier.
>
> I see the same rationale for removing the __init qualifier from the init functions since the mtd notifier could be called at any point during kernel execution, but that bothers me a bit more because these are somewhat large functions. Then again it’s a tradeoff between doing the right thing (using a notifier) vs using a workaround (late_initcall()). I think it’s acceptable.
>
> Anyway, I’ve refactored the patch with your changes and I added a note and edited the commit message about the point above; see attachment below. Hopefully this one is good to go :)
>
> You are the author of this patch and you did all the heavy lifting (thanks again for that), so if you have no further changes/comments I’d suggest you submit this as a v2. I’ll ACK it then (please CC-me - this should be automatic with my SoB-line added).
>
> Thanks,
> Thibaut
>
diff mbox series

Patch

diff --git a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
index 4c8c0bfac5..e5745c88a9 100644
--- a/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
+++ b/target/linux/generic/files/drivers/platform/mikrotik/routerboot.c
@@ -210,7 +210,7 @@  ssize_t routerboot_tag_show_u32s(const u8 *pld, u16 pld_len, char *buf)
 	return out - buf;
 }
 
-module_init(routerboot_init);
+late_initcall(routerboot_init);
 module_exit(routerboot_exit);
 
 MODULE_LICENSE("GPL v2");