diff mbox

[LEDE-DEV] libubus: initialize list member for ubus_auto_conn's timer

Message ID 1481614887-18236-1-git-send-email-ardeleanalex@gmail.com
State Rejected
Headers show

Commit Message

Alexandru Ardelean Dec. 13, 2016, 7:41 a.m. UTC
Every once in a while, we'll get stacktrace:
```
(gdb) bt
\#0  0xb7bc4668 in _list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
\#1  list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
\#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
\#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 <conn>) at /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
\#4  module_ubus_fini () at ubus.c:64
\#5  0x100013fc in main (argc=<optimized out>, argv=<optimized out>) at module.c:128
```

In our code, `ubus_auto_connect()` is called, then due to some logic,
the module has to quickly stop, calling ubus_auto_shutdown().

It seems that there is case in where the `timer` timeout, is not yet
registered with the internal list of timeouts from uloop, which
ends up trying to delete an invalid list.

So, one solution is to init the list element of the `timer` of
the `ubus_auto_conn` struct.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
---
 libubus.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Felix Fietkau Dec. 13, 2016, 7:52 a.m. UTC | #1
On 2016-12-13 08:41, Alexandru Ardelean wrote:
> Every once in a while, we'll get stacktrace:
> ```
> (gdb) bt
> \#0  0xb7bc4668 in _list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
> \#1  list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
> \#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
> \#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 <conn>) at /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
> \#4  module_ubus_fini () at ubus.c:64
> \#5  0x100013fc in main (argc=<optimized out>, argv=<optimized out>) at module.c:128
> ```
> 
> In our code, `ubus_auto_connect()` is called, then due to some logic,
> the module has to quickly stop, calling ubus_auto_shutdown().
> 
> It seems that there is case in where the `timer` timeout, is not yet
> registered with the internal list of timeouts from uloop, which
> ends up trying to delete an invalid list.
> 
> So, one solution is to init the list element of the `timer` of
> the `ubus_auto_conn` struct.
> 
> Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
I think this patch is a workaround that simply hides the real issue.
uloop_timeout_cancel will only call list_del if the timer 'pending' flag
is set.

It seems to me that you might be using ubus_auto_connect on
uninitialized memory. If that's the case, you might run into issues in
other places as well, and you should just add a memset to your application.

- Felix
Alexandru Ardelean Dec. 13, 2016, 8:04 a.m. UTC | #2
On Tue, Dec 13, 2016 at 9:52 AM, Felix Fietkau <nbd@nbd.name> wrote:
> On 2016-12-13 08:41, Alexandru Ardelean wrote:
>> Every once in a while, we'll get stacktrace:
>> ```
>> (gdb) bt
>> \#0  0xb7bc4668 in _list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
>> \#1  list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
>> \#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
>> \#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 <conn>) at /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
>> \#4  module_ubus_fini () at ubus.c:64
>> \#5  0x100013fc in main (argc=<optimized out>, argv=<optimized out>) at module.c:128
>> ```
>>
>> In our code, `ubus_auto_connect()` is called, then due to some logic,
>> the module has to quickly stop, calling ubus_auto_shutdown().
>>
>> It seems that there is case in where the `timer` timeout, is not yet
>> registered with the internal list of timeouts from uloop, which
>> ends up trying to delete an invalid list.
>>
>> So, one solution is to init the list element of the `timer` of
>> the `ubus_auto_conn` struct.
>>
>> Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
> I think this patch is a workaround that simply hides the real issue.
> uloop_timeout_cancel will only call list_del if the timer 'pending' flag
> is set.
>
> It seems to me that you might be using ubus_auto_connect on
> uninitialized memory. If that's the case, you might run into issues in
> other places as well, and you should just add a memset to your application.
>
> - Felix

Hmmm, you're right.
It is only one module I'm seeing this in.
Maybe it's some weird memory corruption, or race ; since that
ubus_auto_conn struct is init-ed to 0 [since it's static].

Will dig deeper into that.

Sorry for the noise.
diff mbox

Patch

diff --git a/libubus.c b/libubus.c
index 8163ff7..faa30d3 100644
--- a/libubus.c
+++ b/libubus.c
@@ -337,6 +337,7 @@  static void ubus_auto_connect_cb(struct uloop_timeout *timeout)
 void ubus_auto_connect(struct ubus_auto_conn *conn)
 {
 	conn->timer.cb = ubus_auto_connect_cb;
+	INIT_LIST_HEAD(&conn->timer.list);
 	ubus_auto_connect_cb(&conn->timer);
 }