diff mbox

net: Fix potential NULL pointer dereference in __skb_try_recv_datagram

Message ID 1451921108.8255.74.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 4, 2016, 3:25 p.m. UTC
On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
> > <jacob@teenage.engineering> wrote:
> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
> >>> How often can you trigger this bug ?
> >>
> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> >> few times when bringing up/down network interfaces. Does the trace
> >> give any clue?
> >>
> >
> > A little bit. You need to help people to narrow down the problem
> > because there are too many places using skb->next and skb->prev.
> >
> > Since you mentioned it seems related to network interface flip,
> > what network interfaces are you using? What's is your TC setup?
> >
> > Thanks.
> 
> The system contains only one physical network interface (TI WL1837,
> wl18xx module).
> The state prior to the crash was as follows:
> - One virtual network interface active (as STA, associated with access point)
> - Bluetooth (BLE only) active (same physical chip, co-existence,
> btwilink/st_drv modules)
> 
> Actions made around the time of the crash:
> - Bluetooth disabled
> - One additional virtual network interface brought up (also as STA)
> 
> I believe the crash occurred between these two actions. I just saw
> that there are some interesting events in the log prior to the crash:
> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> kernel: (stc):  proto stack 4's ->recv failed
> kernel: (stc): remove_channel_from_table: id 3
> kernel: (stc): remove_channel_from_table: id 2
> kernel: (stc): remove_channel_from_table: id 4
> kernel: (stc):  all chnl_ids unregistered
> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> 
> The first print is from btwilink.c. However, I can't see the
> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> 6LoWPAN or anything similar).
> 
> Thanks, Jacob

Definitely these details are useful ;)

Could you try :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rainer Weikusat Jan. 4, 2016, 4:14 p.m. UTC | #1
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:

[...]

>> I believe the crash occurred between these two actions. I just saw
>> that there are some interesting events in the log prior to the crash:
>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> kernel: (stc):  proto stack 4's ->recv failed
>> kernel: (stc): remove_channel_from_table: id 3
>> kernel: (stc): remove_channel_from_table: id 2
>> kernel: (stc): remove_channel_from_table: id 4
>> kernel: (stc):  all chnl_ids unregistered
>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>> 
>> The first print is from btwilink.c. However, I can't see the
>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> 6LoWPAN or anything similar).
>> 
>> Thanks, Jacob
>
> Definitely these details are useful ;)
>
> Could you try :
>
> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> index 6e3af8b42cdd..0c99a74fb895 100644
> --- a/drivers/misc/ti-st/st_core.c
> +++ b/drivers/misc/ti-st/st_core.c
> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>  		skb_queue_purge(&st_gdata->txq);
>  		skb_queue_purge(&st_gdata->tx_waitq);
>  		kfree_skb(st_gdata->rx_skb);
> +		st_gdata->rx_skb = NULL;
>  		kfree_skb(st_gdata->tx_skb);
> +		st_gdata->tx_skb = NULL;
>  		/* TTY ldisc cleanup */
>  		err = tty_unregister_ldisc(N_TI_WL);
>  		if (err)

Hmm ... the code continues with

		err = tty_unregister_ldisc(N_TI_WL);
		if (err)
			pr_err("unable to un-register ldisc %ld", err);
		/* free the global data pointer */
		kfree(st_gdata);

So who would ever see that the rx_skb and tx_skb pointers were cleared
prior to freeing the data structure containing them?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 4, 2016, 4:45 p.m. UTC | #2
On Mon, 2016-01-04 at 16:14 +0000, Rainer Weikusat wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> 
> [...]
> 
> >> I believe the crash occurred between these two actions. I just saw
> >> that there are some interesting events in the log prior to the crash:
> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> >> kernel: (stc):  proto stack 4's ->recv failed
> >> kernel: (stc): remove_channel_from_table: id 3
> >> kernel: (stc): remove_channel_from_table: id 2
> >> kernel: (stc): remove_channel_from_table: id 4
> >> kernel: (stc):  all chnl_ids unregistered
> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> >> 
> >> The first print is from btwilink.c. However, I can't see the
> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> >> 6LoWPAN or anything similar).
> >> 
> >> Thanks, Jacob
> >
> > Definitely these details are useful ;)
> >
> > Could you try :
> >
> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> > index 6e3af8b42cdd..0c99a74fb895 100644
> > --- a/drivers/misc/ti-st/st_core.c
> > +++ b/drivers/misc/ti-st/st_core.c
> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
> >  		skb_queue_purge(&st_gdata->txq);
> >  		skb_queue_purge(&st_gdata->tx_waitq);
> >  		kfree_skb(st_gdata->rx_skb);
> > +		st_gdata->rx_skb = NULL;
> >  		kfree_skb(st_gdata->tx_skb);
> > +		st_gdata->tx_skb = NULL;
> >  		/* TTY ldisc cleanup */
> >  		err = tty_unregister_ldisc(N_TI_WL);
> >  		if (err)
> 
> Hmm ... the code continues with
> 
> 		err = tty_unregister_ldisc(N_TI_WL);
> 		if (err)
> 			pr_err("unable to un-register ldisc %ld", err);
> 		/* free the global data pointer */
> 		kfree(st_gdata);
> 
> So who would ever see that the rx_skb and tx_skb pointers were cleared
> prior to freeing the data structure containing them?

This is the theory, but I suspect a use after free.

kfree(st_gdata) does not clear all content with 0, unless you use
special SLUB/SLAB debugging features.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacob Siverskog Jan. 5, 2016, 11:07 a.m. UTC | #3
On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>> > <jacob@teenage.engineering> wrote:
>> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>> >>> How often can you trigger this bug ?
>> >>
>> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>> >> few times when bringing up/down network interfaces. Does the trace
>> >> give any clue?
>> >>
>> >
>> > A little bit. You need to help people to narrow down the problem
>> > because there are too many places using skb->next and skb->prev.
>> >
>> > Since you mentioned it seems related to network interface flip,
>> > what network interfaces are you using? What's is your TC setup?
>> >
>> > Thanks.
>>
>> The system contains only one physical network interface (TI WL1837,
>> wl18xx module).
>> The state prior to the crash was as follows:
>> - One virtual network interface active (as STA, associated with access point)
>> - Bluetooth (BLE only) active (same physical chip, co-existence,
>> btwilink/st_drv modules)
>>
>> Actions made around the time of the crash:
>> - Bluetooth disabled
>> - One additional virtual network interface brought up (also as STA)
>>
>> I believe the crash occurred between these two actions. I just saw
>> that there are some interesting events in the log prior to the crash:
>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> kernel: (stc):  proto stack 4's ->recv failed
>> kernel: (stc): remove_channel_from_table: id 3
>> kernel: (stc): remove_channel_from_table: id 2
>> kernel: (stc): remove_channel_from_table: id 4
>> kernel: (stc):  all chnl_ids unregistered
>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>>
>> The first print is from btwilink.c. However, I can't see the
>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> 6LoWPAN or anything similar).
>>
>> Thanks, Jacob
>
> Definitely these details are useful ;)
>
> Could you try :
>
> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> index 6e3af8b42cdd..0c99a74fb895 100644
> --- a/drivers/misc/ti-st/st_core.c
> +++ b/drivers/misc/ti-st/st_core.c
> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>                 skb_queue_purge(&st_gdata->txq);
>                 skb_queue_purge(&st_gdata->tx_waitq);
>                 kfree_skb(st_gdata->rx_skb);
> +               st_gdata->rx_skb = NULL;
>                 kfree_skb(st_gdata->tx_skb);
> +               st_gdata->tx_skb = NULL;
>                 /* TTY ldisc cleanup */
>                 err = tty_unregister_ldisc(N_TI_WL);
>                 if (err)
>
>

Sure. Since I don't have a good way to trigger the initial issue, I
can't really know if there is a difference with your patch. However,
normal usage seems to work as expected with your patch. I've tried to
reproduce the initial issue with and without your patch repeatedly for
hours and have not seen any crash in any of the runs so far.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 5, 2016, 2:14 p.m. UTC | #4
On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> >> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
> >> > <jacob@teenage.engineering> wrote:
> >> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
> >> >>> How often can you trigger this bug ?
> >> >>
> >> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> >> >> few times when bringing up/down network interfaces. Does the trace
> >> >> give any clue?
> >> >>
> >> >
> >> > A little bit. You need to help people to narrow down the problem
> >> > because there are too many places using skb->next and skb->prev.
> >> >
> >> > Since you mentioned it seems related to network interface flip,
> >> > what network interfaces are you using? What's is your TC setup?
> >> >
> >> > Thanks.
> >>
> >> The system contains only one physical network interface (TI WL1837,
> >> wl18xx module).
> >> The state prior to the crash was as follows:
> >> - One virtual network interface active (as STA, associated with access point)
> >> - Bluetooth (BLE only) active (same physical chip, co-existence,
> >> btwilink/st_drv modules)
> >>
> >> Actions made around the time of the crash:
> >> - Bluetooth disabled
> >> - One additional virtual network interface brought up (also as STA)
> >>
> >> I believe the crash occurred between these two actions. I just saw
> >> that there are some interesting events in the log prior to the crash:
> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> >> kernel: (stc):  proto stack 4's ->recv failed
> >> kernel: (stc): remove_channel_from_table: id 3
> >> kernel: (stc): remove_channel_from_table: id 2
> >> kernel: (stc): remove_channel_from_table: id 4
> >> kernel: (stc):  all chnl_ids unregistered
> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> >>
> >> The first print is from btwilink.c. However, I can't see the
> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> >> 6LoWPAN or anything similar).
> >>
> >> Thanks, Jacob
> >
> > Definitely these details are useful ;)
> >
> > Could you try :
> >
> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> > index 6e3af8b42cdd..0c99a74fb895 100644
> > --- a/drivers/misc/ti-st/st_core.c
> > +++ b/drivers/misc/ti-st/st_core.c
> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
> >                 skb_queue_purge(&st_gdata->txq);
> >                 skb_queue_purge(&st_gdata->tx_waitq);
> >                 kfree_skb(st_gdata->rx_skb);
> > +               st_gdata->rx_skb = NULL;
> >                 kfree_skb(st_gdata->tx_skb);
> > +               st_gdata->tx_skb = NULL;
> >                 /* TTY ldisc cleanup */
> >                 err = tty_unregister_ldisc(N_TI_WL);
> >                 if (err)
> >
> >
> 
> Sure. Since I don't have a good way to trigger the initial issue, I
> can't really know if there is a difference with your patch. However,
> normal usage seems to work as expected with your patch. I've tried to
> reproduce the initial issue with and without your patch repeatedly for
> hours and have not seen any crash in any of the runs so far.
> --

You might build a kernel with KASAN support to get maybe more chances to
trigger the bug.

( https://www.kernel.org/doc/Documentation/kasan.txt )



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacob Siverskog Jan. 5, 2016, 2:34 p.m. UTC | #5
On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
>> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>> >> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>> >> > <jacob@teenage.engineering> wrote:
>> >> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>> >> >>> How often can you trigger this bug ?
>> >> >>
>> >> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>> >> >> few times when bringing up/down network interfaces. Does the trace
>> >> >> give any clue?
>> >> >>
>> >> >
>> >> > A little bit. You need to help people to narrow down the problem
>> >> > because there are too many places using skb->next and skb->prev.
>> >> >
>> >> > Since you mentioned it seems related to network interface flip,
>> >> > what network interfaces are you using? What's is your TC setup?
>> >> >
>> >> > Thanks.
>> >>
>> >> The system contains only one physical network interface (TI WL1837,
>> >> wl18xx module).
>> >> The state prior to the crash was as follows:
>> >> - One virtual network interface active (as STA, associated with access point)
>> >> - Bluetooth (BLE only) active (same physical chip, co-existence,
>> >> btwilink/st_drv modules)
>> >>
>> >> Actions made around the time of the crash:
>> >> - Bluetooth disabled
>> >> - One additional virtual network interface brought up (also as STA)
>> >>
>> >> I believe the crash occurred between these two actions. I just saw
>> >> that there are some interesting events in the log prior to the crash:
>> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> >> kernel: (stc):  proto stack 4's ->recv failed
>> >> kernel: (stc): remove_channel_from_table: id 3
>> >> kernel: (stc): remove_channel_from_table: id 2
>> >> kernel: (stc): remove_channel_from_table: id 4
>> >> kernel: (stc):  all chnl_ids unregistered
>> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>> >>
>> >> The first print is from btwilink.c. However, I can't see the
>> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> >> 6LoWPAN or anything similar).
>> >>
>> >> Thanks, Jacob
>> >
>> > Definitely these details are useful ;)
>> >
>> > Could you try :
>> >
>> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
>> > index 6e3af8b42cdd..0c99a74fb895 100644
>> > --- a/drivers/misc/ti-st/st_core.c
>> > +++ b/drivers/misc/ti-st/st_core.c
>> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>> >                 skb_queue_purge(&st_gdata->txq);
>> >                 skb_queue_purge(&st_gdata->tx_waitq);
>> >                 kfree_skb(st_gdata->rx_skb);
>> > +               st_gdata->rx_skb = NULL;
>> >                 kfree_skb(st_gdata->tx_skb);
>> > +               st_gdata->tx_skb = NULL;
>> >                 /* TTY ldisc cleanup */
>> >                 err = tty_unregister_ldisc(N_TI_WL);
>> >                 if (err)
>> >
>> >
>>
>> Sure. Since I don't have a good way to trigger the initial issue, I
>> can't really know if there is a difference with your patch. However,
>> normal usage seems to work as expected with your patch. I've tried to
>> reproduce the initial issue with and without your patch repeatedly for
>> hours and have not seen any crash in any of the runs so far.
>> --
>
> You might build a kernel with KASAN support to get maybe more chances to
> trigger the bug.
>
> ( https://www.kernel.org/doc/Documentation/kasan.txt )
>

Ah. Doesn't seem to be supported on arm(32) unfortunately.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 5, 2016, 2:39 p.m. UTC | #6
On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> >
> > You might build a kernel with KASAN support to get maybe more chances to
> > trigger the bug.
> >
> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
> >
> 
> Ah. Doesn't seem to be supported on arm(32) unfortunately.

Then you could at least use standard debugging features :

CONFIG_SLAB=y
CONFIG_SLABINFO=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

(Or equivalent SLUB options)

and

CONFIG_DEBUG_PAGEALLOC=y

(If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacob Siverskog Jan. 20, 2016, 3:06 p.m. UTC | #7
On Tue, Jan 5, 2016 at 3:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
>> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> >
>> > You might build a kernel with KASAN support to get maybe more chances to
>> > trigger the bug.
>> >
>> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
>> >
>>
>> Ah. Doesn't seem to be supported on arm(32) unfortunately.
>
> Then you could at least use standard debugging features :
>
> CONFIG_SLAB=y
> CONFIG_SLABINFO=y
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
>
> (Or equivalent SLUB options)
>
> and
>
> CONFIG_DEBUG_PAGEALLOC=y
>
> (If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)

I tried with those enabled and while toggling power on the Bluetooth
interface I usually get this after a few iterations:
kernel: Bluetooth: Unable to push skb to HCI core(-6)
kernel: (stc):  proto stack 4's ->recv failed
kernel: Slab corruption (Not tainted): skbuff_head_cache start=c08a8a00, len=176
kernel: 0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b a5  kkkkkkkkkkkkjkk.
kernel: Prev obj: start=c08a8940, len=176
kernel: 000: 00 00 00 00 00 00 00 00 31 73 52 00 43 17 2b 14  ........1sR.C.+.
kernel: 010: 00 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00  ................
kernel: Next obj: start=c08a8ac0, len=176
kernel: 000: 00 00 00 00 00 00 00 00 01 42 f6 50 36 17 2b 14  .........B.P6.+.
kernel: 010: 00 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00  ................

The "Unable to push skb" and "recv failed" lines always appear before
the corruption.

Unfortunately, the corruptions occur also with your patch.
Peter Hurley Jan. 20, 2016, 4:38 p.m. UTC | #8
Hi Jacob,

On 01/05/2016 06:34 AM, Jacob Siverskog wrote:
> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
>>> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>>>>> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>> On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>>>>>> <jacob@teenage.engineering> wrote:
>>>>>>> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>> How often can you trigger this bug ?
>>>>>>>
>>>>>>> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>>>>>>> few times when bringing up/down network interfaces. Does the trace
>>>>>>> give any clue?
>>>>>>>
>>>>>>
>>>>>> A little bit. You need to help people to narrow down the problem
>>>>>> because there are too many places using skb->next and skb->prev.
>>>>>>
>>>>>> Since you mentioned it seems related to network interface flip,
>>>>>> what network interfaces are you using? What's is your TC setup?
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>> The system contains only one physical network interface (TI WL1837,
>>>>> wl18xx module).
>>>>> The state prior to the crash was as follows:
>>>>> - One virtual network interface active (as STA, associated with access point)
>>>>> - Bluetooth (BLE only) active (same physical chip, co-existence,
>>>>> btwilink/st_drv modules)
>>>>>
>>>>> Actions made around the time of the crash:
>>>>> - Bluetooth disabled
>>>>> - One additional virtual network interface brought up (also as STA)
>>>>>
>>>>> I believe the crash occurred between these two actions. I just saw
>>>>> that there are some interesting events in the log prior to the crash:
>>>>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>>>>> kernel: (stc):  proto stack 4's ->recv failed
>>>>> kernel: (stc): remove_channel_from_table: id 3
>>>>> kernel: (stc): remove_channel_from_table: id 2
>>>>> kernel: (stc): remove_channel_from_table: id 4
>>>>> kernel: (stc):  all chnl_ids unregistered
>>>>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>>>>>
>>>>> The first print is from btwilink.c. However, I can't see the
>>>>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>>>>> 6LoWPAN or anything similar).
>>>>>
>>>>> Thanks, Jacob
>>>>
>>>> Definitely these details are useful ;)
>>>>
>>>> Could you try :
>>>>
>>>> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
>>>> index 6e3af8b42cdd..0c99a74fb895 100644
>>>> --- a/drivers/misc/ti-st/st_core.c
>>>> +++ b/drivers/misc/ti-st/st_core.c
>>>> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>>>>                 skb_queue_purge(&st_gdata->txq);
>>>>                 skb_queue_purge(&st_gdata->tx_waitq);
>>>>                 kfree_skb(st_gdata->rx_skb);
>>>> +               st_gdata->rx_skb = NULL;
>>>>                 kfree_skb(st_gdata->tx_skb);
>>>> +               st_gdata->tx_skb = NULL;
>>>>                 /* TTY ldisc cleanup */
>>>>                 err = tty_unregister_ldisc(N_TI_WL);
>>>>                 if (err)

FWIW,

You don't need that ti-st junk to get the WL1837 working; the WL1837 only
has BT channels. Unfortunately, that's really all I can say about it; sorry.

Regards,
Peter Hurley
diff mbox

Patch

diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
index 6e3af8b42cdd..0c99a74fb895 100644
--- a/drivers/misc/ti-st/st_core.c
+++ b/drivers/misc/ti-st/st_core.c
@@ -912,7 +912,9 @@  void st_core_exit(struct st_data_s *st_gdata)
 		skb_queue_purge(&st_gdata->txq);
 		skb_queue_purge(&st_gdata->tx_waitq);
 		kfree_skb(st_gdata->rx_skb);
+		st_gdata->rx_skb = NULL;
 		kfree_skb(st_gdata->tx_skb);
+		st_gdata->tx_skb = NULL;
 		/* TTY ldisc cleanup */
 		err = tty_unregister_ldisc(N_TI_WL);
 		if (err)