diff mbox

Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC

Message ID CAM_iQpUEKrcuQvcXO_Rembgk_ZKc-v9Pe2ZEuZ5W_QxdcYDDNQ@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang July 5, 2016, 8:36 p.m. UTC
On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>> Using printk I have confirmed that ppp_pernet() is called from
>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>
>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>> take reference on channels netns").
>
> We have some race condition here, where a parallel ppp_unregister_channel()
> could happen while we are in ppp_connect_channel().
>
> We need some synchronization for them. I am not sure what is the right lock
> here since ppp locking looks crazy.

Matt, could you try if the attached patch helps?

Thanks!

Comments

Matt Bennett July 6, 2016, 12:05 a.m. UTC | #1
On 07/06/2016 08:37 AM, Cong Wang wrote:
> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>> Using printk I have confirmed that ppp_pernet() is called from
>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>
>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>> take reference on channels netns").
>>
>> We have some race condition here, where a parallel ppp_unregister_channel()
>> could happen while we are in ppp_connect_channel().
>>
>> We need some synchronization for them. I am not sure what is the right lock
>> here since ppp locking looks crazy.
>
> Matt, could you try if the attached patch helps?
>
> Thanks!
>
I have given that patch a good amount of testing and the BUG_ON() no 
longer is hit. Whether that is the best fix or not I am unsure?

Either way, the following comment in ppp_unregister_channel() seems 
incorrect to me and should probably be deleted unless it is fixed?

/*
  * This ensures that we have returned from any calls into the
  * the channel's start_xmit or ioctl routine before we proceed.
  */

It appears mutex_lock(&ppp_mutex) what locks ppp_ioctl. ppp_xmit uses 
ppp_xmit_lock(ppp) in ppp_xmit_process.
Cong Wang July 6, 2016, 2:02 a.m. UTC | #2
On Tue, Jul 5, 2016 at 5:05 PM, Matt Bennett
<Matt.Bennett@alliedtelesis.co.nz> wrote:
> On 07/06/2016 08:37 AM, Cong Wang wrote:
>> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>>> Using printk I have confirmed that ppp_pernet() is called from
>>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>>
>>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>>> take reference on channels netns").
>>>
>>> We have some race condition here, where a parallel ppp_unregister_channel()
>>> could happen while we are in ppp_connect_channel().
>>>
>>> We need some synchronization for them. I am not sure what is the right lock
>>> here since ppp locking looks crazy.
>>
>> Matt, could you try if the attached patch helps?
>>
>> Thanks!
>>
> I have given that patch a good amount of testing and the BUG_ON() no
> longer is hit. Whether that is the best fix or not I am unsure?

At least my patch makes the net refcnt sync with pch life-time:
we grab a net refcnt when we allocate a pch, and release it when
we are going to destroy a pch. Makes sense to you?

>
> Either way, the following comment in ppp_unregister_channel() seems
> incorrect to me and should probably be deleted unless it is fixed?
>
> /*
>   * This ensures that we have returned from any calls into the
>   * the channel's start_xmit or ioctl routine before we proceed.
>   */

This comment is pretty old, I think it refers to the pch->ppp
check in ppp_connect_channel().

Thanks.
diff mbox

Patch

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 8dedafa..07f0e49 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2601,8 +2601,6 @@  ppp_unregister_channel(struct ppp_channel *chan)
 	spin_lock_bh(&pn->all_channels_lock);
 	list_del(&pch->list);
 	spin_unlock_bh(&pn->all_channels_lock);
-	put_net(pch->chan_net);
-	pch->chan_net = NULL;
 
 	pch->file.dead = 1;
 	wake_up_interruptible(&pch->file.rwait);
@@ -3136,6 +3134,11 @@  ppp_disconnect_channel(struct channel *pch)
  */
 static void ppp_destroy_channel(struct channel *pch)
 {
+	if (pch->chan_net) {
+		put_net(pch->chan_net);
+		pch->chan_net = NULL;
+	}
+
 	atomic_dec(&channel_count);
 
 	if (!pch->file.dead) {