diff mbox

[net,1/3] slip/slcan: added locking in wakeup function

Message ID 1379093833-4949-2-git-send-email-nautsch2@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Andre Naujoks Sept. 13, 2013, 5:37 p.m. UTC
The locking is needed, since the the internal buffer for the CAN frames is
changed during the wakeup call. This could cause buffer inconsistencies
under high loads, especially for the outgoing short CAN packet skbuffs.

The needed locks led to deadlocks before commit
"5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra wakeup from pty
write() path", which removed the direct callback to the wakeup function from the
tty layer.

As slcan.c is based on slip.c the issue in the original code is fixed, too.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
---
 drivers/net/can/slcan.c | 3 +++
 drivers/net/slip/slip.c | 3 +++
 2 files changed, 6 insertions(+)

Comments

Oliver Hartkopp Sept. 13, 2013, 6:45 p.m. UTC | #1
On 13.09.2013 19:37, Andre Naujoks wrote:
> The locking is needed, since the the internal buffer for the CAN frames is
> changed during the wakeup call. This could cause buffer inconsistencies
> under high loads, especially for the outgoing short CAN packet skbuffs.
> 
> The needed locks led to deadlocks before commit
> "5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra wakeup from pty
> write() path", which removed the direct callback to the wakeup function from the
> tty layer.
> 
> As slcan.c is based on slip.c the issue in the original code is fixed, too.
> 
> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>

At least for slcan.c:

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>

Tnx for figuring that out with your heavy load testing.

Best regards,
Oliver

> ---
>  drivers/net/can/slcan.c | 3 +++
>  drivers/net/slip/slip.c | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
> index 874188b..d571e2e 100644
> --- a/drivers/net/can/slcan.c
> +++ b/drivers/net/can/slcan.c
> @@ -286,11 +286,13 @@ static void slcan_write_wakeup(struct tty_struct *tty)
>  	if (!sl || sl->magic != SLCAN_MAGIC || !netif_running(sl->dev))
>  		return;
>  
> +	spin_lock(&sl->lock);
>  	if (sl->xleft <= 0)  {
>  		/* Now serial buffer is almost free & we can start
>  		 * transmission of another packet */
>  		sl->dev->stats.tx_packets++;
>  		clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
> +		spin_unlock(&sl->lock);
>  		netif_wake_queue(sl->dev);
>  		return;
>  	}
> @@ -298,6 +300,7 @@ static void slcan_write_wakeup(struct tty_struct *tty)
>  	actual = tty->ops->write(tty, sl->xhead, sl->xleft);
>  	sl->xleft -= actual;
>  	sl->xhead += actual;
> +	spin_unlock(&sl->lock);
>  }
>  
>  /* Send a can_frame to a TTY queue. */
> diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
> index a34d6bf..cc70ecf 100644
> --- a/drivers/net/slip/slip.c
> +++ b/drivers/net/slip/slip.c
> @@ -429,11 +429,13 @@ static void slip_write_wakeup(struct tty_struct *tty)
>  	if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev))
>  		return;
>  
> +	spin_lock(&sl->lock);
>  	if (sl->xleft <= 0)  {
>  		/* Now serial buffer is almost free & we can start
>  		 * transmission of another packet */
>  		sl->dev->stats.tx_packets++;
>  		clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
> +		spin_unlock(&sl->lock);
>  		sl_unlock(sl);
>  		return;
>  	}
> @@ -441,6 +443,7 @@ static void slip_write_wakeup(struct tty_struct *tty)
>  	actual = tty->ops->write(tty, sl->xhead, sl->xleft);
>  	sl->xleft -= actual;
>  	sl->xhead += actual;
> +	spin_unlock(&sl->lock);
>  }
>  
>  static void sl_tx_timeout(struct net_device *dev)
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Kleine-Budde Sept. 19, 2013, 9:36 a.m. UTC | #2
On 09/13/2013 07:37 PM, Andre Naujoks wrote:
> The locking is needed, since the the internal buffer for the CAN frames is
> changed during the wakeup call. This could cause buffer inconsistencies
> under high loads, especially for the outgoing short CAN packet skbuffs.
> 
> The needed locks led to deadlocks before commit
> "5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra wakeup from pty
> write() path", which removed the direct callback to the wakeup function from the
> tty layer.

What does that mean for older kernels?
(< 5ede52538ee2b2202d9dff5b06c33bfde421e6e4)

> As slcan.c is based on slip.c the issue in the original code is fixed, too.
> 
> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-by: Marc Kleine-Budde  <mkl@pengutronix.de>

Marc
Andre Naujoks Sept. 19, 2013, 10:29 a.m. UTC | #3
On 19.09.2013 11:36, schrieb Marc Kleine-Budde:
> On 09/13/2013 07:37 PM, Andre Naujoks wrote:
>> The locking is needed, since the the internal buffer for the CAN
>> frames is changed during the wakeup call. This could cause buffer
>> inconsistencies under high loads, especially for the outgoing
>> short CAN packet skbuffs.
>> 
>> The needed locks led to deadlocks before commit 
>> "5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra
>> wakeup from pty write() path", which removed the direct callback
>> to the wakeup function from the tty layer.
> 
> What does that mean for older kernels? (<
> 5ede52538ee2b2202d9dff5b06c33bfde421e6e4)

It seems the slcan (and slip) driver is broken for older kernels. See
this thread for a discussion about the patch in pty.c.

http://marc.info/?l=linux-kernel&m=137269017002789&w=2

The patch from Peter Hurley was actually already in the queue, when I
ran into the problem, and is now in kernel 3.12.

Without the pty patch and slow CAN traffic, the driver works, because
the wakeup is called directly from the pty driver. That is also the
reason why there was no locking. It would just deadlock.

When the pty driver defers the wakeup, we ran into synchronisation
problems (which should be fixed by the locking) and eventually into a
kernel panic because of a recursive loop (which should be fixed by the
pty.c patch).

Maybe it is possible to get both patches back into the stable branches?

Regards
  Andre

> 
>> As slcan.c is based on slip.c the issue in the original code is
>> fixed, too.
>> 
>> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
> Acked-by: Marc Kleine-Budde  <mkl@pengutronix.de>
> 
> Marc
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Kleine-Budde Sept. 19, 2013, 10:35 a.m. UTC | #4
On 09/19/2013 12:29 PM, Andre Naujoks wrote:
> On 19.09.2013 11:36, schrieb Marc Kleine-Budde:
>> On 09/13/2013 07:37 PM, Andre Naujoks wrote:
>>> The locking is needed, since the the internal buffer for the CAN
>>> frames is changed during the wakeup call. This could cause buffer
>>> inconsistencies under high loads, especially for the outgoing
>>> short CAN packet skbuffs.
>>>
>>> The needed locks led to deadlocks before commit 
>>> "5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra
>>> wakeup from pty write() path", which removed the direct callback
>>> to the wakeup function from the tty layer.
>>
>> What does that mean for older kernels? (<
>> 5ede52538ee2b2202d9dff5b06c33bfde421e6e4)
> 
> It seems the slcan (and slip) driver is broken for older kernels. See
> this thread for a discussion about the patch in pty.c.
> 
> http://marc.info/?l=linux-kernel&m=137269017002789&w=2

Thanks for the info.

> The patch from Peter Hurley was actually already in the queue, when I
> ran into the problem, and is now in kernel 3.12.
> 
> Without the pty patch and slow CAN traffic, the driver works, because
> the wakeup is called directly from the pty driver. That is also the
> reason why there was no locking. It would just deadlock.
> 
> When the pty driver defers the wakeup, we ran into synchronisation
> problems (which should be fixed by the locking) and eventually into a
> kernel panic because of a recursive loop (which should be fixed by the
> pty.c patch).
> 
> Maybe it is possible to get both patches back into the stable branches?

Sounds reasonable. You might get in touch with Peter Hurley, if his
patch is scheduled for stable. Documentation/stable_kernel_rules.txt
suggests a procedure if your patch depends on others to be cherry picked.

Marc
Peter Hurley Sept. 19, 2013, 10:43 a.m. UTC | #5
[ +cc Greg Kroah-Hartman]

On 09/19/2013 06:35 AM, Marc Kleine-Budde wrote:
> On 09/19/2013 12:29 PM, Andre Naujoks wrote:
>> On 19.09.2013 11:36, schrieb Marc Kleine-Budde:
>>> On 09/13/2013 07:37 PM, Andre Naujoks wrote:
>>>> The locking is needed, since the the internal buffer for the CAN
>>>> frames is changed during the wakeup call. This could cause buffer
>>>> inconsistencies under high loads, especially for the outgoing
>>>> short CAN packet skbuffs.
>>>>
>>>> The needed locks led to deadlocks before commit
>>>> "5ede52538ee2b2202d9dff5b06c33bfde421e6e4 tty: Remove extra
>>>> wakeup from pty write() path", which removed the direct callback
>>>> to the wakeup function from the tty layer.
>>>
>>> What does that mean for older kernels? (<
>>> 5ede52538ee2b2202d9dff5b06c33bfde421e6e4)
>>
>> It seems the slcan (and slip) driver is broken for older kernels. See
>> this thread for a discussion about the patch in pty.c.
>>
>> http://marc.info/?l=linux-kernel&m=137269017002789&w=2
>
> Thanks for the info.
>
>> The patch from Peter Hurley was actually already in the queue, when I
>> ran into the problem, and is now in kernel 3.12.
>>
>> Without the pty patch and slow CAN traffic, the driver works, because
>> the wakeup is called directly from the pty driver. That is also the
>> reason why there was no locking. It would just deadlock.
>>
>> When the pty driver defers the wakeup, we ran into synchronisation
>> problems (which should be fixed by the locking) and eventually into a
>> kernel panic because of a recursive loop (which should be fixed by the
>> pty.c patch).
>>
>> Maybe it is possible to get both patches back into the stable branches?
>
> Sounds reasonable. You might get in touch with Peter Hurley, if his
> patch is scheduled for stable. Documentation/stable_kernel_rules.txt
> suggests a procedure if your patch depends on others to be cherry picked.

Already following along.

I'd like to wait for 3.12 release before the pty patch goes to -stable
(so that it gets more in-the-wild testing).

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
index 874188b..d571e2e 100644
--- a/drivers/net/can/slcan.c
+++ b/drivers/net/can/slcan.c
@@ -286,11 +286,13 @@  static void slcan_write_wakeup(struct tty_struct *tty)
 	if (!sl || sl->magic != SLCAN_MAGIC || !netif_running(sl->dev))
 		return;
 
+	spin_lock(&sl->lock);
 	if (sl->xleft <= 0)  {
 		/* Now serial buffer is almost free & we can start
 		 * transmission of another packet */
 		sl->dev->stats.tx_packets++;
 		clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+		spin_unlock(&sl->lock);
 		netif_wake_queue(sl->dev);
 		return;
 	}
@@ -298,6 +300,7 @@  static void slcan_write_wakeup(struct tty_struct *tty)
 	actual = tty->ops->write(tty, sl->xhead, sl->xleft);
 	sl->xleft -= actual;
 	sl->xhead += actual;
+	spin_unlock(&sl->lock);
 }
 
 /* Send a can_frame to a TTY queue. */
diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index a34d6bf..cc70ecf 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -429,11 +429,13 @@  static void slip_write_wakeup(struct tty_struct *tty)
 	if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev))
 		return;
 
+	spin_lock(&sl->lock);
 	if (sl->xleft <= 0)  {
 		/* Now serial buffer is almost free & we can start
 		 * transmission of another packet */
 		sl->dev->stats.tx_packets++;
 		clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+		spin_unlock(&sl->lock);
 		sl_unlock(sl);
 		return;
 	}
@@ -441,6 +443,7 @@  static void slip_write_wakeup(struct tty_struct *tty)
 	actual = tty->ops->write(tty, sl->xhead, sl->xleft);
 	sl->xleft -= actual;
 	sl->xhead += actual;
+	spin_unlock(&sl->lock);
 }
 
 static void sl_tx_timeout(struct net_device *dev)