Patchwork Fix SJA1000 command register writes on SMP systems

login
register
mail settings
Submitter Oliver Hartkopp
Date May 17, 2010, 11:06 a.m.
Message ID <4BF12321.6080506@hartkopp.net>
Download mbox | patch
Permalink /patch/52784/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Oliver Hartkopp - May 17, 2010, 11:06 a.m.
The SJA1000 command register is concurrently written in the rx-path to free
the receive buffer _and_ in the tx-path to start the transmission.
On SMP systems this leads to a write stall in the tx-path, which can be
solved by adding some locking for the command register in the SMP case.

Thanks to Klaus Hitschler for the original fix and detailed problem
description.

This patch applies on net-2.6 and (with some small offsets) on net-next-2.6.
It is also a candidate for the stable series.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>

---



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Grandegger - May 17, 2010, 2:29 p.m.
Hi Oliver,

On 05/17/2010 01:06 PM, Oliver Hartkopp wrote:
> The SJA1000 command register is concurrently written in the rx-path to free
> the receive buffer _and_ in the tx-path to start the transmission.
> On SMP systems this leads to a write stall in the tx-path, which can be
> solved by adding some locking for the command register in the SMP case.

We should explain why a write stall can happen. Here is the relavant
part from the SJA1000 data sheet, 6.4.4 COMMAND REGISTER (CMR):

"Between two commands at least one internal clock cycle is needed in
order to proceed. The internal clock is half of the external oscillator
frequency."

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp - May 17, 2010, 7:47 p.m.
On 17.05.2010 16:29, Wolfgang Grandegger wrote:
> Hi Oliver,
> 
> On 05/17/2010 01:06 PM, Oliver Hartkopp wrote:
>> The SJA1000 command register is concurrently written in the rx-path to free
>> the receive buffer _and_ in the tx-path to start the transmission.
>> On SMP systems this leads to a write stall in the tx-path, which can be
>> solved by adding some locking for the command register in the SMP case.
> 
> We should explain why a write stall can happen. Here is the relavant
> part from the SJA1000 data sheet, 6.4.4 COMMAND REGISTER (CMR):
> 
> "Between two commands at least one internal clock cycle is needed in
> order to proceed. The internal clock is half of the external oscillator
> frequency."

The delay directly after the register access can only be guaranteed when there
is some locking around the command register write access.

In the end it boils down to a SMP issue again as this is (from all known
environments) the only case, where the problem appears in reality.
This was also what i've taken from the discussion on the SocketCAN ML.

I don't stick on the patch description. Would you like to produce a different
one? My Acked-by for the code remains sure :-)

Regards,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wolfgang Grandegger - May 17, 2010, 8:04 p.m.
On 05/17/2010 09:47 PM, Oliver Hartkopp wrote:
> On 17.05.2010 16:29, Wolfgang Grandegger wrote:
>> Hi Oliver,
>>
>> On 05/17/2010 01:06 PM, Oliver Hartkopp wrote:
>>> The SJA1000 command register is concurrently written in the rx-path to free
>>> the receive buffer _and_ in the tx-path to start the transmission.
>>> On SMP systems this leads to a write stall in the tx-path, which can be
>>> solved by adding some locking for the command register in the SMP case.
>>
>> We should explain why a write stall can happen. Here is the relavant
>> part from the SJA1000 data sheet, 6.4.4 COMMAND REGISTER (CMR):
>>
>> "Between two commands at least one internal clock cycle is needed in
>> order to proceed. The internal clock is half of the external oscillator
>> frequency."
> 
> The delay directly after the register access can only be guaranteed when there
> is some locking around the command register write access.

Of course.

> In the end it boils down to a SMP issue again as this is (from all known
> environments) the only case, where the problem appears in reality.
> This was also what i've taken from the discussion on the SocketCAN ML.

I know.

> I don't stick on the patch description. Would you like to produce a different
> one? My Acked-by for the code remains sure :-)

I just suggested to mention the hardware requirements in the patch
description so people can understand why we need it. Feel free to add
what I wrote above.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index 145b1a7..2760085 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -84,6 +84,27 @@  static struct can_bittiming_const sja1000_bittiming_const = {
 	.brp_inc = 1,
 };
 
+static void sja1000_write_cmdreg(struct sja1000_priv *priv, u8 val)
+{
+	/* the command register needs some locking on SMP systems */
+
+#ifdef CONFIG_SMP
+
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->cmdreg_lock, flags);
+	priv->write_reg(priv, REG_CMR, val);
+	priv->read_reg(priv, REG_SR);
+	spin_unlock_irqrestore(&priv->cmdreg_lock, flags);
+
+#else
+
+	/* write to the command register without locking */
+	priv->write_reg(priv, REG_CMR, val);
+
+#endif
+}
+
 static int sja1000_probe_chip(struct net_device *dev)
 {
 	struct sja1000_priv *priv = netdev_priv(dev);
@@ -297,7 +318,7 @@  static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
 
 	can_put_echo_skb(skb, dev, 0);
 
-	priv->write_reg(priv, REG_CMR, CMD_TR);
+	sja1000_write_cmdreg(priv, CMD_TR);
 
 	return NETDEV_TX_OK;
 }
@@ -346,7 +367,7 @@  static void sja1000_rx(struct net_device *dev)
 	cf->can_id = id;
 
 	/* release receive buffer */
-	priv->write_reg(priv, REG_CMR, CMD_RRB);
+	sja1000_write_cmdreg(priv, CMD_RRB);
 
 	netif_rx(skb);
 
@@ -374,7 +395,7 @@  static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
 		cf->data[1] = CAN_ERR_CRTL_RX_OVERFLOW;
 		stats->rx_over_errors++;
 		stats->rx_errors++;
-		priv->write_reg(priv, REG_CMR, CMD_CDO);	/* clear bit */
+		sja1000_write_cmdreg(priv, CMD_CDO);	/* clear bit */
 	}
 
 	if (isrc & IRQ_EI) {
diff --git a/drivers/net/can/sja1000/sja1000.h b/drivers/net/can/sja1000/sja1000.h
index 97a622b..4527d95 100644
--- a/drivers/net/can/sja1000/sja1000.h
+++ b/drivers/net/can/sja1000/sja1000.h
@@ -168,6 +168,10 @@  struct sja1000_priv {
 	void __iomem *reg_base;	 /* ioremap'ed address to registers */
 	unsigned long irq_flags; /* for request_irq() */
 
+#ifdef CONFIG_SMP
+	spinlock_t cmdreg_lock; /* lock for concurrent cmd register writes */
+#endif
+
 	u16 flags;		/* custom mode flags */
 	u8 ocr;			/* output control register */
 	u8 cdr;			/* clock divider register */