diff mbox series

[24/29] can: ti_hecc: add fifo underflow error reporting

Message ID 20191010121750.27237-25-mkl@pengutronix.de
State RFC
Delegated to: David Miller
Headers show
Series [01/29] can: dev: add missing of_node_put() after calling of_get_child_by_name() | expand

Commit Message

Marc Kleine-Budde Oct. 10, 2019, 12:17 p.m. UTC
From: Jeroen Hofstee <jhofstee@victronenergy.com>

When the rx fifo overflows the ti_hecc would silently drop them since
the overwrite protection is enabled for all mailboxes. So disable it
for the lowest priority mailbox and increment the rx_fifo_errors when
receive message lost is set. Drop the message itself in that case,
since it might be partially updated.

Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/ti_hecc.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

Comments

Marc Kleine-Budde Oct. 10, 2019, 3:52 p.m. UTC | #1
On 10/10/19 2:17 PM, Marc Kleine-Budde wrote:
> From: Jeroen Hofstee <jhofstee@victronenergy.com>
> 
> When the rx fifo overflows the ti_hecc would silently drop them since
> the overwrite protection is enabled for all mailboxes. So disable it
> for the lowest priority mailbox and increment the rx_fifo_errors when
> receive message lost is set. Drop the message itself in that case,
> since it might be partially updated.

Is that your observation or does the data sheet say anything to this
situation?

> 
> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
>  drivers/net/can/ti_hecc.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
> index 6ea29126c60b..c2d83ada203a 100644
> --- a/drivers/net/can/ti_hecc.c
> +++ b/drivers/net/can/ti_hecc.c
> @@ -82,7 +82,7 @@ MODULE_VERSION(HECC_MODULE_VERSION);
>  #define HECC_CANTA		0x10	/* Transmission acknowledge */
>  #define HECC_CANAA		0x14	/* Abort acknowledge */
>  #define HECC_CANRMP		0x18	/* Receive message pending */
> -#define HECC_CANRML		0x1C	/* Remote message lost */
> +#define HECC_CANRML		0x1C	/* Receive message lost */
>  #define HECC_CANRFP		0x20	/* Remote frame pending */
>  #define HECC_CANGAM		0x24	/* SECC only:Global acceptance mask */
>  #define HECC_CANMC		0x28	/* Master control */
> @@ -385,8 +385,17 @@ static void ti_hecc_start(struct net_device *ndev)
>  	/* Enable tx interrupts */
>  	hecc_set_bit(priv, HECC_CANMIM, BIT(HECC_MAX_TX_MBOX) - 1);
>  
> -	/* Prevent message over-write & Enable interrupts */
> -	hecc_write(priv, HECC_CANOPC, HECC_SET_REG);
> +	/* Prevent message over-write to create a rx fifo, but not for
> +	 * the lowest priority mailbox, since that allows detecting
> +	 * overflows instead of the hardware silently dropping the
> +	 * messages. The lowest rx mailbox is one above the tx ones,
> +	 * hence its mbxno is the number of tx mailboxes.
> +	 */
> +	mbxno = HECC_MAX_TX_MBOX;
> +	mbx_mask = ~BIT(mbxno);
> +	hecc_write(priv, HECC_CANOPC, mbx_mask);
> +
> +	/* Enable interrupts */
>  	if (priv->use_hecc1int) {
>  		hecc_write(priv, HECC_CANMIL, HECC_SET_REG);
>  		hecc_write(priv, HECC_CANGIM, HECC_CANGIM_DEF_MASK |
> @@ -531,6 +540,7 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>  {
>  	struct ti_hecc_priv *priv = rx_offload_to_priv(offload);
>  	u32 data, mbx_mask;
> +	int lost;
>  
>  	mbx_mask = BIT(mbxno);
>  	data = hecc_read_mbx(priv, mbxno, HECC_CANMID);
> @@ -552,9 +562,12 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>  	}
>  
>  	*timestamp = hecc_read_stamp(priv, mbxno);
> +	lost = hecc_read(priv, HECC_CANRML) & mbx_mask;
> +	if (unlikely(lost))
> +		priv->offload.dev->stats.rx_fifo_errors++;

In the flexcan and at91_can driver we're incrementing the following errors:
			dev->stats.rx_over_errors++;
			dev->stats.rx_errors++;

You can save the register access if you only check for overflows if
reading from the lowest prio mailbox.

If you're discarding the data if the mailbox is marked as overflow
there's no need to read the data in the first place.

>  	hecc_write(priv, HECC_CANRMP, mbx_mask);
>  
> -	return 1;
> +	return !lost;
>  }
>  
>  static int ti_hecc_error(struct net_device *ndev, int int_status,
> 

I'll send a v2 that addresses these findings.

Marc
Jeroen Hofstee Oct. 10, 2019, 6:04 p.m. UTC | #2
Attempt 2, now as plain text... (vger doesn't like html)

On 10/10/19 5:52 PM, Marc Kleine-Budde wrote:
> On 10/10/19 2:17 PM, Marc Kleine-Budde wrote:
>> From: Jeroen Hofstee <jhofstee@victronenergy.com>
>>
>> When the rx fifo overflows the ti_hecc would silently drop them since
>> the overwrite protection is enabled for all mailboxes. So disable it
>> for the lowest priority mailbox and increment the rx_fifo_errors when
>> receive message lost is set. Drop the message itself in that case,
>> since it might be partially updated.
> Is that your observation or does the data sheet say anything to this
> situation?


I couldn't find in the data sheet, so I simply tested it, by allowing
the highest mailbox to be overwritten and send a stream alternating
with messages will all bits set and all cleared. That does end with
canids from one message combined with data from another.


>> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>> ---
>>   drivers/net/can/ti_hecc.c | 21 +++++++++++++++++----
>>   1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
>> index 6ea29126c60b..c2d83ada203a 100644
>> --- a/drivers/net/can/ti_hecc.c
>> +++ b/drivers/net/can/ti_hecc.c
>> @@ -82,7 +82,7 @@ MODULE_VERSION(HECC_MODULE_VERSION);
>>   #define HECC_CANTA		0x10	/* Transmission acknowledge */
>>   #define HECC_CANAA		0x14	/* Abort acknowledge */
>>   #define HECC_CANRMP		0x18	/* Receive message pending */
>> -#define HECC_CANRML		0x1C	/* Remote message lost */
>> +#define HECC_CANRML		0x1C	/* Receive message lost */
>>   #define HECC_CANRFP		0x20	/* Remote frame pending */
>>   #define HECC_CANGAM		0x24	/* SECC only:Global acceptance mask */
>>   #define HECC_CANMC		0x28	/* Master control */
>> @@ -385,8 +385,17 @@ static void ti_hecc_start(struct net_device *ndev)
>>   	/* Enable tx interrupts */
>>   	hecc_set_bit(priv, HECC_CANMIM, BIT(HECC_MAX_TX_MBOX) - 1);
>>   
>> -	/* Prevent message over-write & Enable interrupts */
>> -	hecc_write(priv, HECC_CANOPC, HECC_SET_REG);
>> +	/* Prevent message over-write to create a rx fifo, but not for
>> +	 * the lowest priority mailbox, since that allows detecting
>> +	 * overflows instead of the hardware silently dropping the
>> +	 * messages. The lowest rx mailbox is one above the tx ones,
>> +	 * hence its mbxno is the number of tx mailboxes.
>> +	 */
>> +	mbxno = HECC_MAX_TX_MBOX;
>> +	mbx_mask = ~BIT(mbxno);
>> +	hecc_write(priv, HECC_CANOPC, mbx_mask);
>> +
>> +	/* Enable interrupts */
>>   	if (priv->use_hecc1int) {
>>   		hecc_write(priv, HECC_CANMIL, HECC_SET_REG);
>>   		hecc_write(priv, HECC_CANGIM, HECC_CANGIM_DEF_MASK |
>> @@ -531,6 +540,7 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>>   {
>>   	struct ti_hecc_priv *priv = rx_offload_to_priv(offload);
>>   	u32 data, mbx_mask;
>> +	int lost;
>>   
>>   	mbx_mask = BIT(mbxno);
>>   	data = hecc_read_mbx(priv, mbxno, HECC_CANMID);
>> @@ -552,9 +562,12 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>>   	}
>>   
>>   	*timestamp = hecc_read_stamp(priv, mbxno);
>> +	lost = hecc_read(priv, HECC_CANRML) & mbx_mask;
>> +	if (unlikely(lost))
>> +		priv->offload.dev->stats.rx_fifo_errors++;
> In the flexcan and at91_can driver we're incrementing the following errors:
> 			dev->stats.rx_over_errors++;
> 			dev->stats.rx_errors++;


I understood it as follows, see[1] e.g.:

rx_errors -> link level errors, not really applicable to CAN
(perhaps in single shot mode or if you want (and can) report 
retransmissions)

rx_over_errors -> the hardware itself cannot keep up.
Not applicable for CAN.

rx_fifo_errors -> the software driver cannot keep up.
So I picked that one.

rx_dropped -> software is dropping on purpose based on limits etc.

But I might be wrong.


> You can save the register access if you only check for overflows if
> reading from the lowest prio mailbox.
>
> If you're discarding the data if the mailbox is marked as overflow
> there's no need to read the data in the first place.
>
>>   	hecc_write(priv, HECC_CANRMP, mbx_mask);
>>   
>> -	return 1;
>> +	return !lost;
>>   }
>>   
>>   static int ti_hecc_error(struct net_device *ndev, int int_status,
>>

Mind it that you don't cause a race! The bit can become set
during reading of the data, it should be check _after_ we
have a copy of the mailbox. You can do a double check, one
before one after, but since there should be no fifo overflow
anyway, there is no reason to optimize for that path. (@250k
I cannot get more then 3 messages in the fifo...).

Regards,

Jeroen

[1] 
https://community.mellanox.com/s/article/counters-troubleshooting-for-linux-driver
Marc Kleine-Budde Oct. 10, 2019, 8:29 p.m. UTC | #3
On 10/10/19 7:51 PM, Jeroen Hofstee wrote:
>>> When the rx fifo overflows the ti_hecc would silently drop them since
>>> the overwrite protection is enabled for all mailboxes. So disable it
>>> for the lowest priority mailbox and increment the rx_fifo_errors when
>>> receive message lost is set. Drop the message itself in that case,
>>> since it might be partially updated.
>> Is that your observation or does the data sheet say anything to this
>> situation?
> 
> I couldn't find in the data sheet, so I simply tested it, by allowing
> the highest mailbox to be overwritten and send a stream alternating
> with messages will all bits set and all cleared. That does end with
> canids from one message combined with data from another.

I see. This is why the register is called overwrite _protection_ control.

>>> Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>> ---
>>>  drivers/net/can/ti_hecc.c | 21 +++++++++++++++++----
>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
>>> index 6ea29126c60b..c2d83ada203a 100644
>>> --- a/drivers/net/can/ti_hecc.c
>>> +++ b/drivers/net/can/ti_hecc.c
>>> @@ -82,7 +82,7 @@ MODULE_VERSION(HECC_MODULE_VERSION);
>>>  #define HECC_CANTA		0x10	/* Transmission acknowledge */
>>>  #define HECC_CANAA		0x14	/* Abort acknowledge */
>>>  #define HECC_CANRMP		0x18	/* Receive message pending */
>>> -#define HECC_CANRML		0x1C	/* Remote message lost */
>>> +#define HECC_CANRML		0x1C	/* Receive message lost */
>>>  #define HECC_CANRFP		0x20	/* Remote frame pending */
>>>  #define HECC_CANGAM		0x24	/* SECC only:Global acceptance mask */
>>>  #define HECC_CANMC		0x28	/* Master control */
>>> @@ -385,8 +385,17 @@ static void ti_hecc_start(struct net_device *ndev)
>>>  	/* Enable tx interrupts */
>>>  	hecc_set_bit(priv, HECC_CANMIM, BIT(HECC_MAX_TX_MBOX) - 1);
>>>  
>>> -	/* Prevent message over-write & Enable interrupts */
>>> -	hecc_write(priv, HECC_CANOPC, HECC_SET_REG);
>>> +	/* Prevent message over-write to create a rx fifo, but not for
>>> +	 * the lowest priority mailbox, since that allows detecting
>>> +	 * overflows instead of the hardware silently dropping the
>>> +	 * messages. The lowest rx mailbox is one above the tx ones,
>>> +	 * hence its mbxno is the number of tx mailboxes.
>>> +	 */
>>> +	mbxno = HECC_MAX_TX_MBOX;
>>> +	mbx_mask = ~BIT(mbxno);
>>> +	hecc_write(priv, HECC_CANOPC, mbx_mask);
>>> +
>>> +	/* Enable interrupts */
>>>  	if (priv->use_hecc1int) {
>>>  		hecc_write(priv, HECC_CANMIL, HECC_SET_REG);
>>>  		hecc_write(priv, HECC_CANGIM, HECC_CANGIM_DEF_MASK |
>>> @@ -531,6 +540,7 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>>>  {
>>>  	struct ti_hecc_priv *priv = rx_offload_to_priv(offload);
>>>  	u32 data, mbx_mask;
>>> +	int lost;
>>>  
>>>  	mbx_mask = BIT(mbxno);
>>>  	data = hecc_read_mbx(priv, mbxno, HECC_CANMID);
>>> @@ -552,9 +562,12 @@ static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
>>>  	}
>>>  
>>>  	*timestamp = hecc_read_stamp(priv, mbxno);
>>> +	lost = hecc_read(priv, HECC_CANRML) & mbx_mask;
>>> +	if (unlikely(lost))
>>> +		priv->offload.dev->stats.rx_fifo_errors++;
>> In the flexcan and at91_can driver we're incrementing the following errors:
>> 			dev->stats.rx_over_errors++;
>> 			dev->stats.rx_errors++;
> 
> I understood it as follows, see[1] e.g.:
>
> rx_errors -> link level errors, not really applicable to CAN
> (perhaps in single shot mode or if you want)

I increment this for CRC, bit stuffing and all the other bus errors. As
well as on HW FIFO overflows.

> rx_over_errors -> the hardware itself cannot keep up.
> Not applicable for CAN.

If the HW FIFO overflows for whatever reason, I increment this.

> rx_fifo_errors -> the software driver cannot keep up.
> So I picked that one.

If the rx-offload queue reaches it's limit I increment this.

> rx_dropped -> software is dropping on purpose based on limits etc.
> 
> But I might be wrong.

rx-offload used this if the skb cannot be allocated.

Basically the kernel doc gives a general description of these values but
says: look at the driver for exact meaning :)

I wanted to keep it similar with the CAN drivers.

>> You can save the register access if you only check for overflows if
>> reading from the lowest prio mailbox.
>>
>> If you're discarding the data if the mailbox is marked as overflow
>> there's no need to read the data in the first place.
>>
> 
> Mind it that you don't cause a race! The bit can become set
> during reading of the data, it should be check _after_ we
> have a copy of the mailbox.

Right. My understanding of the bit was wrong.

In the flexcan HW there is a similar bit. It says there was an overflow
in this mailbox. But a coherence mechanism guarantees that the mailbox
is not changed by the CAN core, while the ARM accesses it.

> You can do a double check, one
> before one after, but since there should be no fifo overflow
> anyway, there is no reason to optimize for that path. (@250k
> I cannot get more then 3 messages in the fifo...).
Thanks for the explanations,
Marc
diff mbox series

Patch

diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
index 6ea29126c60b..c2d83ada203a 100644
--- a/drivers/net/can/ti_hecc.c
+++ b/drivers/net/can/ti_hecc.c
@@ -82,7 +82,7 @@  MODULE_VERSION(HECC_MODULE_VERSION);
 #define HECC_CANTA		0x10	/* Transmission acknowledge */
 #define HECC_CANAA		0x14	/* Abort acknowledge */
 #define HECC_CANRMP		0x18	/* Receive message pending */
-#define HECC_CANRML		0x1C	/* Remote message lost */
+#define HECC_CANRML		0x1C	/* Receive message lost */
 #define HECC_CANRFP		0x20	/* Remote frame pending */
 #define HECC_CANGAM		0x24	/* SECC only:Global acceptance mask */
 #define HECC_CANMC		0x28	/* Master control */
@@ -385,8 +385,17 @@  static void ti_hecc_start(struct net_device *ndev)
 	/* Enable tx interrupts */
 	hecc_set_bit(priv, HECC_CANMIM, BIT(HECC_MAX_TX_MBOX) - 1);
 
-	/* Prevent message over-write & Enable interrupts */
-	hecc_write(priv, HECC_CANOPC, HECC_SET_REG);
+	/* Prevent message over-write to create a rx fifo, but not for
+	 * the lowest priority mailbox, since that allows detecting
+	 * overflows instead of the hardware silently dropping the
+	 * messages. The lowest rx mailbox is one above the tx ones,
+	 * hence its mbxno is the number of tx mailboxes.
+	 */
+	mbxno = HECC_MAX_TX_MBOX;
+	mbx_mask = ~BIT(mbxno);
+	hecc_write(priv, HECC_CANOPC, mbx_mask);
+
+	/* Enable interrupts */
 	if (priv->use_hecc1int) {
 		hecc_write(priv, HECC_CANMIL, HECC_SET_REG);
 		hecc_write(priv, HECC_CANGIM, HECC_CANGIM_DEF_MASK |
@@ -531,6 +540,7 @@  static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
 {
 	struct ti_hecc_priv *priv = rx_offload_to_priv(offload);
 	u32 data, mbx_mask;
+	int lost;
 
 	mbx_mask = BIT(mbxno);
 	data = hecc_read_mbx(priv, mbxno, HECC_CANMID);
@@ -552,9 +562,12 @@  static unsigned int ti_hecc_mailbox_read(struct can_rx_offload *offload,
 	}
 
 	*timestamp = hecc_read_stamp(priv, mbxno);
+	lost = hecc_read(priv, HECC_CANRML) & mbx_mask;
+	if (unlikely(lost))
+		priv->offload.dev->stats.rx_fifo_errors++;
 	hecc_write(priv, HECC_CANRMP, mbx_mask);
 
-	return 1;
+	return !lost;
 }
 
 static int ti_hecc_error(struct net_device *ndev, int int_status,