Patchwork Gianfar: RX Recycle skb->len error

login
register
mail settings
Submitter ben@bigfootnetworks.com
Date March 20, 2010, 7:54 p.m.
Message ID <A6A1774AFD79E346AE6D49A33CB294530DC19EB5@EX-BE-017-SFO.shared.themessagecenter.com>
Download mbox | patch
Permalink /patch/48215/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

ben@bigfootnetworks.com - March 20, 2010, 7:54 p.m.
We are seeing some random skb data length errors on RX after long-running, full-gigabit traffic.  First, my debugging and solution are based on the following invariant assumption:
(skb->tail - skb->data) == skb->len

If this is wrong, please educate.

After some tracing, here is where the error packets seem to originate:
1.  We are cleaning rx, in gfar_clean_rx_ring;
2.  A new RX skb is drawn from the rx_recycle queue, and obey the above invariant (so, in gfar_new_skb(), __skb_dequeue returns an skb);
3.  At this point skb_reserve is called, which moves data and tail by the same calculated alignamount;
4.  So, newskb is not NULL.  However, !(bdp->status & RXBD_LAST) || (bdp->status & RXBD_ERR)) is evaluates to true;
5.  Since newskb is not NULL, we arrive at the else if (skb), which is true;
6.  skb->data = skb->head + NET_SKB_PAD is applied, and then the skb is requeued for recycling.

At this point, skb->data != skb->tail, but skb->len == 0.  When this skb is used for the next RX, it is causing issues later when we skb_put trailers, and then trust skb->len.

I would propose something like:


Ben Menchaca
Bigfoot Networks

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 22, 2010, 4:46 a.m.
From: "Ben Menchaca (ben@bigfootnetworks.com)" <ben@bigfootnetworks.com>
Date: Sat, 20 Mar 2010 12:54:59 -0700

> We are seeing some random skb data length errors on RX after long-running, full-gigabit traffic.  First, my debugging and solution are based on the following invariant assumption:
> (skb->tail - skb->data) == skb->len
> 
> If this is wrong, please educate.
> 
> After some tracing, here is where the error packets seem to originate:
> 1.  We are cleaning rx, in gfar_clean_rx_ring;
> 2.  A new RX skb is drawn from the rx_recycle queue, and obey the above invariant (so, in gfar_new_skb(), __skb_dequeue returns an skb);
> 3.  At this point skb_reserve is called, which moves data and tail by the same calculated alignamount;
> 4.  So, newskb is not NULL.  However, !(bdp->status & RXBD_LAST) || (bdp->status & RXBD_ERR)) is evaluates to true;
> 5.  Since newskb is not NULL, we arrive at the else if (skb), which is true;
> 6.  skb->data = skb->head + NET_SKB_PAD is applied, and then the skb is requeued for recycling.
> 
> At this point, skb->data != skb->tail, but skb->len == 0.  When this skb is used for the next RX, it is causing issues later when we skb_put trailers, and then trust skb->len.
> 
> I would propose something like:

Thanks for debugging this, some gianfar developers CC:'d.

> @@ -2540,6 +2540,7 @@ 
> 				 * recycle list.
>  				 */
>  				skb->data = skb->head + NET_SKB_PAD;
> +				skb_reset_tail_pointer(skb);
> 				__skb_queue_head(&priv->rx_recycle, skb);
> 			}
> 		} else {

This code is essentially trying to undo skb_reserve()
but as you found it's doing so in a buggy manner.

skb_reserve() adjusts both the 'data' and 'tail' pointers,
but this attempt at a reversal is only modifying 'data'.

Your fix is fine, but really any by-hand modification of
skb->data is a bug, and we should provide an skb_unreserve()
or similar to hide such details away, and use it here.

Anton?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anton Vorontsov - March 22, 2010, 5:24 p.m.
On Sun, Mar 21, 2010 at 09:46:42PM -0700, David Miller wrote:
[...]
> > 				 * recycle list.
> >  				 */
> >  				skb->data = skb->head + NET_SKB_PAD;
> > +				skb_reset_tail_pointer(skb);
> > 				__skb_queue_head(&priv->rx_recycle, skb);
> > 			}
> > 		} else {
> 
> This code is essentially trying to undo skb_reserve()
> but as you found it's doing so in a buggy manner.
> 
> skb_reserve() adjusts both the 'data' and 'tail' pointers,
> but this attempt at a reversal is only modifying 'data'.
> 
> Your fix is fine, but really any by-hand modification of
> skb->data is a bug, and we should provide an skb_unreserve()
> or similar to hide such details away, and use it here.
> 
> Anton?

Yes, skb_unreserve() (or skb_reset_reserved() for naming consistency?)
would be great.

Ben, note that ucc_geth.c driver is also affected by that bug,
so I guess it needs a similar fix.

Thanks,

Patch

--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2540,6 +2540,7 @@  
				 * recycle list.
 				 */
 				skb->data = skb->head + NET_SKB_PAD;
+				skb_reset_tail_pointer(skb);
				__skb_queue_head(&priv->rx_recycle, skb);
			}
		} else {