Message ID | 1358991906.12374.1356.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 01/23/2013 05:45 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote: > >> Excellent, thats the bug. >> >> I'll send a fix asap, thanks ! > > loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE > > Its hard to believe such an old bug never hit us in the past. > > Probably because most of the time, the packet given to netif_rx() is > immediately processed (and loopback device is hard wired ?) I'll test this in the morning...one thing, maybe it's worth adding a WARN_ON in netif_rx() in case there are other bugs of this nature? The dst objects are also garbage collected, so they have a fairly long valid-ish lifetime after they are logically freed. That might be part of why it was so hard to reproduce the bug... Thanks again for all the help on this bug! Hopefully it will now be well and truly squashed. Ben
On Wed, 2013-01-23 at 20:26 -0800, Ben Greear wrote: > I'll test this in the morning...one thing, maybe it's worth adding > a WARN_ON in netif_rx() in case there are other bugs of > this nature? Quite frankly we should do the skb_dst_force() in netif_rx(), and remove it from callers. This will be a net-next patch. Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 05:45 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote: > >> Excellent, thats the bug. >> >> I'll send a fix asap, thanks ! > > loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE > > Its hard to believe such an old bug never hit us in the past. > > Probably because most of the time, the packet given to netif_rx() is > immediately processed (and loopback device is hard wired ?) I have not seen any crashes since this change went into my test bed, so I believe this one is finally fixed! I'm running with your macvlan fixes as well..and even if they did not fix the problem I saw, they at least appear to cause no harm. If you wish, feel free to add: Tested-By: Ben Greear <greearb@candelatech.com> Thanks, Ben > > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c > index 81f8f9e..fcbf680 100644 > --- a/drivers/net/loopback.c > +++ b/drivers/net/loopback.c > @@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb, > > skb_orphan(skb); > > + /* Before queueing this packet to netif_rx(), > + * make sure dst is refcounted. > + */ > + skb_dst_force(skb); > + > skb->protocol = eth_type_trans(skb, dev); > > /* it's OK to use per_cpu_ptr() because BHs are off */ > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote: > iately processed (and loopback device is hard wired ?) > > I have not seen any crashes since this change went into my > test bed, so I believe this one is finally fixed! I'm running > with your macvlan fixes as well..and even if they did not > fix the problem I saw, they at least appear to cause no harm. > > If you wish, feel free to add: > > Tested-By: Ben Greear <greearb@candelatech.com> > > Thanks, > Ben Just to make sure, you still have the WARN_ON_ONCE(skb_dst_is_noref(skb)) in netif_rx() ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/24/2013 12:59 PM, Eric Dumazet wrote: > On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote: >> iately processed (and loopback device is hard wired ?) >> >> I have not seen any crashes since this change went into my >> test bed, so I believe this one is finally fixed! I'm running >> with your macvlan fixes as well..and even if they did not >> fix the problem I saw, they at least appear to cause no harm. >> >> If you wish, feel free to add: >> >> Tested-By: Ben Greear <greearb@candelatech.com> >> >> Thanks, >> Ben > > Just to make sure, you still have the > > WARN_ON_ONCE(skb_dst_is_noref(skb)) > > in netif_rx() ? > I have something similar, and I have not seen any splats: int netif_rx(struct sk_buff *skb) { int ret; WARN_ON(skb->_skb_refdst & SKB_DST_NOREF); I'll keep this in our kernels so when we do .1q vlans and other sorts of testing we'll get some more coverage. Thanks, Ben
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index 81f8f9e..fcbf680 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb, skb_orphan(skb); + /* Before queueing this packet to netif_rx(), + * make sure dst is refcounted. + */ + skb_dst_force(skb); + skb->protocol = eth_type_trans(skb, dev); /* it's OK to use per_cpu_ptr() because BHs are off */