diff mbox

3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic.

Message ID 1358991906.12374.1356.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 24, 2013, 1:45 a.m. UTC
On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:

> Excellent, thats the bug.
> 
> I'll send a fix asap, thanks !

loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE

Its hard to believe such an old bug never hit us in the past.

Probably because most of the time, the packet given to netif_rx() is
immediately processed (and loopback device is hard wired ?)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Greear Jan. 24, 2013, 4:26 a.m. UTC | #1
On 01/23/2013 05:45 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:
>
>> Excellent, thats the bug.
>>
>> I'll send a fix asap, thanks !
>
> loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE
>
> Its hard to believe such an old bug never hit us in the past.
>
> Probably because most of the time, the packet given to netif_rx() is
> immediately processed (and loopback device is hard wired ?)

I'll test this in the morning...one thing, maybe it's worth adding
a WARN_ON in netif_rx() in case there are other bugs of
this nature?

The dst objects are also garbage collected, so they have a fairly
long valid-ish lifetime after they are logically freed.  That might
be part of why it was so hard to reproduce the bug...

Thanks again for all the help on this bug!  Hopefully it will now
be well and truly squashed.

Ben
Eric Dumazet Jan. 24, 2013, 5:39 a.m. UTC | #2
On Wed, 2013-01-23 at 20:26 -0800, Ben Greear wrote:

> I'll test this in the morning...one thing, maybe it's worth adding
> a WARN_ON in netif_rx() in case there are other bugs of
> this nature?

Quite frankly we should do the skb_dst_force() in netif_rx(), and remove
it from callers. This will be a net-next patch.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Jan. 24, 2013, 8:03 p.m. UTC | #3
On 01/23/2013 05:45 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:
>
>> Excellent, thats the bug.
>>
>> I'll send a fix asap, thanks !
>
> loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE
>
> Its hard to believe such an old bug never hit us in the past.
>
> Probably because most of the time, the packet given to netif_rx() is
> immediately processed (and loopback device is hard wired ?)

I have not seen any crashes since this change went into my
test bed, so I believe this one is finally fixed!  I'm running
with your macvlan fixes as well..and even if they did not
fix the problem I saw, they at least appear to cause no harm.

If you wish, feel free to add:

Tested-By:  Ben Greear <greearb@candelatech.com>

Thanks,
Ben

>
>
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index 81f8f9e..fcbf680 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
>
>   	skb_orphan(skb);
>
> +	/* Before queueing this packet to netif_rx(),
> +	 * make sure dst is refcounted.
> +	 */
> +	skb_dst_force(skb);
> +
>   	skb->protocol = eth_type_trans(skb, dev);
>
>   	/* it's OK to use per_cpu_ptr() because BHs are off */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Eric Dumazet Jan. 24, 2013, 8:59 p.m. UTC | #4
On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote:
> iately processed (and loopback device is hard wired ?)
> 
> I have not seen any crashes since this change went into my
> test bed, so I believe this one is finally fixed!  I'm running
> with your macvlan fixes as well..and even if they did not
> fix the problem I saw, they at least appear to cause no harm.
> 
> If you wish, feel free to add:
> 
> Tested-By:  Ben Greear <greearb@candelatech.com>
> 
> Thanks,
> Ben

Just to make sure, you still have the

WARN_ON_ONCE(skb_dst_is_noref(skb))

in netif_rx() ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Jan. 24, 2013, 9:01 p.m. UTC | #5
On 01/24/2013 12:59 PM, Eric Dumazet wrote:
> On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote:
>> iately processed (and loopback device is hard wired ?)
>>
>> I have not seen any crashes since this change went into my
>> test bed, so I believe this one is finally fixed!  I'm running
>> with your macvlan fixes as well..and even if they did not
>> fix the problem I saw, they at least appear to cause no harm.
>>
>> If you wish, feel free to add:
>>
>> Tested-By:  Ben Greear <greearb@candelatech.com>
>>
>> Thanks,
>> Ben
>
> Just to make sure, you still have the
>
> WARN_ON_ONCE(skb_dst_is_noref(skb))
>
> in netif_rx() ?
>

I have something similar, and I have not seen any splats:

int netif_rx(struct sk_buff *skb)
{
	int ret;

	WARN_ON(skb->_skb_refdst & SKB_DST_NOREF);


I'll keep this in our kernels so when we do .1q vlans and other
sorts of testing we'll get some more coverage.

Thanks,
Ben
diff mbox

Patch

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 81f8f9e..fcbf680 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -77,6 +77,11 @@  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 
 	skb_orphan(skb);
 
+	/* Before queueing this packet to netif_rx(),
+	 * make sure dst is refcounted.
+	 */
+	skb_dst_force(skb);
+
 	skb->protocol = eth_type_trans(skb, dev);
 
 	/* it's OK to use per_cpu_ptr() because BHs are off */