diff mbox

NULL pointer dereference panic in stable (2.6.33.2), amd64

Message ID 1271056697.16881.7.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 12, 2010, 7:18 a.m. UTC
Le lundi 12 avril 2010 à 08:01 +0200, Eric Dumazet a écrit :

> Problem is when you reset sk->sk_tx_queue_mapping at the very moment
> route (or destination) changes, we might have old packets queued in tx
> queues, of the old ethernet device (eth0 : multi queue compatable)
> 
> 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
> is rebuild, it points to a dst_entry referring a new device (eth1 : non
> multiqueue)
> 
> 3) When one old packet finally is transmitted, we do :
> 
> 	queue_index = 1; // any value > 0
> 
> 	if (sk && sk->sk_dst_cache)
> 		sk_tx_queue_set(sk, queue_index); // remember a >0 value
> 
> 4) application does a sendmsg(), enqueues a new skb on eth1
> 
> 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
>    we pick a non existent txq for eth1 device.
> 
> 6) We crash.
> 
> 
> > The following might be better to prove the panic is due to this, since
> > your suggestion will hide a panic that happens somewhat rare (according
> > to Denys):
> > 
> >       if (sk_tx_queue_recorded(sk)) {
> >             queue_index = sk_tx_queue_get(sk);
> > +           queue_index = dev_cap_txqueue(dev, queue_index);
> >       } else {
> > 
> 
> Sure, but I thought I was clear enough to prove this commit was wrong,
> and we have to find a fix.

Here the patch I cooked, this is a stable candidate, once tested and
acknowledged.

[PATCH] net: dev_pick_tx() fix

When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.

Bug introduced by commit a4ee3ce3
(net: Use sk_tx_queue_mapping for connected sockets)

Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 12, 2010, 7:36 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> Here the patch I cooked, this is a stable candidate, once tested and
> acknowledged.
> 
> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good Eric.  I'll integrate this into net-2.6 and also queue it
up for -stable in a day or two unless some problem is discovered.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 15, 2010, 6:52 a.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 15, 2010, 8:02 a.m. UTC | #3
Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 12 Apr 2010 09:18:17 +0200
> 
> > [PATCH] net: dev_pick_tx() fix
> > 
> > When dev_pick_tx() caches tx queue_index on a socket, we must check
> > socket dst_entry matches skb one, or risk a crash later, as reported by
> > Denys Fedorysychenko, if old packets are in flight during a route
> > change, involving devices with different number of queues.
> > 
> > Bug introduced by commit a4ee3ce3
> > (net: Use sk_tx_queue_mapping for connected sockets)
> > 
> > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> It looks like Denys is still getting crashes even with this patch
> applied.  And I also think there is some meric to some of Krishna's
> analysis.
> 


> To me it seems to make more sense to validate the SKB's queue against
> the real actual choosen device's range.
> 
> The socket queue index will catch up and eventually become valid
> because the dst reset will invalidate the queue setting, and we'll
> thus recompute it as needed, as Krishna stated.
> 

??? 


> So I'm tossing this patch for now since it doesn't even aparently
> fix the bug.

I am a bit lost here David.

Denys got a crash that we cannot explain yet. He said he has no
multiqueue devices, so obviously my patch cant help him.

But this patch was fixing a real issue, I believe I pointed it twice
already...

I'll try to setup an environment to trigger this bug for real, but this
will take time, my dev machines are not multiqueue.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 15, 2010, 8:26 a.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:02:33 +0200

> Denys got a crash that we cannot explain yet. He said he has no
> multiqueue devices, so obviously my patch cant help him.
> 
> But this patch was fixing a real issue, I believe I pointed it twice
> already...

Ok, I thought your patch was specifically meant to fix Denys's bug.

The confusion comes from the fact that you mention Denys's crash in
your commit message:

--------------------
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.
--------------------

Anyways, I studied your patch once more and read the thread discussion
with Krishna again and your patch looks fine.  I'll apply it to
net-2.6, thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 15, 2010, 8:51 a.m. UTC | #5
Le jeudi 15 avril 2010 à 01:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:02:33 +0200
> 
> > Denys got a crash that we cannot explain yet. He said he has no
> > multiqueue devices, so obviously my patch cant help him.
> > 
> > But this patch was fixing a real issue, I believe I pointed it twice
> > already...
> 
> Ok, I thought your patch was specifically meant to fix Denys's bug.
> 
> The confusion comes from the fact that you mention Denys's crash in
> your commit message:
> 
> --------------------
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> --------------------
> 
> Anyways, I studied your patch once more and read the thread discussion
> with Krishna again and your patch looks fine.  I'll apply it to
> net-2.6, thanks!
> 

In any case, I think there is a fundamental problem with this sk
caching. Because one packet can travel in many stacked devices before
hitting the wire.

(bonding, vlan, ethernet) for example.

Socket cache is meaningfull for one level only...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 15, 2010, 9:06 a.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:51:47 +0200

> In any case, I think there is a fundamental problem with this sk
> caching. Because one packet can travel in many stacked devices before
> hitting the wire.
> 
> (bonding, vlan, ethernet) for example.
> 
> Socket cache is meaningfull for one level only...

We were talking the other day about that 'tun' change to orphan the
SKB on TX, and I mentioned the possibility of just doing this in some
generic location before we give the packet to the device ->xmit()
method.

Such a scheme could help with this problem too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko April 15, 2010, 9:11 a.m. UTC | #7
Btw i have application using tun.

On Thursday 15 April 2010 12:06:19 David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:51:47 +0200
> 
> > In any case, I think there is a fundamental problem with this sk
> > caching. Because one packet can travel in many stacked devices before
> > hitting the wire.
> >
> > (bonding, vlan, ethernet) for example.
> >
> > Socket cache is meaningfull for one level only...
> 
> We were talking the other day about that 'tun' change to orphan the
> SKB on TX, and I mentioned the possibility of just doing this in some
> generic location before we give the packet to the device ->xmit()
> method.
> 
> Such a scheme could help with this problem too.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 1c8a0ce..92584bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1989,8 +1989,12 @@  static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 			if (dev->real_num_tx_queues > 1)
 				queue_index = skb_tx_hash(dev, skb);
 
-			if (sk && sk->sk_dst_cache)
-				sk_tx_queue_set(sk, queue_index);
+			if (sk) {
+				struct dst_entry *dst = rcu_dereference(sk->sk_dst_cache);
+
+				if (dst && skb_dst(skb) == dst)
+					sk_tx_queue_set(sk, queue_index);
+			}
 		}
 	}