diff mbox

[net-next] vxlan: keep original skb ownership

Message ID 1389030871.12212.203.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 6, 2014, 5:54 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Sathya Perla posted a patch trying to address following problem :

<quote>
 The vxlan driver sets itself as the socket owner for all the TX flows
 it encapsulates (using vxlan_set_owner()) and assigns it's own skb
 destructor. This causes all tunneled traffic to land up on only one TXQ
 as all encapsulated skbs refer to the vxlan socket and not the original
 socket.  Also, the vxlan skb destructor breaks some functionality for
 tunneled traffic like wmem accounting and as TCP small queues and
 FQ/pacing packet scheduler.
</quote>

I reworked Sathya patch and added some explanations.

vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair
and gain better drop monitor accuracy, by calling kfree_skb() when
appropriate.

The UDP socket used by vxlan to perform encapsulation of xmit packets
do not need to be alive while packets leave vxlan code. Its better
to keep original socket ownership to get proper feedback from qdisc and
NIC layers.

We use skb->sk to 

A) control amount of bytes/packets queued on behalf of a socket, but
prior vxlan code did the skb->sk transfert without any limit/control
on vxlan socket sk_sndbuf.

B) security purposes (as selinux) or netfilter uses, and I do not think
anything is prepared to handle vxlan stacked case in this area.

By not changing ownership, vxlan tunnels behave like other tunnels.
As Stephen mentioned, we might do the same change in L2TP.

Reported-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |   31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Jan. 6, 2014, 9:41 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Jan 2014 09:54:31 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Sathya Perla posted a patch trying to address following problem :
> 
> <quote>
>  The vxlan driver sets itself as the socket owner for all the TX flows
>  it encapsulates (using vxlan_set_owner()) and assigns it's own skb
>  destructor. This causes all tunneled traffic to land up on only one TXQ
>  as all encapsulated skbs refer to the vxlan socket and not the original
>  socket.  Also, the vxlan skb destructor breaks some functionality for
>  tunneled traffic like wmem accounting and as TCP small queues and
>  FQ/pacing packet scheduler.
> </quote>
> 
> I reworked Sathya patch and added some explanations.
> 
> vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair
> and gain better drop monitor accuracy, by calling kfree_skb() when
> appropriate.
> 
> The UDP socket used by vxlan to perform encapsulation of xmit packets
> do not need to be alive while packets leave vxlan code. Its better
> to keep original socket ownership to get proper feedback from qdisc and
> NIC layers.
> 
> We use skb->sk to 
> 
> A) control amount of bytes/packets queued on behalf of a socket, but
> prior vxlan code did the skb->sk transfert without any limit/control
> on vxlan socket sk_sndbuf.
> 
> B) security purposes (as selinux) or netfilter uses, and I do not think
> anything is prepared to handle vxlan stacked case in this area.
> 
> By not changing ownership, vxlan tunnels behave like other tunnels.
> As Stephen mentioned, we might do the same change in L2TP.
> 
> Reported-by: Sathya Perla <sathya.perla@emulex.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks a lot Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 474a99ed0222..ab2e92eec949 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1381,20 +1381,6 @@  static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
-static void vxlan_sock_put(struct sk_buff *skb)
-{
-	sock_put(skb->sk);
-}
-
-/* On transmit, associate with the tunnel socket */
-static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb)
-{
-	skb_orphan(skb);
-	sock_hold(sk);
-	skb->sk = sk;
-	skb->destructor = vxlan_sock_put;
-}
-
 /* Compute source port for outgoing packet
  *   first choice to use L4 flow hash since it will spread
  *     better and maybe available from hardware
@@ -1514,8 +1500,6 @@  static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 	ip6h->daddr	  = *daddr;
 	ip6h->saddr	  = *saddr;
 
-	vxlan_set_owner(vs->sock->sk, skb);
-
 	err = handle_offloads(skb);
 	if (err)
 		return err;
@@ -1572,8 +1556,6 @@  int vxlan_xmit_skb(struct vxlan_sock *vs,
 	uh->len = htons(skb->len);
 	uh->check = 0;
 
-	vxlan_set_owner(vs->sock->sk, skb);
-
 	err = handle_offloads(skb);
 	if (err)
 		return err;
@@ -1786,7 +1768,7 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct ethhdr *eth;
 	bool did_rsc = false;
-	struct vxlan_rdst *rdst;
+	struct vxlan_rdst *rdst, *fdst = NULL;
 	struct vxlan_fdb *f;
 
 	skb_reset_mac_header(skb);
@@ -1828,7 +1810,7 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 				vxlan_fdb_miss(vxlan, eth->h_dest);
 
 			dev->stats.tx_dropped++;
-			dev_kfree_skb(skb);
+			kfree_skb(skb);
 			return NETDEV_TX_OK;
 		}
 	}
@@ -1836,12 +1818,19 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	list_for_each_entry_rcu(rdst, &f->remotes, list) {
 		struct sk_buff *skb1;
 
+		if (!fdst) {
+			fdst = rdst;
+			continue;
+		}
 		skb1 = skb_clone(skb, GFP_ATOMIC);
 		if (skb1)
 			vxlan_xmit_one(skb1, dev, rdst, did_rsc);
 	}
 
-	dev_kfree_skb(skb);
+	if (fdst)
+		vxlan_xmit_one(skb, dev, fdst, did_rsc);
+	else
+		kfree_skb(skb);
 	return NETDEV_TX_OK;
 }