diff mbox

[net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs

Message ID 1387910346.12212.15.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 24, 2013, 6:39 p.m. UTC
On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote:

> The idea is good, but without the destructor there is nothing to keep
> the UDP socket from being destroyed while packet is being sent on another
> CPU.

I see no requirement of holding a reference on the vxlan UDP socket in
transmit path.

At the time vxlan_set_owner() is called, nothing requires access to the
socket anymore. If you believe its needed, then its already too late.

Sathya, your patch is a step in the right direction, but the skb_clone()
thing should be done a bit differently.

The trick would be to avoid the skb_clone() for the last
vxlan_xmit_one() call.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stephen Hemminger Dec. 24, 2013, 11:05 p.m. UTC | #1
On Tue, 24 Dec 2013 10:39:06 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote:
> 
> > The idea is good, but without the destructor there is nothing to keep
> > the UDP socket from being destroyed while packet is being sent on another
> > CPU.
> 
> I see no requirement of holding a reference on the vxlan UDP socket in
> transmit path.
> 
> At the time vxlan_set_owner() is called, nothing requires access to the
> socket anymore. If you believe its needed, then its already too late.
> 
> Sathya, your patch is a step in the right direction, but the skb_clone()
> thing should be done a bit differently.
> 
> The trick would be to avoid the skb_clone() for the last
> vxlan_xmit_one() call.
> 

That code was cloned from L2TP, I assumed that that it was required
to hold reference when calling UDP, since that is what socket layer does.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 31, 2013, 6:56 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 Dec 2013 10:39:06 -0800

> On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote:
> 
>> The idea is good, but without the destructor there is nothing to keep
>> the UDP socket from being destroyed while packet is being sent on another
>> CPU.
> 
> I see no requirement of holding a reference on the vxlan UDP socket in
> transmit path.

I'm trying to figure out how leaving a dangling socket attached to
skb->sk, as in this original patch, can be OK.

If skb->sk is there, anyone can reference it, and meanwhile anyone can
destroy and free it.

That's Stephens' objection.

Are you saying that we have something that allows this to be valid?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 2, 2014, 5:56 a.m. UTC | #3
On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 24 Dec 2013 10:39:06 -0800
> 
> > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote:
> > 
> >> The idea is good, but without the destructor there is nothing to keep
> >> the UDP socket from being destroyed while packet is being sent on another
> >> CPU.
> > 
> > I see no requirement of holding a reference on the vxlan UDP socket in
> > transmit path.
> 
> I'm trying to figure out how leaving a dangling socket attached to
> skb->sk, as in this original patch, can be OK.

Sorry, I lost track here (vacation time...), what patch are you
referring to ?

> 
> If skb->sk is there, anyone can reference it, and meanwhile anyone can
> destroy and free it.
> 
> That's Stephens' objection.

Not really. Stephen told us he copied code from L2TP.
(But IPIP, GRE tunnels do not do that..._

He thought it was needed to hold a reference on vxlan socket, while it
is not needed for any valid reason. RCU locking is more than enough to
be able to build the encapsulation.

> 
> Are you saying that we have something that allows this to be valid?


Once we pass the tunnel, we can either :

1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP
producer)

2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and
releasing reference on socket sk1)

  Current vxlan code chose 2), but it makes no sense because :

We use skb->sk to 

A) control amount of bytes/packets queued on behalf a socket, but
current vxlan code does the skb->sk transfert without any limit/control
on vxlan socket sk_sndbuf.

B) security puposes (as selinux) or netfilter uses, and I do not think
anything is prepared to handle vxlan stacked case in this area.


If we chose 1), it makes more sense, because each producer will be
effectively limited by the proper sk->sk_sndbuf limit.

And a socket cannot be destroyed anyway as long as at least one skb is
in flight (with skb->sk set to this socket)

Really, I do not think vxlan should behave in a different way than other
tunnels (GRE, IPIP, SIT, ...), which chose 1) 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 2, 2014, 6:36 a.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 Jan 2014 21:56:46 -0800

> On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 24 Dec 2013 10:39:06 -0800
>> 
>> > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote:
>> > 
>> >> The idea is good, but without the destructor there is nothing to keep
>> >> the UDP socket from being destroyed while packet is being sent on another
>> >> CPU.
>> > 
>> > I see no requirement of holding a reference on the vxlan UDP socket in
>> > transmit path.
>> 
>> I'm trying to figure out how leaving a dangling socket attached to
>> skb->sk, as in this original patch, can be OK.
> 
> Sorry, I lost track here (vacation time...), what patch are you
> referring to ?

http://patchwork.ozlabs.org/patch/304767/

> Once we pass the tunnel, we can either :
> 
> 1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP
> producer)
> 
> 2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and
> releasing reference on socket sk1)
 ...
> If we chose 1), it makes more sense, because each producer will be
> effectively limited by the proper sk->sk_sndbuf limit.
> 
> And a socket cannot be destroyed anyway as long as at least one skb is
> in flight (with skb->sk set to this socket)
> 
> Really, I do not think vxlan should behave in a different way than other
> tunnels (GRE, IPIP, SIT, ...), which chose 1) 

Ok, agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 249e01c5600c..3a1e2cee7c6e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1366,20 +1366,6 @@  static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
-static void vxlan_sock_put(struct sk_buff *skb)
-{
-	sock_put(skb->sk);
-}
-
-/* On transmit, associate with the tunnel socket */
-static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb)
-{
-	skb_orphan(skb);
-	sock_hold(sk);
-	skb->sk = sk;
-	skb->destructor = vxlan_sock_put;
-}
-
 /* Compute source port for outgoing packet
  *   first choice to use L4 flow hash since it will spread
  *     better and maybe available from hardware
@@ -1499,8 +1485,6 @@  static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 	ip6h->daddr	  = *daddr;
 	ip6h->saddr	  = *saddr;
 
-	vxlan_set_owner(vs->sock->sk, skb);
-
 	err = handle_offloads(skb);
 	if (err)
 		return err;
@@ -1557,8 +1541,6 @@  int vxlan_xmit_skb(struct vxlan_sock *vs,
 	uh->len = htons(skb->len);
 	uh->check = 0;
 
-	vxlan_set_owner(vs->sock->sk, skb);
-
 	err = handle_offloads(skb);
 	if (err)
 		return err;
@@ -1770,7 +1752,7 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct ethhdr *eth;
 	bool did_rsc = false;
-	struct vxlan_rdst *rdst;
+	struct vxlan_rdst *rdst, *fdst = NULL;
 	struct vxlan_fdb *f;
 
 	skb_reset_mac_header(skb);
@@ -1812,7 +1794,7 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 				vxlan_fdb_miss(vxlan, eth->h_dest);
 
 			dev->stats.tx_dropped++;
-			dev_kfree_skb(skb);
+			kfree_skb(skb);
 			return NETDEV_TX_OK;
 		}
 	}
@@ -1820,12 +1802,19 @@  static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	list_for_each_entry_rcu(rdst, &f->remotes, list) {
 		struct sk_buff *skb1;
 
+		if (!fdst) {
+			fdst = rdst;
+			continue;
+		}
 		skb1 = skb_clone(skb, GFP_ATOMIC);
 		if (skb1)
 			vxlan_xmit_one(skb1, dev, rdst, did_rsc);
 	}
 
-	dev_kfree_skb(skb);
+	if (fdst)
+		vxlan_xmit_one(skb, dev, fdst, did_rsc);
+	else
+		kfree_skb(skb);
 	return NETDEV_TX_OK;
 }