Message ID | 1387910346.12212.15.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 24 Dec 2013 10:39:06 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > > > The idea is good, but without the destructor there is nothing to keep > > the UDP socket from being destroyed while packet is being sent on another > > CPU. > > I see no requirement of holding a reference on the vxlan UDP socket in > transmit path. > > At the time vxlan_set_owner() is called, nothing requires access to the > socket anymore. If you believe its needed, then its already too late. > > Sathya, your patch is a step in the right direction, but the skb_clone() > thing should be done a bit differently. > > The trick would be to avoid the skb_clone() for the last > vxlan_xmit_one() call. > That code was cloned from L2TP, I assumed that that it was required to hold reference when calling UDP, since that is what socket layer does. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 24 Dec 2013 10:39:06 -0800 > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > >> The idea is good, but without the destructor there is nothing to keep >> the UDP socket from being destroyed while packet is being sent on another >> CPU. > > I see no requirement of holding a reference on the vxlan UDP socket in > transmit path. I'm trying to figure out how leaving a dangling socket attached to skb->sk, as in this original patch, can be OK. If skb->sk is there, anyone can reference it, and meanwhile anyone can destroy and free it. That's Stephens' objection. Are you saying that we have something that allows this to be valid? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 24 Dec 2013 10:39:06 -0800 > > > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > > > >> The idea is good, but without the destructor there is nothing to keep > >> the UDP socket from being destroyed while packet is being sent on another > >> CPU. > > > > I see no requirement of holding a reference on the vxlan UDP socket in > > transmit path. > > I'm trying to figure out how leaving a dangling socket attached to > skb->sk, as in this original patch, can be OK. Sorry, I lost track here (vacation time...), what patch are you referring to ? > > If skb->sk is there, anyone can reference it, and meanwhile anyone can > destroy and free it. > > That's Stephens' objection. Not really. Stephen told us he copied code from L2TP. (But IPIP, GRE tunnels do not do that..._ He thought it was needed to hold a reference on vxlan socket, while it is not needed for any valid reason. RCU locking is more than enough to be able to build the encapsulation. > > Are you saying that we have something that allows this to be valid? Once we pass the tunnel, we can either : 1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP producer) 2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and releasing reference on socket sk1) Current vxlan code chose 2), but it makes no sense because : We use skb->sk to A) control amount of bytes/packets queued on behalf a socket, but current vxlan code does the skb->sk transfert without any limit/control on vxlan socket sk_sndbuf. B) security puposes (as selinux) or netfilter uses, and I do not think anything is prepared to handle vxlan stacked case in this area. If we chose 1), it makes more sense, because each producer will be effectively limited by the proper sk->sk_sndbuf limit. And a socket cannot be destroyed anyway as long as at least one skb is in flight (with skb->sk set to this socket) Really, I do not think vxlan should behave in a different way than other tunnels (GRE, IPIP, SIT, ...), which chose 1) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 01 Jan 2014 21:56:46 -0800 > On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Tue, 24 Dec 2013 10:39:06 -0800 >> >> > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: >> > >> >> The idea is good, but without the destructor there is nothing to keep >> >> the UDP socket from being destroyed while packet is being sent on another >> >> CPU. >> > >> > I see no requirement of holding a reference on the vxlan UDP socket in >> > transmit path. >> >> I'm trying to figure out how leaving a dangling socket attached to >> skb->sk, as in this original patch, can be OK. > > Sorry, I lost track here (vacation time...), what patch are you > referring to ? http://patchwork.ozlabs.org/patch/304767/ > Once we pass the tunnel, we can either : > > 1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP > producer) > > 2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and > releasing reference on socket sk1) ... > If we chose 1), it makes more sense, because each producer will be > effectively limited by the proper sk->sk_sndbuf limit. > > And a socket cannot be destroyed anyway as long as at least one skb is > in flight (with skb->sk set to this socket) > > Really, I do not think vxlan should behave in a different way than other > tunnels (GRE, IPIP, SIT, ...), which chose 1) Ok, agreed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 249e01c5600c..3a1e2cee7c6e 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1366,20 +1366,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; } -static void vxlan_sock_put(struct sk_buff *skb) -{ - sock_put(skb->sk); -} - -/* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb) -{ - skb_orphan(skb); - sock_hold(sk); - skb->sk = sk; - skb->destructor = vxlan_sock_put; -} - /* Compute source port for outgoing packet * first choice to use L4 flow hash since it will spread * better and maybe available from hardware @@ -1499,8 +1485,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, ip6h->daddr = *daddr; ip6h->saddr = *saddr; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1557,8 +1541,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, uh->len = htons(skb->len); uh->check = 0; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1770,7 +1752,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst; + struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; skb_reset_mac_header(skb); @@ -1812,7 +1794,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) vxlan_fdb_miss(vxlan, eth->h_dest); dev->stats.tx_dropped++; - dev_kfree_skb(skb); + kfree_skb(skb); return NETDEV_TX_OK; } } @@ -1820,12 +1802,19 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) list_for_each_entry_rcu(rdst, &f->remotes, list) { struct sk_buff *skb1; + if (!fdst) { + fdst = rdst; + continue; + } skb1 = skb_clone(skb, GFP_ATOMIC); if (skb1) vxlan_xmit_one(skb1, dev, rdst, did_rsc); } - dev_kfree_skb(skb); + if (fdst) + vxlan_xmit_one(skb, dev, fdst, did_rsc); + else + kfree_skb(skb); return NETDEV_TX_OK; }