Message ID | 1285664747.2607.48.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
NETIF_F_LLTX is marked as deprecated: include/linux/netdevice.h: #define NETIF_F_LLTX 4096 /* LockLess TX - deprecated. Please */ /* do not use LLTX in new drivers */ Is it right to use it? Regards, Nicolas Eric Dumazet wrote: > GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX > > Note: If tunnels are created with the "oseq" option, LLTX is not > enabled : > > Even using an atomic_t o_seq, we would increase chance for packets being > out of order at receiver. > > Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending > 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) > > Before patch : > real 3m0.094s > user 0m9.365s > sys 47m50.103s > > After patch: > real 0m29.756s > user 0m11.097s > sys 7m33.012s > > Last problem to solve is the contention on dst : > > > 38660.00 21.4% __ip_route_output_key vmlinux > 20786.00 11.5% dst_release vmlinux > 14191.00 7.8% __xfrm_lookup vmlinux > 12410.00 6.9% ip_finish_output vmlinux > 4540.00 2.5% ip_push_pending_frames vmlinux > 4427.00 2.4% ip_append_data vmlinux > 4265.00 2.4% __alloc_skb vmlinux > 4140.00 2.3% __ip_local_out vmlinux > 3991.00 2.2% dev_queue_xmit vmlinux > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/ip_gre.c | 4 ++++ > 1 files changed, 4 insertions(+) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index a1b5d5e..035db63 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla > if (!tb[IFLA_MTU]) > dev->mtu = mtu; > > + /* Can use a lockless transmit, unless we generate output sequences */ > + if (!(nt->parms.o_flags & GRE_SEQ)) > + dev->features |= NETIF_F_LLTX; > + > err = register_netdevice(dev); > if (err) > goto out; > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 29 septembre 2010 à 10:10 +0200, Nicolas Dichtel a écrit : > NETIF_F_LLTX is marked as deprecated: > > include/linux/netdevice.h: > #define NETIF_F_LLTX 4096 /* LockLess TX - deprecated. > Please */ > /* do not use LLTX in new > drivers */ > > Is it right to use it? > In this particular case (and drivers/net/loopback.c), yes. This is the only way to avoid the locking in core network (net/core/dev.c) What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the ndo_xmit() driver method. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 29 Sep 2010 10:18:08 +0200 > What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the > ndo_xmit() driver method. Also we've discussed recently to do away with the NETIF_F_LLTX flag entirely for queue-less devices such as loopback and sw tunnels. Then all will be left are the truly "deprecated" cases Eric mentions. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 28, 2010 at 2:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX > > Note: If tunnels are created with the "oseq" option, LLTX is not > enabled : > > Even using an atomic_t o_seq, we would increase chance for packets being > out of order at receiver. > > Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending > 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) > > Before patch : > real 3m0.094s > user 0m9.365s > sys 47m50.103s > > After patch: > real 0m29.756s > user 0m11.097s > sys 7m33.012s > > Last problem to solve is the contention on dst : > > > 38660.00 21.4% __ip_route_output_key vmlinux > 20786.00 11.5% dst_release vmlinux > 14191.00 7.8% __xfrm_lookup vmlinux > 12410.00 6.9% ip_finish_output vmlinux > 4540.00 2.5% ip_push_pending_frames vmlinux > 4427.00 2.4% ip_append_data vmlinux > 4265.00 2.4% __alloc_skb vmlinux > 4140.00 2.3% __ip_local_out vmlinux > 3991.00 2.2% dev_queue_xmit vmlinux > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> The tx lock has another use here: to break local loops. With this change, a misconfigured tunnel can bring down the machine with a stack overflow. There are clearly other ways to fix this that don't require a lock that restricts parallelism, such as a loop counter, but that's the way it is now. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit : > The tx lock has another use here: to break local loops. With this > change, a misconfigured tunnel can bring down the machine with a stack > overflow. There are clearly other ways to fix this that don't require > a lock that restricts parallelism, such as a loop counter, but that's > the way it is now. Thats a very good point ! We could use a loop counter in the skb, but this use a bit of ram, or percpu counters in tunnel drivers, to avoid a given level of recursion. /* this should be shared by all tunnels */ DEFINE_PER_CPU(tunnel_xmit_count); tunnel_xmit() { if (__this_cpu_read(tunnel_xmit_count) >= LIMIT) goto tx_error; __this_cpu_inc(tunnel_xmit_count); .... __IPTUNNEL_XMIT(tstats, &dev->stats); __this_cpu_dec(tunnel_xmit_count), return NETDEV_TX_OK; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 29, 2010 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit : > >> The tx lock has another use here: to break local loops. With this >> change, a misconfigured tunnel can bring down the machine with a stack >> overflow. There are clearly other ways to fix this that don't require >> a lock that restricts parallelism, such as a loop counter, but that's >> the way it is now. > > Thats a very good point ! > > We could use a loop counter in the skb, but this use a bit of ram, > or percpu counters in tunnel drivers, to avoid a given level of > recursion. I agree, a percpu loop counter is the way to go. I implemented something nearly identical to deal with this problem in Open vSwitch. There are actually a number of optimizations in the Open vSwitch tunneling stack that you may be interested in taking a look at as well (for example, it has had this sort of lockless transmit for a while): http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/tunnel.c;hb=HEAD The plan is to upstream all of the kernel code (or at least offer it), just trying to get the userspace interfaces settled down first. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index a1b5d5e..035db63 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla if (!tb[IFLA_MTU]) dev->mtu = mtu; + /* Can use a lockless transmit, unless we generate output sequences */ + if (!(nt->parms.o_flags & GRE_SEQ)) + dev->features |= NETIF_F_LLTX; + err = register_netdevice(dev); if (err) goto out;
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX Note: If tunnels are created with the "oseq" option, LLTX is not enabled : Even using an atomic_t o_seq, we would increase chance for packets being out of order at receiver. Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) Before patch : real 3m0.094s user 0m9.365s sys 47m50.103s After patch: real 0m29.756s user 0m11.097s sys 7m33.012s Last problem to solve is the contention on dst : 38660.00 21.4% __ip_route_output_key vmlinux 20786.00 11.5% dst_release vmlinux 14191.00 7.8% __xfrm_lookup vmlinux 12410.00 6.9% ip_finish_output vmlinux 4540.00 2.5% ip_push_pending_frames vmlinux 4427.00 2.4% ip_append_data vmlinux 4265.00 2.4% __alloc_skb vmlinux 4140.00 2.3% __ip_local_out vmlinux 3991.00 2.2% dev_queue_xmit vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/ip_gre.c | 4 ++++ 1 files changed, 4 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html