diff mbox

[net-next,1/3] net: make default tx_queue_len configurable

Message ID 1438203103-27013-2-git-send-email-phil@nwl.cc
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Phil Sutter July 29, 2015, 8:51 p.m. UTC
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 net/Kconfig        | 12 ++++++++++++
 net/ethernet/eth.c |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

Comments

Florian Westphal July 29, 2015, 9:06 p.m. UTC | #1
Phil Sutter <phil@nwl.cc> wrote:
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  net/Kconfig        | 12 ++++++++++++
>  net/ethernet/eth.c |  2 +-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/Kconfig b/net/Kconfig
> index 7021c1b..21c164f 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -48,6 +48,18 @@ config COMPAT_NETLINK_MESSAGES
>  config NET_INGRESS
>  	bool
>  
> +config DEFAULT_TX_QUEUE_LEN
> +	prompt "Default TX queue length (in packets)" if EXPERT
> +	int
> +	default 1000	# Ethernet wants good queues
> +	help
> +	  Set the default value of tx_queue_len for newly created network
> +	  interfaces. It is used by queueing disciplines to determine how many
> +	  packets to keep in backlog before starting to drop new ones.
> +
> +	  The default value of 1000 packets is there for a very long time and
> +	  in combination with GSO way too big.
> +

I can't see how this could be used in a meaningful way.

No distro is going to touch this.

I don't think sysctl value would help either.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Sutter July 29, 2015, 9:34 p.m. UTC | #2
On Wed, Jul 29, 2015 at 11:06:18PM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
[...]
> > +config DEFAULT_TX_QUEUE_LEN
> > +	prompt "Default TX queue length (in packets)" if EXPERT
> > +	int
> > +	default 1000	# Ethernet wants good queues
> > +	help
> > +	  Set the default value of tx_queue_len for newly created network
> > +	  interfaces. It is used by queueing disciplines to determine how many
> > +	  packets to keep in backlog before starting to drop new ones.
> > +
> > +	  The default value of 1000 packets is there for a very long time and
> > +	  in combination with GSO way too big.
> > +
> 
> I can't see how this could be used in a meaningful way.
> 
> No distro is going to touch this.
> 
> I don't think sysctl value would help either.

I just didn't want to introduce yet another magic value assignment. It's
merely a #define with a little flexibility and a subtle note that the
default should be changed attached.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 29, 2015, 9:37 p.m. UTC | #3
From: Phil Sutter <phil@nwl.cc>
Date: Wed, 29 Jul 2015 23:34:28 +0200

> On Wed, Jul 29, 2015 at 11:06:18PM +0200, Florian Westphal wrote:
>> Phil Sutter <phil@nwl.cc> wrote:
> [...]
>> > +config DEFAULT_TX_QUEUE_LEN
>> > +	prompt "Default TX queue length (in packets)" if EXPERT
>> > +	int
>> > +	default 1000	# Ethernet wants good queues
>> > +	help
>> > +	  Set the default value of tx_queue_len for newly created network
>> > +	  interfaces. It is used by queueing disciplines to determine how many
>> > +	  packets to keep in backlog before starting to drop new ones.
>> > +
>> > +	  The default value of 1000 packets is there for a very long time and
>> > +	  in combination with GSO way too big.
>> > +
>> 
>> I can't see how this could be used in a meaningful way.
>> 
>> No distro is going to touch this.
>> 
>> I don't think sysctl value would help either.
> 
> I just didn't want to introduce yet another magic value assignment. It's
> merely a #define with a little flexibility and a subtle note that the
> default should be changed attached.

Like others have mentioned, fix the _REAL_ issue.

Which is that there are devices (virtual or whatever) which don't want
a qdisc attached no matter what.  Flag those devices as such and
adjust the qdisc attachment logic to check that new flag.

Anything is better than hacking the queue len.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Aug. 11, 2015, 3:48 p.m. UTC | #4
On Wed, 29 Jul 2015 14:37:31 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
[...]
> Which is that there are devices (virtual or whatever) which don't want
> a qdisc attached no matter what.  Flag those devices as such and
> adjust the qdisc attachment logic to check that new flag.

I agree on the approach DaveM are suggesting.

But virtual devices must support getting a qdisc attached.  I know of
many companies depending on this behavior.   Some times people just get
hit by this "strange" zero len issues when they happen to use
pfifo_fast as leaf node.


> Anything is better than hacking the queue len.

The hole problem comes from the double meaning of the queue len. e.g.
that value 0 have special meaning, but only during assigning the
default qdisc.  And pfifo_fast will use queue len zero if assigned.

(proposed solution:)

As DaveM also suggested, I would likely use a device flag to indicate
the device does not require any qdisc, and not assign any qdisc
(actually "noqueue") in case the default qdisc is chosen for this
device.

This should solve the problem for veth. And then we should cleanup all
the virtual devices, adding this flag and changing the
dev->tx_queue_len to the default value (e.g. remove setting it to zero).
Phil Sutter Aug. 11, 2015, 4:23 p.m. UTC | #5
On Tue, Aug 11, 2015 at 05:48:07PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Wed, 29 Jul 2015 14:37:31 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> [...]
> > Which is that there are devices (virtual or whatever) which don't want
> > a qdisc attached no matter what.  Flag those devices as such and
> > adjust the qdisc attachment logic to check that new flag.
> 
> I agree on the approach DaveM are suggesting.
> 
> But virtual devices must support getting a qdisc attached.  I know of
> many companies depending on this behavior.   Some times people just get
> hit by this "strange" zero len issues when they happen to use
> pfifo_fast as leaf node.

That was just how I (purposely mis-)interpreted David's suggestion.

> > Anything is better than hacking the queue len.
> 
> The hole problem comes from the double meaning of the queue len. e.g.
> that value 0 have special meaning, but only during assigning the
> default qdisc.  And pfifo_fast will use queue len zero if assigned.
> 
> (proposed solution:)
> 
> As DaveM also suggested, I would likely use a device flag to indicate
> the device does not require any qdisc, and not assign any qdisc
> (actually "noqueue") in case the default qdisc is chosen for this
> device.
> 
> This should solve the problem for veth. And then we should cleanup all
> the virtual devices, adding this flag and changing the
> dev->tx_queue_len to the default value (e.g. remove setting it to zero).

I have an unfinished solution in the oven, but being kept busy with
other things for now. The action plan is as follows:

1) Introduce IFF_NO_QUEUE net_device->priv_flag.
2) Have attach_default_qdiscs() and attach_one_default_qdisc() treat
   IFF_NO_QUEUE as alternative to tx_queue_len == 0.
3) Add warning to register_netdevice() if tx_queue_len == 0.
4) Change virtual NIC drivers to set IFF_NO_QUEUE and leave tx_queue_len
   alone.
5) Eventually drop all special handling for tx_queue_len == 0.

I am currently somewhere in 2) and need to implement 4) for veth as PoC to
check if 2) suffices in all situations we want. Not sure if 3) is
desireable at all or if there are valid cases for a literally zero
length TX queue length.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 12, 2015, 1:13 a.m. UTC | #6
On Tue, Aug 11, 2015 at 06:23:35PM +0200, Phil Sutter wrote:
> 
> I have an unfinished solution in the oven, but being kept busy with
> other things for now. The action plan is as follows:
> 
> 1) Introduce IFF_NO_QUEUE net_device->priv_flag.
> 2) Have attach_default_qdiscs() and attach_one_default_qdisc() treat
>    IFF_NO_QUEUE as alternative to tx_queue_len == 0.
> 3) Add warning to register_netdevice() if tx_queue_len == 0.
> 4) Change virtual NIC drivers to set IFF_NO_QUEUE and leave tx_queue_len
>    alone.
> 5) Eventually drop all special handling for tx_queue_len == 0.
> 
> I am currently somewhere in 2) and need to implement 4) for veth as PoC to
> check if 2) suffices in all situations we want. Not sure if 3) is
> desireable at all or if there are valid cases for a literally zero
> length TX queue length.

sounds like you want to change default qdisc from pfifo_fast to noqueue
for veth, right?
In general 'changing the default' may be an acceptable thing, but then
it needs to strongly justified. How much performance does it bring?
Also why introduce the flag? Why not just add 'tx_queue_len = 0;' 
to veth_setup() like the whole bunch of devices do?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 12, 2015, 2:55 p.m. UTC | #7
On Tue, 2015-08-11 at 18:13 -0700, Alexei Starovoitov wrote:
> Also why introduce the flag? Why not just add 'tx_queue_len = 0;' 
> to veth_setup() like the whole bunch of devices do?

Sigh.

Because some people install htb or pfifo on veth, leaving tx_queue_len
unchanged.

If you install htb while tx_queue_len is 0, pfifo created on htb classe
can only queue one packet.


static int fifo_init(struct Qdisc *sch, struct nlattr *opt)
{
        bool bypass;
        bool is_bfifo = sch->ops == &bfifo_qdisc_ops;

        if (opt == NULL) {
                u32 limit = qdisc_dev(sch)->tx_queue_len ? : 1;

                if (is_bfifo)
                        limit *= psched_mtu(qdisc_dev(sch));

                sch->limit = limit;



Changing veth txqueuelen is too late.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Sutter Aug. 13, 2015, 1:13 a.m. UTC | #8
On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 11, 2015 at 06:23:35PM +0200, Phil Sutter wrote:
> > 
> > I have an unfinished solution in the oven, but being kept busy with
> > other things for now. The action plan is as follows:
> > 
> > 1) Introduce IFF_NO_QUEUE net_device->priv_flag.
> > 2) Have attach_default_qdiscs() and attach_one_default_qdisc() treat
> >    IFF_NO_QUEUE as alternative to tx_queue_len == 0.
> > 3) Add warning to register_netdevice() if tx_queue_len == 0.
> > 4) Change virtual NIC drivers to set IFF_NO_QUEUE and leave tx_queue_len
> >    alone.
> > 5) Eventually drop all special handling for tx_queue_len == 0.
> > 
> > I am currently somewhere in 2) and need to implement 4) for veth as PoC to
> > check if 2) suffices in all situations we want. Not sure if 3) is
> > desireable at all or if there are valid cases for a literally zero
> > length TX queue length.
> 
> sounds like you want to change default qdisc from pfifo_fast to noqueue
> for veth, right?
> In general 'changing the default' may be an acceptable thing, but then
> it needs to strongly justified. How much performance does it bring?
> Also why introduce the flag? Why not just add 'tx_queue_len = 0;' 
> to veth_setup() like the whole bunch of devices do?

A quick test on my local VM with veth and netperf (netserver and veth
peer in different netns) I see an increase of about 5% of throughput
when using noqueue instead of the default pfifo_fast.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Aug. 13, 2015, 1:10 p.m. UTC | #9
On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter <phil@nwl.cc> wrote:

> On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote:

> > In general 'changing the default' may be an acceptable thing, but then
> > it needs to strongly justified. How much performance does it bring?
> 
> A quick test on my local VM with veth and netperf (netserver and veth
> peer in different netns) I see an increase of about 5% of throughput
> when using noqueue instead of the default pfifo_fast.

Good that you can show 5% improvement with a single netperf flow.  We
are saving approx 6 atomic operations avoiding the qdisc code path.

This fixes a scalability issue with veth. Thus, the real performance
boost will happen with multiple flows and multiple CPU cores in
action.  You can try with a multi core VM and use super_netperf.

https://github.com/borkmann/stuff/blob/master/super_netperf
Phil Sutter Aug. 13, 2015, 3:06 p.m. UTC | #10
On Thu, Aug 13, 2015 at 03:10:33PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter <phil@nwl.cc> wrote:
> 
> > On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote:
> 
> > > In general 'changing the default' may be an acceptable thing, but then
> > > it needs to strongly justified. How much performance does it bring?
> > 
> > A quick test on my local VM with veth and netperf (netserver and veth
> > peer in different netns) I see an increase of about 5% of throughput
> > when using noqueue instead of the default pfifo_fast.
> 
> Good that you can show 5% improvement with a single netperf flow.  We
> are saving approx 6 atomic operations avoiding the qdisc code path.
> 
> This fixes a scalability issue with veth. Thus, the real performance
> boost will happen with multiple flows and multiple CPU cores in
> action.  You can try with a multi core VM and use super_netperf.
> 
> https://github.com/borkmann/stuff/blob/master/super_netperf

I actually used that on my VM as well, but the difference between a
single and ten streams in parallel was negligible. In order to avoid
tampering the results, I tested again on a physical system with four
cores, ran each benchmark ten times and built an average over the
results. This showed an increase in throughput of about 35% with a
single stream and about 10% with ten streams in parallel. Not sure
though why the improvement is bigger in the first case if there really
is a scalability problem as you say.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/Kconfig b/net/Kconfig
index 7021c1b..21c164f 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -48,6 +48,18 @@  config COMPAT_NETLINK_MESSAGES
 config NET_INGRESS
 	bool
 
+config DEFAULT_TX_QUEUE_LEN
+	prompt "Default TX queue length (in packets)" if EXPERT
+	int
+	default 1000	# Ethernet wants good queues
+	help
+	  Set the default value of tx_queue_len for newly created network
+	  interfaces. It is used by queueing disciplines to determine how many
+	  packets to keep in backlog before starting to drop new ones.
+
+	  The default value of 1000 packets is there for a very long time and
+	  in combination with GSO way too big.
+
 menu "Networking options"
 
 source "net/packet/Kconfig"
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 77e0f0e..b778586 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -355,7 +355,7 @@  void ether_setup(struct net_device *dev)
 	dev->hard_header_len 	= ETH_HLEN;
 	dev->mtu		= ETH_DATA_LEN;
 	dev->addr_len		= ETH_ALEN;
-	dev->tx_queue_len	= 1000;	/* Ethernet wants good queues */
+	dev->tx_queue_len	= CONFIG_DEFAULT_TX_QUEUE_LEN;
 	dev->flags		= IFF_BROADCAST|IFF_MULTICAST;
 	dev->priv_flags		|= IFF_TX_SKB_SHARING;