[net-next,1/3] mlx4_en: TX ring size default to 1024

Message ID	4F46404D.10509@mellanox.co.il
State	Rejected, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-ID: <4F46404D.10509@mellanox.co.il> Date: Thu, 23 Feb 2012 15:34:05 +0200 From: Yevgeny Petrilin <yevgenyp@mellanox.co.il> User-Agent: Thunderbird 2.0.0.17 (X11/20080914) MIME-Version: 1.0 To: <davem@davemloft.net> CC: <netdev@vger.kernel.org>, <yevgenyp@mellanox.co.il> Subject: [PATCH net-next 1/3] mlx4_en: TX ring size default to 1024 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Yevgeny Petrilin Feb. 23, 2012, 1:34 p.m. UTC

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

David Miller Feb. 23, 2012, 7:45 p.m. UTC | #1

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 23 Feb 2012 15:34:05 +0200

> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

This is rediculious as a default, yes even for 10Gb.

Do you have any idea how high latency is going to be for packets
trying to get into the transmit queue if there are already a
thousand other frames in there?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet Feb. 23, 2012, 7:54 p.m. UTC | #2

Le jeudi 23 février 2012 à 14:45 -0500, David Miller a écrit :
> From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> Date: Thu, 23 Feb 2012 15:34:05 +0200
> 
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> This is rediculious as a default, yes even for 10Gb.
> 
> Do you have any idea how high latency is going to be for packets
> trying to get into the transmit queue if there are already a
> thousand other frames in there?

Before increasing TX ring sizes, a driver should implement BQL as a
prereq.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yevgeny Petrilin Feb. 24, 2012, 7:35 p.m. UTC | #3

> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> This is rediculious as a default, yes even for 10Gb.
> 
> Do you have any idea how high latency is going to be for packets
> trying to get into the transmit queue if there are already a
> thousand other frames in there?

On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
how is it better?
Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.

I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.

Thanks,
Yevgeny
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Feb. 24, 2012, 8:14 p.m. UTC | #4

From: Yevgeny Petrilin <yevgenyp@mellanox.com>
Date: Fri, 24 Feb 2012 19:35:45 +0000

> On the other hand, when having smaller queue with 1000 in-flight
> packets would mean queue would be stopped, how is it better?

It's a thousand times better.

Because if a high priority packet gets queued up it won't have to wait
for 1024 packets to hit the wire before it can go out.

You need to support byte queue limits before you jack things up so high
like this, otherwise high priority packets are absolutely pointless
and unusable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet Feb. 24, 2012, 8:17 p.m. UTC | #5

Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit :
> > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > 
> > This is rediculious as a default, yes even for 10Gb.
> > 
> > Do you have any idea how high latency is going to be for packets
> > trying to get into the transmit queue if there are already a
> > thousand other frames in there?
> 
> On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
> how is it better?

Its better because you can have any kind of Qdisc setup to properly
classify packets, with 100.000 total packets in queues if you wish.

TX ring is a single FIFO, and that is just horrible, especially with big packets...

> Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
> It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.
> 

Really I doubt people using forwarding setups use default qdiscs.

Instead of bigger TX rings, they need appropriate Qdiscs.

> I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.

Thats a shame.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill Fink Feb. 25, 2012, 6:51 a.m. UTC | #6

On Fri, 24 Feb 2012, Eric Dumazet wrote:

> Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit :
> > > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > > 
> > > This is rediculious as a default, yes even for 10Gb.
> > > 
> > > Do you have any idea how high latency is going to be for packets
> > > trying to get into the transmit queue if there are already a
> > > thousand other frames in there?

For a GigE NIC with a typical ring size of 256, the serialization delay
for 256 1500 byte packets is:

	1500*8*256/10^9 = ~3.1 msec

For a 10-GigE NIC with a ring size of 1024, the serialization delay
for 1024 1500 byte packets is:

	1500*8*1024/10^10 = ~1.2 msec

So it's not immediately clear that a ring size of 1024 is unreasonable
for 10-GigE.

It probably boils down to whether the default setting should
be biased more toward low latency applications or high throughput
bulk data applications.  Determining the best happy medium is
best decided by appropriate benchmark testing.  Of course,
anyone can change the settings to suit their purpose, so it's
really just a question of what's best for the "usual" case.

> > On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
> > how is it better?
> 
> Its better because you can have any kind of Qdisc setup to properly
> classify packets, with 100.000 total packets in queues if you wish.

Not everyone wants to deal with the convoluted, arcane, and poorly
documented qdisc machinery, especially with its current limitations
at 10-GigE (or faster) line rates.

> TX ring is a single FIFO, and that is just horrible, especially with big packets...
> 
> > Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
> > It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.
> 
> Really I doubt people using forwarding setups use default qdiscs.

I don't think it's necessarily that uncommon, such as a simple
10-GigE firewall setup.

> Instead of bigger TX rings, they need appropriate Qdiscs.
> 
> > I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.
> 
> Thats a shame.

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet Feb. 25, 2012, 8:22 a.m. UTC | #7

Le samedi 25 février 2012 à 01:51 -0500, Bill Fink a écrit :

> For a GigE NIC with a typical ring size of 256, the serialization delay
> for 256 1500 byte packets is:
> 
> 	1500*8*256/10^9 = ~3.1 msec
> 
> For a 10-GigE NIC with a ring size of 1024, the serialization delay
> for 1024 1500 byte packets is:
> 
> 	1500*8*1024/10^10 = ~1.2 msec
> 
> So it's not immediately clear that a ring size of 1024 is unreasonable
> for 10-GigE.
> 

Its clear when you take into account packets of 64Kbytes (TSO)

With current hardware and state of linux software, you dont need anymore
very big NIC queues since they bring known drawbacks.

It was true in the past with UP and some timer handlers that could hog
cpu for long periods of time, and when TSO didnt exist.

Hopefully all these cpu hogs are not running in softirq handlers
anymore.

If your workload needs more than ~500 slots, then something is wrong
elsewhere and should be fixed. No more workarounds please.

Now BQL (Byte Queue Limits) is available, a driver should implement it
first before considering big TX rings. Thats a 20 minutes change.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next,1/3] mlx4_en: TX ring size default to 1024

Commit Message

Comments

Patch