diff mbox

[net-next,1/3] mlx4_en: TX ring size default to 1024

Message ID 4F46404D.10509@mellanox.co.il
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Yevgeny Petrilin Feb. 23, 2012, 1:34 p.m. UTC
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

David Miller Feb. 23, 2012, 7:45 p.m. UTC | #1
From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 23 Feb 2012 15:34:05 +0200

> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

This is rediculious as a default, yes even for 10Gb.

Do you have any idea how high latency is going to be for packets
trying to get into the transmit queue if there are already a
thousand other frames in there?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 23, 2012, 7:54 p.m. UTC | #2
Le jeudi 23 février 2012 à 14:45 -0500, David Miller a écrit :
> From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> Date: Thu, 23 Feb 2012 15:34:05 +0200
> 
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> This is rediculious as a default, yes even for 10Gb.
> 
> Do you have any idea how high latency is going to be for packets
> trying to get into the transmit queue if there are already a
> thousand other frames in there?

Before increasing TX ring sizes, a driver should implement BQL as a
prereq.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yevgeny Petrilin Feb. 24, 2012, 7:35 p.m. UTC | #3
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> This is rediculious as a default, yes even for 10Gb.
> 
> Do you have any idea how high latency is going to be for packets
> trying to get into the transmit queue if there are already a
> thousand other frames in there?

On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
how is it better?
Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.

I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.

Thanks,
Yevgeny
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 24, 2012, 8:14 p.m. UTC | #4
From: Yevgeny Petrilin <yevgenyp@mellanox.com>
Date: Fri, 24 Feb 2012 19:35:45 +0000

> On the other hand, when having smaller queue with 1000 in-flight
> packets would mean queue would be stopped, how is it better?

It's a thousand times better.

Because if a high priority packet gets queued up it won't have to wait
for 1024 packets to hit the wire before it can go out.

You need to support byte queue limits before you jack things up so high
like this, otherwise high priority packets are absolutely pointless
and unusable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 24, 2012, 8:17 p.m. UTC | #5
Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit :
> > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > 
> > This is rediculious as a default, yes even for 10Gb.
> > 
> > Do you have any idea how high latency is going to be for packets
> > trying to get into the transmit queue if there are already a
> > thousand other frames in there?
> 
> On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
> how is it better?

Its better because you can have any kind of Qdisc setup to properly
classify packets, with 100.000 total packets in queues if you wish.

TX ring is a single FIFO, and that is just horrible, especially with big packets...

> Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
> It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.
> 

Really I doubt people using forwarding setups use default qdiscs.

Instead of bigger TX rings, they need appropriate Qdiscs.

> I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.

Thats a shame.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bill Fink Feb. 25, 2012, 6:51 a.m. UTC | #6
On Fri, 24 Feb 2012, Eric Dumazet wrote:

> Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit :
> > > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > > 
> > > This is rediculious as a default, yes even for 10Gb.
> > > 
> > > Do you have any idea how high latency is going to be for packets
> > > trying to get into the transmit queue if there are already a
> > > thousand other frames in there?

For a GigE NIC with a typical ring size of 256, the serialization delay
for 256 1500 byte packets is:

	1500*8*256/10^9 = ~3.1 msec

For a 10-GigE NIC with a ring size of 1024, the serialization delay
for 1024 1500 byte packets is:

	1500*8*1024/10^10 = ~1.2 msec

So it's not immediately clear that a ring size of 1024 is unreasonable
for 10-GigE.

It probably boils down to whether the default setting should
be biased more toward low latency applications or high throughput
bulk data applications.  Determining the best happy medium is
best decided by appropriate benchmark testing.  Of course,
anyone can change the settings to suit their purpose, so it's
really just a question of what's best for the "usual" case.

> > On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped,
> > how is it better?
> 
> Its better because you can have any kind of Qdisc setup to properly
> classify packets, with 100.000 total packets in queues if you wish.

Not everyone wants to deal with the convoluted, arcane, and poorly
documented qdisc machinery, especially with its current limitations
at 10-GigE (or faster) line rates.

> TX ring is a single FIFO, and that is just horrible, especially with big packets...
> 
> > Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue,
> > It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX.
> 
> Really I doubt people using forwarding setups use default qdiscs.

I don't think it's necessarily that uncommon, such as a simple
10-GigE firewall setup.

> Instead of bigger TX rings, they need appropriate Qdiscs.
> 
> > I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue.
> 
> Thats a shame.

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 25, 2012, 8:22 a.m. UTC | #7
Le samedi 25 février 2012 à 01:51 -0500, Bill Fink a écrit :

> For a GigE NIC with a typical ring size of 256, the serialization delay
> for 256 1500 byte packets is:
> 
> 	1500*8*256/10^9 = ~3.1 msec
> 
> For a 10-GigE NIC with a ring size of 1024, the serialization delay
> for 1024 1500 byte packets is:
> 
> 	1500*8*1024/10^10 = ~1.2 msec
> 
> So it's not immediately clear that a ring size of 1024 is unreasonable
> for 10-GigE.
> 

Its clear when you take into account packets of 64Kbytes (TSO)

With current hardware and state of linux software, you dont need anymore
very big NIC queues since they bring known drawbacks.

It was true in the past with UP and some timer handlers that could hog
cpu for long periods of time, and when TSO didnt exist.

Hopefully all these cpu hogs are not running in softirq handlers
anymore.

If your workload needs more than ~500 slots, then something is wrong
elsewhere and should be fixed. No more workarounds please.

Now BQL (Byte Queue Limits) is available, a driver should implement it
first before considering big TX rings. Thats a 20 minutes change.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d60335f..174dc38 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -110,7 +110,7 @@  enum {
 #define MLX4_EN_NUM_TX_RINGS		8
 #define MLX4_EN_NUM_PPP_RINGS		8
 #define MAX_TX_RINGS			(MLX4_EN_NUM_TX_RINGS + MLX4_EN_NUM_PPP_RINGS)
-#define MLX4_EN_DEF_TX_RING_SIZE	512
+#define MLX4_EN_DEF_TX_RING_SIZE	1024
 #define MLX4_EN_DEF_RX_RING_SIZE  	1024
 
 /* Target number of packets to coalesce with interrupt moderation */