diff mbox

[net-next,2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"

Message ID 1388581537-4810-3-git-send-email-amirv@mellanox.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Amir Vadai Jan. 1, 2014, 1:05 p.m. UTC
From: Eugenia Emantayev <eugenia@mellanox.com>

This reverts commit 90b1ebe "mlx4: set maximal number of default RSS
queues".
Limiting the number of receive rings to default (8) reduces overall
packet rate by 15% in multiple TCP streams application:

          Packet rate
8  rings  967691
16 rings  1142070

Results were obtained using netperf:

S=200 ; ( for i in $(seq 1 $S) ; do ( \
  netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
  wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )


CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Yuval Mintz Jan. 1, 2014, 6:46 p.m. UTC | #1
> From: Eugenia Emantayev <eugenia@mellanox.com>
> 
> This reverts commit 90b1ebe "mlx4: set maximal number of default RSS
> queues".
> Limiting the number of receive rings to default (8) reduces overall
> packet rate by 15% in multiple TCP streams application:
> 
>           Packet rate
> 8  rings  967691
> 16 rings  1142070
> 
> Results were obtained using netperf:
> 
> S=200 ; ( for i in $(seq 1 $S) ; do ( \
>   netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
>   wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )

Seems odd - well, to be more exact, nothing is odd about the results;
the whole notion of the original patch was to set an upper limit on the
number of  interrupt vectors multi-queue devices ask by default, not
to improve the default performance.

If you believe this is a better default (or some relaxation will be, 
e.g.,  16 instead of 8),  why not set it as default for ALL multi-queue 
networking drivers?

Thanks,
Yuval
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 1, 2014, 9:50 p.m. UTC | #2
On Wed, Jan 1, 2014 at 8:46 PM, Yuval Mintz <yuvalmin@broadcom.com> wrote:
>
> [...] If you believe this is a better default (or some relaxation will be, e.g.,  16 instead of 8),  why not set it as default for ALL multi-queue networking drivers?

Going back to your original commit 16917b87a "net-next: Add
netif_get_num_default_rss_queues" I am still not clear why we want

1. why we want a common default to all MQ devices?

2. why this default has to be hard coded and not derived e.g from the
number of cores or alike attribute of the system?

Can you please elaborate?

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yuval Mintz Jan. 2, 2014, 6:04 a.m. UTC | #3
> > [...] If you believe this is a better default (or some relaxation will be, e.g.,  16
> instead of 8),  why not set it as default for ALL multi-queue networking
> drivers?
> 
> Going back to your original commit 16917b87a "net-next: Add
> netif_get_num_default_rss_queues" I am still not clear why we want
> 
> 1. why we want a common default to all MQ devices?

Although networking benefits from multiple Interrupt vectors
(enabling more rings, better performance, etc.), bounding this
number only to the number of cpus is unreasonable as it strains
system resources; e.g., consider a 40-cpu server - we might wish
to have 40 vectors per device, but that means that connecting
several devices to the same server might cause other functions 
to fail probe as they will no longer be able to acquire interrupt
vectors of their own.

Since networking has an API allowing the user to manually set the
number of channels, the default is upper-bounded.

> 2. why this default has to be hard coded and not derived e.g from the
> number of cores or alike attribute of the system?

This is not entirely correct; The default number is derived from
the number of online cpus - it's only upper bounded by some
hard-coded value.

Cheers,
Yuval
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 2, 2014, 9:35 a.m. UTC | #4
On 02/01/2014 08:04, Yuval Mintz wrote:
>>> [...] If you believe this is a better default (or some relaxation will be, e.g.,  16
>> instead of 8),  why not set it as default for ALL multi-queue networking
>> drivers?
>>
>> Going back to your original commit 16917b87a "net-next: Add
>> netif_get_num_default_rss_queues" I am still not clear why we want
>>
>> 1. why we want a common default to all MQ devices?
> Although networking benefits from multiple Interrupt vectors
> (enabling more rings, better performance, etc.), bounding this
> number only to the number of cpus is unreasonable as it strains
> system resources; e.g., consider a 40-cpu server - we might wish
> to have 40 vectors per device, but that means that connecting
> several devices to the same server might cause other functions
> to fail probe as they will no longer be able to acquire interrupt
> vectors of their own.

Modern servers which have tens of CPUs typically have thousands of MSI-X 
vectors which means you should be easily able to plug four cards into a 
server with 64 cores which will consume 256 out of the 1-4K vectors out 
there. Anyway, let me continue your approach - how about raising the 
default hard limit to 16 or having it as the number of cores @ the numa 
node where the card is plugged?

Or.


>
> Since networking has an API allowing the user to manually set the
> number of channels, the default is upper-bounded.
>
>> 2. why this default has to be hard coded and not derived e.g from the
>> number of cores or alike attribute of the system?
> This is not entirely correct; The default number is derived from
> the number of online cpus - it's only upper bounded by some
> hard-coded value.
>
> Cheers,
> Yuval

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yuval Mintz Jan. 2, 2014, 10:27 a.m. UTC | #5
> >> Going back to your original commit 16917b87a "net-next: Add
> >> netif_get_num_default_rss_queues" I am still not clear why we want
> >>
> >> 1. why we want a common default to all MQ devices?
> > Although networking benefits from multiple Interrupt vectors
> > (enabling more rings, better performance, etc.), bounding this
> > number only to the number of cpus is unreasonable as it strains
> > system resources; e.g., consider a 40-cpu server - we might wish
> > to have 40 vectors per device, but that means that connecting
> > several devices to the same server might cause other functions
> > to fail probe as they will no longer be able to acquire interrupt
> > vectors of their own.
> 
> Modern servers which have tens of CPUs typically have thousands of MSI-X
> vectors which means you should be easily able to plug four cards into a
> server with 64 cores which will consume 256 out of the 1-4K vectors out
> there. Anyway, let me continue your approach - how about raising the
> default hard limit to 16 or having it as the number of cores @ the numa
> node where the card is plugged?

I think an additional issue was memory consumption -
additional interrupts --> additional allocated memory (for Rx rings).
And I do know the issues were real - we've had complains about devices
failing to load due to lack of resources (not all servers in the world are
top of the art).

Anyway, I believe 8/16 are simply strict limitations without any true meaning;
To judge what's more important, default `slimness' or default performance
is beyond me.
Perhaps the numa approach will prove beneficial (and will make some sense).

Thanks,
Yuval
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ido Shamai Jan. 15, 2014, 12:15 p.m. UTC | #6
On 1/2/2014 12:27 PM, Yuval Mintz wrote:
>>>> Going back to your original commit 16917b87a "net-next: Add
>>>> netif_get_num_default_rss_queues" I am still not clear why we want
>>>>
>>>> 1. why we want a common default to all MQ devices?
>>> Although networking benefits from multiple Interrupt vectors
>>> (enabling more rings, better performance, etc.), bounding this
>>> number only to the number of cpus is unreasonable as it strains
>>> system resources; e.g., consider a 40-cpu server - we might wish
>>> to have 40 vectors per device, but that means that connecting
>>> several devices to the same server might cause other functions
>>> to fail probe as they will no longer be able to acquire interrupt
>>> vectors of their own.
>>
>> Modern servers which have tens of CPUs typically have thousands of MSI-X
>> vectors which means you should be easily able to plug four cards into a
>> server with 64 cores which will consume 256 out of the 1-4K vectors out
>> there. Anyway, let me continue your approach - how about raising the
>> default hard limit to 16 or having it as the number of cores @ the numa
>> node where the card is plugged?
>
> I think an additional issue was memory consumption -
> additional interrupts --> additional allocated memory (for Rx rings).
> And I do know the issues were real - we've had complains about devices
> failing to load due to lack of resources (not all servers in the world are
> top of the art).
>
> Anyway, I believe 8/16 are simply strict limitations without any true meaning;
> To judge what's more important, default `slimness' or default performance
> is beyond me.
> Perhaps the numa approach will prove beneficial (and will make some sense).

After reviewing all that was said, I feel there is no need to enforce 
vendors with this strict limitation without any true meaning.

The reverted commit you applied forces the driver to use 8 rings at max 
at all time, without the possibility to change in flight using ethtool, 
as it's enforced on the PCI driver at module init (restarting the en 
driver with different of requested rings will not affect).
So it's crucial for performance oriented applications using mlx4_en.

Going through all Ethernet vendors I don't see this limitation enforced, 
so this limitation has no true meaning (no fairness).
I think this patch should go in as is.
Ethernet vendors should use it this limitation when they desire.

Ido
> Thanks,
> Yuval
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sathya Perla Jan. 15, 2014, 12:46 p.m. UTC | #7
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
> Of Ido Shamai
> 
> On 1/2/2014 12:27 PM, Yuval Mintz wrote:
> >>>> Going back to your original commit 16917b87a "net-next: Add
> >>>> netif_get_num_default_rss_queues" I am still not clear why we want
> >>>>
> >>>> 1. why we want a common default to all MQ devices?
> >>> Although networking benefits from multiple Interrupt vectors
> >>> (enabling more rings, better performance, etc.), bounding this
> >>> number only to the number of cpus is unreasonable as it strains
> >>> system resources; e.g., consider a 40-cpu server - we might wish
> >>> to have 40 vectors per device, but that means that connecting
> >>> several devices to the same server might cause other functions
> >>> to fail probe as they will no longer be able to acquire interrupt
> >>> vectors of their own.
> >>
> >> Modern servers which have tens of CPUs typically have thousands of MSI-X
> >> vectors which means you should be easily able to plug four cards into a
> >> server with 64 cores which will consume 256 out of the 1-4K vectors out
> >> there. Anyway, let me continue your approach - how about raising the
> >> default hard limit to 16 or having it as the number of cores @ the numa
> >> node where the card is plugged?
> >
> > I think an additional issue was memory consumption -
> > additional interrupts --> additional allocated memory (for Rx rings).
> > And I do know the issues were real - we've had complains about devices
> > failing to load due to lack of resources (not all servers in the world are
> > top of the art).
> >
> > Anyway, I believe 8/16 are simply strict limitations without any true meaning;
> > To judge what's more important, default `slimness' or default performance
> > is beyond me.
> > Perhaps the numa approach will prove beneficial (and will make some sense).
> 
> After reviewing all that was said, I feel there is no need to enforce
> vendors with this strict limitation without any true meaning.
> 
> The reverted commit you applied forces the driver to use 8 rings at max
> at all time, without the possibility to change in flight using ethtool,
> as it's enforced on the PCI driver at module init (restarting the en
> driver with different of requested rings will not affect).
> So it's crucial for performance oriented applications using mlx4_en.

The number of RSS/RX rings used by a driver can be increased (up to the HW supported value)
at runtime using set-channels ethtool interface.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ido Shamai Jan. 15, 2014, 12:49 p.m. UTC | #8
On 1/15/2014 2:46 PM, Sathya Perla wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
>> Of Ido Shamai
>>
>> On 1/2/2014 12:27 PM, Yuval Mintz wrote:
>>>>>> Going back to your original commit 16917b87a "net-next: Add
>>>>>> netif_get_num_default_rss_queues" I am still not clear why we want
>>>>>>
>>>>>> 1. why we want a common default to all MQ devices?
>>>>> Although networking benefits from multiple Interrupt vectors
>>>>> (enabling more rings, better performance, etc.), bounding this
>>>>> number only to the number of cpus is unreasonable as it strains
>>>>> system resources; e.g., consider a 40-cpu server - we might wish
>>>>> to have 40 vectors per device, but that means that connecting
>>>>> several devices to the same server might cause other functions
>>>>> to fail probe as they will no longer be able to acquire interrupt
>>>>> vectors of their own.
>>>>
>>>> Modern servers which have tens of CPUs typically have thousands of MSI-X
>>>> vectors which means you should be easily able to plug four cards into a
>>>> server with 64 cores which will consume 256 out of the 1-4K vectors out
>>>> there. Anyway, let me continue your approach - how about raising the
>>>> default hard limit to 16 or having it as the number of cores @ the numa
>>>> node where the card is plugged?
>>>
>>> I think an additional issue was memory consumption -
>>> additional interrupts --> additional allocated memory (for Rx rings).
>>> And I do know the issues were real - we've had complains about devices
>>> failing to load due to lack of resources (not all servers in the world are
>>> top of the art).
>>>
>>> Anyway, I believe 8/16 are simply strict limitations without any true meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The number of RSS/RX rings used by a driver can be increased (up to the HW supported value)
> at runtime using set-channels ethtool interface.
Not in this case, see my comment above: as it's enforced on the PCI 
driver at module init.
set-channels interface in our case will not change this limitation, but 
only up to it.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yuval Mintz Jan. 15, 2014, 12:54 p.m. UTC | #9
> > Anyway, I believe 8/16 are simply strict limitations without any true
> meaning;
> > To judge what's more important, default `slimness' or default performance
> > is beyond me.
> > Perhaps the numa approach will prove beneficial (and will make some
> sense).
> 
> After reviewing all that was said, I feel there is no need to enforce
> vendors with this strict limitation without any true meaning.
> 
> The reverted commit you applied forces the driver to use 8 rings at max
> at all time, without the possibility to change in flight using ethtool,
> as it's enforced on the PCI driver at module init (restarting the en
> driver with different of requested rings will not affect).
> So it's crucial for performance oriented applications using mlx4_en.

The rational is to prevent default misusage of resources, be them irq vectors
memories for rings.
Notice this is just the default - You can always re-request interrupts if the
user explicitly requests a large number of rings;
Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
should allocate a larger number of irq vectors during init and use a limitation
on your default number of rings 
(that's assuming that irq vectors are really inexpensive on all machines).

> Going through all Ethernet vendors I don't see this limitation enforced,
> so this limitation has no true meaning (no fairness).
> I think this patch should go in as is.
> Ethernet vendors should use it this limitation when they desire.

Might be true, but two wrongs don't make a right.
I believe that either this limitation should be mandatory, or the functionality
Should Not be included in the kernel as communal code and each driver
should do as it pleases.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ido Shamai Jan. 15, 2014, 1:15 p.m. UTC | #10
On 1/15/2014 2:54 PM, Yuval Mintz wrote:
>>> Anyway, I believe 8/16 are simply strict limitations without any true
>> meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some
>> sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The rational is to prevent default misusage of resources, be them irq vectors
> memories for rings.
> Notice this is just the default - You can always re-request interrupts if the
> user explicitly requests a large number of rings;
> Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
> should allocate a larger number of irq vectors during init and use a limitation
> on your default number of rings
> (that's assuming that irq vectors are really inexpensive on all machines).

I fully agree with you on that.
We will send a new patch that will limit the default number of rings to 
this limitation, and could be changed using set-channels ethtool 
interface. Number of IRQ vectors will be (max) of number of CPUs.
Yuval, thanks for clarifying.

>> Going through all Ethernet vendors I don't see this limitation enforced,
>> so this limitation has no true meaning (no fairness).
>> I think this patch should go in as is.
>> Ethernet vendors should use it this limitation when they desire.
>
> Might be true, but two wrongs don't make a right.
> I believe that either this limitation should be mandatory, or the functionality
> Should Not be included in the kernel as communal code and each driver
> should do as it pleases.
I agree, but this is a different discussion that should be held.
I agree, but this is a different discussion, which I hope to be held 
sometimes in the near future.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 417a595..31ff812 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -41,7 +41,6 @@ 
 #include <linux/slab.h>
 #include <linux/io-mapping.h>
 #include <linux/delay.h>
-#include <linux/netdevice.h>
 #include <linux/kmod.h>
 
 #include <linux/mlx4/device.h>
@@ -1974,8 +1973,8 @@  static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct msix_entry *entries;
 	int nreq = min_t(int, dev->caps.num_ports *
-			 min_t(int, netif_get_num_default_rss_queues() + 1,
-			       MAX_MSIX_P_PORT) + MSIX_LEGACY_SZ, MAX_MSIX);
+			 min_t(int, num_online_cpus() + 1, MAX_MSIX_P_PORT)
+				+ MSIX_LEGACY_SZ, MAX_MSIX);
 	int err;
 	int i;