diff mbox

[RFC] multiqueue changes

Message ID 4ACD9255.4020008@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 8, 2009, 7:18 a.m. UTC
Say I have such non multiqueue device :

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)

Driver bnx2

This drivers does an alloc_etherdev_mq(sizeof(*bp), TX_MAX_RINGS),
regardless of real capabilities of the NIC.

Then, a bit later it does :

bp->dev->real_num_tx_queues = bp->num_tx_rings;  

(one in my non multiqueue case)

Now I have :

# tc -s -d qdisc show dev eth0
qdisc mq 0: root
 Sent 117693091 bytes 1542188 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
# tc -s -d class show dev eth0
class mq :1 root
 Sent 113935509 bytes 1492598 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :2 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :3 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :4 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :5 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :6 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :7 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :8 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

While in previous kernels I had :

# tc -s -d qdisc show dev eth0
qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 26292963818 bytes 347139141 pkts (dropped 0, overlimits 0 requeues 0)
# tc -s -d class show dev eth0


So I lost the default pfifo_fast classification.

Just wondering if it could hurt some people.

Anyway, should we change bnx2/tg3 drivers so that single queue devices
have same default qdisc/class than before ?

eg :

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jarek Poplawski Oct. 8, 2009, 9:03 a.m. UTC | #1
On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
> Say I have such non multiqueue device :
> 
> 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
> 
> Driver bnx2
> 
> This drivers does an alloc_etherdev_mq(sizeof(*bp), TX_MAX_RINGS),
> regardless of real capabilities of the NIC.
> 
> Then, a bit later it does :
> 
> bp->dev->real_num_tx_queues = bp->num_tx_rings;  
> 
> (one in my non multiqueue case)
> 
> Now I have :
> 
> # tc -s -d qdisc show dev eth0
> qdisc mq 0: root
>  Sent 117693091 bytes 1542188 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> # tc -s -d class show dev eth0
> class mq :1 root
>  Sent 113935509 bytes 1492598 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :8 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> 
> While in previous kernels I had :
> 
> # tc -s -d qdisc show dev eth0
> qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 26292963818 bytes 347139141 pkts (dropped 0, overlimits 0 requeues 0)
> # tc -s -d class show dev eth0
> 
> 
> So I lost the default pfifo_fast classification.

IMHO it (pfifo_fast qdiscs under mq root) could/should get back.

> 
> Just wondering if it could hurt some people.
> 
> Anyway, should we change bnx2/tg3 drivers so that single queue devices
> have same default qdisc/class than before ?
> 
> eg :
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 08cddb6..7cac205 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
>  
>  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
>  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
>  
>  	bp->num_rx_rings = bp->irq_nvecs;
>  }

It doesn't look consistent to me wrt. the comment in netdevice.h on
num_tx_queues. But it seems we should rather use more often
real_num_tx_queue in schedulers code like dumps and maybe more
(like e.g. sch_multiq does).

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 8, 2009, noon UTC | #2
On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
...
> num_tx_queues. But it seems we should rather use more often
> real_num_tx_queue in schedulers code like dumps and maybe more
> (like e.g. sch_multiq does).

...i.e. probably everywhere between dev_activate and dev_deactivate
all qdisc operations could use real_num_tx_queues (including a test
like: netif_is_real_multique), unless I miss something.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 8, 2009, 12:13 p.m. UTC | #3
Jarek Poplawski a écrit :
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> ...
>> num_tx_queues. But it seems we should rather use more often
>> real_num_tx_queue in schedulers code like dumps and maybe more
>> (like e.g. sch_multiq does).
> 
> ...i.e. probably everywhere between dev_activate and dev_deactivate
> all qdisc operations could use real_num_tx_queues (including a test
> like: netif_is_real_multique), unless I miss something.

I am not sure David intent was being able to dynamically adjust real_num_tx_queue
between 1 and num_tx_queue.

For low/moderate traffic, its better to use one queue, to lower IRQ activations,
and let some cpus sleep longer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 8, 2009, 12:53 p.m. UTC | #4
On Thu, Oct 08, 2009 at 02:13:22PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > ...
> >> num_tx_queues. But it seems we should rather use more often
> >> real_num_tx_queue in schedulers code like dumps and maybe more
> >> (like e.g. sch_multiq does).
> > 
> > ...i.e. probably everywhere between dev_activate and dev_deactivate
> > all qdisc operations could use real_num_tx_queues (including a test
> > like: netif_is_real_multique), unless I miss something.
> 
> I am not sure David intent was being able to dynamically adjust real_num_tx_queue
> between 1 and num_tx_queue.

If so, then it looks like some drivers could misuse this intent.

Thanks for the explanation,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 9, 2009, 7:58 a.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 08 Oct 2009 14:13:22 +0200

> I am not sure David intent was being able to dynamically adjust
> real_num_tx_queue between 1 and num_tx_queue.

The idea was to allow semi-dynamic adjustment.

Meaning that you could change the number of TX queues, but only in
some quiescent state, such as when the device is down or frozen while
up in some way.

dev->num_tx_queues tracks how many total were allocated.  This is
necessary so that even if the real_num_tx_queues is modified while the
device is up, we can still see the correct statistics by gathering
from queues that are now disabled but were enabled beforehand.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 9, 2009, 8:51 a.m. UTC | #6
On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
...
> > Just wondering if it could hurt some people.
> > 
> > Anyway, should we change bnx2/tg3 drivers so that single queue devices
> > have same default qdisc/class than before ?
> > 
> > eg :
> > 
> > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > index 08cddb6..7cac205 100644
> > --- a/drivers/net/bnx2.c
> > +++ b/drivers/net/bnx2.c
> > @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> >  
> >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> >  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
> >  
> >  	bp->num_rx_rings = bp->irq_nvecs;
> >  }
> 
> It doesn't look consistent to me wrt. the comment in netdevice.h on
> num_tx_queues. But it seems we should rather use more often
> real_num_tx_queue in schedulers code like dumps and maybe more
> (like e.g. sch_multiq does).

So, according to my current understanding, we should probably let
drivers to reset the tx_queues allocation with a new num_tx_queues
in a place like this (ndo_open).

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 9, 2009, 9:40 a.m. UTC | #7
On Fri, Oct 09, 2009 at 08:51:07AM +0000, Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
> ...
> > > Just wondering if it could hurt some people.
> > > 
> > > Anyway, should we change bnx2/tg3 drivers so that single queue devices
> > > have same default qdisc/class than before ?
> > > 
> > > eg :
> > > 
> > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > > index 08cddb6..7cac205 100644
> > > --- a/drivers/net/bnx2.c
> > > +++ b/drivers/net/bnx2.c
> > > @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> > >  
> > >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> > >  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > > +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
> > >  
> > >  	bp->num_rx_rings = bp->irq_nvecs;
> > >  }
> > 
> > It doesn't look consistent to me wrt. the comment in netdevice.h on
> > num_tx_queues. But it seems we should rather use more often
> > real_num_tx_queue in schedulers code like dumps and maybe more
> > (like e.g. sch_multiq does).
> 
> So, according to my current understanding, we should probably let
> drivers to reset the tx_queues allocation with a new num_tx_queues
> in a place like this (ndo_open).

Actually, it should be only for the first ndo_open.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 28, 2009, 5:27 p.m. UTC | #8
Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> ...
>> num_tx_queues. But it seems we should rather use more often
>> real_num_tx_queue in schedulers code like dumps and maybe more
>> (like e.g. sch_multiq does).
> 
> ...i.e. probably everywhere between dev_activate and dev_deactivate
> all qdisc operations could use real_num_tx_queues (including a test
> like: netif_is_real_multique), unless I miss something.

We don't seem to be supporting changing real_num_tx_queues for
registered devices currently (at least I couldn't find it).
So I guess it depends on how this would be implemented.

Simply changing the dev->real_num_tx_queues value while the
device is down would require qdisc operations to operate on
all possible queues since the amount of queues in use could
be changed after the qdisc is created/configured, but before
the device is set up. This approach has more complications
like switching between mq and non-mq root qdiscs, taking care
of non-default root qdisc (cloning them to the new queues), etc.

A simpler alternative would be to destroy the existing root
qdisc on any change to real_num_tx_queues and have dev_activate()
set it up from scratch. In this case, we could (as you suggested)
use real_num_tx_queues, which should fix the problem Eric reported.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 28, 2009, 9:23 p.m. UTC | #9
On Wed, Oct 28, 2009 at 06:27:10PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > ...
> >> num_tx_queues. But it seems we should rather use more often
> >> real_num_tx_queue in schedulers code like dumps and maybe more
> >> (like e.g. sch_multiq does).
> > 
> > ...i.e. probably everywhere between dev_activate and dev_deactivate
> > all qdisc operations could use real_num_tx_queues (including a test
> > like: netif_is_real_multique), unless I miss something.
> 
> We don't seem to be supporting changing real_num_tx_queues for
> registered devices currently (at least I couldn't find it).
> So I guess it depends on how this would be implemented.
> 
> Simply changing the dev->real_num_tx_queues value while the
> device is down would require qdisc operations to operate on
> all possible queues since the amount of queues in use could
> be changed after the qdisc is created/configured, but before
> the device is set up. This approach has more complications
> like switching between mq and non-mq root qdiscs, taking care
> of non-default root qdisc (cloning them to the new queues), etc.
> 
> A simpler alternative would be to destroy the existing root
> qdisc on any change to real_num_tx_queues and have dev_activate()
> set it up from scratch. In this case, we could (as you suggested)
> use real_num_tx_queues, which should fix the problem Eric reported.

Actually, I changed my mind after Eric's and especially David's
explanations. Probably there will be needed some changes in handling
the real_num_tx_queues, but there is no reason to misuse them for
masking a totally useless num_tx_queues value, like in this case. So,
IMHO, its mainly about the driver(s) (and maybe a bit of API change)
here.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 29, 2009, 4:37 p.m. UTC | #10
Jarek Poplawski wrote:
> On Wed, Oct 28, 2009 at 06:27:10PM +0100, Patrick McHardy wrote:
>> We don't seem to be supporting changing real_num_tx_queues for
>> registered devices currently (at least I couldn't find it).
>> So I guess it depends on how this would be implemented.
>>
>> Simply changing the dev->real_num_tx_queues value while the
>> device is down would require qdisc operations to operate on
>> all possible queues since the amount of queues in use could
>> be changed after the qdisc is created/configured, but before
>> the device is set up. This approach has more complications
>> like switching between mq and non-mq root qdiscs, taking care
>> of non-default root qdisc (cloning them to the new queues), etc.
>>
>> A simpler alternative would be to destroy the existing root
>> qdisc on any change to real_num_tx_queues and have dev_activate()
>> set it up from scratch. In this case, we could (as you suggested)
>> use real_num_tx_queues, which should fix the problem Eric reported.
> 
> Actually, I changed my mind after Eric's and especially David's
> explanations. Probably there will be needed some changes in handling
> the real_num_tx_queues, but there is no reason to misuse them for
> masking a totally useless num_tx_queues value, like in this case. So,
> IMHO, its mainly about the driver(s) (and maybe a bit of API change)
> here.

Well, we do need both values for supporting changes to the actually
used numbers of TX queues. If I understood Dave's explanation correctly,
this is also what's intended. It also doesn't seem unreasonable
what bnx2 is doing.

But getting back to the problem Eric reported - so you're suggesting
that bnx2.c should also adjust num_tx_queues in case the hardware
doesn't support multiqueue? That seems reasonable as well.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 29, 2009, 9:15 p.m. UTC | #11
On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
...
> Well, we do need both values for supporting changes to the actually
> used numbers of TX queues. If I understood Dave's explanation correctly,
> this is also what's intended. It also doesn't seem unreasonable
> what bnx2 is doing.

Exactly. With a growing number of cores, both available and powered
off, these values will be soon treated more carefully than now.

> But getting back to the problem Eric reported - so you're suggesting
> that bnx2.c should also adjust num_tx_queues in case the hardware
> doesn't support multiqueue? That seems reasonable as well.

Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
too soon for some drivers. It seems they should be able to do it
separately later during the .probe. There is a question if .ndo_open
should be considered too.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 29, 2009, 10:12 p.m. UTC | #12
Jarek Poplawski wrote:
> On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> ...
>> Well, we do need both values for supporting changes to the actually
>> used numbers of TX queues. If I understood Dave's explanation correctly,
>> this is also what's intended. It also doesn't seem unreasonable
>> what bnx2 is doing.
> 
> Exactly. With a growing number of cores, both available and powered
> off, these values will be soon treated more carefully than now.
> 
>> But getting back to the problem Eric reported - so you're suggesting
>> that bnx2.c should also adjust num_tx_queues in case the hardware
>> doesn't support multiqueue? That seems reasonable as well.
> 
> Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> too soon for some drivers. It seems they should be able to do it
> separately later during the .probe.

The value passed into alloc_netdev_mq() is just used for allocation
purposes, from what I can tell there's no downside in reducing it
before the dev_activate() call.

> There is a question if .ndo_open should be considered too.

I currently can't see any purpose in decreasing num_tx_queues after
registration instead of real_num_tx_queues. But it depends on how
exactly this will be implemented and how it interacts with qdiscs
(hence me previous mail, where I tried to point out possible
inconsistencies from using real_num_tx_queues).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 30, 2009, 10 a.m. UTC | #13
On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> > ...
> >> Well, we do need both values for supporting changes to the actually
> >> used numbers of TX queues. If I understood Dave's explanation correctly,
> >> this is also what's intended. It also doesn't seem unreasonable
> >> what bnx2 is doing.
> > 
> > Exactly. With a growing number of cores, both available and powered
> > off, these values will be soon treated more carefully than now.
> > 
> >> But getting back to the problem Eric reported - so you're suggesting
> >> that bnx2.c should also adjust num_tx_queues in case the hardware
> >> doesn't support multiqueue? That seems reasonable as well.
> > 
> > Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> > too soon for some drivers. It seems they should be able to do it
> > separately later during the .probe.
> 
> The value passed into alloc_netdev_mq() is just used for allocation
> purposes, from what I can tell there's no downside in reducing it
> before the dev_activate() call.

Right, but IMHO this reducing (or reallocation) should be done with
some API. Simple overwriting of num_tx_queues proposed by Eric, even
if no downside now, seems to be asking for problems in the future.

> > There is a question if .ndo_open should be considered too.
> 
> I currently can't see any purpose in decreasing num_tx_queues after
> registration instead of real_num_tx_queues.

I agree, but since Eric's example shows some drivers do it (almost)
like this, I'd prefer authors/maintainers answer this question.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Chan Oct. 31, 2009, 5:25 p.m. UTC | #14
On Fri, 2009-10-30 at 03:00 -0700, Jarek Poplawski wrote:
> On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
> > Jarek Poplawski wrote:
> > > On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> > > ...
> > >> Well, we do need both values for supporting changes to the actually
> > >> used numbers of TX queues. If I understood Dave's explanation correctly,
> > >> this is also what's intended. It also doesn't seem unreasonable
> > >> what bnx2 is doing.
> > > 
> > > Exactly. With a growing number of cores, both available and powered
> > > off, these values will be soon treated more carefully than now.
> > > 
> > >> But getting back to the problem Eric reported - so you're suggesting
> > >> that bnx2.c should also adjust num_tx_queues in case the hardware
> > >> doesn't support multiqueue? That seems reasonable as well.
> > > 
> > > Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> > > too soon for some drivers. It seems they should be able to do it
> > > separately later during the .probe.
> > 
> > The value passed into alloc_netdev_mq() is just used for allocation
> > purposes, from what I can tell there's no downside in reducing it
> > before the dev_activate() call.
> 
> Right, but IMHO this reducing (or reallocation) should be done with
> some API. Simple overwriting of num_tx_queues proposed by Eric, even
> if no downside now, seems to be asking for problems in the future.
> 
> > > There is a question if .ndo_open should be considered too.
> > 
> > I currently can't see any purpose in decreasing num_tx_queues after
> > registration instead of real_num_tx_queues.
> 
> I agree, but since Eric's example shows some drivers do it (almost)
> like this, I'd prefer authors/maintainers answer this question.
> 

We need the netdev and the private structure at the beginning of
->probe() to store values as we probe the device.  Later on in
->probe(), we'll know whether the device supports MSI-X and multiqueue.
If we successfully enable MSI-X in ->ndo_open(), tx multiqueue will be
used.

So if we can separate the allocation of the netdev and the private
structure from the tx queues, we can allocate and set the exact number
of num_tx_queues in ->ndo_open().


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 1, 2009, 1:20 p.m. UTC | #15
On Sat, Oct 31, 2009 at 10:25:52AM -0700, Michael Chan wrote:
> 
> On Fri, 2009-10-30 at 03:00 -0700, Jarek Poplawski wrote:
> > On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
...
> > > I currently can't see any purpose in decreasing num_tx_queues after
> > > registration instead of real_num_tx_queues.
> > 
> > I agree, but since Eric's example shows some drivers do it (almost)
> > like this, I'd prefer authors/maintainers answer this question.
> > 
> 
> We need the netdev and the private structure at the beginning of
> ->probe() to store values as we probe the device.  Later on in
> ->probe(), we'll know whether the device supports MSI-X and multiqueue.
> If we successfully enable MSI-X in ->ndo_open(), tx multiqueue will be
> used.
> 
> So if we can separate the allocation of the netdev and the private
> structure from the tx queues, we can allocate and set the exact number
> of num_tx_queues in ->ndo_open().

There is a question if we can predict in ->probe() MSI-X should be
successfully enabled in ->ndo_open() for probed hardware. If so,
then it could go e.g. like this:
->probe()
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)

->ndo_open()
	if (!enabled_MSI-X)
		/* something gone wrong but it's an exception */
		dev->real_num_tx_queues = 1

Otherwise, (re)allocation in ->ndo_open() would need to answer some
questions about reinitializing scheduler vs preserving qdisc stats
between opens.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 2, 2009, 11:35 a.m. UTC | #16
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 1 Nov 2009 14:20:17 +0100

> There is a question if we can predict in ->probe() MSI-X should be
> successfully enabled in ->ndo_open() for probed hardware. If so,
> then it could go e.g. like this:

We never can know this.

Another device driver can eat up all the MSI-X vectors in the PCI
domain before we make the request_irq() calls in ->open().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 2, 2009, 12:30 p.m. UTC | #17
On Mon, Nov 02, 2009 at 03:35:33AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 1 Nov 2009 14:20:17 +0100
> 
> > There is a question if we can predict in ->probe() MSI-X should be
> > successfully enabled in ->ndo_open() for probed hardware. If so,
> > then it could go e.g. like this:
> 
> We never can know this.
> 
> Another device driver can eat up all the MSI-X vectors in the PCI
> domain before we make the request_irq() calls in ->open().

Right, but it's not a 50% chance, I guess? A user most of the time
gets consistently multiqueue or non-multiqueue behavior after open,
unless I miss something. Then such an exceptional state could be
handled by real_num_tx_queues (just like in case of powered of cpus).
The main difference is to hold in num_tx_queues something that is
really available vs max possible value for all configs.

Jarek P. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 2, 2009, 12:39 p.m. UTC | #18
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 2 Nov 2009 12:30:29 +0000

> Right, but it's not a 50% chance, I guess? A user most of the time
> gets consistently multiqueue or non-multiqueue behavior after open,
> unless I miss something. Then such an exceptional state could be
> handled by real_num_tx_queues (just like in case of powered of cpus).
> The main difference is to hold in num_tx_queues something that is
> really available vs max possible value for all configs.

I see your point, yes this would seem to be a reasonable way
to start handling num_tx_queues and real_num_tx_queues.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 2, 2009, 1:02 p.m. UTC | #19
On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 2 Nov 2009 12:30:29 +0000
> 
> > Right, but it's not a 50% chance, I guess? A user most of the time
> > gets consistently multiqueue or non-multiqueue behavior after open,
> > unless I miss something. Then such an exceptional state could be
> > handled by real_num_tx_queues (just like in case of powered of cpus).
> > The main difference is to hold in num_tx_queues something that is
> > really available vs max possible value for all configs.
> 
> I see your point, yes this would seem to be a reasonable way
> to start handling num_tx_queues and real_num_tx_queues.

Very nice! So, I hope Eric should be satisfied with these requested
comments already :-)

Thanks everybody,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 2, 2009, 1:03 p.m. UTC | #20
Jarek Poplawski a écrit :
> On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Mon, 2 Nov 2009 12:30:29 +0000
>>
>>> Right, but it's not a 50% chance, I guess? A user most of the time
>>> gets consistently multiqueue or non-multiqueue behavior after open,
>>> unless I miss something. Then such an exceptional state could be
>>> handled by real_num_tx_queues (just like in case of powered of cpus).
>>> The main difference is to hold in num_tx_queues something that is
>>> really available vs max possible value for all configs.
>> I see your point, yes this would seem to be a reasonable way
>> to start handling num_tx_queues and real_num_tx_queues.
> 
> Very nice! So, I hope Eric should be satisfied with these requested
> comments already :-)
> 
Sure, but I prefer a patch from you ;)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 2, 2009, 1:09 p.m. UTC | #21
On Mon, Nov 02, 2009 at 02:03:21PM +0100, Eric Dumazet wrote:
...
> Sure, but I prefer a patch from you ;)
> 
Wrong code! (RFP? ;-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 08cddb6..7cac205 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6152,6 +6152,7 @@  bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
 
 	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
 	bp->dev->real_num_tx_queues = bp->num_tx_rings;
+	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
 
 	bp->num_rx_rings = bp->irq_nvecs;
 }