[take,2] pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race

From: Jarek Poplawski <jarkao2@gmail.com>

On Sat, Sep 20, 2008 at 12:21:37AM -0700, David Miller wrote:
...
> Let's look at what actually matters for cpu utilization.  These
> __qdisc_run() things are invoked in two situations where we might
> block on the hw queue being stopped:
> 
> 1) When feeding packets into the qdisc in dev_queue_xmit().
> 
>    Guess what?  We _know_ the queue this packet is going to
>    hit.
> 
>    The only new thing we can possible trigger and be interested
>    in at this specific point is if _this_ packet can be sent at
>    this time.
> 
>    And we can check that queue mapping after the qdisc_enqueue_root()
>    call, so that multiq aware qdiscs can have made their changes.
> 
> 2) When waking up a queue.  And here we should schedule the qdisc_run
>    _unconditionally_.
> 
>    If the queue was full, it is extremely likely that new packets
>    are bound for that device queue.  There is no real savings to
>    be had by doing this peek/requeue/dequeue stuff.
> 
> The cpu utilization savings exist for case #1 only, and we can
> implement the bypass logic _perfectly_ as described above.
> 
> For #2 there is nothing to check, just do it and see what comes
> out of the qdisc.

Right, unless __netif_schedule() wasn't done when waking up. I've
thought about this because of another thread/patch around this
problem, and got misled by dev_requeue_skb() scheduling. Now, I think
this could be the main reason for this high load. Anyway, if we want
to skip this check for #2 I think something like the patch below is
needed.

> I would suggest adding an skb pointer argument to qdisc_run().
> If it's NULL, unconditionally schedule __qdisc_run().  Else,
> only schedule if the TX queue indicated by skb_queue_mapping()
> is not stopped.
> 
> dev_queue_xmit() will use the "pass the skb" case, but only if
> qdisc_enqueue_root()'s return value doesn't indicate that there
> is a potential drop.  On potential drop, we'll pass NULL to
> make sure we don't potentially reference a free'd SKB.
> 
> The other case in net_tx_action() can always pass NULL to qdisc_run().

I'm not convinced this #1 is useful for us: this could be an skb #1000
in a queue; the tx status could change many times before this packet
would be #1; why worry? This adds additional checks on the fast path
for something which is unlikely even if this skb would be #1, but for
any later skbs it's only a guess. IMHO, if we can't check for the next
skb to be xmitted it's better to skip this test entirely (which seems
to be safe with the patch below).

Jarek P.

--------------->
pkt_sched: dev_requeue_skb: Don't schedule if a queue is stopped

Doing __netif_schedule() while requeuing because of a stopped tx queue
and skipping such a test in qdisc_run() can cause a requeuing loop with
high cpu use until the queue is awaken.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_generic.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	20080920234843.GA2531@ami.dom.local
State	Rejected, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=CsqjESlFWrGEmEpJLCciVC5B2zw1MIVSbhGQWghHlx7/lf5+P2jpj7VhqZFrrdVBhY qfbzLpgwrnZjOl53kKQskfVlM7SV96CwPAzD+0jZq2qpQ5+6ipeCUMmct/YZWiUxJgnq yUrJCEGidPci0lk0iuadYWAxHt27oTwc8YnPY= Date: Sun, 21 Sep 2008 01:48:43 +0200 From: Jarek Poplawski <jarkao2@gmail.com> To: David Miller <davem@davemloft.net> Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org, kaber@trash.net Subject: Re: [PATCH take 2] pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race Message-ID: <20080920234843.GA2531@ami.dom.local> References: <20080913205408.GA2545@ami.dom.local> <20080914061610.GA20571@gondor.apana.org.au> <20080914202715.GA2540@ami.dom.local> <20080920.002137.108837580.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080920.002137.108837580.davem@davemloft.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk

[take,2] pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race

Commit Message

Comments

Patch