From patchwork Sat Sep 20 23:48:43 2008 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jarek Poplawski X-Patchwork-Id: 761 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 136BEDDE31 for ; Sun, 21 Sep 2008 09:46:02 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751665AbYITXp4 (ORCPT ); Sat, 20 Sep 2008 19:45:56 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751675AbYITXp4 (ORCPT ); Sat, 20 Sep 2008 19:45:56 -0400 Received: from mu-out-0910.google.com ([209.85.134.189]:56270 "EHLO mu-out-0910.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751539AbYITXpz (ORCPT ); Sat, 20 Sep 2008 19:45:55 -0400 Received: by mu-out-0910.google.com with SMTP id g7so832855muf.1 for ; Sat, 20 Sep 2008 16:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=8qnvR7c3afAX5bAisd5CdNAwAxFBHdZjoLyJJpP6194=; b=trS72smMIlNwmtmOEc6XC8u5TNpfS9wftvByinMPkbYj1wDhGM9O65s123QFUq4vVG kjO/zSpTHps1SMiHN3NBkoqbE8L8lnC1czrYG9QVXWMx/rz+Lhwx8ixHtr6uCzBaV7jF qJRjRNvT2ohD5qh0FC4OEHst5pNv1KITEgWpA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=CsqjESlFWrGEmEpJLCciVC5B2zw1MIVSbhGQWghHlx7/lf5+P2jpj7VhqZFrrdVBhY qfbzLpgwrnZjOl53kKQskfVlM7SV96CwPAzD+0jZq2qpQ5+6ipeCUMmct/YZWiUxJgnq yUrJCEGidPci0lk0iuadYWAxHt27oTwc8YnPY= Received: by 10.103.243.7 with SMTP id v7mr1479683mur.24.1221954353670; Sat, 20 Sep 2008 16:45:53 -0700 (PDT) Received: from ami.dom.local ( [83.27.54.207]) by mx.google.com with ESMTPS id s10sm6279378muh.12.2008.09.20.16.45.51 (version=SSLv3 cipher=RC4-MD5); Sat, 20 Sep 2008 16:45:52 -0700 (PDT) Date: Sun, 21 Sep 2008 01:48:43 +0200 From: Jarek Poplawski To: David Miller Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org, kaber@trash.net Subject: Re: [PATCH take 2] pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race Message-ID: <20080920234843.GA2531@ami.dom.local> References: <20080913205408.GA2545@ami.dom.local> <20080914061610.GA20571@gondor.apana.org.au> <20080914202715.GA2540@ami.dom.local> <20080920.002137.108837580.davem@davemloft.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20080920.002137.108837580.davem@davemloft.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Sat, Sep 20, 2008 at 12:21:37AM -0700, David Miller wrote: ... > Let's look at what actually matters for cpu utilization. These > __qdisc_run() things are invoked in two situations where we might > block on the hw queue being stopped: > > 1) When feeding packets into the qdisc in dev_queue_xmit(). > > Guess what? We _know_ the queue this packet is going to > hit. > > The only new thing we can possible trigger and be interested > in at this specific point is if _this_ packet can be sent at > this time. > > And we can check that queue mapping after the qdisc_enqueue_root() > call, so that multiq aware qdiscs can have made their changes. > > 2) When waking up a queue. And here we should schedule the qdisc_run > _unconditionally_. > > If the queue was full, it is extremely likely that new packets > are bound for that device queue. There is no real savings to > be had by doing this peek/requeue/dequeue stuff. > > The cpu utilization savings exist for case #1 only, and we can > implement the bypass logic _perfectly_ as described above. > > For #2 there is nothing to check, just do it and see what comes > out of the qdisc. Right, unless __netif_schedule() wasn't done when waking up. I've thought about this because of another thread/patch around this problem, and got misled by dev_requeue_skb() scheduling. Now, I think this could be the main reason for this high load. Anyway, if we want to skip this check for #2 I think something like the patch below is needed. > I would suggest adding an skb pointer argument to qdisc_run(). > If it's NULL, unconditionally schedule __qdisc_run(). Else, > only schedule if the TX queue indicated by skb_queue_mapping() > is not stopped. > > dev_queue_xmit() will use the "pass the skb" case, but only if > qdisc_enqueue_root()'s return value doesn't indicate that there > is a potential drop. On potential drop, we'll pass NULL to > make sure we don't potentially reference a free'd SKB. > > The other case in net_tx_action() can always pass NULL to qdisc_run(). I'm not convinced this #1 is useful for us: this could be an skb #1000 in a queue; the tx status could change many times before this packet would be #1; why worry? This adds additional checks on the fast path for something which is unlikely even if this skb would be #1, but for any later skbs it's only a guess. IMHO, if we can't check for the next skb to be xmitted it's better to skip this test entirely (which seems to be safe with the patch below). Jarek P. ---------------> pkt_sched: dev_requeue_skb: Don't schedule if a queue is stopped Doing __netif_schedule() while requeuing because of a stopped tx queue and skipping such a test in qdisc_run() can cause a requeuing loop with high cpu use until the queue is awaken. Signed-off-by: Jarek Poplawski --- net/sched/sch_generic.c | 23 +++++++++++++++-------- 1 files changed, 15 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index ec0a083..bae2eb8 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -42,14 +42,17 @@ static inline int qdisc_qlen(struct Qdisc *q) return q->q.qlen; } -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q, + bool stopped) { if (unlikely(skb->next)) q->gso_skb = skb; else q->ops->requeue(skb, q); - __netif_schedule(q); + if (!stopped) + __netif_schedule(q); + return 0; } @@ -89,7 +92,7 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb, * some time. */ __get_cpu_var(netdev_rx_stat).cpu_collision++; - ret = dev_requeue_skb(skb, q); + ret = dev_requeue_skb(skb, q, false); } return ret; @@ -121,6 +124,7 @@ static inline int qdisc_restart(struct Qdisc *q) struct net_device *dev; spinlock_t *root_lock; struct sk_buff *skb; + bool stopped; /* Dequeue packet */ if (unlikely((skb = dequeue_skb(q)) == NULL)) @@ -135,9 +139,13 @@ static inline int qdisc_restart(struct Qdisc *q) txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb)); HARD_TX_LOCK(dev, txq, smp_processor_id()); - if (!netif_tx_queue_stopped(txq) && - !netif_tx_queue_frozen(txq)) + if (!netif_tx_queue_stopped(txq) && !netif_tx_queue_frozen(txq)) { ret = dev_hard_start_xmit(skb, dev, txq); + stopped = netif_tx_queue_stopped(txq) || + netif_tx_queue_frozen(txq); + } else { + stopped = true; + } HARD_TX_UNLOCK(dev, txq); spin_lock(root_lock); @@ -159,12 +167,11 @@ static inline int qdisc_restart(struct Qdisc *q) printk(KERN_WARNING "BUG %s code %d qlen %d\n", dev->name, ret, q->q.qlen); - ret = dev_requeue_skb(skb, q); + ret = dev_requeue_skb(skb, q, stopped); break; } - if (ret && (netif_tx_queue_stopped(txq) || - netif_tx_queue_frozen(txq))) + if (ret && stopped) ret = 0; return ret;