From patchwork Sun Nov 25 22:06:52 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 201575 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 869542C0082 for ; Mon, 26 Nov 2012 09:07:03 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753504Ab2KYWG5 (ORCPT ); Sun, 25 Nov 2012 17:06:57 -0500 Received: from merlin.infradead.org ([205.233.59.134]:45559 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753466Ab2KYWG4 (ORCPT ); Sun, 25 Nov 2012 17:06:56 -0500 Received: from shinybook.infradead.org ([2001:8b0:10b:1:e6ce:8fff:fe1f:f2c0]) by merlin.infradead.org with esmtpsa (Exim 4.76 #1 (Red Hat Linux)) id 1TckLm-0005gY-40; Sun, 25 Nov 2012 22:06:54 +0000 Message-ID: <1353881212.26346.303.camel@shinybook.infradead.org> Subject: [PATCH v2] atm: br2684: Fix excessive queue bloat From: David Woodhouse To: netdev@vger.kernel.org, John Crispin , Dave =?ISO-8859-1?Q?T=E4ht?= , "Chas Williams (CONTRACTOR)" Date: Sun, 25 Nov 2012 22:06:52 +0000 X-Mailer: Evolution 3.6.1 (3.6.1-1.fc18) Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org There's really no excuse for an additional wmem_default of buffering between the netdev queue and the ATM device. Two packets (one in-flight, and one ready to send) ought to be fine. It's not as if it should take long to get another from the netdev queue when we need it. If necessary we can make the queue space configurable later, but I don't think it's likely to be necessary. cf. commit 9d02daf754238adac48fa075ee79e7edd3d79ed3 (pppoatm: Fix excessive queue bloat) which did something very similar for PPPoATM. Note that there is a tremendously unlikely race condition which may result in qspace temporarily going negative. If a CPU running the br2684_pop() function goes off into the weeds for a long period of time after incrementing qspace to 1, but before calling netdev_wake_queue()... and another CPU ends up calling br2684_start_xmit() and *stopping* the queue again before the first CPU comes back, the netdev queue could end up being woken when qspace has already reached zero. An alternative approach to coping with this race would be to check in br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is simpler. It just warranted a mention of *why* we do it that way... Move the call to atmvcc->send() to happen *after* the accounting and potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters if the ->send() call suffers an immediate failure, because it'll call br2684_pop() with the offending skb before returning. We want that to happen *after* we've done the initial accounting for the packet in question. Also make it return an appropriate success/failure indication while we're at it. Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my network, with only a single PPPoE-over-BR2684 link running. And after setting txqueuelen on the nas0 interface to something low (5, in fact). Before the patch, we'd see about 15 packets being queued and a resulting latency of ~56ms being reached. After the patch, we see only about 8, which is fairly much what we expect. And a max latency of ~36ms. On this OpenWRT box, wmem_default is 163840. Signed-off-by: David Woodhouse Reviewed-by: Krzysztof Mazur --- [v2: Comment format fixed] net/atm/br2684.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/net/atm/br2684.c b/net/atm/br2684.c index 4819d315..8eb6fbe 100644 --- a/net/atm/br2684.c +++ b/net/atm/br2684.c @@ -74,6 +74,7 @@ struct br2684_vcc { struct br2684_filter filter; #endif /* CONFIG_ATM_BR2684_IPFILTER */ unsigned int copies_needed, copies_failed; + atomic_t qspace; }; struct br2684_dev { @@ -181,18 +182,15 @@ static struct notifier_block atm_dev_notifier = { static void br2684_pop(struct atm_vcc *vcc, struct sk_buff *skb) { struct br2684_vcc *brvcc = BR2684_VCC(vcc); - struct net_device *net_dev = skb->dev; - pr_debug("(vcc %p ; net_dev %p )\n", vcc, net_dev); + pr_debug("(vcc %p ; net_dev %p )\n", vcc, brvcc->device); brvcc->old_pop(vcc, skb); - if (!net_dev) - return; - - if (atm_may_send(vcc, 0)) - netif_wake_queue(net_dev); - + /* If the queue space just went up from zero, wake */ + if (atomic_inc_return(&brvcc->qspace) == 1) + netif_wake_queue(brvcc->device); } + /* * Send a packet out a particular vcc. Not to useful right now, but paves * the way for multiple vcc's per itf. Returns true if we can send, @@ -256,16 +254,19 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev, ATM_SKB(skb)->atm_options = atmvcc->atm_options; dev->stats.tx_packets++; dev->stats.tx_bytes += skb->len; - atmvcc->send(atmvcc, skb); - if (!atm_may_send(atmvcc, 0)) { + if (atomic_dec_return(&brvcc->qspace) < 1) { + /* No more please! */ netif_stop_queue(brvcc->device); - /*check for race with br2684_pop*/ - if (atm_may_send(atmvcc, 0)) - netif_start_queue(brvcc->device); + /* We might have raced with br2684_pop() */ + if (unlikely(atomic_read(&brvcc->qspace) > 0)) + netif_wake_queue(brvcc->device); } - return 1; + /* If this fails immediately, the skb will be freed and br2684_pop() + will wake the queue if appropriate. Just return an error so that + the stats are updated correctly */ + return !atmvcc->send(atmvcc, skb); } static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb, @@ -504,6 +505,13 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg) brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL); if (!brvcc) return -ENOMEM; + /* + * Allow two packets in the ATM queue. One actually being sent, and one + * for the ATM 'TX done' handler to send. It shouldn't take long to get + * the next one from the netdev queue, when we need it. More than that + * would be bufferbloat. + */ + atomic_set(&brvcc->qspace, 2); write_lock_irq(&devs_lock); net_dev = br2684_find_dev(&be.ifspec); if (net_dev == NULL) {