From patchwork Fri Dec 16 04:27:56 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 131779 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5D16A1007D4 for ; Fri, 16 Dec 2011 15:28:08 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758761Ab1LPE2B (ORCPT ); Thu, 15 Dec 2011 23:28:01 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:44157 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754042Ab1LPE2A (ORCPT ); Thu, 15 Dec 2011 23:28:00 -0500 Received: by werm1 with SMTP id m1so274785wer.19 for ; Thu, 15 Dec 2011 20:27:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:x-mailer:content-transfer-encoding:mime-version; bh=hq08VWIF3odXEcVDW6uLN6aYjEnlXg3myhsoQ3baq8o=; b=s8U2WfiQjUVD0boZXG16oPsiGD/KTzzSwMn0EJFPwfhILF5Lk3egQ3+TcXZOUvzHrA rbyWX8pxCTrK6zIqATqmGvNQnFitEVXg1H/QNHSdEv3LjBKdrfiq36fkKVxdjkBfPt/n sR4N9uUjbeSJXgLhvZPWCayckufR9juKTbJys= Received: by 10.216.139.140 with SMTP id c12mr2598963wej.26.1324009679258; Thu, 15 Dec 2011 20:27:59 -0800 (PST) Received: from [192.168.1.21] (94.144.72.86.rev.sfr.net. [86.72.144.94]) by mx.google.com with ESMTPS id di5sm11425146wib.3.2011.12.15.20.27.57 (version=SSLv3 cipher=OTHER); Thu, 15 Dec 2011 20:27:58 -0800 (PST) Message-ID: <1324009676.2562.9.camel@edumazet-laptop> Subject: Re: twice past the taps, thence out to net? From: Eric Dumazet To: Rick Jones Cc: Stephen Hemminger , Vijay Subramanian , tcpdump-workers@lists.tcpdump.org, netdev@vger.kernel.org, Matthew Vick , Jeff Kirsher Date: Fri, 16 Dec 2011 05:27:56 +0100 In-Reply-To: <4EEA730E.5010405@hp.com> References: <4EE8F884.1010304@hp.com> <1323970998.2769.18.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <4EEA3D58.5010101@hp.com> <20111215104440.1eef9e47@s6510.linuxnetplumber.net> <1323975606.2769.24.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <4EEA730E.5010405@hp.com> X-Mailer: Evolution 3.2.1- Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le jeudi 15 décembre 2011 à 14:22 -0800, Rick Jones a écrit : > On 12/15/2011 11:00 AM, Eric Dumazet wrote: > >> Device's work better if the driver proactively manages stop_queue/wake_queue. > >> Old devices used TX_BUSY, but newer devices tend to manage the queue > >> themselves. > >> > > > > Some 'new' drivers like igb can be fooled in case skb is gso segmented ? > > > > Because igb_xmit_frame_ring() needs skb_shinfo(skb)->nr_frags + 4 > > descriptors, igb should stop its queue not at MAX_SKB_FRAGS + 4, but > > MAX_SKB_FRAGS*4 > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c > > index 89d576c..989da36 100644 > > --- a/drivers/net/ethernet/intel/igb/igb_main.c > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > > @@ -4370,7 +4370,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb, > > igb_tx_map(tx_ring, first, hdr_len); > > > > /* Make sure there is space in the ring for the next send. */ > > - igb_maybe_stop_tx(tx_ring, MAX_SKB_FRAGS + 4); > > + igb_maybe_stop_tx(tx_ring, MAX_SKB_FRAGS * 4); > > > > return NETDEV_TX_OK; > > > Is there a minimum transmit queue length here? I get the impression > that MAX_SKB_FRAGS is at least 16 and is 18 on a system with 4096 byte > pages. The previous addition then would be OK so long as the TX queue > was always at least 22 entries in size, but now it would have to always > be at least 72? > > I guess things are "OK" at the moment: > > raj@tardy:~/net-next/drivers/net/ethernet/intel/igb$ grep IGB_MIN_TXD *.[ch] > igb_ethtool.c: new_tx_count = max_t(u16, new_tx_count, IGB_MIN_TXD); > igb.h:#define IGB_MIN_TXD 80 > > but is that getting a little close? > > rick jones Sure ! I only pointed out a possible problem, and not gave a full patch, since we also need to change the opposite threshold (when we XON the queue at TX completion) You can see its not even consistent with the minimum for a single TSO frame ! Most probably your high requeue numbers come from this too low value given the real requirements of the hardware (4 + nr_frags descriptors per skb) /* How many Tx Descriptors do we need to call netif_wake_queue ? */ #define IGB_TX_QUEUE_WAKE 16 Maybe we should CC Intel guys Could you try following patch ? Thanks ! --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index c69feeb..93ce118 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -51,8 +51,8 @@ struct igb_adapter; /* TX/RX descriptor defines */ #define IGB_DEFAULT_TXD 256 #define IGB_DEFAULT_TX_WORK 128 -#define IGB_MIN_TXD 80 -#define IGB_MAX_TXD 4096 +#define IGB_MIN_TXD max_t(unsigned, 80U, IGB_TX_QUEUE_WAKE * 2) +#define IGB_MAX_TXD 4096 #define IGB_DEFAULT_RXD 256 #define IGB_MIN_RXD 80 @@ -121,8 +121,11 @@ struct vf_data_storage { #define IGB_RXBUFFER_16384 16384 #define IGB_RX_HDR_LEN IGB_RXBUFFER_512 -/* How many Tx Descriptors do we need to call netif_wake_queue ? */ -#define IGB_TX_QUEUE_WAKE 16 +/* How many Tx Descriptors should be available + * before calling netif_wake_subqueue() ? + */ +#define IGB_TX_QUEUE_WAKE (MAX_SKB_FRAGS * 4) + /* How many Rx Buffers do we bundle into one write to the hardware ? */ #define IGB_RX_BUFFER_WRITE 16 /* Must be power of 2 */