From patchwork Thu Sep 7 12:05:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael, Alice" X-Patchwork-Id: 811127 X-Patchwork-Delegate: jeffrey.t.kirsher@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=osuosl.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xpBRD2DpPz9sMN for ; Fri, 8 Sep 2017 06:11:04 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id DAC1330574; Thu, 7 Sep 2017 20:11:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H8DEOdwSPntb; Thu, 7 Sep 2017 20:11:01 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by silver.osuosl.org (Postfix) with ESMTP id E71FD3057B; Thu, 7 Sep 2017 20:11:01 +0000 (UTC) X-Original-To: intel-wired-lan@lists.osuosl.org Delivered-To: intel-wired-lan@lists.osuosl.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by ash.osuosl.org (Postfix) with ESMTP id ADFE21CEB4A for ; Thu, 7 Sep 2017 20:11:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id A5D5C85910 for ; Thu, 7 Sep 2017 20:11:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cIdBkYVHSziL for ; Thu, 7 Sep 2017 20:10:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 58F4987A44 for ; Thu, 7 Sep 2017 20:10:59 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP; 07 Sep 2017 13:10:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.42,360,1500966000"; d="scan'208"; a="1170126580" Received: from unknown (HELO localhost.jf.intel.com) ([10.166.16.121]) by orsmga001.jf.intel.com with ESMTP; 07 Sep 2017 13:10:58 -0700 From: Alice Michael To: alice.michael@intel.com, intel-wired-lan@lists.osuosl.org Date: Thu, 7 Sep 2017 08:05:51 -0400 Message-Id: <20170907120556.45699-6-alice.michael@intel.com> X-Mailer: git-send-email 2.9.4 In-Reply-To: <20170907120556.45699-1-alice.michael@intel.com> References: <20170907120556.45699-1-alice.michael@intel.com> Subject: [Intel-wired-lan] [next PATCH S80-V3 06/11] i40e/i40evf: bump tail only in multiples of 8 X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" From: Jacob Keller Hardware only fetches descriptors on cachelines of 8, essentially ignoring the lower 3 bits of the tail register. Thus, it is pointless to bump tail by an unaligned access as the hardware will ignore some of the new descriptors we allocated. Thus, it's ideal if we can ensure tail writes are always aligned to 8. At first, it seems like we'd already do this, since we allocate descriptors in batches which are a multiple of 8. Since we'd always increment by a multiple of 8, it seems like the value should always be aligned. However, this ignores allocation failures. If we fail to allocate a buffer, our tail register will become unaligned. Once it has become unaligned it will essentially be stuck unaligned until a buffer allocation happens to fail at the exact amount necessary to re-align it. We can do better, by simply rounding down the number of buffers we're about to allocate (cleaned_count) such that "next_to_clean + cleaned_count" is rounded to the nearest multiple of 8. We do this by calculating how far off that value is and subtracting it from the cleaned_count. This essentially defers allocation of buffers if they're going to be ignored by hardware anyways, and re-aligns our next_to_use and tail values after a failure to allocate a descriptor. This calculation ensures that we always align the tail writes in a way the hardware expects and don't unnecessarily allocate buffers which won't be fetched immediately. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 9 +++++++++ drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 0e9a910..94311e3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1372,6 +1372,15 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) union i40e_rx_desc *rx_desc; struct i40e_rx_buffer *bi; + /* Hardware only fetches new descriptors in cache lines of 8, + * essentially ignoring the lower 3 bits of the tail register. We want + * to ensure our tail writes are aligned to avoid unnecessary work. We + * can't simply round down the cleaned count, since we might fail to + * allocate some buffers. What we really want is to ensure that + * next_to_used + cleaned_count produces an aligned value. + */ + cleaned_count -= (ntu + cleaned_count) & 0x7; + /* do nothing if no valid netdev defined */ if (!rx_ring->netdev || !cleaned_count) return false; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index 4ae054d..212fc1f 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -711,6 +711,15 @@ bool i40evf_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) union i40e_rx_desc *rx_desc; struct i40e_rx_buffer *bi; + /* Hardware only fetches new descriptors in cache lines of 8, + * essentially ignoring the lower 3 bits of the tail register. We want + * to ensure our tail writes are aligned to avoid unnecessary work. We + * can't simply round down the cleaned count, since we might fail to + * allocate some buffers. What we really want is to ensure that + * next_to_used + cleaned_count produces an aligned value. + */ + cleaned_count -= (ntu + cleaned_count) & 0x7; + /* do nothing if no valid netdev defined */ if (!rx_ring->netdev || !cleaned_count) return false;