[net-next,V3,0/3] Introduce adaptive TX interrupt moderation to net DIM

Message ID	1524566163-41563-1-git-send-email-talgi@mellanox.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Tal Gilboa <talgi@mellanox.com> To: David Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org, Tariq Toukan <tariqt@mellanox.com>, Tal Gilboa <talgi@mellanox.com>, Saeed Mahameed <saeedm@mellanox.com>, Florian Fainelli <f.fainelli@gmail.com>, Andy Gospodarek <andrew.gospodarek@broadcom.com> Subject: [PATCH net-next V3 0/3] Introduce adaptive TX interrupt moderation to net DIM Date: Tue, 24 Apr 2018 13:36:00 +0300 Message-Id: <1524566163-41563-1-git-send-email-talgi@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	Introduce adaptive TX interrupt moderation to net DIM \| expand [net-next,V3,0/3] Introduce adaptive TX interrupt moderation to net DIM [net-next,V3,1/3] net/dim: Rename _get_profile() functions to _get_rx_moderation() [net-next,V3,2/3] net/dim: Support adaptive TX moderation [net-next,V3,3/3] net/mlx5e: Enable adaptive-TX moderation

Message ID

1524566163-41563-1-git-send-email-talgi@mellanox.com

Headers

From: Tal Gilboa <talgi@mellanox.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Tariq Toukan <tariqt@mellanox.com>,
	Tal Gilboa <talgi@mellanox.com>, Saeed Mahameed <saeedm@mellanox.com>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Andy Gospodarek <andrew.gospodarek@broadcom.com>
Subject: [PATCH net-next V3 0/3] Introduce adaptive TX interrupt moderation
	to net DIM
Date: Tue, 24 Apr 2018 13:36:00 +0300
Message-Id: <1524566163-41563-1-git-send-email-talgi@mellanox.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

Introduce adaptive TX interrupt moderation to net DIM | expand

Message

Tal Gilboa April 24, 2018, 10:36 a.m. UTC

Net DIM is a library designed for dynamic interrupt moderation. It was
implemented and optimized with receive side interrupts in mind, since these
are usually the CPU expensive ones. This patch-set introduces adaptive transmit
interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
Furthermore, it is essential for increasing bandwidth on cases where payload
aggregation is required.

v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
adding "enabled" field from struct net_dim and applied mlx5e structural
suggestions (suggested by SaeedM).

v2: Rebase over proper tree.

v1: Fix compilation issues due to missed function renaming.

Tal Gilboa (3):
  net/dim: Rename *_get_profile() functions to *_get_rx_moderation()
  net/dim: Support adaptive TX moderation
  net/mlx5e: Enable adaptive-TX moderation

 drivers/net/ethernet/broadcom/bcmsysport.c         |  6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |  8 +--
 drivers/net/ethernet/broadcom/genet/bcmgenet.c     |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   | 28 ++++++--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 35 +++++++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 79 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  | 37 +++++++---
 include/linux/net_dim.h                            | 69 ++++++++++++++-----
 9 files changed, 190 insertions(+), 82 deletions(-)

Comments

David Miller April 24, 2018, 2:18 p.m. UTC | #1

From: Tal Gilboa <talgi@mellanox.com>
Date: Tue, 24 Apr 2018 13:36:00 +0300

> Net DIM is a library designed for dynamic interrupt moderation. It was
> implemented and optimized with receive side interrupts in mind, since these
> are usually the CPU expensive ones. This patch-set introduces adaptive transmit
> interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
> Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
> Furthermore, it is essential for increasing bandwidth on cases where payload
> aggregation is required.
> 
> v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
> adding "enabled" field from struct net_dim and applied mlx5e structural
> suggestions (suggested by SaeedM).
> 
> v2: Rebase over proper tree.
> 
> v1: Fix compilation issues due to missed function renaming.

I have no problem with this, series applied, thanks.

Although I have to say that I've always been suspicious of adaptive moderation
schemes, especially if implemented in software.

My thinking was that at these kinds of link speeds, the conditions of the link
change so fast that whatever state you've measured changes by the time you
commit new settings to the chip.

It obviously helps, so I must be missing some piece of the puzzle in my mental
analysis :-)

Andy Gospodarek April 24, 2018, 3:02 p.m. UTC | #2

On Tue, Apr 24, 2018 at 10:18:09AM -0400, David Miller wrote:
> From: Tal Gilboa <talgi@mellanox.com>
> Date: Tue, 24 Apr 2018 13:36:00 +0300
> 
> > Net DIM is a library designed for dynamic interrupt moderation. It was
> > implemented and optimized with receive side interrupts in mind, since these
> > are usually the CPU expensive ones. This patch-set introduces adaptive transmit
> > interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
> > Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
> > Furthermore, it is essential for increasing bandwidth on cases where payload
> > aggregation is required.
> > 
> > v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
> > adding "enabled" field from struct net_dim and applied mlx5e structural
> > suggestions (suggested by SaeedM).
> > 
> > v2: Rebase over proper tree.
> > 
> > v1: Fix compilation issues due to missed function renaming.
> 
> I have no problem with this, series applied, thanks.
> 
> Although I have to say that I've always been suspicious of adaptive moderation
> schemes, especially if implemented in software.
> 
> My thinking was that at these kinds of link speeds, the conditions of the link
> change so fast that whatever state you've measured changes by the time you
> commit new settings to the chip.
> 
> It obviously helps, so I must be missing some piece of the puzzle in my mental
> analysis :-)

You are definitely correct that there are many cases where sessions are
so short that by the time a measurement is made and modified conditions
can change.

What I found when adding this to the bnxt_en driver was that for longer
running sessions/transfers (flows lasting secs not msecs) that the
adjustment can happen pretty quickly and you get a nice reduction in CPU
utilization during the duration of that transfer.

There is also an advantage that since this is done a per queue basis one
queue that may be handling a bulk transfer can have its coalescing
parameters adjusted while others stay at a setting that keeps traffic
flowing at low latency.  This is helpful when a system is receiving a
large amount of traffic on one queue but also sending data on another
queue and quick processing of acks keeps data flowing at high rate with
low CPU utilization in both directions.

David Miller April 24, 2018, 3:08 p.m. UTC | #3

From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Date: Tue, 24 Apr 2018 11:02:46 -0400

> There is also an advantage that since this is done a per queue basis one
> queue that may be handling a bulk transfer can have its coalescing
> parameters adjusted while others stay at a setting that keeps traffic
> flowing at low latency.  This is helpful when a system is receiving a
> large amount of traffic on one queue but also sending data on another
> queue and quick processing of acks keeps data flowing at high rate with
> low CPU utilization in both directions.

Ok, that's the missing piece on my end.  My original analysis of this
problem space was on uni-queue NICs, and the problem there is that the
sampling algorithm is exposed to the traffic anomalies of the entire
link rather than a specific sub-class of traffic as is the case with
multiqueue NICs.