mbox series

[net-next,V3,0/3] Introduce adaptive TX interrupt moderation to net DIM

Message ID 1524566163-41563-1-git-send-email-talgi@mellanox.com
Headers show
Series Introduce adaptive TX interrupt moderation to net DIM | expand

Message

Tal Gilboa April 24, 2018, 10:36 a.m. UTC
Net DIM is a library designed for dynamic interrupt moderation. It was
implemented and optimized with receive side interrupts in mind, since these
are usually the CPU expensive ones. This patch-set introduces adaptive transmit
interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
Furthermore, it is essential for increasing bandwidth on cases where payload
aggregation is required.

v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
adding "enabled" field from struct net_dim and applied mlx5e structural
suggestions (suggested by SaeedM).

v2: Rebase over proper tree.

v1: Fix compilation issues due to missed function renaming.

Tal Gilboa (3):
  net/dim: Rename *_get_profile() functions to *_get_rx_moderation()
  net/dim: Support adaptive TX moderation
  net/mlx5e: Enable adaptive-TX moderation

 drivers/net/ethernet/broadcom/bcmsysport.c         |  6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |  8 +--
 drivers/net/ethernet/broadcom/genet/bcmgenet.c     |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   | 28 ++++++--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 35 +++++++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 79 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  | 37 +++++++---
 include/linux/net_dim.h                            | 69 ++++++++++++++-----
 9 files changed, 190 insertions(+), 82 deletions(-)

Comments

David Miller April 24, 2018, 2:18 p.m. UTC | #1
From: Tal Gilboa <talgi@mellanox.com>
Date: Tue, 24 Apr 2018 13:36:00 +0300

> Net DIM is a library designed for dynamic interrupt moderation. It was
> implemented and optimized with receive side interrupts in mind, since these
> are usually the CPU expensive ones. This patch-set introduces adaptive transmit
> interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
> Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
> Furthermore, it is essential for increasing bandwidth on cases where payload
> aggregation is required.
> 
> v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
> adding "enabled" field from struct net_dim and applied mlx5e structural
> suggestions (suggested by SaeedM).
> 
> v2: Rebase over proper tree.
> 
> v1: Fix compilation issues due to missed function renaming.

I have no problem with this, series applied, thanks.

Although I have to say that I've always been suspicious of adaptive moderation
schemes, especially if implemented in software.

My thinking was that at these kinds of link speeds, the conditions of the link
change so fast that whatever state you've measured changes by the time you
commit new settings to the chip.

It obviously helps, so I must be missing some piece of the puzzle in my mental
analysis :-)
Andy Gospodarek April 24, 2018, 3:02 p.m. UTC | #2
On Tue, Apr 24, 2018 at 10:18:09AM -0400, David Miller wrote:
> From: Tal Gilboa <talgi@mellanox.com>
> Date: Tue, 24 Apr 2018 13:36:00 +0300
> 
> > Net DIM is a library designed for dynamic interrupt moderation. It was
> > implemented and optimized with receive side interrupts in mind, since these
> > are usually the CPU expensive ones. This patch-set introduces adaptive transmit
> > interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
> > Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
> > Furthermore, it is essential for increasing bandwidth on cases where payload
> > aggregation is required.
> > 
> > v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
> > adding "enabled" field from struct net_dim and applied mlx5e structural
> > suggestions (suggested by SaeedM).
> > 
> > v2: Rebase over proper tree.
> > 
> > v1: Fix compilation issues due to missed function renaming.
> 
> I have no problem with this, series applied, thanks.
> 
> Although I have to say that I've always been suspicious of adaptive moderation
> schemes, especially if implemented in software.
> 
> My thinking was that at these kinds of link speeds, the conditions of the link
> change so fast that whatever state you've measured changes by the time you
> commit new settings to the chip.
> 
> It obviously helps, so I must be missing some piece of the puzzle in my mental
> analysis :-)

You are definitely correct that there are many cases where sessions are
so short that by the time a measurement is made and modified conditions
can change.

What I found when adding this to the bnxt_en driver was that for longer
running sessions/transfers (flows lasting secs not msecs) that the
adjustment can happen pretty quickly and you get a nice reduction in CPU
utilization during the duration of that transfer.

There is also an advantage that since this is done a per queue basis one
queue that may be handling a bulk transfer can have its coalescing
parameters adjusted while others stay at a setting that keeps traffic
flowing at low latency.  This is helpful when a system is receiving a
large amount of traffic on one queue but also sending data on another
queue and quick processing of acks keeps data flowing at high rate with
low CPU utilization in both directions.
David Miller April 24, 2018, 3:08 p.m. UTC | #3
From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Date: Tue, 24 Apr 2018 11:02:46 -0400

> There is also an advantage that since this is done a per queue basis one
> queue that may be handling a bulk transfer can have its coalescing
> parameters adjusted while others stay at a setting that keeps traffic
> flowing at low latency.  This is helpful when a system is receiving a
> large amount of traffic on one queue but also sending data on another
> queue and quick processing of acks keeps data flowing at high rate with
> low CPU utilization in both directions.

Ok, that's the missing piece on my end.  My original analysis of this
problem space was on uni-queue NICs, and the problem there is that the
sampling algorithm is exposed to the traffic anomalies of the entire
link rather than a specific sub-class of traffic as is the case with
multiqueue NICs.