mbox

[pull,request,net-next,00/11] Mellanox, mlx5 tc flow handling for concurrent execution (Part 3/3)

Message ID 20190821232806.21847-1-saeedm@mellanox.com
State Accepted
Delegated to: David Miller
Headers show

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-08-21

Message

Saeed Mahameed Aug. 21, 2019, 11:28 p.m. UTC
Hi Dave,

This series, mostly from Vlad, is the 3rd and last part of 3 part series
to improve mlx5 tc flow handling by removing dependency on rtnl_lock and
providing a more fine-grained locking and rcu safe data structures to
allow tc flow handling for concurrent execution.

2) In this part Vlad handles mlx5e neigh offloads for concurrent
execution.

2) Vlad with Dmytro's help, They add 3 new mlx5 tracepoints to track mlx5
 tc flower requests and neigh updates.

3) Added mlx5 documentation for the new tracepoints.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---
The following changes since commit 2b9b5e74507fe8e6146b048c0dadbe2fe7b298e5:

  net: stmmac: dwc-qos: use devm_platform_ioremap_resource() to simplify code (2019-08-21 13:52:34 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-08-21

for you to fetch changes up to 5970882a2510e8bffaef518a82ea207798187a93:

  net/mlx5e: Add trace point for neigh update (2019-08-21 15:55:18 -0700)

----------------------------------------------------------------
mlx5 tc flow handling for concurrent execution (Part 3)

This series includes updates to mlx5 ethernet and core driver:

Vlad submits part 3 of 3 part series to allow TC flow handling
for concurrent execution.

Vlad says:
==========

Structure mlx5e_neigh_hash_entry code that uses it are refactored in
following ways:

- Extend neigh_hash_entry with rcu and modify its users to always take
  reference to the structure when using it (neigh_hash_entry has already
  had atomic reference counter which was only used when scheduling neigh
  update on workqueue from atomic context of neigh update netevent).

- Always use mlx5e_neigh_update_table->encap_lock when modifying neigh
  update hash table and list. Originally, this lock was only used to
  synchronize with netevent handler function, which is called from bh
  context and cannot use rtnl lock for synchronization. Use rcu read lock
  instead of encap_lock to lookup nhe in atomic context of netevent even
  handler function. Convert encap_lock to mutex to allow creating new
  neigh hash entries while holding it, which is safe to do because the
  lock is no longer used in atomic context.

- Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on
  encap list to their rcu counterparts and extending encap structure
  with rcu_head to free the encap instances after rcu grace period. This
  allows fast traversal of list of encaps attached to nhe under rcu read
  lock protection.

- Take encap_table_lock when accessing encap entries in neigh update and
  neigh stats update code to protect from concurrent encap entry
  insertion or removal.

This approach leads to potential race condition when neigh update and
neigh stats update code can access encap and flow entries that are not
fully initialized or are being destroyed, or neigh can change state
without updating encaps that are created concurrently. Prevent these
issues by following changes in flow and encap initialization:

- Extend mlx5e_tc_flow with 'init_done' completion. Modify neigh update
  to wait for both encap and flow completions to prevent concurrent
  access to a structure that is being initialized by tc.

- Skip structures that failed during initialization: encaps with
  encap_id<0 and flows that don't have OFFLOADED flag set.

- To ensure that no new flows are added to encap when it is being
  accessed by neigh update or neigh stats update, take encap_table_lock
  mutex.

- To prevent concurrent deletion by tc, ensure that neigh update and
  neigh stats update hold references to encap and flow instances while
  using them.

With changes presented in this patch set it is now safe to execute tc
concurrently with neigh update and neigh stats update. However, these
two workqueue tasks modify same flow "tmp_list" field to store flows
with reference taken in temporary list to release the references after
update operation finishes and should not be executed concurrently with
each other.

Last 3 patches of this series provide 3 new mlx5 trace points to track
mlx5 tc requests and mlx5 neigh updates.

----------------------------------------------------------------
Dmytro Linkin (1):
      net/mlx5e: Add tc flower tracepoints

Vlad Buslov (10):
      net/mlx5e: Extract code that queues neigh update work into function
      net/mlx5e: Always take reference to neigh entry
      net/mlx5e: Extend neigh hash entry with rcu
      net/mlx5e: Refactor mlx5e_neigh_update_table->encap_lock
      net/mlx5e: Protect neigh hash encap list with spinlock and rcu
      net/mlx5e: Refactor neigh used value update for concurrent execution
      net/mlx5e: Refactor neigh update for concurrent execution
      net/mlx5e: Only access fully initialized flows in neigh update
      net/mlx5e: Add trace point for neigh used value update
      net/mlx5e: Add trace point for neigh update

 .../networking/device_drivers/mellanox/mlx5.rst    |  46 +++++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 .../mellanox/mlx5/core/diag/en_rep_tracepoint.h    |  54 +++++
 .../mellanox/mlx5/core/diag/en_tc_tracepoint.c     |  58 ++++++
 .../mellanox/mlx5/core/diag/en_tc_tracepoint.h     | 114 +++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 224 +++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 152 ++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |   9 +-
 include/net/flow_offload.h                         |   1 +
 10 files changed, 545 insertions(+), 126 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/en_rep_tracepoint.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/en_tc_tracepoint.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/en_tc_tracepoint.h

Comments

David Miller Aug. 22, 2019, 3:23 a.m. UTC | #1
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed, 21 Aug 2019 23:28:31 +0000

> This series, mostly from Vlad, is the 3rd and last part of 3 part series
> to improve mlx5 tc flow handling by removing dependency on rtnl_lock and
> providing a more fine-grained locking and rcu safe data structures to
> allow tc flow handling for concurrent execution.
> 
> 2) In this part Vlad handles mlx5e neigh offloads for concurrent
> execution.
> 
> 2) Vlad with Dmytro's help, They add 3 new mlx5 tracepoints to track mlx5
>  tc flower requests and neigh updates.
> 
> 3) Added mlx5 documentation for the new tracepoints.
> 
> For more information please see tag log below.
> 
> Please pull and let me know if there is any problem.

I reviewed this a few times, looks good.

Pulled, thanks.