[net-next,0/9] Simplify IPv4 route offload API
mbox series

Message ID 20191210172402.463397-1-idosch@idosch.org
Headers show
Series
  • Simplify IPv4 route offload API
Related show

Message

Ido Schimmel Dec. 10, 2019, 5:23 p.m. UTC
From: Ido Schimmel <idosch@mellanox.com>

Motivation
==========

The aim of this patch set is to simplify the IPv4 route offload API by
making the stack a bit smarter about the notifications it is generating.
This allows driver authors to focus on programming the underlying device
instead of having to duplicate the IPv4 route insertion logic in their
driver, which is error-prone.

This is the first patch set out of a series of four. Subsequent patch
sets will simplify the IPv6 API, add offload/trap indication to routes
and add tests for all the code paths (including error paths). Available
here [1].

Details
=======

Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.

This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.

The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.

The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I am familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.

Testing
=======

To ensure there is no degradation in route insertion rates, I averaged
the insertion rate of 512k routes (/24 and /32) over 50 runs. Did not
observe any degradation.

Functional tests are available here [1]. They rely on route trap
indication, which is only added in the last patch set.

In addition, I have been running syzkaller for the past week with all
four patch sets and debug options enabled. Did not observe any problems.

Patch set overview
==================

Patches #1-#7 gradually introduce the new FIB notifications
Patch #8 converts mlxsw to use the new notifications
Patch #9 converts the remaining listeners and removes the old
notifications

RFC: https://patchwork.ozlabs.org/cover/1170530/

[1] https://github.com/idosch/linux/tree/fib-notifier

Ido Schimmel (9):
  net: fib_notifier: Add temporary events to the FIB notification chain
  ipv4: Notify route after insertion to the routing table
  ipv4: Notify route if replacing currently offloaded one
  ipv4: Notify newly added route if should be offloaded
  ipv4: Handle route deletion notification
  ipv4: Handle route deletion notification during flush
  ipv4: Only Replay routes of interest to new listeners
  mlxsw: spectrum_router: Start using new IPv4 route notifications
  ipv4: Remove old route notifications and convert listeners

 .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |   4 -
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 136 +++---------------
 drivers/net/ethernet/rocker/rocker_main.c     |   4 +-
 drivers/net/netdevsim/fib.c                   |   4 +-
 net/ipv4/fib_trie.c                           | 131 ++++++++++++-----
 5 files changed, 117 insertions(+), 162 deletions(-)