mbox series

[RFC,net-next,00/15] Simplify IPv4 route offload API

Message ID 20191002084103.12138-1-idosch@idosch.org
Headers show
Series Simplify IPv4 route offload API | expand

Message

Ido Schimmel Oct. 2, 2019, 8:40 a.m. UTC
From: Ido Schimmel <idosch@mellanox.com>

Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.

This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.

The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.

The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I'm familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.

The series also adds an "in hardware" indication to routes, in addition
to the offload indication we already have on nexthops today. Besides
being long overdue, the reason this is done in this series is that it
makes it possible to easily test the new FIB notification API over
netdevsim.

To ensure there is no degradation in route insertion rates, I used
Vincent Bernat's script [1][2] from [3] to inject 500,000 routes from an
MRT dump from a router with a full view. On a system with Intel(R)
Xeon(R) CPU D-1527 @ 2.20GHz I measured 8.184 seconds, averaged over 10
runs and saw no degradation compared to net-next from today.

Patchset overview:
Patches #1-#7 introduce the new FIB notifications
Patches #8-#9 convert listeners to make use of the new notifications
Patches #10-#14 add "in hardware" indication for IPv4 routes, including
a dummy FIB offload implementation in netdevsim
Patch #15 adds a selftest for the new FIB notifications API over
netdevsim

The series is based on Jiri's "devlink: allow devlink instances to
change network namespace" series [4]. The patches can be found here [5]
and patched iproute2 with the "in hardware" indication can be found here
[6].

IPv6 is next on my TODO list.

[1] https://github.com/vincentbernat/network-lab/blob/master/common/helpers/lab-routes-ipvX/insert-from-bgp
[2] https://gist.github.com/idosch/2eb96efe50eb5234d205e964f0814859
[3] https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-linux
[4] https://patchwork.ozlabs.org/cover/1162295/
[5] https://github.com/idosch/linux/tree/fib-notifier
[6] https://github.com/idosch/iproute2/tree/fib-notifier

Ido Schimmel (15):
  ipv4: Add temporary events to the FIB notification chain
  ipv4: Notify route after insertion to the routing table
  ipv4: Notify route if replacing currently offloaded one
  ipv4: Notify newly added route if should be offloaded
  ipv4: Handle route deletion notification
  ipv4: Handle route deletion notification during flush
  ipv4: Only Replay routes of interest to new listeners
  mlxsw: spectrum_router: Start using new IPv4 route notifications
  ipv4: Remove old route notifications and convert listeners
  ipv4: Replace route in list before notifying
  ipv4: Encapsulate function arguments in a struct
  ipv4: Add "in hardware" indication to routes
  mlxsw: spectrum_router: Mark routes as "in hardware"
  netdevsim: fib: Mark routes as "in hardware"
  selftests: netdevsim: Add test for route offload API

 .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |   4 -
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 152 ++-----
 drivers/net/ethernet/rocker/rocker_main.c     |   4 +-
 drivers/net/netdevsim/fib.c                   | 263 ++++++++++-
 include/net/ip_fib.h                          |   5 +
 include/uapi/linux/rtnetlink.h                |   1 +
 net/ipv4/fib_lookup.h                         |  18 +-
 net/ipv4/fib_semantics.c                      |  30 +-
 net/ipv4/fib_trie.c                           | 223 ++++++++--
 net/ipv4/route.c                              |  12 +-
 .../drivers/net/netdevsim/fib_notifier.sh     | 411 ++++++++++++++++++
 11 files changed, 938 insertions(+), 185 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh

Comments

Jiri Pirko Oct. 2, 2019, 6:17 p.m. UTC | #1
Wed, Oct 02, 2019 at 10:40:48AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>Today, whenever an IPv4 route is added or deleted a notification is sent
>in the FIB notification chain and it is up to offload drivers to decide
>if the route should be programmed to the hardware or not. This is not an
>easy task as in hardware routes are keyed by {prefix, prefix length,
>table id}, whereas the kernel can store multiple such routes that only
>differ in metric / TOS / nexthop info.
>
>This series makes sure that only routes that are actually used in the
>data path are notified to offload drivers. This greatly simplifies the
>work these drivers need to do, as they are now only concerned with
>programming the hardware and do not need to replicate the IPv4 route
>insertion logic and store multiple identical routes.
>
>The route that is notified is the first FIB alias in the FIB node with
>the given {prefix, prefix length, table ID}. In case the route is
>deleted and there is another route with the same key, a replace
>notification is emitted. Otherwise, a delete notification is emitted.
>
>The above means that in the case of multiple routes with the same key,
>but different TOS, only the route with the highest TOS is notified.
>While the kernel can route a packet based on its TOS, this is not
>supported by any hardware devices I'm familiar with. Moreover, this is
>not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
>drivers should therefore use the presence of a non-zero TOS as an
>indication to trap packets matching the route and let the kernel route
>them instead. mlxsw has been doing it for the past two years.
>
>The series also adds an "in hardware" indication to routes, in addition

I think this might be a separate patchset. I mean patch "ipv4: Replace
route in list before notifying" and above.


>to the offload indication we already have on nexthops today. Besides
>being long overdue, the reason this is done in this series is that it
>makes it possible to easily test the new FIB notification API over
>netdevsim.
>
>To ensure there is no degradation in route insertion rates, I used
>Vincent Bernat's script [1][2] from [3] to inject 500,000 routes from an
>MRT dump from a router with a full view. On a system with Intel(R)
>Xeon(R) CPU D-1527 @ 2.20GHz I measured 8.184 seconds, averaged over 10
>runs and saw no degradation compared to net-next from today.
>
>Patchset overview:
>Patches #1-#7 introduce the new FIB notifications
>Patches #8-#9 convert listeners to make use of the new notifications
>Patches #10-#14 add "in hardware" indication for IPv4 routes, including
>a dummy FIB offload implementation in netdevsim
>Patch #15 adds a selftest for the new FIB notifications API over
>netdevsim
>
>The series is based on Jiri's "devlink: allow devlink instances to
>change network namespace" series [4]. The patches can be found here [5]
>and patched iproute2 with the "in hardware" indication can be found here
>[6].
>
>IPv6 is next on my TODO list.
>
>[1] https://github.com/vincentbernat/network-lab/blob/master/common/helpers/lab-routes-ipvX/insert-from-bgp
>[2] https://gist.github.com/idosch/2eb96efe50eb5234d205e964f0814859
>[3] https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-linux
>[4] https://patchwork.ozlabs.org/cover/1162295/
>[5] https://github.com/idosch/linux/tree/fib-notifier
>[6] https://github.com/idosch/iproute2/tree/fib-notifier
>
>Ido Schimmel (15):
>  ipv4: Add temporary events to the FIB notification chain
>  ipv4: Notify route after insertion to the routing table
>  ipv4: Notify route if replacing currently offloaded one
>  ipv4: Notify newly added route if should be offloaded
>  ipv4: Handle route deletion notification
>  ipv4: Handle route deletion notification during flush
>  ipv4: Only Replay routes of interest to new listeners
>  mlxsw: spectrum_router: Start using new IPv4 route notifications
>  ipv4: Remove old route notifications and convert listeners
>  ipv4: Replace route in list before notifying
>  ipv4: Encapsulate function arguments in a struct
>  ipv4: Add "in hardware" indication to routes
>  mlxsw: spectrum_router: Mark routes as "in hardware"
>  netdevsim: fib: Mark routes as "in hardware"
>  selftests: netdevsim: Add test for route offload API
>
> .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |   4 -
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 152 ++-----
> drivers/net/ethernet/rocker/rocker_main.c     |   4 +-
> drivers/net/netdevsim/fib.c                   | 263 ++++++++++-
> include/net/ip_fib.h                          |   5 +
> include/uapi/linux/rtnetlink.h                |   1 +
> net/ipv4/fib_lookup.h                         |  18 +-
> net/ipv4/fib_semantics.c                      |  30 +-
> net/ipv4/fib_trie.c                           | 223 ++++++++--
> net/ipv4/route.c                              |  12 +-
> .../drivers/net/netdevsim/fib_notifier.sh     | 411 ++++++++++++++++++
> 11 files changed, 938 insertions(+), 185 deletions(-)
> create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh
>
>-- 
>2.21.0
>
Ido Schimmel Oct. 3, 2019, 5:18 a.m. UTC | #2
On Wed, Oct 02, 2019 at 08:17:59PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2019 at 10:40:48AM CEST, idosch@idosch.org wrote:
> >From: Ido Schimmel <idosch@mellanox.com>
> >
> >Today, whenever an IPv4 route is added or deleted a notification is sent
> >in the FIB notification chain and it is up to offload drivers to decide
> >if the route should be programmed to the hardware or not. This is not an
> >easy task as in hardware routes are keyed by {prefix, prefix length,
> >table id}, whereas the kernel can store multiple such routes that only
> >differ in metric / TOS / nexthop info.
> >
> >This series makes sure that only routes that are actually used in the
> >data path are notified to offload drivers. This greatly simplifies the
> >work these drivers need to do, as they are now only concerned with
> >programming the hardware and do not need to replicate the IPv4 route
> >insertion logic and store multiple identical routes.
> >
> >The route that is notified is the first FIB alias in the FIB node with
> >the given {prefix, prefix length, table ID}. In case the route is
> >deleted and there is another route with the same key, a replace
> >notification is emitted. Otherwise, a delete notification is emitted.
> >
> >The above means that in the case of multiple routes with the same key,
> >but different TOS, only the route with the highest TOS is notified.
> >While the kernel can route a packet based on its TOS, this is not
> >supported by any hardware devices I'm familiar with. Moreover, this is
> >not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
> >drivers should therefore use the presence of a non-zero TOS as an
> >indication to trap packets matching the route and let the kernel route
> >them instead. mlxsw has been doing it for the past two years.
> >
> >The series also adds an "in hardware" indication to routes, in addition
> 
> I think this might be a separate patchset. I mean patch "ipv4: Replace
> route in list before notifying" and above.

OK. I mainly wanted to have it together in order to submit the tests
with the patchset itself. I can split it into two.