mbox series

[net-next,00/18] ipv6: Align nexthop behaviour with IPv4

Message ID 20180107104518.31693-1-idosch@mellanox.com
Headers show
Series ipv6: Align nexthop behaviour with IPv4 | expand

Message

Ido Schimmel Jan. 7, 2018, 10:45 a.m. UTC
This set tries to eliminate some differences between IPv4's and IPv6's
treatment of nexthops. These differences are most likely a side effect
of IPv6's data structures (specifically 'rt6_info') that incorporate
both the route and the nexthop and the late addition of ECMP support in
commit 51ebd3181572 ("ipv6: add support of equal cost multipath
(ECMP)").

IPv4 and IPv6 do not react the same to certain netdev events. For
example, upon carrier change affected IPv4 nexthops are marked using the
RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
IPv6 on the other hand, does nothing which forces us to perform a
carrier check during route lookup and dump. This makes it difficult to
introduce features such as non-equal-cost multipath that are built on
top of this set [1].

In addition, when a netdev is put administratively down IPv4 nexthops
are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
the routes using these nexthops. To be consistent with IPv4, multipath
routes should only be flushed when all nexthops in the group are
considered dead.

The first 12 patches introduce non-functional changes that store the
RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
events, in a similar fashion to IPv4. This allows us to remove the
carrier check performed during route lookup and dump.

The next three patches make sure we only flush a multipath route when
all of its nexthops are dead.

Last three patches add test cases for IPv4/IPv6 FIB. These verify that
both address families react similarly to netdev events.

Finally, this series also serves as a good first step towards David
Ahern's goal of treating nexthops as standalone objects [2], as it makes
the code more in line with IPv4 where the nexthop and the nexthop group
are separate objects from the route itself.

1. https://github.com/idosch/linux/tree/ipv6-nexthops
2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf

Changes since RFC (feedback from David Ahern):
* Remove redundant declaration of rt6_ifdown() in patch 4 and adjust
comment referencing it accordingly
* Drop patch to flush multipath routes upon NETDEV_UNREGISTER. Reword
cover letter accordingly
* Use a temporary variable to make code more readable in patch 15

Ido Schimmel (18):
  ipv6: Remove redundant route flushing during namespace dismantle
  ipv6: Mark dead nexthops with appropriate flags
  ipv6: Clear nexthop flags upon netdev up
  ipv6: Prepare to handle multiple netdev events
  ipv6: Set nexthop flags upon carrier change
  ipv6: Set nexthop flags during route creation
  ipv6: Check nexthop flags during route lookup instead of carrier
  ipv6: Check nexthop flags in route dump instead of carrier
  ipv6: Ignore dead routes during lookup
  ipv6: Report dead flag during route dump
  ipv6: Add explicit flush indication to routes
  ipv6: Teach tree walker to skip multipath routes
  ipv6: Export sernum update function
  ipv6: Take table lock outside of sernum update function
  ipv6: Flush multipath routes when all siblings are dead
  selftests: fib_tests: Add test cases for IPv4/IPv6 FIB
  selftests: fib_tests: Add test cases for netdev down
  selftests: fib_tests: Add test cases for netdev carrier change

 include/net/ip6_fib.h                    |   4 +-
 include/net/ip6_route.h                  |   4 +-
 net/ipv6/addrconf.c                      |   9 +-
 net/ipv6/ip6_fib.c                       |  28 +-
 net/ipv6/route.c                         | 185 +++++++++++--
 tools/testing/selftests/net/Makefile     |   1 +
 tools/testing/selftests/net/fib_tests.sh | 429 +++++++++++++++++++++++++++++++
 7 files changed, 618 insertions(+), 42 deletions(-)
 create mode 100755 tools/testing/selftests/net/fib_tests.sh

Comments

David Ahern Jan. 7, 2018, 5:20 p.m. UTC | #1
On 1/7/18 3:45 AM, Ido Schimmel wrote:
> This set tries to eliminate some differences between IPv4's and IPv6's
> treatment of nexthops. These differences are most likely a side effect
> of IPv6's data structures (specifically 'rt6_info') that incorporate
> both the route and the nexthop and the late addition of ECMP support in
> commit 51ebd3181572 ("ipv6: add support of equal cost multipath
> (ECMP)").
> 
> IPv4 and IPv6 do not react the same to certain netdev events. For
> example, upon carrier change affected IPv4 nexthops are marked using the
> RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
> IPv6 on the other hand, does nothing which forces us to perform a
> carrier check during route lookup and dump. This makes it difficult to
> introduce features such as non-equal-cost multipath that are built on
> top of this set [1].
> 
> In addition, when a netdev is put administratively down IPv4 nexthops
> are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
> the routes using these nexthops. To be consistent with IPv4, multipath
> routes should only be flushed when all nexthops in the group are
> considered dead.
> 
> The first 12 patches introduce non-functional changes that store the
> RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
> events, in a similar fashion to IPv4. This allows us to remove the
> carrier check performed during route lookup and dump.
> 
> The next three patches make sure we only flush a multipath route when
> all of its nexthops are dead.
> 
> Last three patches add test cases for IPv4/IPv6 FIB. These verify that
> both address families react similarly to netdev events.
> 
> Finally, this series also serves as a good first step towards David
> Ahern's goal of treating nexthops as standalone objects [2], as it makes
> the code more in line with IPv4 where the nexthop and the nexthop group
> are separate objects from the route itself.
> 
> 1. https://github.com/idosch/linux/tree/ipv6-nexthops
> 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf
> 

Thanks for working on this - and creating the test cases.

One of many follow on changes that would be beneficial is to remove the
idev dereference in the hot path to check the
ignore_routes_with_linkdown sysctl.
David Miller Jan. 8, 2018, 2:30 a.m. UTC | #2
From: Ido Schimmel <idosch@mellanox.com>
Date: Sun,  7 Jan 2018 12:45:00 +0200

> This set tries to eliminate some differences between IPv4's and IPv6's
> treatment of nexthops. These differences are most likely a side effect
> of IPv6's data structures (specifically 'rt6_info') that incorporate
> both the route and the nexthop and the late addition of ECMP support in
> commit 51ebd3181572 ("ipv6: add support of equal cost multipath
> (ECMP)").
 ...
> Finally, this series also serves as a good first step towards David
> Ahern's goal of treating nexthops as standalone objects [2], as it makes
> the code more in line with IPv4 where the nexthop and the nexthop group
> are separate objects from the route itself.
 ...

Looks great, series applied, thanks Ido!
Ido Schimmel Jan. 9, 2018, 8:04 a.m. UTC | #3
On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote:
> One of many follow on changes that would be beneficial is to remove the
> idev dereference in the hot path to check the
> ignore_routes_with_linkdown sysctl.

When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all
the nexthops using it as their nexthop device, in case the sysctl is
set. Similarly, when the sysctl is toggled we can walk the routing
tables and toggle the flag where appropriate.

You have a different idea?

Thanks for reviewing!
David Ahern Jan. 10, 2018, 4:41 a.m. UTC | #4
On 1/9/18 1:04 AM, Ido Schimmel wrote:
> On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote:
>> One of many follow on changes that would be beneficial is to remove the
>> idev dereference in the hot path to check the
>> ignore_routes_with_linkdown sysctl.
> 
> When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all
> the nexthops using it as their nexthop device, in case the sysctl is
> set. Similarly, when the sysctl is toggled we can walk the routing
> tables and toggle the flag where appropriate.
> 
> You have a different idea?

that would work.