Message ID | 20180107104518.31693-1-idosch@mellanox.com |
---|---|
Headers | show |
Series | ipv6: Align nexthop behaviour with IPv4 | expand |
On 1/7/18 3:45 AM, Ido Schimmel wrote: > This set tries to eliminate some differences between IPv4's and IPv6's > treatment of nexthops. These differences are most likely a side effect > of IPv6's data structures (specifically 'rt6_info') that incorporate > both the route and the nexthop and the late addition of ECMP support in > commit 51ebd3181572 ("ipv6: add support of equal cost multipath > (ECMP)"). > > IPv4 and IPv6 do not react the same to certain netdev events. For > example, upon carrier change affected IPv4 nexthops are marked using the > RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly. > IPv6 on the other hand, does nothing which forces us to perform a > carrier check during route lookup and dump. This makes it difficult to > introduce features such as non-equal-cost multipath that are built on > top of this set [1]. > > In addition, when a netdev is put administratively down IPv4 nexthops > are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all > the routes using these nexthops. To be consistent with IPv4, multipath > routes should only be flushed when all nexthops in the group are > considered dead. > > The first 12 patches introduce non-functional changes that store the > RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev > events, in a similar fashion to IPv4. This allows us to remove the > carrier check performed during route lookup and dump. > > The next three patches make sure we only flush a multipath route when > all of its nexthops are dead. > > Last three patches add test cases for IPv4/IPv6 FIB. These verify that > both address families react similarly to netdev events. > > Finally, this series also serves as a good first step towards David > Ahern's goal of treating nexthops as standalone objects [2], as it makes > the code more in line with IPv4 where the nexthop and the nexthop group > are separate objects from the route itself. > > 1. https://github.com/idosch/linux/tree/ipv6-nexthops > 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf > Thanks for working on this - and creating the test cases. One of many follow on changes that would be beneficial is to remove the idev dereference in the hot path to check the ignore_routes_with_linkdown sysctl.
From: Ido Schimmel <idosch@mellanox.com> Date: Sun, 7 Jan 2018 12:45:00 +0200 > This set tries to eliminate some differences between IPv4's and IPv6's > treatment of nexthops. These differences are most likely a side effect > of IPv6's data structures (specifically 'rt6_info') that incorporate > both the route and the nexthop and the late addition of ECMP support in > commit 51ebd3181572 ("ipv6: add support of equal cost multipath > (ECMP)"). ... > Finally, this series also serves as a good first step towards David > Ahern's goal of treating nexthops as standalone objects [2], as it makes > the code more in line with IPv4 where the nexthop and the nexthop group > are separate objects from the route itself. ... Looks great, series applied, thanks Ido!
On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote: > One of many follow on changes that would be beneficial is to remove the > idev dereference in the hot path to check the > ignore_routes_with_linkdown sysctl. When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all the nexthops using it as their nexthop device, in case the sysctl is set. Similarly, when the sysctl is toggled we can walk the routing tables and toggle the flag where appropriate. You have a different idea? Thanks for reviewing!
On 1/9/18 1:04 AM, Ido Schimmel wrote: > On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote: >> One of many follow on changes that would be beneficial is to remove the >> idev dereference in the hot path to check the >> ignore_routes_with_linkdown sysctl. > > When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all > the nexthops using it as their nexthop device, in case the sysctl is > set. Similarly, when the sysctl is toggled we can walk the routing > tables and toggle the flag where appropriate. > > You have a different idea? that would work.