diff mbox series

[net-next,v1] net: evaluate net.conf.ipvX.all.ignore_routes_with_linkdown

Message ID 20201017125011.2655391-1-vincent@bernat.ch
State Deferred
Delegated to: David Miller
Headers show
Series [net-next,v1] net: evaluate net.conf.ipvX.all.ignore_routes_with_linkdown | expand

Commit Message

Vincent Bernat Oct. 17, 2020, 12:50 p.m. UTC
Introduced in 0eeb075fad73, the "ignore_routes_with_linkdown" sysctl
ignores a route whose interface is down. It is provided as a
per-interface sysctl. However, while a "all" variant is exposed, it
was a noop since it was never evaluated. We use the usual "or" logic
for this kind of sysctls.

Tested with:

    ip link add type veth # veth0 + veth1
    ip link add type veth # veth1 + veth2
    ip link set up dev veth0
    ip link set up dev veth1 # link-status paired with veth0
    ip link set up dev veth2
    ip link set up dev veth3 # link-status paired with veth2

    # First available path
    ip -4 addr add 203.0.113.${uts#H}/24 dev veth0
    ip -6 addr add 2001:db8:1::${uts#H}/64 dev veth0

    # Second available path
    ip -4 addr add 192.0.2.${uts#H}/24 dev veth2
    ip -6 addr add 2001:db8:2::${uts#H}/64 dev veth2

    # More specific route through first path
    ip -4 route add 198.51.100.0/25 via 203.0.113.254 # via veth0
    ip -6 route add 2001:db8:3::/56 via 2001:db8:1::ff # via veth0

    # Less specific route through second path
    ip -4 route add 198.51.100.0/24 via 192.0.2.254 # via veth2
    ip -6 route add 2001:db8:3::/48 via 2001:db8:2::ff # via veth2

    # H1: enable on "all"
    # H2: enable on "veth0"
    for v in ipv4 ipv6; do
      case $uts in
        H1)
          sysctl -qw net.${v}.conf.all.ignore_routes_with_linkdown=1
          ;;
        H2)
          sysctl -qw net.${v}.conf.veth0.ignore_routes_with_linkdown=1
          ;;
      esac
    done

    set -xe
    # When veth0 is up, best route is through veth0
    ip -o route get 198.51.100.1 | grep -Fw veth0
    ip -o route get 2001:db8:3::1 | grep -Fw veth0

    # When veth0 is down, best route should be through veth2 on H1/H2,
    # but on veth0 on H2
    ip link set down dev veth1 # down veth0
    ip route show
    [ $uts != H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth0
    [ $uts != H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth0
    [ $uts = H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth2
    [ $uts = H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth2

Without this patch, the two last lines would fail on H1 (the one using
the "all" sysctl). With the patch, everything succeeds as expected.

Also document the sysctl in `ip-sysctl.rst`.

Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
---
 Documentation/networking/ip-sysctl.rst | 3 +++
 include/linux/inetdevice.h             | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Oct. 20, 2020, 12:53 a.m. UTC | #1
On Sat, 17 Oct 2020 14:50:11 +0200 Vincent Bernat wrote:
> Introduced in 0eeb075fad73, the "ignore_routes_with_linkdown" sysctl
> ignores a route whose interface is down. It is provided as a
> per-interface sysctl. However, while a "all" variant is exposed, it
> was a noop since it was never evaluated. We use the usual "or" logic
> for this kind of sysctls.

> Without this patch, the two last lines would fail on H1 (the one using
> the "all" sysctl). With the patch, everything succeeds as expected.
> 
> Also document the sysctl in `ip-sysctl.rst`.
> 
> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>

I'm not hearing any objections, but I have two questions:
 - do you intend to merge it for 5.10 or 5.11? Because it has a fixes
   tag, yet it's marked for net-next. If we put it in 5.10 it may get
   pulled into stable immediately, knowing how things work lately.
 - we have other sysctls that use IN_DEV_CONF_GET(), 
   e.g. "proxy_arp_pvlan" should those also be converted?
David Ahern Oct. 20, 2020, 2:56 a.m. UTC | #2
On 10/19/20 6:53 PM, Jakub Kicinski wrote:
> On Sat, 17 Oct 2020 14:50:11 +0200 Vincent Bernat wrote:
>> Introduced in 0eeb075fad73, the "ignore_routes_with_linkdown" sysctl
>> ignores a route whose interface is down. It is provided as a
>> per-interface sysctl. However, while a "all" variant is exposed, it
>> was a noop since it was never evaluated. We use the usual "or" logic
>> for this kind of sysctls.
> 
>> Without this patch, the two last lines would fail on H1 (the one using
>> the "all" sysctl). With the patch, everything succeeds as expected.
>>
>> Also document the sysctl in `ip-sysctl.rst`.
>>
>> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
>> Signed-off-by: Vincent Bernat <vincent@bernat.ch>
> 
> I'm not hearing any objections, but I have two questions:
>  - do you intend to merge it for 5.10 or 5.11? Because it has a fixes
>    tag, yet it's marked for net-next. If we put it in 5.10 it may get
>    pulled into stable immediately, knowing how things work lately.
>  - we have other sysctls that use IN_DEV_CONF_GET(), 
>    e.g. "proxy_arp_pvlan" should those also be converted?
> 

The inconsistency with 'all' has been a major pain. In this case, I
think it makes sense. Blindly changing all of them I suspect will lead
to trouble. It is something reviewers should keep an eye on as sysctl
settings get added.
David Ahern Oct. 20, 2020, 2:57 a.m. UTC | #3
[ fix Andy's address ]

On 10/17/20 6:50 AM, Vincent Bernat wrote:
> Introduced in 0eeb075fad73, the "ignore_routes_with_linkdown" sysctl
> ignores a route whose interface is down. It is provided as a
> per-interface sysctl. However, while a "all" variant is exposed, it
> was a noop since it was never evaluated. We use the usual "or" logic
> for this kind of sysctls.
> 
> Tested with:
> 
>     ip link add type veth # veth0 + veth1
>     ip link add type veth # veth1 + veth2
>     ip link set up dev veth0
>     ip link set up dev veth1 # link-status paired with veth0
>     ip link set up dev veth2
>     ip link set up dev veth3 # link-status paired with veth2
> 
>     # First available path
>     ip -4 addr add 203.0.113.${uts#H}/24 dev veth0
>     ip -6 addr add 2001:db8:1::${uts#H}/64 dev veth0
> 
>     # Second available path
>     ip -4 addr add 192.0.2.${uts#H}/24 dev veth2
>     ip -6 addr add 2001:db8:2::${uts#H}/64 dev veth2
> 
>     # More specific route through first path
>     ip -4 route add 198.51.100.0/25 via 203.0.113.254 # via veth0
>     ip -6 route add 2001:db8:3::/56 via 2001:db8:1::ff # via veth0
> 
>     # Less specific route through second path
>     ip -4 route add 198.51.100.0/24 via 192.0.2.254 # via veth2
>     ip -6 route add 2001:db8:3::/48 via 2001:db8:2::ff # via veth2
> 
>     # H1: enable on "all"
>     # H2: enable on "veth0"
>     for v in ipv4 ipv6; do
>       case $uts in
>         H1)
>           sysctl -qw net.${v}.conf.all.ignore_routes_with_linkdown=1
>           ;;
>         H2)
>           sysctl -qw net.${v}.conf.veth0.ignore_routes_with_linkdown=1
>           ;;
>       esac
>     done
> 
>     set -xe
>     # When veth0 is up, best route is through veth0
>     ip -o route get 198.51.100.1 | grep -Fw veth0
>     ip -o route get 2001:db8:3::1 | grep -Fw veth0
> 
>     # When veth0 is down, best route should be through veth2 on H1/H2,
>     # but on veth0 on H2
>     ip link set down dev veth1 # down veth0
>     ip route show
>     [ $uts != H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth0
>     [ $uts != H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth0
>     [ $uts = H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth2
>     [ $uts = H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth2
> 
> Without this patch, the two last lines would fail on H1 (the one using
> the "all" sysctl). With the patch, everything succeeds as expected.
> 
> Also document the sysctl in `ip-sysctl.rst`.
> 
> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>
> ---
>  Documentation/networking/ip-sysctl.rst | 3 +++
>  include/linux/inetdevice.h             | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 837d51f9e1fa..fb6e4658fd4f 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1552,6 +1552,9 @@ igmpv3_unsolicited_report_interval - INTEGER
>  
>  	Default: 1000 (1 seconds)
>  
> +ignore_routes_with_linkdown - BOOLEAN
> +        Ignore routes whose link is down when performing a FIB lookup.
> +
>  promote_secondaries - BOOLEAN
>  	When a primary IP address is removed from this interface
>  	promote a corresponding secondary IP address instead of
> diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
> index 3515ca64e638..3bbcddd22df8 100644
> --- a/include/linux/inetdevice.h
> +++ b/include/linux/inetdevice.h
> @@ -126,7 +126,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
>  	  IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS)))
>  
>  #define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \
> -	IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
> +	IN_DEV_ORCONF((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
>  
>  #define IN_DEV_ARPFILTER(in_dev)	IN_DEV_ORCONF((in_dev), ARPFILTER)
>  #define IN_DEV_ARP_ACCEPT(in_dev)	IN_DEV_ORCONF((in_dev), ARP_ACCEPT)
>
Jakub Kicinski Oct. 20, 2020, 3:15 a.m. UTC | #4
On Mon, 19 Oct 2020 20:56:36 -0600 David Ahern wrote:
> On 10/19/20 6:53 PM, Jakub Kicinski wrote:
> > On Sat, 17 Oct 2020 14:50:11 +0200 Vincent Bernat wrote:  
> >> Introduced in 0eeb075fad73, the "ignore_routes_with_linkdown" sysctl
> >> ignores a route whose interface is down. It is provided as a
> >> per-interface sysctl. However, while a "all" variant is exposed, it
> >> was a noop since it was never evaluated. We use the usual "or" logic
> >> for this kind of sysctls.  
> >   
> >> Without this patch, the two last lines would fail on H1 (the one using
> >> the "all" sysctl). With the patch, everything succeeds as expected.
> >>
> >> Also document the sysctl in `ip-sysctl.rst`.
> >>
> >> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
> >> Signed-off-by: Vincent Bernat <vincent@bernat.ch>  
> > 
> > I'm not hearing any objections, but I have two questions:
> >  - do you intend to merge it for 5.10 or 5.11? Because it has a fixes
> >    tag, yet it's marked for net-next. If we put it in 5.10 it may get
> >    pulled into stable immediately, knowing how things work lately.
> >  - we have other sysctls that use IN_DEV_CONF_GET(), 
> >    e.g. "proxy_arp_pvlan" should those also be converted?
> 
> The inconsistency with 'all' has been a major pain. In this case, I
> think it makes sense. Blindly changing all of them I suspect will lead
> to trouble. It is something reviewers should keep an eye on as sysctl
> settings get added.

Just saying.. if Vincent had the time to clean them all up _carefully_,
it'd be less likely we'll see another one added through copy & paste :)
Vincent Bernat Oct. 20, 2020, 6:20 a.m. UTC | #5
❦ 19 octobre 2020 17:53 -07, Jakub Kicinski:

> I'm not hearing any objections, but I have two questions:
>  - do you intend to merge it for 5.10 or 5.11? Because it has a fixes
>    tag, yet it's marked for net-next. If we put it in 5.10 it may get
>    pulled into stable immediately, knowing how things work lately.

I have never been super-familiar with net/net-next. I am always using
net-next and let people sort it out.

>  - we have other sysctls that use IN_DEV_CONF_GET(), e.g.
> "proxy_arp_pvlan" should those also be converted?

I can check them and do that.
Jakub Kicinski Oct. 20, 2020, 10:55 p.m. UTC | #6
On Tue, 20 Oct 2020 08:20:43 +0200 Vincent Bernat wrote:
>  ❦ 19 octobre 2020 17:53 -07, Jakub Kicinski:
> > I'm not hearing any objections, but I have two questions:
> >  - do you intend to merge it for 5.10 or 5.11? Because it has a fixes
> >    tag, yet it's marked for net-next. If we put it in 5.10 it may get
> >    pulled into stable immediately, knowing how things work lately.  
> 
> I have never been super-familiar with net/net-next. I am always using
> net-next and let people sort it out.

Since there is some breakage potential I'd prefer to merge this change
into net-next and give it more testing. But net-next is currently
closed (don't pay attention to the graphic at vger).

If you wouldn't mind - please repost next week.

You can split out and repost the documentation change separately if you
want, that's a fix that can go into net right away.

> >  - we have other sysctls that use IN_DEV_CONF_GET(), e.g.
> > "proxy_arp_pvlan" should those also be converted?  
> 
> I can check them and do that.
diff mbox series

Patch

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 837d51f9e1fa..fb6e4658fd4f 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1552,6 +1552,9 @@  igmpv3_unsolicited_report_interval - INTEGER
 
 	Default: 1000 (1 seconds)
 
+ignore_routes_with_linkdown - BOOLEAN
+        Ignore routes whose link is down when performing a FIB lookup.
+
 promote_secondaries - BOOLEAN
 	When a primary IP address is removed from this interface
 	promote a corresponding secondary IP address instead of
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 3515ca64e638..3bbcddd22df8 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -126,7 +126,7 @@  static inline void ipv4_devconf_setall(struct in_device *in_dev)
 	  IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS)))
 
 #define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \
-	IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
+	IN_DEV_ORCONF((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
 
 #define IN_DEV_ARPFILTER(in_dev)	IN_DEV_ORCONF((in_dev), ARPFILTER)
 #define IN_DEV_ARP_ACCEPT(in_dev)	IN_DEV_ORCONF((in_dev), ARP_ACCEPT)