Message ID | 20190405233041.30775-9-dsahern@kernel.org |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | ipv4: Enable support for IPv6 gateway with IPv4 routes | expand |
On Fri, Apr 5, 2019 at 4:32 PM David Ahern <dsahern@kernel.org> wrote: > > From: David Ahern <dsahern@gmail.com> > > fib_check_nh is currently huge covering multiple uses cases - device only, > device + gateway, and device + gateway with ONLINK. The next patch adds > validation checks for IPv6 which only further complicates it. So, break > fib_check_nh into 2 helpers - one for gateway validation and one for device > only. > > Signed-off-by: David Ahern <dsahern@gmail.com> > Reviewed-by: Ido Schimmel <idosch@mellanox.com> With the latest net-next I am having issue with network traffic. git bisect points to this commit as the first bad commit. (448d7248191706cbbd7761e3bc72c2985c4d38a7 ipv4: Refactor fib_check_nh) I do not understand this part of kernel, from what I see, this patch rejects the default route added by NetworkManager. [gvaradar@240m5avmarch ~]$ sudo ip route add 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101 Error: Nexthop has invalid gateway. Is this request valid or is this an issue with NetworkManager? This used to work before, [gvaradar@240m5avmarch linux]$ ip route default via 10.193.164.254 dev e0 proto dhcp metric 100 10.193.164.0/24 via 0.0.0.0 dev e0 proto kernel scope link src 10.193.164.12 metric 100 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101 I do not know why NetworkManager is adding "via 0.0.0.0". 192.168.0.0/24 is a local subnet. Either way, this seems to break odd behavior of NetworkManager. -- Govind Network details: [NOT WORKING]: HEAD of net-next [gvaradar@240m5avmarch ~]$ ip -d a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:64:86:b6 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.193.164.12/24 brd 10.193.164.255 scope global dynamic noprefixroute e0 valid_lft 21430sec preferred_lft 21430sec inet6 fe80::1f95:bfca:8e05:9448/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: bm0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:fc:ba:a2:9c:d5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9000 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 inet 192.168.0.1/24 brd 192.168.0.255 scope global noprefixroute bm0 valid_lft forever preferred_lft forever inet6 fe80::5988:7d2:a671:6c7d/64 scope link noprefixroute valid_lft forever preferred_lft forever [gvaradar@240m5avmarch ~]$ ip route [gvaradar@240m5avmarch ~]$ ping 192.168.0.2 -c2 connect: Network is unreachable [gvaradar@240m5avmarch ~]$ sudo ip route add 192.168.0.0/24 dev bm0 [gvaradar@240m5avmarch ~]$ ip route 192.168.0.0/24 dev bm0 scope link [gvaradar@240m5avmarch ~]$ ping 192.168.0.2 -c2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.121 ms 64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.065 ms [gvaradar@240m5avmarch ~]$ sudo ip route add 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101 Error: Nexthop has invalid gateway. [root@240m5avmarch ~]# systemctl stop NetworkManager [root@240m5avmarch ~]# netctl start bm0 [root@240m5avmarch ~]# ip route 192.168.0.0/24 dev bm0 proto kernel scope link src 192.168.0.1 [root@240m5avmarch ~]# ping 192.168.0.2 -c2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.184 ms 64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.050 ms [Working]: #First working commit [gvaradar@240m5avmarch linux]$ git log -1 commit a4ea5d43c807be28545625c1e0641905022fa0d1 (HEAD, refs/bisect/good-a4ea5d43c807be28545625c1e0641905022fa0d1) Author: David Ahern <dsahern@gmail.com> Date: Fri Apr 5 16:30:30 2019 -0700 ipv4: Add support to fib_config for IPv6 gateway Add support for an IPv6 gateway to fib_config. Since a gateway is either IPv4 or IPv6, make it a union with fc_gw4 where fc_gw_family decides which address is in use. Update current checks on family and gw4 to handle ipv6 as well. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [gvaradar@240m5avmarch linux]$ ip route default via 10.193.164.254 dev e0 proto dhcp metric 100 10.193.164.0/24 via 0.0.0.0 dev e0 proto kernel scope link src 10.193.164.12 metric 100 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101 [gvaradar@240m5avmarch linux]$ ping 192.168.0.2 -c4 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.075 ms 64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.047 ms 64 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.046 ms 64 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.071 ms ... [gvaradar@240m5avmarch linux]$ sudo ip route del 192.168.0.0/24 [gvaradar@240m5avmarch linux]$ ip route default via 10.193.164.254 dev e0 proto dhcp metric 100 10.193.164.0/24 via 0.0.0.0 dev e0 proto kernel scope link src 10.193.164.12 metric 100 [gvaradar@240m5avmarch linux]$ sudo ip route add 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101 [gvaradar@240m5avmarch linux]$ ip route default via 10.193.164.254 dev e0 proto dhcp metric 100 10.193.164.0/24 via 0.0.0.0 dev e0 proto kernel scope link src 10.193.164.12 metric 100 192.168.0.0/24 via 0.0.0.0 dev bm0 proto kernel scope link src 192.168.0.1 metric 101
On 4/9/19 5:08 PM, Govindarajulu Varadarajan wrote: > On Fri, Apr 5, 2019 at 4:32 PM David Ahern <dsahern@kernel.org> wrote: >> >> From: David Ahern <dsahern@gmail.com> >> >> fib_check_nh is currently huge covering multiple uses cases - device only, >> device + gateway, and device + gateway with ONLINK. The next patch adds >> validation checks for IPv6 which only further complicates it. So, break >> fib_check_nh into 2 helpers - one for gateway validation and one for device >> only. >> >> Signed-off-by: David Ahern <dsahern@gmail.com> >> Reviewed-by: Ido Schimmel <idosch@mellanox.com> > > With the latest net-next I am having issue with network traffic. > git bisect points to this commit as the first bad commit. > (448d7248191706cbbd7761e3bc72c2985c4d38a7 ipv4: Refactor fib_check_nh) > Thanks for the report. I'll take a look tonight.
On 4/9/19 5:08 PM, Govindarajulu Varadarajan wrote: > On Fri, Apr 5, 2019 at 4:32 PM David Ahern <dsahern@kernel.org> wrote: >> >> From: David Ahern <dsahern@gmail.com> >> >> fib_check_nh is currently huge covering multiple uses cases - device only, >> device + gateway, and device + gateway with ONLINK. The next patch adds >> validation checks for IPv6 which only further complicates it. So, break >> fib_check_nh into 2 helpers - one for gateway validation and one for device >> only. >> >> Signed-off-by: David Ahern <dsahern@gmail.com> >> Reviewed-by: Ido Schimmel <idosch@mellanox.com> > > With the latest net-next I am having issue with network traffic. > git bisect points to this commit as the first bad commit. > (448d7248191706cbbd7761e3bc72c2985c4d38a7 ipv4: Refactor fib_check_nh) > Can you see if this fixes it? whitespace damaged, but fairly simple to apply manually: diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 017273885eee..779d2be2b135 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -616,8 +616,9 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, return -EINVAL; } if (nla) { - fib_cfg.fc_gw_family = AF_INET; fib_cfg.fc_gw4 = nla_get_in_addr(nla); + if (fib_cfg.fc_gw4) + fib_cfg.fc_gw_family = AF_INET; } else if (nlav) { ret = fib_gw_from_via(&fib_cfg, nlav, extack); if (ret)
On Tue, Apr 9, 2019 at 7:13 PM David Ahern <dsahern@gmail.com> wrote: > > On 4/9/19 5:08 PM, Govindarajulu Varadarajan wrote: > > On Fri, Apr 5, 2019 at 4:32 PM David Ahern <dsahern@kernel.org> wrote: > >> > >> From: David Ahern <dsahern@gmail.com> > >> > >> fib_check_nh is currently huge covering multiple uses cases - device only, > >> device + gateway, and device + gateway with ONLINK. The next patch adds > >> validation checks for IPv6 which only further complicates it. So, break > >> fib_check_nh into 2 helpers - one for gateway validation and one for device > >> only. > >> > >> Signed-off-by: David Ahern <dsahern@gmail.com> > >> Reviewed-by: Ido Schimmel <idosch@mellanox.com> > > > > With the latest net-next I am having issue with network traffic. > > git bisect points to this commit as the first bad commit. > > (448d7248191706cbbd7761e3bc72c2985c4d38a7 ipv4: Refactor fib_check_nh) > > > > Can you see if this fixes it? whitespace damaged, but fairly simple to > apply manually: > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index 017273885eee..779d2be2b135 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -616,8 +616,9 @@ static int fib_get_nhs(struct fib_info *fi, struct > rtnexthop *rtnh, > return -EINVAL; > } > if (nla) { > - fib_cfg.fc_gw_family = AF_INET; > fib_cfg.fc_gw4 = nla_get_in_addr(nla); > + if (fib_cfg.fc_gw4) > + fib_cfg.fc_gw_family = AF_INET; > } else if (nlav) { > ret = fib_gw_from_via(&fib_cfg, nlav, > extack); > if (ret) This did not work. However this worked @@ -1096,7 +1096,7 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_nh *nh, u32 table = cfg->fc_table; int err; - if (nh->fib_nh_gw_family == AF_INET) + if (nh->fib_nh_gw4) err = fib_check_nh_v4_gw(net, nh, table, cfg->fc_scope, extack); else if (nh->fib_nh_gw_family == AF_INET6) err = fib_check_nh_v6_gw(net, nh, table, extack);
On 4/9/19 10:31 PM, Govindarajulu Varadarajan wrote: > However this worked > > @@ -1096,7 +1096,7 @@ static int fib_check_nh(struct fib_config *cfg, > struct fib_nh *nh, > u32 table = cfg->fc_table; > int err; > > - if (nh->fib_nh_gw_family == AF_INET) > + if (nh->fib_nh_gw4) > err = fib_check_nh_v4_gw(net, nh, table, cfg->fc_scope, extack); > else if (nh->fib_nh_gw_family == AF_INET6) > err = fib_check_nh_v6_gw(net, nh, table, extack); > since fib_nh_gw4 is a union that cannot be used without checking the family first. You need this part as well (previous hunk was for mpath routes): diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 310060e67790..d4b63f94f7be 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -755,8 +755,9 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, break; case RTA_GATEWAY: has_gw = true; - cfg->fc_gw_family = AF_INET; cfg->fc_gw4 = nla_get_be32(attr); + if (cfg->fc_gw4) + cfg->fc_gw_family = AF_INET; break; case RTA_VIA: has_via = true;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 680b5a9a911a..32ce6e6202d2 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -885,134 +885,150 @@ bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi) * | * |-> {local prefix} (terminal node) */ -static int fib_check_nh(struct fib_config *cfg, struct fib_nh *nh, - struct netlink_ext_ack *extack) +static int fib_check_nh_v4_gw(struct net *net, struct fib_nh *nh, u32 table, + u8 scope, struct netlink_ext_ack *extack) { - int err = 0; - struct net *net; struct net_device *dev; + struct fib_result res; + int err; - net = cfg->fc_nlinfo.nl_net; - if (nh->fib_nh_gw4) { - struct fib_result res; - - if (nh->fib_nh_flags & RTNH_F_ONLINK) { - unsigned int addr_type; + if (nh->fib_nh_flags & RTNH_F_ONLINK) { + unsigned int addr_type; - if (cfg->fc_scope >= RT_SCOPE_LINK) { - NL_SET_ERR_MSG(extack, - "Nexthop has invalid scope"); - return -EINVAL; - } - dev = __dev_get_by_index(net, nh->fib_nh_oif); - if (!dev) { - NL_SET_ERR_MSG(extack, "Nexthop device required for onlink"); - return -ENODEV; - } - if (!(dev->flags & IFF_UP)) { - NL_SET_ERR_MSG(extack, - "Nexthop device is not up"); - return -ENETDOWN; - } - addr_type = inet_addr_type_dev_table(net, dev, - nh->fib_nh_gw4); - if (addr_type != RTN_UNICAST) { - NL_SET_ERR_MSG(extack, - "Nexthop has invalid gateway"); - return -EINVAL; - } - if (!netif_carrier_ok(dev)) - nh->fib_nh_flags |= RTNH_F_LINKDOWN; - nh->fib_nh_dev = dev; - dev_hold(dev); - nh->fib_nh_scope = RT_SCOPE_LINK; - return 0; + if (scope >= RT_SCOPE_LINK) { + NL_SET_ERR_MSG(extack, "Nexthop has invalid scope"); + return -EINVAL; } - rcu_read_lock(); - { - struct fib_table *tbl = NULL; - struct flowi4 fl4 = { - .daddr = nh->fib_nh_gw4, - .flowi4_scope = cfg->fc_scope + 1, - .flowi4_oif = nh->fib_nh_oif, - .flowi4_iif = LOOPBACK_IFINDEX, - }; - - /* It is not necessary, but requires a bit of thinking */ - if (fl4.flowi4_scope < RT_SCOPE_LINK) - fl4.flowi4_scope = RT_SCOPE_LINK; - - if (cfg->fc_table) - tbl = fib_get_table(net, cfg->fc_table); - - if (tbl) - err = fib_table_lookup(tbl, &fl4, &res, - FIB_LOOKUP_IGNORE_LINKSTATE | - FIB_LOOKUP_NOREF); - - /* on error or if no table given do full lookup. This - * is needed for example when nexthops are in the local - * table rather than the given table - */ - if (!tbl || err) { - err = fib_lookup(net, &fl4, &res, - FIB_LOOKUP_IGNORE_LINKSTATE); - } - - if (err) { - NL_SET_ERR_MSG(extack, - "Nexthop has invalid gateway"); - rcu_read_unlock(); - return err; - } + dev = __dev_get_by_index(net, nh->fib_nh_oif); + if (!dev) { + NL_SET_ERR_MSG(extack, "Nexthop device required for onlink"); + return -ENODEV; } - err = -EINVAL; - if (res.type != RTN_UNICAST && res.type != RTN_LOCAL) { - NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway"); - goto out; + if (!(dev->flags & IFF_UP)) { + NL_SET_ERR_MSG(extack, "Nexthop device is not up"); + return -ENETDOWN; } - nh->fib_nh_scope = res.scope; - nh->fib_nh_oif = FIB_RES_OIF(res); - nh->fib_nh_dev = dev = FIB_RES_DEV(res); - if (!dev) { - NL_SET_ERR_MSG(extack, - "No egress device for nexthop gateway"); - goto out; + addr_type = inet_addr_type_dev_table(net, dev, nh->fib_nh_gw4); + if (addr_type != RTN_UNICAST) { + NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway"); + return -EINVAL; } - dev_hold(dev); if (!netif_carrier_ok(dev)) nh->fib_nh_flags |= RTNH_F_LINKDOWN; - err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN; - } else { - struct in_device *in_dev; - - if (nh->fib_nh_flags & (RTNH_F_PERVASIVE | RTNH_F_ONLINK)) { - NL_SET_ERR_MSG(extack, - "Invalid flags for nexthop - PERVASIVE and ONLINK can not be set"); - return -EINVAL; + nh->fib_nh_dev = dev; + dev_hold(dev); + nh->fib_nh_scope = RT_SCOPE_LINK; + return 0; + } + rcu_read_lock(); + { + struct fib_table *tbl = NULL; + struct flowi4 fl4 = { + .daddr = nh->fib_nh_gw4, + .flowi4_scope = scope + 1, + .flowi4_oif = nh->fib_nh_oif, + .flowi4_iif = LOOPBACK_IFINDEX, + }; + + /* It is not necessary, but requires a bit of thinking */ + if (fl4.flowi4_scope < RT_SCOPE_LINK) + fl4.flowi4_scope = RT_SCOPE_LINK; + + if (table) + tbl = fib_get_table(net, table); + + if (tbl) + err = fib_table_lookup(tbl, &fl4, &res, + FIB_LOOKUP_IGNORE_LINKSTATE | + FIB_LOOKUP_NOREF); + + /* on error or if no table given do full lookup. This + * is needed for example when nexthops are in the local + * table rather than the given table + */ + if (!tbl || err) { + err = fib_lookup(net, &fl4, &res, + FIB_LOOKUP_IGNORE_LINKSTATE); } - rcu_read_lock(); - err = -ENODEV; - in_dev = inetdev_by_index(net, nh->fib_nh_oif); - if (!in_dev) - goto out; - err = -ENETDOWN; - if (!(in_dev->dev->flags & IFF_UP)) { - NL_SET_ERR_MSG(extack, "Device for nexthop is not up"); + + if (err) { + NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway"); goto out; } - nh->fib_nh_dev = in_dev->dev; - dev_hold(nh->fib_nh_dev); - nh->fib_nh_scope = RT_SCOPE_HOST; - if (!netif_carrier_ok(nh->fib_nh_dev)) - nh->fib_nh_flags |= RTNH_F_LINKDOWN; - err = 0; } + + err = -EINVAL; + if (res.type != RTN_UNICAST && res.type != RTN_LOCAL) { + NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway"); + goto out; + } + nh->fib_nh_scope = res.scope; + nh->fib_nh_oif = FIB_RES_OIF(res); + nh->fib_nh_dev = dev = FIB_RES_DEV(res); + if (!dev) { + NL_SET_ERR_MSG(extack, + "No egress device for nexthop gateway"); + goto out; + } + dev_hold(dev); + if (!netif_carrier_ok(dev)) + nh->fib_nh_flags |= RTNH_F_LINKDOWN; + err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN; out: rcu_read_unlock(); return err; } +static int fib_check_nh_nongw(struct net *net, struct fib_nh *nh, + struct netlink_ext_ack *extack) +{ + struct in_device *in_dev; + int err; + + if (nh->fib_nh_flags & (RTNH_F_PERVASIVE | RTNH_F_ONLINK)) { + NL_SET_ERR_MSG(extack, + "Invalid flags for nexthop - PERVASIVE and ONLINK can not be set"); + return -EINVAL; + } + + rcu_read_lock(); + + err = -ENODEV; + in_dev = inetdev_by_index(net, nh->fib_nh_oif); + if (!in_dev) + goto out; + err = -ENETDOWN; + if (!(in_dev->dev->flags & IFF_UP)) { + NL_SET_ERR_MSG(extack, "Device for nexthop is not up"); + goto out; + } + + nh->fib_nh_dev = in_dev->dev; + dev_hold(nh->fib_nh_dev); + nh->fib_nh_scope = RT_SCOPE_HOST; + if (!netif_carrier_ok(nh->fib_nh_dev)) + nh->fib_nh_flags |= RTNH_F_LINKDOWN; + err = 0; +out: + rcu_read_unlock(); + return err; +} + +static int fib_check_nh(struct fib_config *cfg, struct fib_nh *nh, + struct netlink_ext_ack *extack) +{ + struct net *net = cfg->fc_nlinfo.nl_net; + u32 table = cfg->fc_table; + int err; + + if (nh->fib_nh_gw_family == AF_INET) + err = fib_check_nh_v4_gw(net, nh, table, cfg->fc_scope, extack); + else + err = fib_check_nh_nongw(net, nh, extack); + + return err; +} + static inline unsigned int fib_laddr_hashfn(__be32 val) { unsigned int mask = (fib_info_hash_size - 1);