Message ID | 20160914124025.13417-1-vincent@bernat.im |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
❦ 14 septembre 2016 14:40 CEST, Vincent Bernat <vincent@bernat.im> : > Each interface can be assigned to a numeric group using IFLA_GROUP. This > commit enables a user to reference such a group into an IP rule. Here is > an example of output of iproute2: > > $ ip rule show > 0: from all lookup local > 32764: from all iifgroup 2 lookup 2 > 32765: from all iifgroup 1 lookup 1 > 32766: from all lookup main > 32767: from all lookup default The patch for iproute2 is available here (didn't post it inline to avoid confuse patchwork): http://paste.debian.net/821247/
On 9/14/16 6:40 AM, Vincent Bernat wrote: > When a user wants to assign a routing table to a group of incoming > interfaces, the current solutions are: > > - one IP rule for each interface (scalability problems) > - use of fwmark and devgroup matcher (don't work with internal route > lookups, used for example by RPF) > - use of VRF devices (more complex) Why do you believe that? A VRF is a formalized grouping of interfaces that includes an API for locally generated traffic to specify which VRF/group to use. And, with the l3mdev rule you only need 1 rule for all VRFs regardless of the number which is the best solution to the scalability problem of adding rules per device/group/VRF. What use case are trying to solve?
❦ 14 septembre 2016 16:15 CEST, David Ahern <dsa@cumulusnetworks.com> : >> When a user wants to assign a routing table to a group of incoming >> interfaces, the current solutions are: >> >> - one IP rule for each interface (scalability problems) >> - use of fwmark and devgroup matcher (don't work with internal route >> lookups, used for example by RPF) >> - use of VRF devices (more complex) > > Why do you believe that? A VRF is a formalized grouping of interfaces > that includes an API for locally generated traffic to specify which > VRF/group to use. And, with the l3mdev rule you only need 1 rule for > all VRFs regardless of the number which is the best solution to the > scalability problem of adding rules per device/group/VRF. > > What use case are trying to solve? Local processes have to be made aware of the VRF by binding to the pseudo-device. Some processes may be tricked by LD_PRELOAD but some won't (like stuff written in Go). Maybe I should just find a better way to bind a process to a VRF without its cooperation.
On 9/14/16 8:25 AM, Vincent Bernat wrote: > ❦ 14 septembre 2016 16:15 CEST, David Ahern <dsa@cumulusnetworks.com> : > >>> When a user wants to assign a routing table to a group of incoming >>> interfaces, the current solutions are: >>> >>> - one IP rule for each interface (scalability problems) >>> - use of fwmark and devgroup matcher (don't work with internal route >>> lookups, used for example by RPF) >>> - use of VRF devices (more complex) >> >> Why do you believe that? A VRF is a formalized grouping of interfaces >> that includes an API for locally generated traffic to specify which >> VRF/group to use. And, with the l3mdev rule you only need 1 rule for >> all VRFs regardless of the number which is the best solution to the >> scalability problem of adding rules per device/group/VRF. >> >> What use case are trying to solve? > > Local processes have to be made aware of the VRF by binding to the > pseudo-device. Some processes may be tricked by LD_PRELOAD but some > won't (like stuff written in Go). Maybe I should just find a better way > to bind a process to a VRF without its cooperation. > What API are you using for interface groups? How does an app tell the kernel to use interface group 1 versus group 2? LD_PRELOAD and overloading socket is an ad-hoc hack at best with many holes - as you have found. We (Cumulus Linux) are using this cgroups patch: http://www.mail-archive.com/netdev@vger.kernel.org/msg93408.html I want something formal like the cgroups patch or even the first idea of adding a default sk_bound_dev_if to the task struct: https://github.com/dsahern/linux/commit/b3e5ccc291505c8a503edb20ea2c2b5e86bed96f Parent-to-child inheritance of the setting is a requirement as is the setting getting applied to all IPv4/v6 sockets without action by the process itself. Still have some work to do to get a solution into the kernel.
❦ 14 septembre 2016 16:39 CEST, David Ahern <dsa@cumulusnetworks.com> : >>>> When a user wants to assign a routing table to a group of incoming >>>> interfaces, the current solutions are: >>>> >>>> - one IP rule for each interface (scalability problems) >>>> - use of fwmark and devgroup matcher (don't work with internal route >>>> lookups, used for example by RPF) >>>> - use of VRF devices (more complex) >>> >>> Why do you believe that? A VRF is a formalized grouping of interfaces >>> that includes an API for locally generated traffic to specify which >>> VRF/group to use. And, with the l3mdev rule you only need 1 rule for >>> all VRFs regardless of the number which is the best solution to the >>> scalability problem of adding rules per device/group/VRF. >>> >>> What use case are trying to solve? >> >> Local processes have to be made aware of the VRF by binding to the >> pseudo-device. Some processes may be tricked by LD_PRELOAD but some >> won't (like stuff written in Go). Maybe I should just find a better way >> to bind a process to a VRF without its cooperation. > > What API are you using for interface groups? How does an app tell the > kernel to use interface group 1 versus group 2? In my testbed, I have only one local application which is dnsmasq as a DHCP server. It sends back the answer to the physical interface (with sendmsg() and auxillary data). So it makes my argument a bit moot as the situation is in fact worse without VRF. :-/ My testbed is here (with use of VRF, more recent commits just use plain ip rules): https://github.com/vincentbernat/network-lab/blob/d86e9ed658863ef0f51d7b853d0dc9f8b7427b21/lab-l3-hyperv/setup I could just give more time to VRF. I also had some concerns over performance with the way Netfilter integration is done, but I understand that I could just stay away from POSTROUTING rules which is the only hook executed twice?
On 9/14/16 9:14 AM, Vincent Bernat wrote: > I could just give more time to VRF. I also had some concerns over > performance with the way Netfilter integration is done, but I understand > that I could just stay away from POSTROUTING rules which is the only > hook executed twice? > With the changes that were committed this past weekend, the VRF code is now setup where I can set a flag on a per VRF basis to disable the extra rx and tx processing - ie., no network taps, no netfilter, no qdisc, etc. Drops the overhead of VRF to ~3% maybe a bit less. I need to think about the user api a bit more and formalize the patch. Given my other commitments that probably won't happen until mid-October. But in terms of a building block, the overhead of VRF is continuing to drop.
❦ 14 septembre 2016 17:25 CEST, David Ahern <dsa@cumulusnetworks.com> : >> I could just give more time to VRF. I also had some concerns over >> performance with the way Netfilter integration is done, but I understand >> that I could just stay away from POSTROUTING rules which is the only >> hook executed twice? > With the changes that were committed this past weekend, the VRF code > is now setup where I can set a flag on a per VRF basis to disable the > extra rx and tx processing - ie., no network taps, no netfilter, no > qdisc, etc. Drops the overhead of VRF to ~3% maybe a bit less. I need > to think about the user api a bit more and formalize the patch. Given > my other commitments that probably won't happen until mid-October. But > in terms of a building block, the overhead of VRF is continuing to > drop. Fine by me. We can drop my patch. Thanks!
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 456e4a6006ab..a96b186ccd02 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -28,6 +28,8 @@ struct fib_rule { u32 pref; int suppress_ifgroup; int suppress_prefixlen; + int iifgroup; + int oifgroup; char iifname[IFNAMSIZ]; char oifname[IFNAMSIZ]; struct rcu_head rcu; @@ -92,7 +94,9 @@ struct fib_rules_ops { [FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \ [FRA_SUPPRESS_IFGROUP] = { .type = NLA_U32 }, \ [FRA_GOTO] = { .type = NLA_U32 }, \ - [FRA_L3MDEV] = { .type = NLA_U8 } + [FRA_L3MDEV] = { .type = NLA_U8 }, \ + [FRA_IIFGROUP] = { .type = NLA_U32 }, \ + [FRA_OIFGROUP] = { .type = NLA_U32 } static inline void fib_rule_get(struct fib_rule *rule) { diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h index 14404b3ebb89..0bf5a5e94d9a 100644 --- a/include/uapi/linux/fib_rules.h +++ b/include/uapi/linux/fib_rules.h @@ -51,6 +51,8 @@ enum { FRA_OIFNAME, FRA_PAD, FRA_L3MDEV, /* iif or oif is l3mdev goto its table */ + FRA_IIFGROUP, /* interface group */ + FRA_OIFGROUP, __FRA_MAX }; diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index be4629c344a6..f8ed6ba85c72 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -37,6 +37,9 @@ int fib_default_rule_add(struct fib_rules_ops *ops, r->suppress_prefixlen = -1; r->suppress_ifgroup = -1; + r->iifgroup = -1; + r->oifgroup = -1; + /* The lock is not required here, the list in unreacheable * at the moment this function is called */ list_add_tail(&r->list, &ops->rules_list); @@ -193,6 +196,30 @@ static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops, if (rule->l3mdev && !l3mdev_fib_rule_match(rule->fr_net, fl, arg)) goto out; + if (rule->iifgroup != -1) { + struct net_device *dev; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_iif); + if (!dev || dev->group != rule->iifgroup) { + rcu_read_unlock(); + goto out; + } + rcu_read_unlock(); + } + + if (rule->oifgroup != -1) { + struct net_device *dev; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_oif); + if (!dev || dev->group != rule->oifgroup) { + rcu_read_unlock(); + goto out; + } + rcu_read_unlock(); + } + ret = ops->match(rule, fl, flags); out: return (rule->flags & FIB_RULE_INVERT) ? !ret : ret; @@ -305,6 +332,12 @@ static int rule_exists(struct fib_rules_ops *ops, struct fib_rule_hdr *frh, if (r->l3mdev != rule->l3mdev) continue; + if (r->iifgroup != rule->iifgroup) + continue; + + if (r->oifgroup != rule->oifgroup) + continue; + if (!ops->compare(r, frh, tb)) continue; return 1; @@ -391,6 +424,16 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh) goto errout_free; } + if (tb[FRA_IIFGROUP]) + rule->iifgroup = nla_get_u32(tb[FRA_IIFGROUP]); + else + rule->iifgroup = -1; + + if (tb[FRA_OIFGROUP]) + rule->oifgroup = nla_get_u32(tb[FRA_OIFGROUP]); + else + rule->oifgroup = -1; + rule->action = frh->action; rule->flags = frh->flags; rule->table = frh_get_table(frh, tb); @@ -552,6 +595,14 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh) (rule->l3mdev != nla_get_u8(tb[FRA_L3MDEV]))) continue; + if (tb[FRA_IIFGROUP] && + (rule->iifgroup != nla_get_u32(tb[FRA_IIFGROUP]))) + continue; + + if (tb[FRA_OIFGROUP] && + (rule->oifgroup != nla_get_u32(tb[FRA_OIFGROUP]))) + continue; + if (!ops->compare(rule, frh, tb)) continue; @@ -679,7 +730,11 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct fib_rule *rule, (rule->tun_id && nla_put_be64(skb, FRA_TUN_ID, rule->tun_id, FRA_PAD)) || (rule->l3mdev && - nla_put_u8(skb, FRA_L3MDEV, rule->l3mdev))) + nla_put_u8(skb, FRA_L3MDEV, rule->l3mdev)) || + ((rule->iifgroup != -1) && + nla_put_u32(skb, FRA_IIFGROUP, rule->iifgroup)) || + ((rule->oifgroup != -1) && + nla_put_u32(skb, FRA_OIFGROUP, rule->oifgroup))) goto nla_put_failure; if (rule->suppress_ifgroup != -1) {
When a user wants to assign a routing table to a group of incoming interfaces, the current solutions are: - one IP rule for each interface (scalability problems) - use of fwmark and devgroup matcher (don't work with internal route lookups, used for example by RPF) - use of VRF devices (more complex) Each interface can be assigned to a numeric group using IFLA_GROUP. This commit enables a user to reference such a group into an IP rule. Here is an example of output of iproute2: $ ip rule show 0: from all lookup local 32764: from all iifgroup 2 lookup 2 32765: from all iifgroup 1 lookup 1 32766: from all lookup main 32767: from all lookup default Signed-off-by: Vincent Bernat <vincent@bernat.im> --- include/net/fib_rules.h | 6 ++++- include/uapi/linux/fib_rules.h | 2 ++ net/core/fib_rules.c | 57 +++++++++++++++++++++++++++++++++++++++++- 3 files changed, 63 insertions(+), 2 deletions(-)