Message ID | 20200307205916.15646-1-sharpd@cumulusnetworks.com |
---|---|
State | Changes Requested |
Delegated to: | stephen hemminger |
Headers | show |
Series | ip link: Prevent duplication of table id for vrf tables | expand |
On 3/7/20 1:59 PM, Donald Sharp wrote: > Creation of different vrf's with duplicate table id's creates > a situation where two different routing entities believe > they have exclusive access to a particular table. This > leads to situations where different routing processes > clash for control of a route due to inadvertent table > id overlap. Prevent end user from making this mistake > on accident. I get the pain, but it is a user management problem and ip is but one tool. I think at most ip warns the user about the table duplication; it can't fail the create.
David - I'm more than a bit confused about this stance. I've been repeatedly told by the likes of you, Roopa, and Nikolay that we cannot modify the kernel behavior. I get that, so that leaves me with user space responses. I went this route because not allowing the end user to make this mistake would have saved us a stupid amount of time from having to debug/understand/rectify ( rinse repeat for every incident ). A warning wouldn't have saved us here since this was all automated and a warning won't generate any actionable return codes from using `ip link add...`. If the argument is that other people are doing it wrong too, point me at them and I'll submit patches there too. In other words a user management problem that the kernel/iproute2 hog ties me from being actually able to stop mistakes when they happen is an interesting response. Part of this is that the routing stack considers vrf completely independent and we don't have duplicate labels to identify the same table( nor can I think of a good use case where this would be even advisable and if you can please let me know as that I want to understand this ). We have a set of actions we perform when we receive routing data from the kernel and if we don't act on the right vrf we've broken routing. This routing data sent over the netlink bus is the tableid, if we can't stop users from making mistakes, can we modify the netlink code actually send us disambiguous data then and include the label as well as part of the route update? thanks! donald On Sun, Mar 8, 2020 at 10:22 PM David Ahern <dsahern@gmail.com> wrote: > > On 3/7/20 1:59 PM, Donald Sharp wrote: > > Creation of different vrf's with duplicate table id's creates > > a situation where two different routing entities believe > > they have exclusive access to a particular table. This > > leads to situations where different routing processes > > clash for control of a route due to inadvertent table > > id overlap. Prevent end user from making this mistake > > on accident. > > I get the pain, but it is a user management problem and ip is but one > tool. I think at most ip warns the user about the table duplication; it > can't fail the create.
On 03/08/2020 07:22 PM, David Ahern wrote: > On 3/7/20 1:59 PM, Donald Sharp wrote: >> Creation of different vrf's with duplicate table id's creates >> a situation where two different routing entities believe >> they have exclusive access to a particular table. This >> leads to situations where different routing processes >> clash for control of a route due to inadvertent table >> id overlap. Prevent end user from making this mistake >> on accident. > > I get the pain, but it is a user management problem and ip is but one > tool. I think at most ip warns the user about the table duplication; it > can't fail the create. You could default 'ip' to use the safe mode, and allow a --force option if there is some reason why it should be allowed? Thanks, Ben
On 3/9/20 6:16 AM, Donald Sharp wrote: > David - > > I'm more than a bit confused about this stance. I've been repeatedly > told by the likes of you, Roopa, and Nikolay that we cannot modify the > kernel behavior. I get that, so that leaves me with user space > responses. I went this route because not allowing the end user to > make this mistake would have saved us a stupid amount of time from > having to debug/understand/rectify ( rinse repeat for every incident > ). A warning wouldn't have saved us here since this was all automated > and a warning won't generate any actionable return codes from using > `ip link add...`. If the argument is that other people are doing it > wrong too, point me at them and I'll submit patches there too. In > other words a user management problem that the kernel/iproute2 hog > ties me from being actually able to stop mistakes when they happen is > an interesting response. > > Part of this is that the routing stack considers vrf completely > independent and we don't have duplicate labels to identify the same > table( nor can I think of a good use case where this would be even > advisable and if you can please let me know as that I want to > understand this ). We have a set of actions we perform when we > receive routing data from the kernel and if we don't act on the right > vrf we've broken routing. This routing data sent over the netlink bus > is the tableid, if we can't stop users from making mistakes, can we > modify the netlink code actually send us disambiguous data then and > include the label as well as part of the route update? As you know multiple tables existed long before VRF devices. The VRF device provides the programmatic API for an end-to-end solution. To name a few: - moving packets (ingress via device A which is in VRF B to look up in table C), - applications (bind this socket to VRF B, a name for the table id), and - assorted notifications like device enslave / removal. I did think about the multiple devices pointing to the same table during development, and at that time I did not see a reason to prevent it. Ultimately, the kernel does not care which is why I say it is a userspace problem. Yes, people make mistakes and tests will typically use ip for scripting. Existing behavior allows it, and we can't break that. Given where we are any change needs to be opt in. For example, your patch can generate a 'WARNING: you are doing something unexpected or wrong' message. From there what comes to mind is the compiler option Werror - make warnings fatal. e.g., for iproute2, it could a new top-level option (ip -Werror) so that you can alias it (alias ip='ip -Werror') and then iproute2 has a warning function to display warnings to the user that when Werror is passed that function does exit(1) to cause the command to fail.
diff --git a/ip/ip_common.h b/ip/ip_common.h index 879287e3..4cc825e9 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -148,6 +148,8 @@ void xdp_dump(FILE *fp, struct rtattr *tb, bool link, bool details); /* iplink_vrf.c */ __u32 ipvrf_get_table(const char *name); +__u32 vrf_table_linkinfo(struct rtattr *li[]); +int ipvrf_filter_req(struct nlmsghdr *nhl, int reqlen); int name_is_vrf(const char *name); #ifndef INFINITY_LIFE_TIME diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c index 5d20f29d..968593f6 100644 --- a/ip/iplink_vrf.c +++ b/ip/iplink_vrf.c @@ -29,17 +29,64 @@ static void explain(void) vrf_explain(stderr); } +static bool vrf_find_table(struct nlmsghdr *n, __u32 table) +{ + struct ifinfomsg *ifi = NLMSG_DATA(n); + struct rtattr *tb[IFLA_MAX + 1]; + struct rtattr *li[IFLA_INFO_MAX + 1]; + int len = n->nlmsg_len; + __u32 vrf_table; + + len -= NLMSG_LENGTH(sizeof(*ifi)); + if (len < 0) + return false; + + parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len); + + if (!tb[IFLA_LINKINFO]) + return false; + + parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]); + + if (!li[IFLA_INFO_KIND]) + return false; + + vrf_table = vrf_table_linkinfo(li); + + if (vrf_table == table) + return true; + + return false; +} + static int vrf_parse_opt(struct link_util *lu, int argc, char **argv, struct nlmsghdr *n) { while (argc > 0) { if (matches(*argv, "table") == 0) { __u32 table; + bool found = false; + struct nlmsg_chain linfo = { NULL, NULL }; NEXT_ARG(); if (rtnl_rttable_a2n(&table, *argv)) invarg("invalid table ID\n", *argv); + + if (ip_link_list(ipvrf_filter_req, &linfo) == 0) { + struct nlmsg_list *l; + + for (l = linfo.head; l; l = l->next) { + if (vrf_find_table(&l->h, table)) { + found = true; + break; + } + } + } + if (found) + invarg("table specified is already being used\n", + *argv); + addattr32(n, 1024, IFLA_VRF_TABLE, table); } else if (matches(*argv, "help") == 0) { explain(); diff --git a/ip/ipvrf.c b/ip/ipvrf.c index b9a43675..ff0c492c 100644 --- a/ip/ipvrf.c +++ b/ip/ipvrf.c @@ -477,7 +477,7 @@ void vrf_reset(void) vrf_switch("default"); } -static int ipvrf_filter_req(struct nlmsghdr *nlh, int reqlen) +int ipvrf_filter_req(struct nlmsghdr *nlh, int reqlen) { struct rtattr *linkinfo; int err; @@ -497,7 +497,7 @@ static int ipvrf_filter_req(struct nlmsghdr *nlh, int reqlen) } /* input arg is linkinfo */ -static __u32 vrf_table_linkinfo(struct rtattr *li[]) +__u32 vrf_table_linkinfo(struct rtattr *li[]) { struct rtattr *attr[IFLA_VRF_MAX + 1];
Creation of different vrf's with duplicate table id's creates a situation where two different routing entities believe they have exclusive access to a particular table. This leads to situations where different routing processes clash for control of a route due to inadvertent table id overlap. Prevent end user from making this mistake on accident. sharpd@eva ~/i/ip (master)> ip vrf show Name Table ----------------------- BLUE 1300 GREEN 1301 sharpd@eva ~/i/ip (master)> sudo ./ip link add ORANGE type vrf table 1300 Error: argument "1300" is wrong: table specified is already being used sharpd@eva ~/i/ip (master) [255]> sudo ./ip link add ORANGE type vrf table 1302 sharpd@eva ~/i/ip (master)> ip vrf show Name Table ----------------------- BLUE 1300 GREEN 1301 ORANGE 1302 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> --- ip/ip_common.h | 2 ++ ip/iplink_vrf.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ ip/ipvrf.c | 4 ++-- 3 files changed, 51 insertions(+), 2 deletions(-)