Message ID | 1507816314-2896-1-git-send-email-mrv@mojatatu.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net-next,v2,1/1] bridge: return error code when deleting Vlan | expand |
On 10/12/17 7:51 AM, Roman Mashak wrote: > v2: > Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern) > > Signed-off-by: Roman Mashak <mrv@mojatatu.com> > --- > net/bridge/br_netlink.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c > index f0e8268..1efdd48 100644 > --- a/net/bridge/br_netlink.c > +++ b/net/bridge/br_netlink.c > @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p, > > case RTM_DELLINK: > if (p) { > - nbp_vlan_delete(p, vinfo->vid); > + err = nbp_vlan_delete(p, vinfo->vid); > + if (err) > + break; I'm not sure a break is the right thing to do. Seems like you leave it in a half configured state. > if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER) > - br_vlan_delete(p->br, vinfo->vid); > + err = br_vlan_delete(p->br, vinfo->vid); > } else { > - br_vlan_delete(br, vinfo->vid); > + err = br_vlan_delete(br, vinfo->vid); > } > break; > } > Why do you want to return the error code here? Walking the code paths seems like ENOENT or err from switchdev_port_obj_del are the 2 error possibilities.
On Thu, Oct 12, 2017 at 10:19 AM, David Ahern <dsahern@gmail.com> wrote: > On 10/12/17 7:51 AM, Roman Mashak wrote: >> v2: >> Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern) >> >> Signed-off-by: Roman Mashak <mrv@mojatatu.com> >> --- >> net/bridge/br_netlink.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >> index f0e8268..1efdd48 100644 >> --- a/net/bridge/br_netlink.c >> +++ b/net/bridge/br_netlink.c >> @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p, >> >> case RTM_DELLINK: >> if (p) { >> - nbp_vlan_delete(p, vinfo->vid); >> + err = nbp_vlan_delete(p, vinfo->vid); >> + if (err) >> + break; > > I'm not sure a break is the right thing to do. Seems like you leave it > in a half configured state. > >> if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER) >> - br_vlan_delete(p->br, vinfo->vid); >> + err = br_vlan_delete(p->br, vinfo->vid); >> } else { >> - br_vlan_delete(br, vinfo->vid); >> + err = br_vlan_delete(br, vinfo->vid); >> } >> break; >> } >> > > Why do you want to return the error code here? Walking the code paths > seems like ENOENT or err from switchdev_port_obj_del are the 2 error > possibilities. For example, if you attempt to delete a non-existing vlan on a port, the current code succeeds and also sends event : rtnetlink_rcv_msg rtnl_bridge_dellink br_dellink br_afspec br_vlan_info int br_dellink(..) { ... err = br_afspec() if (err == 0) br_ifinfo_notify(RTM_NEWLINK, p); } This is misleading, so a proper errcode has to be produced.
On 12/10/17 21:07, Roman Mashak wrote: > On Thu, Oct 12, 2017 at 10:19 AM, David Ahern <dsahern@gmail.com> wrote: >> On 10/12/17 7:51 AM, Roman Mashak wrote: >>> v2: >>> Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern) >>> >>> Signed-off-by: Roman Mashak <mrv@mojatatu.com> >>> --- >>> net/bridge/br_netlink.c | 8 +++++--- >>> 1 file changed, 5 insertions(+), 3 deletions(-) >>> >>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >>> index f0e8268..1efdd48 100644 >>> --- a/net/bridge/br_netlink.c >>> +++ b/net/bridge/br_netlink.c >>> @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p, >>> >>> case RTM_DELLINK: >>> if (p) { >>> - nbp_vlan_delete(p, vinfo->vid); >>> + err = nbp_vlan_delete(p, vinfo->vid); >>> + if (err) >>> + break; >> >> I'm not sure a break is the right thing to do. Seems like you leave it >> in a half configured state. >> >>> if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER) >>> - br_vlan_delete(p->br, vinfo->vid); >>> + err = br_vlan_delete(p->br, vinfo->vid); >>> } else { >>> - br_vlan_delete(br, vinfo->vid); >>> + err = br_vlan_delete(br, vinfo->vid); >>> } >>> break; >>> } >>> >> >> Why do you want to return the error code here? Walking the code paths >> seems like ENOENT or err from switchdev_port_obj_del are the 2 error >> possibilities. > > For example, if you attempt to delete a non-existing vlan on a port, > the current code succeeds and also sends event : > > rtnetlink_rcv_msg > rtnl_bridge_dellink > br_dellink > br_afspec > br_vlan_info > > int br_dellink(..) > { > ... > err = br_afspec() > if (err == 0) > br_ifinfo_notify(RTM_NEWLINK, p); > } > > This is misleading, so a proper errcode has to be produced. > True, but you also change the expected behaviour because now a user can clear all vlans with one request (1 - 4094), and after the change that will fail with a partial delete if some vlan was missing. This has been the behaviour forever and some script might depend on it. Also IMO, and as David also mentioned, doing a partial delete is not good.
On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote: > On 12/10/17 21:07, Roman Mashak wrote: >> For example, if you attempt to delete a non-existing vlan on a port, >> the current code succeeds and also sends event : >> >> rtnetlink_rcv_msg >> rtnl_bridge_dellink >> br_dellink >> br_afspec >> br_vlan_info >> >> int br_dellink(..) >> { >> ... >> err = br_afspec() >> if (err == 0) >> br_ifinfo_notify(RTM_NEWLINK, p); >> } >> >> This is misleading, so a proper errcode has to be produced. >> > > True, but you also change the expected behaviour because now a user can > clear all vlans with one request (1 - 4094), and after the change that > will fail with a partial delete if some vlan was missing. > The issue is more subtle (per Roman above): Try to delete a vlan (that doesnt exist). 1) It says "success". 2) Worse: Another process listening (bridge monitor?) gets an _event_ that the vlan has been deleted (when it never existed in the first place). > This has been the behaviour forever and some script might depend on it. > Also IMO, and as David also mentioned, doing a partial delete is not good. > I think this is a bug (especially the event part). cheers, jamal
On 13.10.2017 05:03, Jamal Hadi Salim wrote: > On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote: >> On 12/10/17 21:07, Roman Mashak wrote: > >>> For example, if you attempt to delete a non-existing vlan on a port, >>> the current code succeeds and also sends event : >>> >>> rtnetlink_rcv_msg >>> rtnl_bridge_dellink >>> br_dellink >>> br_afspec >>> br_vlan_info >>> >>> int br_dellink(..) >>> { >>> ... >>> err = br_afspec() >>> if (err == 0) >>> br_ifinfo_notify(RTM_NEWLINK, p); >>> } >>> >>> This is misleading, so a proper errcode has to be produced. >>> >> > > > >> True, but you also change the expected behaviour because now a user can >> clear all vlans with one request (1 - 4094), and after the change that >> will fail with a partial delete if some vlan was missing. >> > > The issue is more subtle (per Roman above): > Try to delete a vlan (that doesnt exist). > 1) It says "success". > 2) Worse: Another process listening (bridge monitor?) gets an _event_ > that the vlan has been deleted (when it never existed in the first > place). > >> This has been the behaviour forever and some script might depend on it. >> Also IMO, and as David also mentioned, doing a partial delete is not >> good. >> > > I think this is a bug (especially the event part). > > cheers, > jamal > Fair enough, but after the patch you get the opposite effect too - you delete a couple of vlans but you don't generate an event because of an error in the middle. That at least can be taken care of. I do agree it's a bug, but there might be scripts that rely on it and don't check the return value when clearing vlans. They will end up with a partial clear and wrongly assumed state, so maybe leave the opportunistic delete but count if anything was actually deleted and send an event only then ? That should make everyone happy :-)
On 10/12/17 8:15 PM, Nikolay Aleksandrov wrote: > I do agree it's a bug, but there might be scripts that rely on it and > don't check the return value when clearing vlans. They will end up with > a partial clear and wrongly assumed state, so maybe leave the > opportunistic delete but count if anything was actually deleted and send > an event only then ? +1
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes: [...] >>> Why do you want to return the error code here? Walking the code paths >>> seems like ENOENT or err from switchdev_port_obj_del are the 2 error >>> possibilities. >> >> For example, if you attempt to delete a non-existing vlan on a port, >> the current code succeeds and also sends event : >> >> rtnetlink_rcv_msg >> rtnl_bridge_dellink >> br_dellink >> br_afspec >> br_vlan_info >> >> int br_dellink(..) >> { >> ... >> err = br_afspec() >> if (err == 0) >> br_ifinfo_notify(RTM_NEWLINK, p); >> } >> >> This is misleading, so a proper errcode has to be produced. >> > > True, but you also change the expected behaviour because now a user can > clear all vlans with one request (1 - 4094), and after the change that > will fail with a partial delete if some vlan was missing. Nikolay, would you like to have a crack at fixing this? > This has been the behaviour forever and some script might depend on it. > Also IMO, and as David also mentioned, doing a partial delete is not good.
On 13/10/17 19:00, Roman Mashak wrote: > Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes: > > > [...] > >>>> Why do you want to return the error code here? Walking the code paths >>>> seems like ENOENT or err from switchdev_port_obj_del are the 2 error >>>> possibilities. >>> >>> For example, if you attempt to delete a non-existing vlan on a port, >>> the current code succeeds and also sends event : >>> >>> rtnetlink_rcv_msg >>> rtnl_bridge_dellink >>> br_dellink >>> br_afspec >>> br_vlan_info >>> >>> int br_dellink(..) >>> { >>> ... >>> err = br_afspec() >>> if (err == 0) >>> br_ifinfo_notify(RTM_NEWLINK, p); >>> } >>> >>> This is misleading, so a proper errcode has to be produced. >>> >> >> True, but you also change the expected behaviour because now a user can >> clear all vlans with one request (1 - 4094), and after the change that >> will fail with a partial delete if some vlan was missing. > > Nikolay, would you like to have a crack at fixing this? > Sure, need to finish something and will cook up a patch next week. Thanks, Nik
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index f0e8268..1efdd48 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p, case RTM_DELLINK: if (p) { - nbp_vlan_delete(p, vinfo->vid); + err = nbp_vlan_delete(p, vinfo->vid); + if (err) + break; if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER) - br_vlan_delete(p->br, vinfo->vid); + err = br_vlan_delete(p->br, vinfo->vid); } else { - br_vlan_delete(br, vinfo->vid); + err = br_vlan_delete(br, vinfo->vid); } break; }
v2: Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern) Signed-off-by: Roman Mashak <mrv@mojatatu.com> --- net/bridge/br_netlink.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)