[net-next,v2,1/1] bridge: return error code when deleting Vlan

Message ID 1507816314-2896-1-git-send-email-mrv@mojatatu.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • [net-next,v2,1/1] bridge: return error code when deleting Vlan
Related show

Commit Message

Roman Mashak Oct. 12, 2017, 1:51 p.m.
v2:
 Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern)

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/bridge/br_netlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

David Ahern Oct. 12, 2017, 2:19 p.m. | #1
On 10/12/17 7:51 AM, Roman Mashak wrote:
> v2:
>  Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern)
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  net/bridge/br_netlink.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index f0e8268..1efdd48 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p,
>  
>  	case RTM_DELLINK:
>  		if (p) {
> -			nbp_vlan_delete(p, vinfo->vid);
> +			err = nbp_vlan_delete(p, vinfo->vid);
> +			if (err)
> +				break;

I'm not sure a break is the right thing to do. Seems like you leave it
in a half configured state.

>  			if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
> -				br_vlan_delete(p->br, vinfo->vid);
> +				err = br_vlan_delete(p->br, vinfo->vid);
>  		} else {
> -			br_vlan_delete(br, vinfo->vid);
> +			err = br_vlan_delete(br, vinfo->vid);
>  		}
>  		break;
>  	}
> 

Why do you want to return the error code here? Walking the code paths
seems like ENOENT or err from switchdev_port_obj_del are the 2 error
possibilities.
Roman Mashak Oct. 12, 2017, 6:07 p.m. | #2
On Thu, Oct 12, 2017 at 10:19 AM, David Ahern <dsahern@gmail.com> wrote:
> On 10/12/17 7:51 AM, Roman Mashak wrote:
>> v2:
>>  Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern)
>>
>> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
>> ---
>>  net/bridge/br_netlink.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index f0e8268..1efdd48 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p,
>>
>>       case RTM_DELLINK:
>>               if (p) {
>> -                     nbp_vlan_delete(p, vinfo->vid);
>> +                     err = nbp_vlan_delete(p, vinfo->vid);
>> +                     if (err)
>> +                             break;
>
> I'm not sure a break is the right thing to do. Seems like you leave it
> in a half configured state.
>
>>                       if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
>> -                             br_vlan_delete(p->br, vinfo->vid);
>> +                             err = br_vlan_delete(p->br, vinfo->vid);
>>               } else {
>> -                     br_vlan_delete(br, vinfo->vid);
>> +                     err = br_vlan_delete(br, vinfo->vid);
>>               }
>>               break;
>>       }
>>
>
> Why do you want to return the error code here? Walking the code paths
> seems like ENOENT or err from switchdev_port_obj_del are the 2 error
> possibilities.

For example, if you attempt to delete a non-existing vlan on a port,
the current code succeeds and also sends event :

rtnetlink_rcv_msg
    rtnl_bridge_dellink
       br_dellink
          br_afspec
             br_vlan_info

int br_dellink(..)
{
  ...
  err = br_afspec()
  if (err == 0)
      br_ifinfo_notify(RTM_NEWLINK, p);
}

This is misleading, so a proper errcode has to be produced.
Nikolay Aleksandrov Oct. 12, 2017, 6:12 p.m. | #3
On 12/10/17 21:07, Roman Mashak wrote:
> On Thu, Oct 12, 2017 at 10:19 AM, David Ahern <dsahern@gmail.com> wrote:
>> On 10/12/17 7:51 AM, Roman Mashak wrote:
>>> v2:
>>>  Return err immediately if nbp_vlan_delete() fails (pointed by David Ahern)
>>>
>>> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
>>> ---
>>>  net/bridge/br_netlink.c | 8 +++++---
>>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>> index f0e8268..1efdd48 100644
>>> --- a/net/bridge/br_netlink.c
>>> +++ b/net/bridge/br_netlink.c
>>> @@ -527,11 +527,13 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p,
>>>
>>>       case RTM_DELLINK:
>>>               if (p) {
>>> -                     nbp_vlan_delete(p, vinfo->vid);
>>> +                     err = nbp_vlan_delete(p, vinfo->vid);
>>> +                     if (err)
>>> +                             break;
>>
>> I'm not sure a break is the right thing to do. Seems like you leave it
>> in a half configured state.
>>
>>>                       if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
>>> -                             br_vlan_delete(p->br, vinfo->vid);
>>> +                             err = br_vlan_delete(p->br, vinfo->vid);
>>>               } else {
>>> -                     br_vlan_delete(br, vinfo->vid);
>>> +                     err = br_vlan_delete(br, vinfo->vid);
>>>               }
>>>               break;
>>>       }
>>>
>>
>> Why do you want to return the error code here? Walking the code paths
>> seems like ENOENT or err from switchdev_port_obj_del are the 2 error
>> possibilities.
> 
> For example, if you attempt to delete a non-existing vlan on a port,
> the current code succeeds and also sends event :
> 
> rtnetlink_rcv_msg
>     rtnl_bridge_dellink
>        br_dellink
>           br_afspec
>              br_vlan_info
> 
> int br_dellink(..)
> {
>   ...
>   err = br_afspec()
>   if (err == 0)
>       br_ifinfo_notify(RTM_NEWLINK, p);
> }
> 
> This is misleading, so a proper errcode has to be produced.
> 

True, but you also change the expected behaviour because now a user can
clear all vlans with one request (1 - 4094), and after the change that
will fail with a partial delete if some vlan was missing.

This has been the behaviour forever and some script might depend on it.
Also IMO, and as David also mentioned, doing a partial delete is not good.
Jamal Hadi Salim Oct. 13, 2017, 2:03 a.m. | #4
On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote:
> On 12/10/17 21:07, Roman Mashak wrote:

>> For example, if you attempt to delete a non-existing vlan on a port,
>> the current code succeeds and also sends event :
>>
>> rtnetlink_rcv_msg
>>      rtnl_bridge_dellink
>>         br_dellink
>>            br_afspec
>>               br_vlan_info
>>
>> int br_dellink(..)
>> {
>>    ...
>>    err = br_afspec()
>>    if (err == 0)
>>        br_ifinfo_notify(RTM_NEWLINK, p);
>> }
>>
>> This is misleading, so a proper errcode has to be produced.
>>
>



> True, but you also change the expected behaviour because now a user can
> clear all vlans with one request (1 - 4094), and after the change that
> will fail with a partial delete if some vlan was missing.
> 

The issue is more subtle (per Roman above):
Try to delete a vlan (that doesnt  exist).
1) It says "success".
2) Worse: Another process listening (bridge monitor?) gets an _event_
  that  the vlan has been deleted (when it never existed in the first
  place).

> This has been the behaviour forever and some script might depend on it.
> Also IMO, and as David also mentioned, doing a partial delete is not good.
>

I think this is a bug (especially the event part).

cheers,
jamal
Nikolay Aleksandrov Oct. 13, 2017, 2:15 a.m. | #5
On 13.10.2017 05:03, Jamal Hadi Salim wrote:
> On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote:
>> On 12/10/17 21:07, Roman Mashak wrote:
> 
>>> For example, if you attempt to delete a non-existing vlan on a port,
>>> the current code succeeds and also sends event :
>>>
>>> rtnetlink_rcv_msg
>>>      rtnl_bridge_dellink
>>>         br_dellink
>>>            br_afspec
>>>               br_vlan_info
>>>
>>> int br_dellink(..)
>>> {
>>>    ...
>>>    err = br_afspec()
>>>    if (err == 0)
>>>        br_ifinfo_notify(RTM_NEWLINK, p);
>>> }
>>>
>>> This is misleading, so a proper errcode has to be produced.
>>>
>>
> 
> 
> 
>> True, but you also change the expected behaviour because now a user can
>> clear all vlans with one request (1 - 4094), and after the change that
>> will fail with a partial delete if some vlan was missing.
>>
> 
> The issue is more subtle (per Roman above):
> Try to delete a vlan (that doesnt  exist).
> 1) It says "success".
> 2) Worse: Another process listening (bridge monitor?) gets an _event_
>  that  the vlan has been deleted (when it never existed in the first
>  place).
> 
>> This has been the behaviour forever and some script might depend on it.
>> Also IMO, and as David also mentioned, doing a partial delete is not
>> good.
>>
> 
> I think this is a bug (especially the event part).
> 
> cheers,
> jamal
> 

Fair enough, but after the patch you get the opposite effect too - you
delete a couple of vlans but you don't generate an event because of an
error in the middle. That at least can be taken care of.

I do agree it's a bug, but there might be scripts that rely on it and
don't check the return value when clearing vlans. They will end up with
a partial clear and wrongly assumed state, so maybe leave the
opportunistic delete but count if anything was actually deleted and send
an event only then ?
That should make everyone happy :-)
David Ahern Oct. 13, 2017, 2:31 p.m. | #6
On 10/12/17 8:15 PM, Nikolay Aleksandrov wrote:
> I do agree it's a bug, but there might be scripts that rely on it and
> don't check the return value when clearing vlans. They will end up with
> a partial clear and wrongly assumed state, so maybe leave the
> opportunistic delete but count if anything was actually deleted and send
> an event only then ?

+1
Roman Mashak Oct. 13, 2017, 4 p.m. | #7
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:


[...]

>>> Why do you want to return the error code here? Walking the code paths
>>> seems like ENOENT or err from switchdev_port_obj_del are the 2 error
>>> possibilities.
>> 
>> For example, if you attempt to delete a non-existing vlan on a port,
>> the current code succeeds and also sends event :
>> 
>> rtnetlink_rcv_msg
>>     rtnl_bridge_dellink
>>        br_dellink
>>           br_afspec
>>              br_vlan_info
>> 
>> int br_dellink(..)
>> {
>>   ...
>>   err = br_afspec()
>>   if (err == 0)
>>       br_ifinfo_notify(RTM_NEWLINK, p);
>> }
>> 
>> This is misleading, so a proper errcode has to be produced.
>> 
>
> True, but you also change the expected behaviour because now a user can
> clear all vlans with one request (1 - 4094), and after the change that
> will fail with a partial delete if some vlan was missing.

Nikolay, would you like to have a crack at fixing this?

> This has been the behaviour forever and some script might depend on it.
> Also IMO, and as David also mentioned, doing a partial delete is not good.
Nikolay Aleksandrov Oct. 14, 2017, 10:52 a.m. | #8
On 13/10/17 19:00, Roman Mashak wrote:
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:
> 
> 
> [...]
> 
>>>> Why do you want to return the error code here? Walking the code paths
>>>> seems like ENOENT or err from switchdev_port_obj_del are the 2 error
>>>> possibilities.
>>>
>>> For example, if you attempt to delete a non-existing vlan on a port,
>>> the current code succeeds and also sends event :
>>>
>>> rtnetlink_rcv_msg
>>>     rtnl_bridge_dellink
>>>        br_dellink
>>>           br_afspec
>>>              br_vlan_info
>>>
>>> int br_dellink(..)
>>> {
>>>   ...
>>>   err = br_afspec()
>>>   if (err == 0)
>>>       br_ifinfo_notify(RTM_NEWLINK, p);
>>> }
>>>
>>> This is misleading, so a proper errcode has to be produced.
>>>
>>
>> True, but you also change the expected behaviour because now a user can
>> clear all vlans with one request (1 - 4094), and after the change that
>> will fail with a partial delete if some vlan was missing.
> 
> Nikolay, would you like to have a crack at fixing this?
> 

Sure, need to finish something and will cook up a patch next week.

Thanks,
 Nik

Patch

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f0e8268..1efdd48 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -527,11 +527,13 @@  static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p,
 
 	case RTM_DELLINK:
 		if (p) {
-			nbp_vlan_delete(p, vinfo->vid);
+			err = nbp_vlan_delete(p, vinfo->vid);
+			if (err)
+				break;
 			if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
-				br_vlan_delete(p->br, vinfo->vid);
+				err = br_vlan_delete(p->br, vinfo->vid);
 		} else {
-			br_vlan_delete(br, vinfo->vid);
+			err = br_vlan_delete(br, vinfo->vid);
 		}
 		break;
 	}