Patchwork 8021q: fix vlan 0 inconsistencies

login
register
mail settings
Submitter Nikolay Aleksandrov
Date June 20, 2013, 12:24 p.m.
Message ID <1371731078-12531-1-git-send-email-nikolay@redhat.com>
Download mbox | patch
Permalink /patch/252891/
State Rejected
Delegated to: David Miller
Headers show

Comments

Nikolay Aleksandrov - June 20, 2013, 12:24 p.m.
From: Nikolay Aleksandrov <nikolay@redhat.com>

The first part of the patch stops the addition of VLAN 0 to bonding
devices because we use an internal vlan_list to keep the added vlans and
after that when checking if we're using vlans on the bond
(bond_vlan_used) it evaluates to true always, which leads to different
problems. Since this is intended for HW vlan filters, it's not needed
for the bonding, and for its slaves it will still get added upon
NETDEV_UP.
The second part that does unconditional vlan_vid_del is needed because
when we add vlan 0 to a bonding device, it can never be removed
completely (it will always stay in the local vlan_list). Since there's
refcounting, I don't think this will change the behaviour because if a
real device is UP then vlan 0 will have at least refcnt == 1 so
ndo_vlan_rx_kill_vid won't get called until the device is down, but in
the bonding case we need it while the device is up so we can cleanup
properly after vlan 0 removal.
As an addition I'd like to say that I tried many different fixes of this
issue from inside the bonding, but they all have shortcomings and fixing
the root cause would be much better. For example I can't filter out vlan
0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others)
rely on being able to check the vlan devices on top through the local
vlan_list. Also there's no way to differentiate between addition of vlan 0
from vlan_device_event and from register_vlan_dev.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 net/8021q/vlan.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)
Nikolay Aleksandrov - June 20, 2013, 2:08 p.m.
On 06/20/2013 02:24 PM, nikolay@redhat.com wrote:
> From: Nikolay Aleksandrov <nikolay@redhat.com>
> 
> The first part of the patch stops the addition of VLAN 0 to bonding
> devices because we use an internal vlan_list to keep the added vlans and
> after that when checking if we're using vlans on the bond
> (bond_vlan_used) it evaluates to true always, which leads to different
> problems. Since this is intended for HW vlan filters, it's not needed
> for the bonding, and for its slaves it will still get added upon
> NETDEV_UP.
> The second part that does unconditional vlan_vid_del is needed because
> when we add vlan 0 to a bonding device, it can never be removed
> completely (it will always stay in the local vlan_list). Since there's
> refcounting, I don't think this will change the behaviour because if a
> real device is UP then vlan 0 will have at least refcnt == 1 so
> ndo_vlan_rx_kill_vid won't get called until the device is down, but in
> the bonding case we need it while the device is up so we can cleanup
> properly after vlan 0 removal.
> As an addition I'd like to say that I tried many different fixes of this
> issue from inside the bonding, but they all have shortcomings and fixing
> the root cause would be much better. For example I can't filter out vlan
> 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others)
> rely on being able to check the vlan devices on top through the local
> vlan_list. Also there's no way to differentiate between addition of vlan 0
> from vlan_device_event and from register_vlan_dev.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
In fact I think there's a deeper issue with vlan 0 because if you add it to any
device its refcount will only be incremented (unconditional vlan_vid_add in
register_vlan_dev) and never decremented. And this issue is also fixed by this
patch.

Nik
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 28, 2013, 5:27 a.m.
From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Thu, 20 Jun 2013 16:08:34 +0200

> On 06/20/2013 02:24 PM, nikolay@redhat.com wrote:
>> From: Nikolay Aleksandrov <nikolay@redhat.com>
>> 
>> The first part of the patch stops the addition of VLAN 0 to bonding
>> devices because we use an internal vlan_list to keep the added vlans and
>> after that when checking if we're using vlans on the bond
>> (bond_vlan_used) it evaluates to true always, which leads to different
>> problems. Since this is intended for HW vlan filters, it's not needed
>> for the bonding, and for its slaves it will still get added upon
>> NETDEV_UP.
>> The second part that does unconditional vlan_vid_del is needed because
>> when we add vlan 0 to a bonding device, it can never be removed
>> completely (it will always stay in the local vlan_list). Since there's
>> refcounting, I don't think this will change the behaviour because if a
>> real device is UP then vlan 0 will have at least refcnt == 1 so
>> ndo_vlan_rx_kill_vid won't get called until the device is down, but in
>> the bonding case we need it while the device is up so we can cleanup
>> properly after vlan 0 removal.
>> As an addition I'd like to say that I tried many different fixes of this
>> issue from inside the bonding, but they all have shortcomings and fixing
>> the root cause would be much better. For example I can't filter out vlan
>> 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others)
>> rely on being able to check the vlan devices on top through the local
>> vlan_list. Also there's no way to differentiate between addition of vlan 0
>> from vlan_device_event and from register_vlan_dev.
>> 
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> In fact I think there's a deeper issue with vlan 0 because if you add it to any
> device its refcount will only be incremented (unconditional vlan_vid_add in
> register_vlan_dev) and never decremented. And this issue is also fixed by this
> patch.

I don't think I can apply this patch, it seems to revert very much intentional
behavior.

If you have the 8021q module available, and you bring a device up, it gets
VLAN 0 by default, and if necessary programmed into the HW filters of the
device.

This VLAN 0 entry is not treated like a real VLAN, it is just there to be
decapsulated for the sake of 802.1p Priority Code Points (QoS).

If the user explicitly configures other VLAN entries, then removes them all,
that conditional check on vlan_id in the delete path retains this default
VLAN 0 configuration and is very much intended to behave that way.

Your patch breaks this, so I cannot apply it.

If bonding is so broken that it cannot cope with this default 802.1p behavior,
that is really bonding's problem.  It seemingly needs logic to handle 802.1p,
and that default VID 0, properly.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov - June 28, 2013, 8:20 a.m.
On 06/28/2013 07:27 AM, David Miller wrote:
> From: Nikolay Aleksandrov <nikolay@redhat.com>
> Date: Thu, 20 Jun 2013 16:08:34 +0200
> 
>> On 06/20/2013 02:24 PM, nikolay@redhat.com wrote:
>>> From: Nikolay Aleksandrov <nikolay@redhat.com>
>>>
>>> The first part of the patch stops the addition of VLAN 0 to bonding
>>> devices because we use an internal vlan_list to keep the added vlans and
>>> after that when checking if we're using vlans on the bond
>>> (bond_vlan_used) it evaluates to true always, which leads to different
>>> problems. Since this is intended for HW vlan filters, it's not needed
>>> for the bonding, and for its slaves it will still get added upon
>>> NETDEV_UP.
>>> The second part that does unconditional vlan_vid_del is needed because
>>> when we add vlan 0 to a bonding device, it can never be removed
>>> completely (it will always stay in the local vlan_list). Since there's
>>> refcounting, I don't think this will change the behaviour because if a
>>> real device is UP then vlan 0 will have at least refcnt == 1 so
>>> ndo_vlan_rx_kill_vid won't get called until the device is down, but in
>>> the bonding case we need it while the device is up so we can cleanup
>>> properly after vlan 0 removal.
>>> As an addition I'd like to say that I tried many different fixes of this
>>> issue from inside the bonding, but they all have shortcomings and fixing
>>> the root cause would be much better. For example I can't filter out vlan
>>> 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others)
>>> rely on being able to check the vlan devices on top through the local
>>> vlan_list. Also there's no way to differentiate between addition of vlan 0
>>> from vlan_device_event and from register_vlan_dev.
>>>
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> In fact I think there's a deeper issue with vlan 0 because if you add it to any
>> device its refcount will only be incremented (unconditional vlan_vid_add in
>> register_vlan_dev) and never decremented. And this issue is also fixed by this
>> patch.
> 
> I don't think I can apply this patch, it seems to revert very much intentional
> behavior.
> 
> If you have the 8021q module available, and you bring a device up, it gets
> VLAN 0 by default, and if necessary programmed into the HW filters of the
> device.
> 
> This VLAN 0 entry is not treated like a real VLAN, it is just there to be
> decapsulated for the sake of 802.1p Priority Code Points (QoS).
> 
> If the user explicitly configures other VLAN entries, then removes them all,
> that conditional check on vlan_id in the delete path retains this default
> VLAN 0 configuration and is very much intended to behave that way.
> 
> Your patch breaks this, so I cannot apply it.
> 
> If bonding is so broken that it cannot cope with this default 802.1p behavior,
> that is really bonding's problem.  It seemingly needs logic to handle 802.1p,
> and that default VID 0, properly.
> 
Hi Dave,
Thank you for the review, but I think I didn't explain myself well :-)
I know about this behavior and was trying to preserve it, this patch will
not destroy that. The reason is that now vlan_vid_add/del are used and in
the case of refcount > 0 in vlan_vid_del the vlan doesn't get deleted from
the HW filters. So when a device is opened VLAN 0 gets added
unconditionally so refcnt = 1, but if I add VLAN 0 (e.g. through vconfig)
additionally its refcnt will get = 2, but since in unregister_vlan_dev
vlan_vid_del is not called for VLAN 0 its refcnt will stay at 2 (if it was
called the only thing that would happen is its refcnt going down without
being removed from the HW filter of the card so the behavior you speak of
is preserved).
Now an example with prints added to vlan_vid_add and vlan_vid_del:
ifconfig eth1 up
Jun 28 10:04:00 localhost kernel: [   90.856548] vlan_vid_add: VID 0 REF 1 CR 0
-----
vconfig add eth1 0
Jun 28 10:04:21 localhost kernel: [  112.033976] vlan_vid_add: VID 0 REF 2 CR 0
-----
vconfig rem eth1.0
(no message, since vlan_vid_del doesn't get called - vid 0 refcnt still = 2)
-----
ifconfig eth1 down
Jun 28 10:04:30 localhost kernel: [  120.792305] vlan_vid_del: VID 0 REF 1
^ - this is the only place vlan_vid_del is called for vlan 0

Now if I were to add and delete vlan 0 once again, its refcnt will only go
up, eg:
series of vconfig add eth1 0, vconfig rem eth1.0
Jun 28 10:15:40 localhost kernel: [  790.349967] vlan_vid_add: VID 0 REF 3 CR 0
Jun 28 10:15:42 localhost kernel: [  792.450245] vlan_vid_add: VID 0 REF 4 CR 0


Nik

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov - June 29, 2013, 11:46 a.m.
On 06/28/2013 07:27 AM, David Miller wrote:
> I don't think I can apply this patch, it seems to revert very much intentional
> behavior.
> 
> If you have the 8021q module available, and you bring a device up, it gets
> VLAN 0 by default, and if necessary programmed into the HW filters of the
> device.
> 
> This VLAN 0 entry is not treated like a real VLAN, it is just there to be
> decapsulated for the sake of 802.1p Priority Code Points (QoS).
> 
> If the user explicitly configures other VLAN entries, then removes them all,
> that conditional check on vlan_id in the delete path retains this default
> VLAN 0 configuration and is very much intended to behave that way.
> 
> Your patch breaks this, so I cannot apply it.
> 
> If bonding is so broken that it cannot cope with this default 802.1p behavior,
> that is really bonding's problem.  It seemingly needs logic to handle 802.1p,
> and that default VID 0, properly.
> 
Another analysis of this problem by commits:
before commit 5b9ea6e022e9ba0fe39cb349ac40361f78d5da5b ("vlan: introduce
vid list with reference counting") ndo_vlan_rx_kill_vid was called directly
which would've broken this functionality as you said. But after that commit
(and the beginning of refcounting) that is not the case, since when a
device is opened VLAN 0 is added and its refcount always has at least +1
until the device is closed (now ndo_vlan_rx_kill_vid is called only when
refcount == 0). But by creating/destroying VLAN 0 on top you can bump that
to whatever value you'd like which leads to:
commit efc73f4bbc238d4f579fb612c04c8e1dd8a82979 ("net: Fix memory leak -
vlan_info struct"), which added vlan_vid_del in the 8021q netdev notifier
which is called upon NETDEV_DOWN which is intended to remove the VLAN 0
that was added upon NETDEV_UP. But via ruining the refcount you can again
leak memory that way (since vlan 0's refcnt will be > 1, so it will not get
deleted/freed).

Nik
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 9424f37..dbabaa5 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -112,11 +112,9 @@  void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 	}
 
 	/* Take it out of our own structures, but be sure to interlock with
-	 * HW accelerating devices or SW vlan input packet processing if
-	 * VLAN is not 0 (leave it there for 802.1p).
+	 * HW accelerating devices or SW vlan input packet processing
 	 */
-	if (vlan_id)
-		vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
+	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
 
 	/* Get rid of the vlan's reference to real_dev */
 	dev_put(real_dev);
@@ -354,7 +352,8 @@  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 		__vlan_device_event(dev, event);
 
 	if ((event == NETDEV_UP) &&
-	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
+	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) &&
+	    !netif_is_bond_master(dev)) {
 		pr_info("adding VLAN 0 to HW filter on device %s\n",
 			dev->name);
 		vlan_vid_add(dev, htons(ETH_P_8021Q), 0);