Message ID | 1371731078-12531-1-git-send-email-nikolay@redhat.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On 06/20/2013 02:24 PM, nikolay@redhat.com wrote: > From: Nikolay Aleksandrov <nikolay@redhat.com> > > The first part of the patch stops the addition of VLAN 0 to bonding > devices because we use an internal vlan_list to keep the added vlans and > after that when checking if we're using vlans on the bond > (bond_vlan_used) it evaluates to true always, which leads to different > problems. Since this is intended for HW vlan filters, it's not needed > for the bonding, and for its slaves it will still get added upon > NETDEV_UP. > The second part that does unconditional vlan_vid_del is needed because > when we add vlan 0 to a bonding device, it can never be removed > completely (it will always stay in the local vlan_list). Since there's > refcounting, I don't think this will change the behaviour because if a > real device is UP then vlan 0 will have at least refcnt == 1 so > ndo_vlan_rx_kill_vid won't get called until the device is down, but in > the bonding case we need it while the device is up so we can cleanup > properly after vlan 0 removal. > As an addition I'd like to say that I tried many different fixes of this > issue from inside the bonding, but they all have shortcomings and fixing > the root cause would be much better. For example I can't filter out vlan > 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others) > rely on being able to check the vlan devices on top through the local > vlan_list. Also there's no way to differentiate between addition of vlan 0 > from vlan_device_event and from register_vlan_dev. > > Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> In fact I think there's a deeper issue with vlan 0 because if you add it to any device its refcount will only be incremented (unconditional vlan_vid_add in register_vlan_dev) and never decremented. And this issue is also fixed by this patch. Nik -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Nikolay Aleksandrov <nikolay@redhat.com> Date: Thu, 20 Jun 2013 16:08:34 +0200 > On 06/20/2013 02:24 PM, nikolay@redhat.com wrote: >> From: Nikolay Aleksandrov <nikolay@redhat.com> >> >> The first part of the patch stops the addition of VLAN 0 to bonding >> devices because we use an internal vlan_list to keep the added vlans and >> after that when checking if we're using vlans on the bond >> (bond_vlan_used) it evaluates to true always, which leads to different >> problems. Since this is intended for HW vlan filters, it's not needed >> for the bonding, and for its slaves it will still get added upon >> NETDEV_UP. >> The second part that does unconditional vlan_vid_del is needed because >> when we add vlan 0 to a bonding device, it can never be removed >> completely (it will always stay in the local vlan_list). Since there's >> refcounting, I don't think this will change the behaviour because if a >> real device is UP then vlan 0 will have at least refcnt == 1 so >> ndo_vlan_rx_kill_vid won't get called until the device is down, but in >> the bonding case we need it while the device is up so we can cleanup >> properly after vlan 0 removal. >> As an addition I'd like to say that I tried many different fixes of this >> issue from inside the bonding, but they all have shortcomings and fixing >> the root cause would be much better. For example I can't filter out vlan >> 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others) >> rely on being able to check the vlan devices on top through the local >> vlan_list. Also there's no way to differentiate between addition of vlan 0 >> from vlan_device_event and from register_vlan_dev. >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> > In fact I think there's a deeper issue with vlan 0 because if you add it to any > device its refcount will only be incremented (unconditional vlan_vid_add in > register_vlan_dev) and never decremented. And this issue is also fixed by this > patch. I don't think I can apply this patch, it seems to revert very much intentional behavior. If you have the 8021q module available, and you bring a device up, it gets VLAN 0 by default, and if necessary programmed into the HW filters of the device. This VLAN 0 entry is not treated like a real VLAN, it is just there to be decapsulated for the sake of 802.1p Priority Code Points (QoS). If the user explicitly configures other VLAN entries, then removes them all, that conditional check on vlan_id in the delete path retains this default VLAN 0 configuration and is very much intended to behave that way. Your patch breaks this, so I cannot apply it. If bonding is so broken that it cannot cope with this default 802.1p behavior, that is really bonding's problem. It seemingly needs logic to handle 802.1p, and that default VID 0, properly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2013 07:27 AM, David Miller wrote: > From: Nikolay Aleksandrov <nikolay@redhat.com> > Date: Thu, 20 Jun 2013 16:08:34 +0200 > >> On 06/20/2013 02:24 PM, nikolay@redhat.com wrote: >>> From: Nikolay Aleksandrov <nikolay@redhat.com> >>> >>> The first part of the patch stops the addition of VLAN 0 to bonding >>> devices because we use an internal vlan_list to keep the added vlans and >>> after that when checking if we're using vlans on the bond >>> (bond_vlan_used) it evaluates to true always, which leads to different >>> problems. Since this is intended for HW vlan filters, it's not needed >>> for the bonding, and for its slaves it will still get added upon >>> NETDEV_UP. >>> The second part that does unconditional vlan_vid_del is needed because >>> when we add vlan 0 to a bonding device, it can never be removed >>> completely (it will always stay in the local vlan_list). Since there's >>> refcounting, I don't think this will change the behaviour because if a >>> real device is UP then vlan 0 will have at least refcnt == 1 so >>> ndo_vlan_rx_kill_vid won't get called until the device is down, but in >>> the bonding case we need it while the device is up so we can cleanup >>> properly after vlan 0 removal. >>> As an addition I'd like to say that I tried many different fixes of this >>> issue from inside the bonding, but they all have shortcomings and fixing >>> the root cause would be much better. For example I can't filter out vlan >>> 0 in the bond's ndo_vlan_rx_add_vid because bond_has_this_ip() (and others) >>> rely on being able to check the vlan devices on top through the local >>> vlan_list. Also there's no way to differentiate between addition of vlan 0 >>> from vlan_device_event and from register_vlan_dev. >>> >>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> >> In fact I think there's a deeper issue with vlan 0 because if you add it to any >> device its refcount will only be incremented (unconditional vlan_vid_add in >> register_vlan_dev) and never decremented. And this issue is also fixed by this >> patch. > > I don't think I can apply this patch, it seems to revert very much intentional > behavior. > > If you have the 8021q module available, and you bring a device up, it gets > VLAN 0 by default, and if necessary programmed into the HW filters of the > device. > > This VLAN 0 entry is not treated like a real VLAN, it is just there to be > decapsulated for the sake of 802.1p Priority Code Points (QoS). > > If the user explicitly configures other VLAN entries, then removes them all, > that conditional check on vlan_id in the delete path retains this default > VLAN 0 configuration and is very much intended to behave that way. > > Your patch breaks this, so I cannot apply it. > > If bonding is so broken that it cannot cope with this default 802.1p behavior, > that is really bonding's problem. It seemingly needs logic to handle 802.1p, > and that default VID 0, properly. > Hi Dave, Thank you for the review, but I think I didn't explain myself well :-) I know about this behavior and was trying to preserve it, this patch will not destroy that. The reason is that now vlan_vid_add/del are used and in the case of refcount > 0 in vlan_vid_del the vlan doesn't get deleted from the HW filters. So when a device is opened VLAN 0 gets added unconditionally so refcnt = 1, but if I add VLAN 0 (e.g. through vconfig) additionally its refcnt will get = 2, but since in unregister_vlan_dev vlan_vid_del is not called for VLAN 0 its refcnt will stay at 2 (if it was called the only thing that would happen is its refcnt going down without being removed from the HW filter of the card so the behavior you speak of is preserved). Now an example with prints added to vlan_vid_add and vlan_vid_del: ifconfig eth1 up Jun 28 10:04:00 localhost kernel: [ 90.856548] vlan_vid_add: VID 0 REF 1 CR 0 ----- vconfig add eth1 0 Jun 28 10:04:21 localhost kernel: [ 112.033976] vlan_vid_add: VID 0 REF 2 CR 0 ----- vconfig rem eth1.0 (no message, since vlan_vid_del doesn't get called - vid 0 refcnt still = 2) ----- ifconfig eth1 down Jun 28 10:04:30 localhost kernel: [ 120.792305] vlan_vid_del: VID 0 REF 1 ^ - this is the only place vlan_vid_del is called for vlan 0 Now if I were to add and delete vlan 0 once again, its refcnt will only go up, eg: series of vconfig add eth1 0, vconfig rem eth1.0 Jun 28 10:15:40 localhost kernel: [ 790.349967] vlan_vid_add: VID 0 REF 3 CR 0 Jun 28 10:15:42 localhost kernel: [ 792.450245] vlan_vid_add: VID 0 REF 4 CR 0 Nik -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2013 07:27 AM, David Miller wrote: > I don't think I can apply this patch, it seems to revert very much intentional > behavior. > > If you have the 8021q module available, and you bring a device up, it gets > VLAN 0 by default, and if necessary programmed into the HW filters of the > device. > > This VLAN 0 entry is not treated like a real VLAN, it is just there to be > decapsulated for the sake of 802.1p Priority Code Points (QoS). > > If the user explicitly configures other VLAN entries, then removes them all, > that conditional check on vlan_id in the delete path retains this default > VLAN 0 configuration and is very much intended to behave that way. > > Your patch breaks this, so I cannot apply it. > > If bonding is so broken that it cannot cope with this default 802.1p behavior, > that is really bonding's problem. It seemingly needs logic to handle 802.1p, > and that default VID 0, properly. > Another analysis of this problem by commits: before commit 5b9ea6e022e9ba0fe39cb349ac40361f78d5da5b ("vlan: introduce vid list with reference counting") ndo_vlan_rx_kill_vid was called directly which would've broken this functionality as you said. But after that commit (and the beginning of refcounting) that is not the case, since when a device is opened VLAN 0 is added and its refcount always has at least +1 until the device is closed (now ndo_vlan_rx_kill_vid is called only when refcount == 0). But by creating/destroying VLAN 0 on top you can bump that to whatever value you'd like which leads to: commit efc73f4bbc238d4f579fb612c04c8e1dd8a82979 ("net: Fix memory leak - vlan_info struct"), which added vlan_vid_del in the 8021q netdev notifier which is called upon NETDEV_DOWN which is intended to remove the VLAN 0 that was added upon NETDEV_UP. But via ruining the refcount you can again leak memory that way (since vlan 0's refcnt will be > 1, so it will not get deleted/freed). Nik -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 9424f37..dbabaa5 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -112,11 +112,9 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) } /* Take it out of our own structures, but be sure to interlock with - * HW accelerating devices or SW vlan input packet processing if - * VLAN is not 0 (leave it there for 802.1p). + * HW accelerating devices or SW vlan input packet processing */ - if (vlan_id) - vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); + vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); /* Get rid of the vlan's reference to real_dev */ dev_put(real_dev); @@ -354,7 +352,8 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, __vlan_device_event(dev, event); if ((event == NETDEV_UP) && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) { + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) && + !netif_is_bond_master(dev)) { pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0);