diff mbox

Bisect'ed BUG in VLAN promisc mode (6c78dcbd47)

Message ID 1222437636.7598.14.camel@localhost.localdomain
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Jesper Dangaard Brouer Sept. 26, 2008, 2 p.m. UTC
Hi Patrick,

Bisected me down to one of you changes
 commit 6c78dcbd47a68a
 [VLAN]: Fix promiscous/allmulti synchronization races

Description:
------------
 All other VLAN interfaces stop working, if a vlan is taken down
 (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
 interface.

 The problem is a result of promisc mode being removed on the
 real-device (eth1), when the vlan interface is taken down.  This
 should not happen as other vlan devices exists that still need
 promisc mode on the real-device (eth1).

Setup:
------
  eth1      - the real-device
  eth1.1013 - VLAN device
  eth1.1025 - VLAN device

 Both VLAN devices has assigned ether address 00:11:22:33:44:55 (which
 is different from the real-device).

Reproduce / test:
-----------------
Lookfor "promiscuous mode" changes in kern.log
# tail -n 40 -f /var/log/kern.log | grep "promiscuous mode"

Start a tcpdump on VLAN device
# tcpdump -ni eth1.1025

# LOG:
#   kernel: device eth1.1025 entered promiscuous mode

Take VLAN device down
# ifconfig eth1.1025 down

Tcpdump dies, the problem is that promisc mode is removed from both
the real-device and the VLAN device.  That should NOT happen, as there
are other VLAN devices up (eth1.1013).

# LOG:
#   kernel: device eth1.1025 left promiscuous mode
#   kernel: device eth1 left promiscuous mode

Bisect log: Attached

Commit 6c78dcbd47: Attached


See you in Paris!

Comments

Patrick McHardy Sept. 26, 2008, 4:14 p.m. UTC | #1
Jesper Dangaard Brouer wrote:
> Hi Patrick,
> 
> Bisected me down to one of you changes
>  commit 6c78dcbd47a68a
>  [VLAN]: Fix promiscous/allmulti synchronization races
> 
> Description:
> ------------
>  All other VLAN interfaces stop working, if a vlan is taken down
>  (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
>  interface.
> 
>  The problem is a result of promisc mode being removed on the
>  real-device (eth1), when the vlan interface is taken down.  This
>  should not happen as other vlan devices exists that still need
>  promisc mode on the real-device (eth1).

I'm pretty sure we fixed this already in commit 0ed21b32.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Sept. 26, 2008, 7:22 p.m. UTC | #2
On Fri, 2008-09-26 at 18:14 +0200, Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
> > Hi Patrick,
> > 
> > Bisected me down to one of you changes
> >  commit 6c78dcbd47a68a
> >  [VLAN]: Fix promiscous/allmulti synchronization races
> > 
> > Description:
> > ------------
> >  All other VLAN interfaces stop working, if a vlan is taken down
> >  (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
> >  interface.
> > 
> >  The problem is a result of promisc mode being removed on the
> >  real-device (eth1), when the vlan interface is taken down.  This
> >  should not happen as other vlan devices exists that still need
> >  promisc mode on the real-device (eth1).
> 
> I'm pretty sure we fixed this already in commit 0ed21b32.

I think that I have tested including this commit.  The first thing I did
was to test with latest DaveM net-2.6 git tree.

I'll test it later... perhaps during the hacking days...

Need to pack.. See you soon!
Patrick McHardy Sept. 26, 2008, 7:24 p.m. UTC | #3
Jesper Dangaard Brouer wrote:
> On Fri, 2008-09-26 at 18:14 +0200, Patrick McHardy wrote:
>> Jesper Dangaard Brouer wrote:
>>> Hi Patrick,
>>>
>>> Bisected me down to one of you changes
>>>  commit 6c78dcbd47a68a
>>>  [VLAN]: Fix promiscous/allmulti synchronization races
>>>
>>> Description:
>>> ------------
>>>  All other VLAN interfaces stop working, if a vlan is taken down
>>>  (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
>>>  interface.
>>>
>>>  The problem is a result of promisc mode being removed on the
>>>  real-device (eth1), when the vlan interface is taken down.  This
>>>  should not happen as other vlan devices exists that still need
>>>  promisc mode on the real-device (eth1).
>> I'm pretty sure we fixed this already in commit 0ed21b32.
> 
> I think that I have tested including this commit.  The first thing I did
> was to test with latest DaveM net-2.6 git tree.

Let me see if I can reproduce it ...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Sept. 26, 2008, 7:28 p.m. UTC | #4
Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
>> On Fri, 2008-09-26 at 18:14 +0200, Patrick McHardy wrote:
>>> Jesper Dangaard Brouer wrote:
>>>> Hi Patrick,
>>>>
>>>> Bisected me down to one of you changes
>>>>  commit 6c78dcbd47a68a
>>>>  [VLAN]: Fix promiscous/allmulti synchronization races
>>>>
>>>> Description:
>>>> ------------
>>>>  All other VLAN interfaces stop working, if a vlan is taken down
>>>>  (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
>>>>  interface.
>>>>
>>>>  The problem is a result of promisc mode being removed on the
>>>>  real-device (eth1), when the vlan interface is taken down.  This
>>>>  should not happen as other vlan devices exists that still need
>>>>  promisc mode on the real-device (eth1).
>>> I'm pretty sure we fixed this already in commit 0ed21b32.
>>
>> I think that I have tested including this commit.  The first thing I did
>> was to test with latest DaveM net-2.6 git tree.
> 
> Let me see if I can reproduce it ...

Actually - one question: you're saying you're using different MAC
addresses on the VLAN devices, so I guess thats why you're expecting
the underlying device to still be in promiscous mode after you set
eth1.1025 down. For devices that support multiple unicast addresses
in hardware, we don't put the device in promiscous mode anymore.
So the question is: is something actually not working, or did you
just notice that the real device is no longer in promiscous mode?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Sept. 26, 2008, 7:34 p.m. UTC | #5
On Fri, 2008-09-26 at 21:28 +0200, Patrick McHardy wrote:
> Patrick McHardy wrote:
> > Jesper Dangaard Brouer wrote:
> >> On Fri, 2008-09-26 at 18:14 +0200, Patrick McHardy wrote:
> >>> Jesper Dangaard Brouer wrote:
> >>>> Hi Patrick,
> >>>>
> >>>> Bisected me down to one of you changes
> >>>>  commit 6c78dcbd47a68a
> >>>>  [VLAN]: Fix promiscous/allmulti synchronization races
> >>>>
> >>>> Description:
> >>>> ------------
> >>>>  All other VLAN interfaces stop working, if a vlan is taken down
> >>>>  (ifconfig eth1.1025 down) _while_ there is a tcpdump running on that
> >>>>  interface.
> >>>>
> >>>>  The problem is a result of promisc mode being removed on the
> >>>>  real-device (eth1), when the vlan interface is taken down.  This
> >>>>  should not happen as other vlan devices exists that still need
> >>>>  promisc mode on the real-device (eth1).
> >>> I'm pretty sure we fixed this already in commit 0ed21b32.
> >>
> >> I think that I have tested including this commit.  The first thing I did
> >> was to test with latest DaveM net-2.6 git tree.
> > 
> > Let me see if I can reproduce it ...
> 
> Actually - one question: you're saying you're using different MAC
> addresses on the VLAN devices, so I guess thats why you're expecting
> the underlying device to still be in promiscous mode after you set
> eth1.1025 down. For devices that support multiple unicast addresses
> in hardware, we don't put the device in promiscous mode anymore.
> So the question is: is something actually not working, or did you
> just notice that the real device is no longer in promiscous mode?

It stopped working!  In the test setup I do have a machine connected to
eth1.1013, where I have a ping running, that stop working...

I use the tg3 driver for both netcards. Don't know if it supports it?
Patrick McHardy Sept. 26, 2008, 7:41 p.m. UTC | #6
Jesper Dangaard Brouer wrote:
> On Fri, 2008-09-26 at 21:28 +0200, Patrick McHardy wrote:
>> Actually - one question: you're saying you're using different MAC
>> addresses on the VLAN devices, so I guess thats why you're expecting
>> the underlying device to still be in promiscous mode after you set
>> eth1.1025 down. For devices that support multiple unicast addresses
>> in hardware, we don't put the device in promiscous mode anymore.
>> So the question is: is something actually not working, or did you
>> just notice that the real device is no longer in promiscous mode?
> 
> It stopped working!  In the test setup I do have a machine connected to
> eth1.1013, where I have a ping running, that stop working...

Found it in the bugreport :) OK, I'll try to reproduce it now.

> I use the tg3 driver for both netcards. Don't know if it supports it?

The driver doesn't support it.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Sept. 26, 2008, 8:10 p.m. UTC | #7
Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
>> On Fri, 2008-09-26 at 21:28 +0200, Patrick McHardy wrote:
>>> Actually - one question: you're saying you're using different MAC
>>> addresses on the VLAN devices, so I guess thats why you're expecting
>>> the underlying device to still be in promiscous mode after you set
>>> eth1.1025 down. For devices that support multiple unicast addresses
>>> in hardware, we don't put the device in promiscous mode anymore.
>>> So the question is: is something actually not working, or did you
>>> just notice that the real device is no longer in promiscous mode?
>>
>> It stopped working!  In the test setup I do have a machine connected to
>> eth1.1013, where I have a ping running, that stop working...
> 
> Found it in the bugreport :) OK, I'll try to reproduce it now.

Found it without reproduing, but unfortunately I also have to leave
now, will look at it later again.

Anyways, the problem appears to be that the promiscous count is
decremented twice for the VLAN device, once in vlan_stop() because
the device is still in promiscous mode, once when af_packet takes
the VLAN device out of promisc in the NETDEV_UNREGISTER notifier
chain, which triggers the VLAN ->change_rx_mode callback and
removes another promiscous count from the real device.

I think the correct fix would be to not invoke ->change_rx_flags
while the device is down, similar to ->set_multicast_list and
->set_rx_mode, but I need to check the other drivers first.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

commit 6c78dcbd47a68a7d25d2bee7a6c74b9136cb5fde
Author: Patrick McHardy <kaber@trash.net>
Date:   Sat Jul 14 18:52:56 2007 -0700

    [VLAN]: Fix promiscous/allmulti synchronization races
    
    The set_multicast_list function may be called without holding the rtnl
    mutex, resulting in races when changing the underlying device's promiscous
    and allmulti state. Use the change_rx_mode hook, which is always invoked
    under the rtnl.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 61a57dc..7f71df4 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -132,8 +132,6 @@  struct vlan_dev_info {
                                            * made, in order to feed the right changes down
                                            * to the real hardware...
                                            */
-	int old_allmulti;               /* similar to above. */
-	int old_promiscuity;            /* similar to above. */
 	struct net_device *real_dev;    /* the underlying device/interface */
 	unsigned char real_dev_addr[ETH_ALEN];
 	struct proc_dir_entry *dent;    /* Holds the proc data */
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index abb9900..39bdcc2 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -373,6 +373,7 @@  void vlan_setup(struct net_device *new_dev)
 	new_dev->open = vlan_dev_open;
 	new_dev->stop = vlan_dev_stop;
 	new_dev->set_multicast_list = vlan_dev_set_multicast_list;
+	new_dev->change_rx_flags = vlan_change_rx_flags;
 	new_dev->destructor = free_netdev;
 	new_dev->do_ioctl = vlan_dev_ioctl;
 
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 62ce1c5..7df5b29 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -69,6 +69,7 @@  int vlan_dev_set_vlan_flag(const struct net_device *dev,
 			   u32 flag, short flag_val);
 void vlan_dev_get_realdev_name(const struct net_device *dev, char *result);
 void vlan_dev_get_vid(const struct net_device *dev, unsigned short *result);
+void vlan_change_rx_flags(struct net_device *dev, int change);
 void vlan_dev_set_multicast_list(struct net_device *vlan_dev);
 
 int vlan_check_real_dev(struct net_device *real_dev, unsigned short vlan_id);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index d4a62d1..dec7e62 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -712,6 +712,11 @@  int vlan_dev_open(struct net_device *dev)
 	}
 	memcpy(vlan->real_dev_addr, real_dev->dev_addr, ETH_ALEN);
 
+	if (dev->flags & IFF_ALLMULTI)
+		dev_set_allmulti(real_dev, 1);
+	if (dev->flags & IFF_PROMISC)
+		dev_set_promiscuity(real_dev, 1);
+
 	return 0;
 }
 
@@ -721,6 +726,11 @@  int vlan_dev_stop(struct net_device *dev)
 
 	vlan_flush_mc_list(dev);
 
+	if (dev->flags & IFF_ALLMULTI)
+		dev_set_allmulti(real_dev, -1);
+	if (dev->flags & IFF_PROMISC)
+		dev_set_promiscuity(real_dev, -1);
+
 	if (compare_ether_addr(dev->dev_addr, real_dev->dev_addr))
 		dev_unicast_delete(real_dev, dev->dev_addr, dev->addr_len);
 
@@ -754,34 +764,26 @@  int vlan_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	return err;
 }
 
+void vlan_change_rx_flags(struct net_device *dev, int change)
+{
+	struct net_device *real_dev = VLAN_DEV_INFO(dev)->real_dev;
+
+	if (change & IFF_ALLMULTI)
+		dev_set_allmulti(real_dev, dev->flags & IFF_ALLMULTI ? 1 : -1);
+	if (change & IFF_PROMISC)
+		dev_set_promiscuity(real_dev, dev->flags & IFF_PROMISC ? 1 : -1);
+}
+
 /** Taken from Gleb + Lennert's VLAN code, and modified... */
 void vlan_dev_set_multicast_list(struct net_device *vlan_dev)
 {
 	struct dev_mc_list *dmi;
 	struct net_device *real_dev;
-	int inc;
 
 	if (vlan_dev && (vlan_dev->priv_flags & IFF_802_1Q_VLAN)) {
 		/* Then it's a real vlan device, as far as we can tell.. */
 		real_dev = VLAN_DEV_INFO(vlan_dev)->real_dev;
 
-		/* compare the current promiscuity to the last promisc we had.. */
-		inc = vlan_dev->promiscuity - VLAN_DEV_INFO(vlan_dev)->old_promiscuity;
-		if (inc) {
-			printk(KERN_INFO "%s: dev_set_promiscuity(master, %d)\n",
-			       vlan_dev->name, inc);
-			dev_set_promiscuity(real_dev, inc); /* found in dev.c */
-			VLAN_DEV_INFO(vlan_dev)->old_promiscuity = vlan_dev->promiscuity;
-		}
-
-		inc = vlan_dev->allmulti - VLAN_DEV_INFO(vlan_dev)->old_allmulti;
-		if (inc) {
-			printk(KERN_INFO "%s: dev_set_allmulti(master, %d)\n",
-			       vlan_dev->name, inc);
-			dev_set_allmulti(real_dev, inc); /* dev.c */
-			VLAN_DEV_INFO(vlan_dev)->old_allmulti = vlan_dev->allmulti;
-		}
-
 		/* looking for addresses to add to master's list */
 		for (dmi = vlan_dev->mc_list; dmi != NULL; dmi = dmi->next) {
 			if (vlan_should_add_mc(dmi, VLAN_DEV_INFO(vlan_dev)->old_mc_list)) {