diff mbox

[net] bridge: do not expire mdb entry when bridge still uses it

Message ID 1362708423-23932-1-git-send-email-amwang@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Amerigo Wang March 8, 2013, 2:07 a.m. UTC
From: Cong Wang <amwang@redhat.com>

This is a long-standing bug and reported several times:
https://bugzilla.redhat.com/show_bug.cgi?id=880035
http://marc.info/?l=linux-netdev&m=136164389416341&w=2

This bug can be observed in virt environment, when a KVM guest
communicates with the host via multicast. After some time (should
be 260 sec, I didn't measure), the multicast traffic suddenly
terminates.

This is due to the mdb entry for bridge itself expires automatically,
it should not expire as long as the bridge still generates multicast
traffic. It should expire when the bridge leaves the multicast group,
OR when there is no multicast traffic on this bridge.

I fix this by adding another bool which is set when there is
multicast traffic goes to the bridge, cleared in the expire timer and
when IGMP leave is received. I ran omping for 15 minutes, everything
looks good now.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Baker <linux@baker-net.org.uk>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Herbert Xu March 8, 2013, 2:31 a.m. UTC | #1
On Fri, Mar 08, 2013 at 10:07:03AM +0800, Cong Wang wrote:
> From: Cong Wang <amwang@redhat.com>
> 
> This is a long-standing bug and reported several times:
> https://bugzilla.redhat.com/show_bug.cgi?id=880035
> http://marc.info/?l=linux-netdev&m=136164389416341&w=2
> 
> This bug can be observed in virt environment, when a KVM guest
> communicates with the host via multicast. After some time (should
> be 260 sec, I didn't measure), the multicast traffic suddenly
> terminates.
> 
> This is due to the mdb entry for bridge itself expires automatically,
> it should not expire as long as the bridge still generates multicast
> traffic. It should expire when the bridge leaves the multicast group,
> OR when there is no multicast traffic on this bridge.
> 
> I fix this by adding another bool which is set when there is
> multicast traffic goes to the bridge, cleared in the expire timer and
> when IGMP leave is received. I ran omping for 15 minutes, everything
> looks good now.

I gather from the bugzilla entry that this happens where there is
no querier in the network.

So my question is why does this only affect the bridge port and
not other ports on the bridge?

Cheers,
Amerigo Wang March 8, 2013, 2:44 a.m. UTC | #2
On Fri, 2013-03-08 at 10:31 +0800, Herbert Xu wrote:
> On Fri, Mar 08, 2013 at 10:07:03AM +0800, Cong Wang wrote:
> > From: Cong Wang <amwang@redhat.com>
> > 
> > This is a long-standing bug and reported several times:
> > https://bugzilla.redhat.com/show_bug.cgi?id=880035
> > http://marc.info/?l=linux-netdev&m=136164389416341&w=2
> > 
> > This bug can be observed in virt environment, when a KVM guest
> > communicates with the host via multicast. After some time (should
> > be 260 sec, I didn't measure), the multicast traffic suddenly
> > terminates.
> > 
> > This is due to the mdb entry for bridge itself expires automatically,
> > it should not expire as long as the bridge still generates multicast
> > traffic. It should expire when the bridge leaves the multicast group,
> > OR when there is no multicast traffic on this bridge.
> > 
> > I fix this by adding another bool which is set when there is
> > multicast traffic goes to the bridge, cleared in the expire timer and
> > when IGMP leave is received. I ran omping for 15 minutes, everything
> > looks good now.
> 
> I gather from the bugzilla entry that this happens where there is
> no querier in the network.
> 
> So my question is why does this only affect the bridge port and
> not other ports on the bridge?
> 

This is a good question. It is due to inside br_handle_frame_finish() we
use 'skb2' to decide if we deliver packets to bridge itself, and in this
case 'skb2' is non-NULL only when:

                if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
                        if ((mdst && mdst->mglist) ||
                            br_multicast_is_router(br))
                                skb2 = skb; //        <======== HERE
                        br_multicast_forward(mdst, skb, skb2);

For other ports, br_multicast_forward() will always forward 'skb' to
them.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 8, 2013, 2:47 a.m. UTC | #3
On Fri, Mar 08, 2013 at 10:44:06AM +0800, Cong Wang wrote:
>
> This is a good question. It is due to inside br_handle_frame_finish() we
> use 'skb2' to decide if we deliver packets to bridge itself, and in this
> case 'skb2' is non-NULL only when:
> 
>                 if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
>                         if ((mdst && mdst->mglist) ||
>                             br_multicast_is_router(br))
>                                 skb2 = skb; //        <======== HERE
>                         br_multicast_forward(mdst, skb, skb2);
> 
> For other ports, br_multicast_forward() will always forward 'skb' to
> them.

Yes but the point is that other ports should also expire after the
timeout and thus be removed from mdst.  So why is this only happening
for the bridge aka mdst->mglist?

Thanks,
Amerigo Wang March 8, 2013, 4:06 a.m. UTC | #4
On Fri, 2013-03-08 at 10:47 +0800, Herbert Xu wrote:
> On Fri, Mar 08, 2013 at 10:44:06AM +0800, Cong Wang wrote:
> >
> > This is a good question. It is due to inside br_handle_frame_finish() we
> > use 'skb2' to decide if we deliver packets to bridge itself, and in this
> > case 'skb2' is non-NULL only when:
> > 
> >                 if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
> >                         if ((mdst && mdst->mglist) ||
> >                             br_multicast_is_router(br))
> >                                 skb2 = skb; //        <======== HERE
> >                         br_multicast_forward(mdst, skb, skb2);
> > 
> > For other ports, br_multicast_forward() will always forward 'skb' to
> > them.
> 
> Yes but the point is that other ports should also expire after the
> timeout and thus be removed from mdst.  So why is this only happening
> for the bridge aka mdst->mglist?

The mdst may be removed, however the BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)
is still set, because in br_multicast_ipv4_rcv():

        if (iph->protocol != IPPROTO_IGMP) {
                if ((iph->daddr & IGMP_LOCAL_GROUP_MASK) !=
IGMP_LOCAL_GROUP)
                        BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
                return 0;
        }


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 8, 2013, 4:08 a.m. UTC | #5
On Fri, Mar 08, 2013 at 12:06:53PM +0800, Cong Wang wrote:
> On Fri, 2013-03-08 at 10:47 +0800, Herbert Xu wrote:
> > On Fri, Mar 08, 2013 at 10:44:06AM +0800, Cong Wang wrote:
> > >
> > > This is a good question. It is due to inside br_handle_frame_finish() we
> > > use 'skb2' to decide if we deliver packets to bridge itself, and in this
> > > case 'skb2' is non-NULL only when:
> > > 
> > >                 if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
> > >                         if ((mdst && mdst->mglist) ||
> > >                             br_multicast_is_router(br))
> > >                                 skb2 = skb; //        <======== HERE
> > >                         br_multicast_forward(mdst, skb, skb2);
> > > 
> > > For other ports, br_multicast_forward() will always forward 'skb' to
> > > them.
> > 
> > Yes but the point is that other ports should also expire after the
> > timeout and thus be removed from mdst.  So why is this only happening
> > for the bridge aka mdst->mglist?
> 
> The mdst may be removed, however the BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)
> is still set, because in br_multicast_ipv4_rcv():

But those packets are only meant for routers, which is why we
have the br_multicast_is_router test above.

Cheers,
Amerigo Wang March 8, 2013, 7:26 a.m. UTC | #6
On Fri, 2013-03-08 at 12:08 +0800, Herbert Xu wrote:
> On Fri, Mar 08, 2013 at 12:06:53PM +0800, Cong Wang wrote:
> > 
> > The mdst may be removed, however the BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)
> > is still set, because in br_multicast_ipv4_rcv():
> 
> But those packets are only meant for routers, which is why we
> have the br_multicast_is_router test above.
> 

Hmm, what the user experiences is the multicast traffic between the
bridge and some port is broken. Isn't this expected to work by default?

Or we can just change the default value of br->multicast_router to 2?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 8, 2013, 11:24 a.m. UTC | #7
On Fri, Mar 08, 2013 at 03:26:21PM +0800, Cong Wang wrote:
>
> Hmm, what the user experiences is the multicast traffic between the
> bridge and some port is broken. Isn't this expected to work by default?

Oh I'm not arguing that this isn't a bug.  I'm trying to ensure
that we come up with the right fix.

> Or we can just change the default value of br->multicast_router to 2?

No because that will cause all sorts of stuff to be thrown at
the host.

Cheers,
Adam Baker March 9, 2013, 10:46 p.m. UTC | #8
On 08/03/13 02:07, Cong Wang wrote:
> From: Cong Wang <amwang@redhat.com>
>
> This is a long-standing bug and reported several times:
> https://bugzilla.redhat.com/show_bug.cgi?id=880035
> http://marc.info/?l=linux-netdev&m=136164389416341&w=2
>
> This bug can be observed in virt environment, when a KVM guest
> communicates with the host via multicast. After some time (should
> be 260 sec, I didn't measure), the multicast traffic suddenly
> terminates.
>
> This is due to the mdb entry for bridge itself expires automatically,
> it should not expire as long as the bridge still generates multicast
> traffic. It should expire when the bridge leaves the multicast group,
> OR when there is no multicast traffic on this bridge.
>
> I fix this by adding another bool which is set when there is
> multicast traffic goes to the bridge, cleared in the expire timer and
> when IGMP leave is received. I ran omping for 15 minutes, everything
> looks good now.
Despite Herbert's comment that he didn't think this patch was correct I 
decided to test it anyway and can confirm that it doesn't fix the issue 
for my configuration.

When Herbert Xu originally sent the patch series that caused the change 
in behaviour that triggered this issue 
(http://www.spinics.net/lists/netdev/msg194893.html) he noted that for 
IPv6 the bridge sends it's own IP address so it can potentially be 
elected as the active querier which he claimed is incorrect. I note 
however that a randomly selected Cisco page on configuring layer 2 IGMP 
snooping 
(http://www.cisco.com/en/US/docs/routers/asr9000/software/multicast/configuration/guide/mcasr9kigsn.html#wp1039776) 
suggests that when configuring a querier on a bridge it should be 
assigned a valid IP address. If it is believed that the use of 0.0.0.0 
as the IP address is what is causing strange behaviour on other devices 
then is there a good reason that a bridge rather than a router shouldn't 
be the active querier? If not then using the bridge IP address and 
having the querier enabled by default may be a reasonable solution 
(provided that our querier obeys the election rules and shuts up if it 
sees a query from a lower IP address that isn't 0.0.0.0). Just because a 
device is the elected querier for IGMP doesn't appear to mean it is 
required to perform any other routing functions.

Any thought on whether that would work - I don't have a device that 
misbehaves on seeing the bridge initiated queries so am unable to test 
if that solves that issue.

An alternative option would be to have a third state for our querier 
where it is only active if it has not seen a query for the Other Querier 
Present Interval but that still leaves the possibility of other devices 
behaving strangely because they have seen the queries we generate when 
they are first switched on.

Regards

Adam Baker

P.S. Sorry Stephen if you feel this is a now a question rather than a 
patch so, as per your previous mail, shouldn't go to individual 
developers but as it is also a NAK to a patch I wasn't comfortable 
trimming the CC list.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amerigo Wang March 11, 2013, 1:44 a.m. UTC | #9
On Fri, 2013-03-08 at 19:24 +0800, Herbert Xu wrote:
> On Fri, Mar 08, 2013 at 03:26:21PM +0800, Cong Wang wrote:
> >
> > Hmm, what the user experiences is the multicast traffic between the
> > bridge and some port is broken. Isn't this expected to work by default?
> 
> Oh I'm not arguing that this isn't a bug.  I'm trying to ensure
> that we come up with the right fix.

Do you have any idea of a better fix?

> 
> > Or we can just change the default value of br->multicast_router to 2?
> 
> No because that will cause all sorts of stuff to be thrown at
> the host.
> 

OK.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amerigo Wang March 19, 2013, 1:12 a.m. UTC | #10
Herbert, ping... :-)

On Mon, 2013-03-11 at 09:44 +0800, Cong Wang wrote:
> On Fri, 2013-03-08 at 19:24 +0800, Herbert Xu wrote:
> > On Fri, Mar 08, 2013 at 03:26:21PM +0800, Cong Wang wrote:
> > >
> > > Hmm, what the user experiences is the multicast traffic between the
> > > bridge and some port is broken. Isn't this expected to work by default?
> > 
> > Oh I'm not arguing that this isn't a bug.  I'm trying to ensure
> > that we come up with the right fix.
> 
> Do you have any idea of a better fix?
> 
> > 
> > > Or we can just change the default value of br->multicast_router to 2?
> > 
> > No because that will cause all sorts of stuff to be thrown at
> > the host.
> > 
> 
> OK.
> 
> Thanks!



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 828e2bc..c706619 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -99,9 +99,11 @@  int br_handle_frame_finish(struct sk_buff *skb)
 	else if (is_multicast_ether_addr(dest)) {
 		mdst = br_mdb_get(br, skb, vid);
 		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
-			if ((mdst && mdst->mglist) ||
-			    br_multicast_is_router(br))
+			bool to_br = mdst && mdst->mglist;
+			if (to_br || br_multicast_is_router(br))
 				skb2 = skb;
+			if (to_br)
+				mdst->busy = true;
 			br_multicast_forward(mdst, skb, skb2);
 			skb = NULL;
 			if (!skb2)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 923fbea..4b12a0d 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -228,11 +228,16 @@  static void br_multicast_group_expired(unsigned long data)
 	if (!netif_running(br->dev) || timer_pending(&mp->timer))
 		goto out;
 
-	mp->mglist = false;
-
 	if (mp->ports)
 		goto out;
 
+	if (mp->mglist && mp->busy) {
+		mp->busy = false;
+		goto out;
+	}
+
+	mp->mglist = false;
+
 	mdb = mlock_dereference(br->mdb, br);
 
 	hlist_del_rcu(&mp->hlist[mdb->ver]);
@@ -1280,6 +1285,7 @@  static void br_multicast_leave_group(struct net_bridge *br,
 			mod_timer(&mp->timer, time);
 		}
 
+		mp->busy = false;
 		goto out;
 	}
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3cbf5be..bb9a3e4 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -112,6 +112,7 @@  struct net_bridge_mdb_entry
 	struct timer_list		timer;
 	struct br_ip			addr;
 	bool				mglist;
+	bool				busy; /* update this only when mglist == true */
 };
 
 struct net_bridge_mdb_htable