diff mbox

bridge: update mdb expiration timer upon reports.

Message ID 1381435079-15387-1-git-send-email-vyasevic@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Vlad Yasevich Oct. 10, 2013, 7:57 p.m. UTC
commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b
	bridge: only expire the mdb entry when query is received
changed the mdb expiration timer to be armed only when QUERY is
received.  Howerver, this causes issues in an environment where
the multicast server socket comes and goes very fast while a client
is trying to send traffic to it.

The root cause is a race where a sequence of LEAVE followed by REPORT
messages can race against QUERY messages generated in response to LEAVE.
The QUERY ends up starting the expiration timer, and that timer can
potentially expire after the new REPORT message has been received signaling
the new join operation.  This leads to a significant drop in multicast
traffic and possible complete stall. 

The solution is to have REPORT messages update the expiration timer
on entries that already exist.

CC: Cong Wang <xiyou.wangcong@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
 net/bridge/br_multicast.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Herbert Xu Oct. 10, 2013, 11:48 p.m. UTC | #1
On Thu, Oct 10, 2013 at 03:57:59PM -0400, Vlad Yasevich wrote:
> commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b
> 	bridge: only expire the mdb entry when query is received
> changed the mdb expiration timer to be armed only when QUERY is
> received.  Howerver, this causes issues in an environment where
> the multicast server socket comes and goes very fast while a client
> is trying to send traffic to it.
> 
> The root cause is a race where a sequence of LEAVE followed by REPORT
> messages can race against QUERY messages generated in response to LEAVE.
> The QUERY ends up starting the expiration timer, and that timer can
> potentially expire after the new REPORT message has been received signaling
> the new join operation.  This leads to a significant drop in multicast
> traffic and possible complete stall. 
> 
> The solution is to have REPORT messages update the expiration timer
> on entries that already exist.
> 
> CC: Cong Wang <xiyou.wangcong@gmail.com>
> CC: Herbert Xu <herbert@gondor.apana.org.au>
> CC: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Good catch.  Thanks!

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
David Miller Oct. 11, 2013, 4:57 a.m. UTC | #2
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 11 Oct 2013 07:48:56 +0800

> On Thu, Oct 10, 2013 at 03:57:59PM -0400, Vlad Yasevich wrote:
>> commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b
>> 	bridge: only expire the mdb entry when query is received
>> changed the mdb expiration timer to be armed only when QUERY is
>> received.  Howerver, this causes issues in an environment where
>> the multicast server socket comes and goes very fast while a client
>> is trying to send traffic to it.
>> 
>> The root cause is a race where a sequence of LEAVE followed by REPORT
>> messages can race against QUERY messages generated in response to LEAVE.
>> The QUERY ends up starting the expiration timer, and that timer can
>> potentially expire after the new REPORT message has been received signaling
>> the new join operation.  This leads to a significant drop in multicast
>> traffic and possible complete stall. 
>> 
>> The solution is to have REPORT messages update the expiration timer
>> on entries that already exist.
>> 
>> CC: Cong Wang <xiyou.wangcong@gmail.com>
>> CC: Herbert Xu <herbert@gondor.apana.org.au>
>> CC: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> 
> Good catch.  Thanks!
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, and queued up for -stable, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index d1c5786..1085f21 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -611,6 +611,9 @@  rehash:
 		break;
 
 	default:
+		/* If we have an existing entry, update it's expire timer */
+		mod_timer(&mp->timer,
+			  jiffies + br->multicast_membership_interval);
 		goto out;
 	}
 
@@ -680,8 +683,12 @@  static int br_multicast_add_group(struct net_bridge *br,
 	for (pp = &mp->ports;
 	     (p = mlock_dereference(*pp, br)) != NULL;
 	     pp = &p->next) {
-		if (p->port == port)
+		if (p->port == port) {
+			/* We already have a portgroup, update the timer.  */
+			mod_timer(&p->timer,
+				  jiffies + br->multicast_membership_interval);
 			goto out;
+		}
 		if ((unsigned long)p->port < (unsigned long)port)
 			break;
 	}