Patchwork 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]

login
register
mail settings
Submitter Herbert Xu
Date Oct. 20, 2010, 6:16 a.m.
Message ID <20101020061631.GA21679@gondor.apana.org.au>
Download mbox | patch
Permalink /patch/68390/
State RFC
Delegated to: David Miller
Headers show

Comments

Herbert Xu - Oct. 20, 2010, 6:16 a.m.
On Mon, Oct 18, 2010 at 10:37:40PM +0200, Patrick Ringl wrote:
>
> Anything else I could possibly provide? :-)

Yes, testing :)

First of all I'd like to rule out (or in) the IPv6 query code,
which is clearly generating a bogus packet (wrong payload_len).

So can you apply this patch and see if it makes the problem
go away? Please take packet dumps so we know that the IPv6 query
is no longer being sent.

Thanks,
Patrick Ringl - Oct. 21, 2010, 10:18 p.m.
On 10/20/2010 08:16 AM, Herbert Xu wrote:
> On Mon, Oct 18, 2010 at 10:37:40PM +0200, Patrick Ringl wrote:
>    
>> Anything else I could possibly provide? :-)
>>      
> Yes, testing :)
>
> First of all I'd like to rule out (or in) the IPv6 query code,
> which is clearly generating a bogus packet (wrong payload_len).
>
> So can you apply this patch and see if it makes the problem
> go away? Please take packet dumps so we know that the IPv6 query
> is no longer being sent.
>    
Hi,

Hi,

sorry for the late response. I've been using your patch on 2.6.36 and 
unfortunately, the bogus ipv6 packet is not the cause of the lockups. I 
have attached two packet dumps (br0 and eth1) again.


regards,
Patrick
> Thanks,
>
Herbert Xu - Oct. 21, 2010, 11:07 p.m.
On Fri, Oct 22, 2010 at 12:18:38AM +0200, Patrick Ringl wrote:
>
> sorry for the late response. I've been using your patch on 2.6.36 and  
> unfortunately, the bogus ipv6 packet is not the cause of the lockups. I  
> have attached two packet dumps (br0 and eth1) again.

OK I see, I had thought that your whole system locked up for
20-30 seconds but it was only the external network responses
that stopped.

I think the problem is your switch.  It appears to purge our
port entry when it receives our general query.

So to work around this, I suggest that you disable the startup
queries through the parameter multicast_startup_query_count.
You can do this either through sysfs or a sufficiently recent
brctl command.

BTW, what brand/model is your switch? If this problem is common
enough then we may have to disable general queries by default.

Cheers,

Patch

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index eb5b256..66f39d7 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -832,11 +832,6 @@  static void br_multicast_send_query(struct net_bridge *br,
 	br_group.proto = htons(ETH_P_IP);
 	__br_multicast_send_query(br, port, &br_group);
 
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	br_group.proto = htons(ETH_P_IPV6);
-	__br_multicast_send_query(br, port, &br_group);
-#endif
-
 	time = jiffies;
 	time += sent < br->multicast_startup_query_count ?
 		br->multicast_startup_query_interval :