diff mbox

[IPoIB] Identify multicast packets and fix IGMP breakage V3

Message ID alpine.DEB.2.00.1008261825490.26351@router.home
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Christoph Lameter (Ampere) Aug. 26, 2010, 11:26 p.m. UTC
On Thu, 26 Aug 2010, Jason Gunthorpe wrote:

> The 40 bytes at this location are defined by the HW specification to
> be an IB GRH which has an identical layout to an IPv6 header. Roland
> is right, it would be clearer to use ib_grh ->dgid

Ok but then we have no nice function that checks for multicast anymore.



Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl@linux.com>

---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jason Gunthorpe Aug. 26, 2010, 11:43 p.m. UTC | #1
On Thu, Aug 26, 2010 at 06:26:58PM -0500, Christoph Lameter wrote:
> On Thu, 26 Aug 2010, Jason Gunthorpe wrote:
>
> > The 40 bytes at this location are defined by the HW specification to
> > be an IB GRH which has an identical layout to an IPv6 header. Roland
> > is right, it would be clearer to use ib_grh ->dgid
>
> Ok but then we have no nice function that checks for multicast anymore.

Were you going to try it this way?

     /* First byte of dgid signals multicast/broadcast when 0xff */
     if ((wc->wc_flags & IB_WC_GRH) &&
         ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) {
	     if (memcmp(((struct ib_grh *)skb->data)->dgid.raw,
                        dev->broadcast + 4, sizeof(union ib_gid)) == 0)
                     skb->pkt_type = PACKET_BROADCAST;
             else
                     skb->pkt_type = PACKET_MULTICAST;
     }
     else
             skb->pkt_type = PACKET_HOST;

I think doing the memcmp only in the multicast path should be
reasonable overhead wise.

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Aug. 26, 2010, 11:57 p.m. UTC | #2
On Thu, 26 Aug 2010, Jason Gunthorpe wrote:

> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.

Thats is not always possible. Here the multicast path is the
default path that is taken 99% of the time.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yossi Etigin Aug. 27, 2010, 12:02 a.m. UTC | #3
> 
> Were you going to try it this way?
> 
>      /* First byte of dgid signals multicast/broadcast when 0xff */
>      if ((wc->wc_flags & IB_WC_GRH) &&
>          ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) {
> 	     if (memcmp(((struct ib_grh *)skb->data)->dgid.raw,
>                         dev->broadcast + 4, sizeof(union ib_gid)) ==
0)
>                      skb->pkt_type = PACKET_BROADCAST;
>              else
>                      skb->pkt_type = PACKET_MULTICAST;
>      }
>      else
>              skb->pkt_type = PACKET_HOST;
> 
> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.
> 

Shouldn't struct ib_grh be packed to make this really work?
The code looks a little messy to me anyway... 
How about using a local var which is a ptr to packed struct ib_grh? The
compiler will probably eliminate it anyway.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 27, 2010, 12:07 a.m. UTC | #4
From: Christoph Lameter <cl@linux.com>
Date: Thu, 26 Aug 2010 18:57:09 -0500 (CDT)

> On Thu, 26 Aug 2010, Jason Gunthorpe wrote:
> 
>> I think doing the memcmp only in the multicast path should be
>> reasonable overhead wise.
> 
> Thats is not always possible. Here the multicast path is the
> default path that is taken 99% of the time.

The highest cost is bringing in that packet header's cache line, which
you've already done by reading the byte and checking for 0xff.

I doubt the memcmp() can possibly matter.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 27, 2010, 12:17 a.m. UTC | #5
On Fri, Aug 27, 2010 at 03:02:54AM +0300, Yossi Etigin wrote:

> Shouldn't struct ib_grh be packed to make this really work?

No idea what the kernel convention for this is. It looks OK to me, in
that no arch I am familiar with will insert padding.

> The code looks a little messy to me anyway... 
> How about using a local var which is a ptr to packed struct ib_grh? The
> compiler will probably eliminate it anyway.

This bike shed is looking pretty well painted already :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Aug. 27, 2010, 2:24 a.m. UTC | #6
On Thu, 26 Aug 2010, David Miller wrote:

> The highest cost is bringing in that packet header's cache line, which
> you've already done by reading the byte and checking for 0xff.

And then you need to bring in the cacheline of the broadcast address....
These are pretty long byte strings in the IB case.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:24:07.842079559 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:25:33.859815544 -0500
@@ -271,6 +271,14 @@ 
 	ipoib_ud_dma_unmap_rx(priv, mapping);
 	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);

+	/* First byte of dgid signals multicast when 0xff */
+	if ((wc->wc_flags & IB_WC_GRH) &&
+		((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff)
+
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;
+
 	skb_pull(skb, IB_GRH_BYTES);

 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -281,9 +289,6 @@ 
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
-
 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;