diff mbox

MAX_SKB_FRAGS and GRO

Message ID 20100208100306.GQ32246@kryten
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Anton Blanchard Feb. 8, 2010, 10:03 a.m. UTC
Hi Herbert,

I was looking through the hardware GRO support in various drivers and I think
we have a couple of issues with PAGE_SIZE > 4k. For example, if we have a 64kB
page size then MAX_SKB_FRAGS ends up as 3:

#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)

This should be fine for hardware and software GSO, but we encounter issues
with hardware GRO (not sure about software GRO).

In the ixgbe case we use MAX_SKB_FRAGS to program the max number of GRO
descriptors, even though we assemble GRO packets using ->frag_list:

#if (MAX_SKB_FRAGS > 16)
                rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
#elif (MAX_SKB_FRAGS > 8)
                rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
#elif (MAX_SKB_FRAGS > 4)
                rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
#else
                rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
#endif

With MAX_SKB_FRAGS = 3 it looks like we only allow 1 GRO descriptor, and since
the card can only do DMAs of 16kB that will be our maximum GRO packet.

I don't have a hardware GRO capable ixgbe to try this out yet, but I think
we want to do something like the attached (completely untested) patch, dividing
GSO_MAX_SIZE by our per packet maximum.

In the cxgb3 case we build GRO packets via ->frags. (FYI I had a look through
the driver but I wasn't able to see where it caps GRO assembly to
MAX_SKB_FRAGS, probably missing something). cxgb3 may be able to do 64kB
DMAs (again not sure), but you could imagine a card with DMA restrictions,
perhaps as bad as 4kB per descriptor. In this case MAX_SKB_FRAGS is sized way
too small.

Thinking out aloud, would setting a pessimistic value for MAX_SKB_FRAGS
be one way to fix this? ie:

#define MAX_SKB_FRAGS (65536/4096 + 2)

I'm not sure if that would break the tx side of some adapters.

Anton

Comments

Herbert Xu Feb. 8, 2010, 12:47 p.m. UTC | #1
On Mon, Feb 08, 2010 at 09:03:07PM +1100, Anton Blanchard wrote:
> 
> I was looking through the hardware GRO support in various drivers and I think
> we have a couple of issues with PAGE_SIZE > 4k. For example, if we have a 64kB
> page size then MAX_SKB_FRAGS ends up as 3:
> 
> #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
> 
> This should be fine for hardware and software GSO, but we encounter issues
> with hardware GRO (not sure about software GRO).
> 
> In the ixgbe case we use MAX_SKB_FRAGS to program the max number of GRO
> descriptors, even though we assemble GRO packets using ->frag_list:
> 
> #if (MAX_SKB_FRAGS > 16)
>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
> #elif (MAX_SKB_FRAGS > 8)
>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
> #elif (MAX_SKB_FRAGS > 4)
>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
> #else
>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
> #endif

First of all this isn't GRO, but RSC.  With GRO we impose extra
restrictions on what packets can be merged while RSC is more
permissive.

In fact I think the ixgbe code may be broken as it is since it's
not marking RSC packets in any way to prevent them from being
forwarded through another interface.

As to your problem with RSC on a 64K page system, I'm sure one
of the Intel developers would be able to help you out.
 
> Thinking out aloud, would setting a pessimistic value for MAX_SKB_FRAGS
> be one way to fix this? ie:
> 
> #define MAX_SKB_FRAGS (65536/4096 + 2)

While I can't think of any serious issues with this, as this is
an entirely ixgbe-specific problem, the fix should probably stay
there.

Cheers,
Duyck, Alexander H Feb. 8, 2010, 5:35 p.m. UTC | #2
Herbert Xu wrote:
> On Mon, Feb 08, 2010 at 09:03:07PM +1100, Anton Blanchard wrote:
>> 
>> I was looking through the hardware GRO support in various drivers
>> and I think we have a couple of issues with PAGE_SIZE > 4k. For
>> example, if we have a 64kB page size then MAX_SKB_FRAGS ends up as 3:
>> 
>> #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
>> 
>> This should be fine for hardware and software GSO, but we encounter
>> issues with hardware GRO (not sure about software GRO).
>> 
>> In the ixgbe case we use MAX_SKB_FRAGS to program the max number of
>> GRO descriptors, even though we assemble GRO packets using
>> ->frag_list: 
>> 
>> #if (MAX_SKB_FRAGS > 16)
>>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_16; #elif
>>                 (MAX_SKB_FRAGS > 8) rscctrl |=
>> IXGBE_RSCCTL_MAXDESC_8; #elif (MAX_SKB_FRAGS > 4)
>>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
>> #else
>>                 rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
>> #endif
> 
> First of all this isn't GRO, but RSC.  With GRO we impose extra
> restrictions on what packets can be merged while RSC is more
> permissive.
> 
> In fact I think the ixgbe code may be broken as it is since it's
> not marking RSC packets in any way to prevent them from being
> forwarded through another interface.
> 
> As to your problem with RSC on a 64K page system, I'm sure one
> of the Intel developers would be able to help you out.
> 
>> Thinking out aloud, would setting a pessimistic value for
>> MAX_SKB_FRAGS 
>> be one way to fix this? ie:
>> 
>> #define MAX_SKB_FRAGS (65536/4096 + 2)
> 
> While I can't think of any serious issues with this, as this is
> an entirely ixgbe-specific problem, the fix should probably stay
> there.
> 
> Cheers,

Herbert is 100% correct, this is HW RSC and not GRO.

When we are using packet split mode we don't use the frag_list, we use frags and just append the buffers as pages.  This is why we use MAX_SKB_FRAGS to determine how many descriptors can be used for a single RSC receive.  The reason for doing it this way is because we can then hand it off to a process like GRO to join up to 3 frames together on systems with 64K pages.

In regards to the bridging/forwarding issue I don't think there is a problem there since we use the LRO flag to determine if HW RSC can be enabled and so typically if you put the port into a forwarding/bridging state HW RSC is automatically disabled.  I will recommend that our validation team double check that though.

If you are running in single buffer mode, which is the mode that uses the frag_list, then you can join up to 16 descriptors as I recall.  However we must avoid joining enough descriptors to result in 64K or more bytes per frame so usually we end up getting a max of 32K per RSC context simply due to the fact that everything is based on powers of 2.

Thanks,

Alex--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Brandeburg Feb. 8, 2010, 5:41 p.m. UTC | #3
On Mon, 8 Feb 2010, Herbert Xu wrote:
> On Mon, Feb 08, 2010 at 09:03:07PM +1100, Anton Blanchard wrote:
> > 
> > I was looking through the hardware GRO support in various drivers and I think
> > we have a couple of issues with PAGE_SIZE > 4k. For example, if we have a 64kB
> > page size then MAX_SKB_FRAGS ends up as 3:
> > 
> > #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
> > 
> > This should be fine for hardware and software GSO, but we encounter issues
> > with hardware GRO (not sure about software GRO).
> > 
> > In the ixgbe case we use MAX_SKB_FRAGS to program the max number of GRO
> > descriptors, even though we assemble GRO packets using ->frag_list:
> > 
> > #if (MAX_SKB_FRAGS > 16)
> >                 rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
> > #elif (MAX_SKB_FRAGS > 8)
> >                 rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
> > #elif (MAX_SKB_FRAGS > 4)
> >                 rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
> > #else
> >                 rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
> > #endif
> 
> First of all this isn't GRO, but RSC.  With GRO we impose extra
> restrictions on what packets can be merged while RSC is more
> permissive.

Herbert is right.  Just for clarity lets not call it "hardware GRO" but 
instead just call it RSC or hardware RSC (receive side coalescing)
 
> In fact I think the ixgbe code may be broken as it is since it's
> not marking RSC packets in any way to prevent them from being
> forwarded through another interface.

Herbert, what are we missing? Is there some interface for us to do what 
you describe?  do we need to set DODGY?
 
> As to your problem with RSC on a 64K page system, I'm sure one
> of the Intel developers would be able to help you out.

I'm here, Anton, what can I do or test specifically for you?
  
> > Thinking out aloud, would setting a pessimistic value for MAX_SKB_FRAGS
> > be one way to fix this? ie:
> > 
> > #define MAX_SKB_FRAGS (65536/4096 + 2)
> 
> While I can't think of any serious issues with this, as this is
> an entirely ixgbe-specific problem, the fix should probably stay
> there.

I've always seen MAX_SKB_FRAGS as being the primary way of making sure 
that an SKB never has more than 64kB of data in it.  If we can stuff > 
64kB of data in an skb then some of the above problem with 64kB pages goes 
away.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rick Jones Feb. 8, 2010, 5:53 p.m. UTC | #4
>>First of all this isn't GRO, but RSC.  With GRO we impose extra
>>restrictions on what packets can be merged while RSC is more
>>permissive.
> 
> 
> Herbert is right.  Just for clarity lets not call it "hardware GRO" but 
> instead just call it RSC or hardware RSC (receive side coalescing)

I'll paint a stripe on the bikeshed and suggest something even more 
explicit - if it is the hardware, how about HRC (Hardware Receive 
Coalescing) or NRC (Nic Receive Coalescing).

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Feb. 8, 2010, 7:12 p.m. UTC | #5
On Mon, 2010-02-08 at 09:53 -0800, Rick Jones wrote:
> >>First of all this isn't GRO, but RSC.  With GRO we impose extra
> >>restrictions on what packets can be merged while RSC is more
> >>permissive.
> > 
> > 
> > Herbert is right.  Just for clarity lets not call it "hardware GRO" but 
> > instead just call it RSC or hardware RSC (receive side coalescing)
> 
> I'll paint a stripe on the bikeshed and suggest something even more 
> explicit - if it is the hardware, how about HRC (Hardware Receive 
> Coalescing) or NRC (Nic Receive Coalescing).

The term 'Large Receive Offload' (LRO) has been around for a while;
what's wrong with that?  Qualify it with 'hardware' (HLRO) if necessary.

Any name involving 'coalescing' would easily be confused with interrupt
moderation, which ethtool calls coalescing.  Also, using anything close
to a proprietary name would only reward and encourage the marketing
geniuses who invent redundant and confusing terminology.

Ben.
Herbert Xu Feb. 8, 2010, 10:21 p.m. UTC | #6
On Mon, Feb 08, 2010 at 09:41:58AM -0800, Brandeburg, Jesse wrote:
>  
> > In fact I think the ixgbe code may be broken as it is since it's
> > not marking RSC packets in any way to prevent them from being
> > forwarded through another interface.
> 
> Herbert, what are we missing? Is there some interface for us to do what 
> you describe?  do we need to set DODGY?

The problem is that AFAIK RSC does not impose enough restrictions
on what is merged.  This means that information is lost during the
merge process.  IOW if the merged packet is forwarded we cannot
regenerate the original fragments exactly.

Therefore whenever bridging/routing is used on that interface RSC
must be turned off.

The easiest way to do this is to do what LRO does, which is to
set gso_type to zero and gso_size non-zero.  Of course you'll
also need to use the LRO toggle to control whether RSC is enabled
or not.

Cheers,
diff mbox

Patch

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index b5f64ad..283f6cc 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -644,6 +644,13 @@  static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw,
 	IXGBE_WRITE_REG(hw, IXGBE_RDT(rx_ring->reg_idx), val);
 }
 
+/* The maximum DMA the card can do is 16kB, so cap it if required */
+#if (PAGE_SIZE / 2) > IXGBE_MAX_RXBUFFER
+#define IXGBE_RXBUFFER_SIZE	IXGBE_MAX_RXBUFFER
+#else
+#define IXGBE_RXBUFFER_SIZE	(PAGE_SIZE / 2)
+#endif
+
 /**
  * ixgbe_alloc_rx_buffers - Replace used receive buffers; packet split
  * @adapter: address of board private structure
@@ -2032,11 +2039,7 @@  static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 		  IXGBE_SRRCTL_BSIZEHDR_MASK;
 
 	if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
-#if (PAGE_SIZE / 2) > IXGBE_MAX_RXBUFFER
-		srrctl |= IXGBE_MAX_RXBUFFER >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
-#else
-		srrctl |= (PAGE_SIZE / 2) >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
-#endif
+		srrctl |= IXGBE_RXBUFFER_SIZE >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
 		srrctl |= IXGBE_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
 	} else {
 		srrctl |= ALIGN(rx_ring->rx_buf_len, 1024) >>
@@ -2101,11 +2104,11 @@  static void ixgbe_configure_rscctl(struct ixgbe_adapter *adapter, int index)
 	 * than 65535
 	 */
 	if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
-#if (MAX_SKB_FRAGS > 16)
+#if (GSO_MAX_SIZE / IXGBE_RXBUFFER_SIZE) >= 16
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
-#elif (MAX_SKB_FRAGS > 8)
+#elif (GSO_MAX_SIZE / IXGBE_RXBUFFER_SIZE) >= 8
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
-#elif (MAX_SKB_FRAGS > 4)
+#elif (GSO_MAX_SIZE / IXGBE_RXBUFFER_SIZE) >= 4
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
 #else
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_1;