Message ID | 1347537926.13103.1530.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Sep 13, 2012 at 3:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2012-09-13 at 12:59 +0300, Or Gerlitz wrote: >> On Thu, Sep 13, 2012 at 11:11 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > MAX_SKB_FRAGS is 16 >> > skb_gro_receive() will return -E2BIG once this limit is hit. >> > If you use a MSS = 100 (instead of MSS = 1460), then GRO skb will >> > contain only at most 1700 bytes, but TSO packets can still be 64KB, if >> > the sender NIC can afford it (some NICS wont work quite well) >> Addressing this assertion of yours, Shlomo showed that with ixgbe he managed >> to see GRO aggregating 32KB which means 20-21 packets that is > 16 fragments >> in this notation, can it be related to the way ixgbe is actually allocating skbs? > Hard to say without knowing exact kernel version, as things change a lot in this area. As Shlomo wrote earlier on this thread his testbed is 3.6-rc1 > You have several kind of GRO. One fast and one slow. > The slow one uses a linked list of skbs (pinfo->frag_list), while the > fast one uses fragments (pinfo->nr_frags) > > For example, some drivers (mellanox one is in this lot) pull too many > bytes in skb->head and this defeats the fast GRO : > Part of payload is in skb->head, remaining part in pinfo->frags[0] > > skb_gro_receive() then has to allocate a new head skb, to link skbs into > head->frag_list. The total skb->truesize is not reduced at all, its > increased. > > So you might think GRO is working, but its only a hack, as one skb has a > list of skbs, and this makes TCP read() slower, and defeats TCP > coalescing as well. Whats the point of delivering fat skbs to TCP stack > if it slows down the consumer, because of increased cache line misses ? Shlomo is dealing with making the IPoIB driver work well with GRO, thanks for the comments on the Mellanox Ethernet driver, we will look there too (added Yevgeny)... As for IPoIB it has two modes, connected which irrelevant for this discussion, and datagram - who is under the scope here. Its MTU is typically 2044 but can be 4092 as well, the allocation of skb's for this mode is done in ipoib_alloc_rx_skb() -- which you've patched recently... Following your comment we noted that if using the lower/typical mtu of 2044 which means we are below the ipoib_ud_need_sg() threshold, skbs are allocated on one "form" and if using the 4092 mtu in another "form" - do you see each of the form to fall into different GRO flow, e.g 2044 to the "slow" and 4092 to the "fast"?! Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 15:47 +0300, Or Gerlitz wrote: > Shlomo is dealing with making the IPoIB driver work well with GRO, > thanks for the > comments on the Mellanox Ethernet driver, we will look there too > (added Yevgeny)... > > As for IPoIB it has two modes, connected which irrelevant for this > discussion, and datagram > - who is under the scope here. Its MTU is typically 2044 but can be > 4092 as well, the allocation > of skb's for this mode is done in ipoib_alloc_rx_skb() -- which you've > patched recently... > > Following your comment we noted that if using the lower/typical mtu of > 2044 which means > we are below the ipoib_ud_need_sg() threshold, skbs are allocated on > one "form" and if using > the 4092 mtu in another "form" - do you see each of the form to fall > into different GRO flow, e.g > 2044 to the "slow" and 4092 to the "fast"?! Seems fine to me both ways, because you use dev_alloc_skb(), and you dont pull tcp payload into tcp->head. You might try adding prefetch() as well to bring into cpu cache IP/TCP headers before they are needed in gro layers. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 6c4f935..435c35e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -96,8 +96,8 @@ /* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU * and 4K allocations) */ enum { - FRAG_SZ0 = 512 - NET_IP_ALIGN, - FRAG_SZ1 = 1024, + FRAG_SZ0 = 1536 - NET_IP_ALIGN, + FRAG_SZ1 = 2048, FRAG_SZ2 = 4096, FRAG_SZ3 = MLX4_EN_ALLOC_SIZE };