diff mbox

[net] gro: Make GRO aware of lightweight tunnels.

Message ID 1453336048-49406-1-git-send-email-jesse@kernel.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jesse Gross Jan. 21, 2016, 12:27 a.m. UTC
GRO is currently not aware of tunnel metadata generated by lightweight
tunnels and stored in the dst. This leads to two possible problems:
 * Incorrectly merging two frames that have different metadata.
 * Leaking of allocated metadata from merged frames.

This avoids those problems by comparing the tunnel information before
merging, similar to how we handle other metadata (such as vlan tags),
and releasing any state when we are done.

Reported-by: John <john.phillips5@hpe.com>
Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jesse Gross <jesse@kernel.org>
---
 include/net/dst_metadata.h | 23 +++++++++++++++++++++++
 net/core/dev.c             |  9 +++++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Jan. 21, 2016, 12:48 a.m. UTC | #1
On Wed, 2016-01-20 at 16:27 -0800, Jesse Gross wrote:
> GRO is currently not aware of tunnel metadata generated by lightweight
> tunnels and stored in the dst. This leads to two possible problems:
>  * Incorrectly merging two frames that have different metadata.
>  * Leaking of allocated metadata from merged frames.
> 
> This avoids those problems by comparing the tunnel information before
> merging, similar to how we handle other metadata (such as vlan tags),
> and releasing any state when we are done.
> 
> Reported-by: John <john.phillips5@hpe.com>
> Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
> Signed-off-by: Jesse Gross <jesse@kernel.org>
> ---
>  include/net/dst_metadata.h | 23 +++++++++++++++++++++++
>  net/core/dev.c             |  9 +++++++--
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 6816f0f..c3de935 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -44,6 +44,29 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
>  	return dst && !(dst->flags & DST_METADATA);
>  }
>  
> +static inline int skb_metadata_dst_cmp(struct sk_buff *skb_a,
> +				       struct sk_buff *skb_b)
> +{
> +	const struct metadata_dst *a = skb_metadata_dst(skb_a);
> +	const struct metadata_dst *b = skb_metadata_dst(skb_b);
> +
> +	if (!a != !b)
> +		return 1;
> +
> +	if (!a)
> +		return 0;
> +

It is adding 4 conditional test per flow in GRO engine for the fast
path... 

With up to 8 flows in GRO (per RX queue), it is a total of 32 added
conditional tests.

You should shortcut to one test :

if (!(skb_a->_skb_refdst | skb_b->_skb_refdst)
	return 0;
Jesse Gross Jan. 21, 2016, 1:47 a.m. UTC | #2
On Wed, Jan 20, 2016 at 4:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-01-20 at 16:27 -0800, Jesse Gross wrote:
>> GRO is currently not aware of tunnel metadata generated by lightweight
>> tunnels and stored in the dst. This leads to two possible problems:
>>  * Incorrectly merging two frames that have different metadata.
>>  * Leaking of allocated metadata from merged frames.
>>
>> This avoids those problems by comparing the tunnel information before
>> merging, similar to how we handle other metadata (such as vlan tags),
>> and releasing any state when we are done.
>>
>> Reported-by: John <john.phillips5@hpe.com>
>> Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
>> Signed-off-by: Jesse Gross <jesse@kernel.org>
>> ---
>>  include/net/dst_metadata.h | 23 +++++++++++++++++++++++
>>  net/core/dev.c             |  9 +++++++--
>>  2 files changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> index 6816f0f..c3de935 100644
>> --- a/include/net/dst_metadata.h
>> +++ b/include/net/dst_metadata.h
>> @@ -44,6 +44,29 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
>>       return dst && !(dst->flags & DST_METADATA);
>>  }
>>
>> +static inline int skb_metadata_dst_cmp(struct sk_buff *skb_a,
>> +                                    struct sk_buff *skb_b)
>> +{
>> +     const struct metadata_dst *a = skb_metadata_dst(skb_a);
>> +     const struct metadata_dst *b = skb_metadata_dst(skb_b);
>> +
>> +     if (!a != !b)
>> +             return 1;
>> +
>> +     if (!a)
>> +             return 0;
>> +
>
> It is adding 4 conditional test per flow in GRO engine for the fast
> path...
>
> With up to 8 flows in GRO (per RX queue), it is a total of 32 added
> conditional tests.
>
> You should shortcut to one test :
>
> if (!(skb_a->_skb_refdst | skb_b->_skb_refdst)
>         return 0;

Thanks, that's a good idea. I'll send out a v2 soon.

Just to merge the two threads together, all of protocols that would be
affected by this also have "normal" GRO handlers that will run when
the packet is first received. There's no argument that that is
preferable if it is available. However, GRO cells do provide a
performance benefit in other situations so it would be nice to avoid
disabling it if possible.
Eric Dumazet Jan. 21, 2016, 2:16 a.m. UTC | #3
On Wed, 2016-01-20 at 17:47 -0800, Jesse Gross wrote:

> Just to merge the two threads together, all of protocols that would be
> affected by this also have "normal" GRO handlers that will run when
> the packet is first received. There's no argument that that is
> preferable if it is available. However, GRO cells do provide a
> performance benefit in other situations so it would be nice to avoid
> disabling it if possible.

Note that having a second stage GRO can introduce packet reorders,
because GRO packets given to the second layer simply bypass GRO engine.

Say sender sends P1,P2,P3

Receiver gets P1 alone, put P1 in the GRO cell (2nd layer)

Then we get P2 and P3, aggregated by first layer.

We decap tunnel header, then give P2-P3 to 2nd GRO engine, P2-P3 is
directly given to upper stack. [1]

Then P1 will follow later.

-> packets received out of order. Slow paths on both senders and
receiver, extra sack, ...

[1]

static enum gro_result dev_gro_receive(struct napi_struct *napi, struct
sk_buff *skb)
{
...
        if (!(skb->dev->features & NETIF_F_GRO))
                goto normal;

        if (skb_is_gso(skb) || skb_has_frag_list(skb) || skb->csum_bad)
                goto normal;
Thomas Graf Jan. 21, 2016, 2:31 a.m. UTC | #4
On 01/20/16 at 05:47pm, Jesse Gross wrote:
> Just to merge the two threads together, all of protocols that would be
> affected by this also have "normal" GRO handlers that will run when
> the packet is first received. There's no argument that that is
> preferable if it is available. However, GRO cells do provide a
> performance benefit in other situations so it would be nice to avoid
> disabling it if possible.

I missed this thread when I replied to the other one.

What are these situations? It seems like there are specific
scenarios where this helps. Is it varying TLVs in the encap header
for otherwise meregable inner headers?
Jesse Gross Jan. 21, 2016, 2:52 a.m. UTC | #5
On Wed, Jan 20, 2016 at 6:31 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/20/16 at 05:47pm, Jesse Gross wrote:
>> Just to merge the two threads together, all of protocols that would be
>> affected by this also have "normal" GRO handlers that will run when
>> the packet is first received. There's no argument that that is
>> preferable if it is available. However, GRO cells do provide a
>> performance benefit in other situations so it would be nice to avoid
>> disabling it if possible.
>
> I missed this thread when I replied to the other one.
>
> What are these situations? It seems like there are specific
> scenarios where this helps. Is it varying TLVs in the encap header
> for otherwise meregable inner headers?

It's nothing really fancy or tunnel type specific.

It's obviously preferable to merge things as soon as the packet is
received but if we don't (for example, if we don't have a checksum
provided by hardware) then we take another crack at it after
decapsulation. There's still enough stack left after decapsulation for
it to make a difference, particularly if you are going up to a VM. I
guess this shouldn't be surprising because it's basically equivalent
to GRO when there is no tunnel at all.

There was some previous discussion about this a little while ago:
https://www.mail-archive.com/netdev@vger.kernel.org/msg68880.html
diff mbox

Patch

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 6816f0f..c3de935 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -44,6 +44,29 @@  static inline bool skb_valid_dst(const struct sk_buff *skb)
 	return dst && !(dst->flags & DST_METADATA);
 }
 
+static inline int skb_metadata_dst_cmp(struct sk_buff *skb_a,
+				       struct sk_buff *skb_b)
+{
+	const struct metadata_dst *a = skb_metadata_dst(skb_a);
+	const struct metadata_dst *b = skb_metadata_dst(skb_b);
+
+	if (!a != !b)
+		return 1;
+
+	if (!a)
+		return 0;
+
+	if (memcmp(&a->u.tun_info.key, &b->u.tun_info.key,
+		   sizeof(a->u.tun_info.key)))
+		return 1;
+
+	if (a->u.tun_info.options_len != b->u.tun_info.options_len)
+		return 1;
+
+	return memcmp(&a->u.tun_info + 1, &b->u.tun_info + 1,
+		      a->u.tun_info.options_len);
+}
+
 struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
 struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index cc9e365..12cc9bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4358,6 +4358,9 @@  static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 			diffs = memcmp(skb_mac_header(p),
 				       skb_mac_header(skb),
 				       maclen);
+		if (!diffs)
+			diffs = skb_metadata_dst_cmp(p, skb);
+
 		NAPI_GRO_CB(p)->same_flow = !diffs;
 	}
 }
@@ -4548,10 +4551,12 @@  static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 		break;
 
 	case GRO_MERGED_FREE:
-		if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD)
+		if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) {
+			skb_dst_drop(skb);
 			kmem_cache_free(skbuff_head_cache, skb);
-		else
+		} else {
 			__kfree_skb(skb);
+		}
 		break;
 
 	case GRO_HELD: