diff mbox

[net-next,1/2] ip6_output: fragment outgoing reassembled skb properly

Message ID 1383756740-7392-2-git-send-email-jiri@resnulli.us
State Awaiting Upstream
Headers show

Commit Message

Jiri Pirko Nov. 6, 2013, 4:52 p.m. UTC
If reassembled packet would fit into outdev MTU, it is not fragmented
according the original frag size and it is send as single big packet.

The second case is if skb is gso. In that case fragmentation does not happen
according to the original frag size.

This patch fixes these.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/ip6_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

David Miller Nov. 7, 2013, 11:54 p.m. UTC | #1
From: Jiri Pirko <jiri@resnulli.us>
Date: Wed,  6 Nov 2013 17:52:19 +0100

> If reassembled packet would fit into outdev MTU, it is not fragmented
> according the original frag size and it is send as single big packet.
> 
> The second case is if skb is gso. In that case fragmentation does not happen
> according to the original frag size.
> 
> This patch fixes these.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
 ...

>  	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
> -	    dst_allfrag(skb_dst(skb)))
> +	    dst_allfrag(skb_dst(skb)) ||
> +	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
>  		return ip6_fragment(skb, ip6_finish_output2);

Jiri are you sure that you don't need to take GSO into account in the
new part you are adding to the test?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 8, 2013, 7:52 a.m. UTC | #2
Fri, Nov 08, 2013 at 12:54:53AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed,  6 Nov 2013 17:52:19 +0100
>
>> If reassembled packet would fit into outdev MTU, it is not fragmented
>> according the original frag size and it is send as single big packet.
>> 
>> The second case is if skb is gso. In that case fragmentation does not happen
>> according to the original frag size.
>> 
>> This patch fixes these.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ...
>
>>  	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
>> -	    dst_allfrag(skb_dst(skb)))
>> +	    dst_allfrag(skb_dst(skb)) ||
>> +	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
>>  		return ip6_fragment(skb, ip6_finish_output2);
>
>Jiri are you sure that you don't need to take GSO into account in the
>new part you are adding to the test?


For gso skb, we need co cap outgoing fragments by the original frag size
as well. So I believe that this code is correct for that case as well.

>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 8, 2013, 7:49 p.m. UTC | #3
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 8 Nov 2013 08:52:01 +0100

> Fri, Nov 08, 2013 at 12:54:53AM CET, davem@davemloft.net wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Date: Wed,  6 Nov 2013 17:52:19 +0100
>>
>>> If reassembled packet would fit into outdev MTU, it is not fragmented
>>> according the original frag size and it is send as single big packet.
>>> 
>>> The second case is if skb is gso. In that case fragmentation does not happen
>>> according to the original frag size.
>>> 
>>> This patch fixes these.
>>> 
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ...
>>
>>>  	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
>>> -	    dst_allfrag(skb_dst(skb)))
>>> +	    dst_allfrag(skb_dst(skb)) ||
>>> +	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
>>>  		return ip6_fragment(skb, ip6_finish_output2);
>>
>>Jiri are you sure that you don't need to take GSO into account in the
>>new part you are adding to the test?
> 
> 
> For gso skb, we need co cap outgoing fragments by the original frag size
> as well. So I believe that this code is correct for that case as well.

I'm still not so sure I agree, even after having taken a second look
at this.

Look at ipv4's logic for this same facility:

		if (skb->len > ip_skb_dst_mtu(skb) && !skb_is_gso(skb))
			return ip_fragment(skb, ip_finish_output2);

Strictly, we only call ip_fragment() if skb_is_gso() is false.  And then
in ip_fragment():

	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->local_df) ||
		     (IPCB(skb)->frag_max_size &&
		      IPCB(skb)->frag_max_size > dst_mtu(&rt->dst)))) {

And that second branch of this test is what you're trying to duplicate
into ipv6.

Perhaps I don't understand completely the intentions and logic of
dst_allfrag() in the ipv6 case, and maybe you can explain it to me.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Nov. 8, 2013, 10:20 p.m. UTC | #4
On Fri, Nov 08, 2013 at 02:49:15PM -0500, David Miller wrote:
> Perhaps I don't understand completely the intentions and logic of
> dst_allfrag() in the ipv6 case, and maybe you can explain it to me.

dst_allfrag gets active if we receive a PMTU packet indicating a MTU
smaller than 1280. Of course IPv6 may not go below 1280 but this indicates
e.g. a IPv6->IPv4 migration technology is on the path which needs the
IPv6 fragment header and its fragment id to generate IPv4 fragment ids
out of the IPv6 ones to produce IPv4 fragments just after this migration
router which could well be smaller than 1280.

Maybe this does help a bit,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 9, 2013, 11 a.m. UTC | #5
Fri, Nov 08, 2013 at 08:49:15PM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Fri, 8 Nov 2013 08:52:01 +0100
>
>> Fri, Nov 08, 2013 at 12:54:53AM CET, davem@davemloft.net wrote:
>>>From: Jiri Pirko <jiri@resnulli.us>
>>>Date: Wed,  6 Nov 2013 17:52:19 +0100
>>>
>>>> If reassembled packet would fit into outdev MTU, it is not fragmented
>>>> according the original frag size and it is send as single big packet.
>>>> 
>>>> The second case is if skb is gso. In that case fragmentation does not happen
>>>> according to the original frag size.
>>>> 
>>>> This patch fixes these.
>>>> 
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ...
>>>
>>>>  	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
>>>> -	    dst_allfrag(skb_dst(skb)))
>>>> +	    dst_allfrag(skb_dst(skb)) ||
>>>> +	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
>>>>  		return ip6_fragment(skb, ip6_finish_output2);
>>>
>>>Jiri are you sure that you don't need to take GSO into account in the
>>>new part you are adding to the test?
>> 
>> 
>> For gso skb, we need co cap outgoing fragments by the original frag size
>> as well. So I believe that this code is correct for that case as well.
>
>I'm still not so sure I agree, even after having taken a second look
>at this.
>
>Look at ipv4's logic for this same facility:
>
>		if (skb->len > ip_skb_dst_mtu(skb) && !skb_is_gso(skb))
>			return ip_fragment(skb, ip_finish_output2);
>
>Strictly, we only call ip_fragment() if skb_is_gso() is false.  And then
>in ip_fragment():
>
>	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->local_df) ||
>		     (IPCB(skb)->frag_max_size &&
>		      IPCB(skb)->frag_max_size > dst_mtu(&rt->dst)))) {
>
>And that second branch of this test is what you're trying to duplicate
>into ipv6.

That is a different check and the same one is already in ip6_fragment().

You cannot compare this to ipv4 directly. In ipv4 if frag skbs are
reassembled into one, they can be forwarded out in different frag sizes
(bigger or smaller) or not in frags at all. Therefore you can lay off
the work to offload.

But for ipv6, the same frags need to go out as they came in. Offload would
not do that as it would try to max the flag sizes to the MTU ->
That is exactly why I add the "skb->len > IP6CB(skb)->frag_max_size" check.

Imagine scenario:

hostA-NIC(MTU1400) ------ NIC(MTU1400)-hostB-NIC(MTU1500) ------ NIC(MTU1500)-hostC

And fragmented packets go hostA->hostB->hostC, and we are doing
forwadring on hostB.

I hope I cleared this out. 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 11, 2013, 5:47 a.m. UTC | #6
From: Jiri Pirko <jiri@resnulli.us>
Date: Sat, 9 Nov 2013 12:00:53 +0100

> I hope I cleared this out. 

Thanks for the explanation, I see now.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 91fb4e8..5e31a90 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -125,7 +125,8 @@  static int ip6_finish_output2(struct sk_buff *skb)
 static int ip6_finish_output(struct sk_buff *skb)
 {
 	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
-	    dst_allfrag(skb_dst(skb)))
+	    dst_allfrag(skb_dst(skb)) ||
+	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
 		return ip6_fragment(skb, ip6_finish_output2);
 	else
 		return ip6_finish_output2(skb);