diff mbox

ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly

Message ID 1394512309-8823-1-git-send-email-lucien.xin@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Xin Long March 11, 2014, 4:31 a.m. UTC
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74
said.  however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.

when I test through  ping6 -c1 -s5000 $ip:
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)

which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)

so delete the min() when change back the mtu.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv6/ip6_output.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Hannes Frederic Sowa March 11, 2014, 2:49 p.m. UTC | #1
On Tue, Mar 11, 2014 at 12:31:49PM +0800, Xin Long wrote:
> -			*mtu = min(*mtu, pmtuprobe ?
> -				   rt->dst.dev->mtu :
> -				   dst_mtu(rt->dst.path));
> +			*mtu = pmtuprobe ? rt->dst.dev->mtu :
> +				   dst_mtu(rt->dst.path);

Sorry, that is not correct:

The min() protects the mtu going over np->frag_size (if set). In case we
remove the min we would fallback to dev->mtu or dst_mtu and thus this could
lead to a situation where the first fragment respects frag_size but second
not. This confuses ip6_append_data and would lead to a crash.

I am thinking about changing this to

min(*mtu + rt->dst.header_len, pmtuprobe ? rt->dst.dev->mtu : dst_mtu(rt->dst.path))

or we pass the np directly and test for frag_size again.

Good catch which should be fixed. Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xin Long March 12, 2014, 2:40 a.m. UTC | #2
On Tue, Mar 11, 2014 at 10:49 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
>
> Sorry, that is not correct:
>
> The min() protects the mtu going over np->frag_size (if set). In case we
> remove the min we would fallback to dev->mtu or dst_mtu and thus this could
> lead to a situation where the first fragment respects frag_size but second
> not. This confuses ip6_append_data and would lead to a crash.
>
yes, your analysis is quite right,  I ignore the code:
                if (np->frag_size < mtu) {
                        if (np->frag_size)
                                mtu = np->frag_size;
                }
> I am thinking about changing this to
>
> min(*mtu + rt->dst.header_len, pmtuprobe ? rt->dst.dev->mtu : dst_mtu(rt->dst.path))
>
> or we pass the np directly and test for frag_size again.

but I cannot understand the top half of ip6_append_data() has the code
to get mtu,
                if (rt->dst.flags & DST_XFRM_TUNNEL)
                        mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
                              rt->dst.dev->mtu : dst_mtu(&rt->dst);
                else
                        mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
                              rt->dst.dev->mtu : dst_mtu(rt->dst.path);

 why it need to calculate mtu again?  just "mtu=*mtu +
rt->dst.header_len", isn't it sufficient?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa March 12, 2014, 10:26 a.m. UTC | #3
On Wed, Mar 12, 2014 at 10:40:50AM +0800, lucien xin wrote:
> but I cannot understand the top half of ip6_append_data() has the code
> to get mtu,
>                 if (rt->dst.flags & DST_XFRM_TUNNEL)
>                         mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
>                               rt->dst.dev->mtu : dst_mtu(&rt->dst);
>                 else
>                         mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
>                               rt->dst.dev->mtu : dst_mtu(rt->dst.path);
> 
>  why it need to calculate mtu again?  just "mtu=*mtu +
> rt->dst.header_len", isn't it sufficient?

It would be possible if we are absolutely sure if we don't call
ip6_append_data_mtu a second time, which I have not yet reviewed.

The line I proposed above may also suffer from this problem.

Maybe you already checked that?

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2bc1070..dd05067 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1113,9 +1113,8 @@  static void ip6_append_data_mtu(unsigned int *mtu,
 			 * this fragment is not first, the headers
 			 * space is regarded as data space.
 			 */
-			*mtu = min(*mtu, pmtuprobe ?
-				   rt->dst.dev->mtu :
-				   dst_mtu(rt->dst.path));
+			*mtu = pmtuprobe ? rt->dst.dev->mtu :
+				   dst_mtu(rt->dst.path);
 		}
 		*maxfraglen = ((*mtu - fragheaderlen) & ~7)
 			      + fragheaderlen - sizeof(struct frag_hdr);