diff mbox

ipv6: udp packets following an UFO enqueued packet need also be handled by UFO

Message ID 20130921042700.GB8070@order.stressinduktion.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa Sept. 21, 2013, 4:27 a.m. UTC
In the following scenario the socket is corked:
If the first UDP packet is larger then the mtu we try to append it to the
write queue via ip6_ufo_append_data. A following packet, which is smaller
than the mtu would be appended to the already queued up gso-skb via
plain ip6_append_data. This causes random memory corruptions.

In ip6_ufo_append_data we also have to be careful to not queue up the
same skb multiple times. So setup the gso frame only when no first skb
is available.

This also fixes a shortcoming where we add the current packet's length to
cork->length but return early because of a packet > mtu with dontfrag set
(instead of sutracting it again).

Found with trinity.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---

I could only test this with virtualized UFO enabled network cards. Could
someone test this on real hardware?

 net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 31 deletions(-)

Comments

Hannes Frederic Sowa Sept. 23, 2013, 12:43 a.m. UTC | #1
On Sat, Sep 21, 2013 at 06:27:00AM +0200, Hannes Frederic Sowa wrote:
> In the following scenario the socket is corked:
> If the first UDP packet is larger then the mtu we try to append it to the
> write queue via ip6_ufo_append_data. A following packet, which is smaller
> than the mtu would be appended to the already queued up gso-skb via
> plain ip6_append_data. This causes random memory corruptions.
> 
> In ip6_ufo_append_data we also have to be careful to not queue up the
> same skb multiple times. So setup the gso frame only when no first skb
> is available.
> 
> This also fixes a shortcoming where we add the current packet's length to
> cork->length but return early because of a packet > mtu with dontfrag set
> (instead of sutracting it again).
> 
> Found with trinity.
> 
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Further analysis showed it is very probable that this fixes the bug Dmitry
reported. So I want to give proper credits:

Reported-by: Dmitry Vyukov <dvyukov@google.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 24, 2013, 3:43 p.m. UTC | #2
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sat, 21 Sep 2013 06:27:00 +0200

> In the following scenario the socket is corked:
> If the first UDP packet is larger then the mtu we try to append it to the
> write queue via ip6_ufo_append_data. A following packet, which is smaller
> than the mtu would be appended to the already queued up gso-skb via
> plain ip6_append_data. This causes random memory corruptions.
> 
> In ip6_ufo_append_data we also have to be careful to not queue up the
> same skb multiple times. So setup the gso frame only when no first skb
> is available.
> 
> This also fixes a shortcoming where we add the current packet's length to
> cork->length but return early because of a packet > mtu with dontfrag set
> (instead of sutracting it again).
> 
> Found with trinity.
> 
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Sept. 30, 2013, 11:43 a.m. UTC | #3
Sat, Sep 21, 2013 at 06:27:00AM CEST, hannes@stressinduktion.org wrote:
>In the following scenario the socket is corked:
>If the first UDP packet is larger then the mtu we try to append it to the
>write queue via ip6_ufo_append_data. A following packet, which is smaller
>than the mtu would be appended to the already queued up gso-skb via
>plain ip6_append_data. This causes random memory corruptions.
>
>In ip6_ufo_append_data we also have to be careful to not queue up the
>same skb multiple times. So setup the gso frame only when no first skb
>is available.
>
>This also fixes a shortcoming where we add the current packet's length to
>cork->length but return early because of a packet > mtu with dontfrag set
>(instead of sutracting it again).
>
>Found with trinity.
>
>Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
>Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>---
>
>I could only test this with virtualized UFO enabled network cards. Could
>someone test this on real hardware?
>
> net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
> 1 file changed, 22 insertions(+), 31 deletions(-)
>
>diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>index 3a692d5..a54c45c 100644
>--- a/net/ipv6/ip6_output.c
>+++ b/net/ipv6/ip6_output.c
>@@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 	 * udp datagram
> 	 */
> 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
>+		struct frag_hdr fhdr;
>+
> 		skb = sock_alloc_send_skb(sk,
> 			hh_len + fragheaderlen + transhdrlen + 20,
> 			(flags & MSG_DONTWAIT), &err);
>@@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 		skb->protocol = htons(ETH_P_IPV6);
> 		skb->ip_summed = CHECKSUM_PARTIAL;
> 		skb->csum = 0;
>-	}
>-
>-	err = skb_append_datato_frags(sk,skb, getfrag, from,
>-				      (length - transhdrlen));
>-	if (!err) {
>-		struct frag_hdr fhdr;
> 
> 		/* Specify the length of each IPv6 datagram fragment.
> 		 * It has to be a multiple of 8.
>@@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 		ipv6_select_ident(&fhdr, rt);
> 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> 		__skb_queue_tail(&sk->sk_write_queue, skb);
>-
>-		return 0;
> 	}
>-	/* There is not enough support do UPD LSO,
>-	 * so follow normal path
>-	 */
>-	kfree_skb(skb);
> 
>-	return err;
>+	return skb_append_datato_frags(sk, skb, getfrag, from,
>+				       (length - transhdrlen));
> }
> 

What if non-ufo-path-created skb is peeked?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Sept. 30, 2013, 5:23 p.m. UTC | #4
On Mon, Sep 30, 2013 at 01:43:43PM +0200, Jiri Pirko wrote:
> Sat, Sep 21, 2013 at 06:27:00AM CEST, hannes@stressinduktion.org wrote:
> >In the following scenario the socket is corked:
> >If the first UDP packet is larger then the mtu we try to append it to the
> >write queue via ip6_ufo_append_data. A following packet, which is smaller
> >than the mtu would be appended to the already queued up gso-skb via
> >plain ip6_append_data. This causes random memory corruptions.
> >
> >In ip6_ufo_append_data we also have to be careful to not queue up the
> >same skb multiple times. So setup the gso frame only when no first skb
> >is available.
> >
> >This also fixes a shortcoming where we add the current packet's length to
> >cork->length but return early because of a packet > mtu with dontfrag set
> >(instead of sutracting it again).
> >
> >Found with trinity.
> >
> >Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> >Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >---
> >
> >I could only test this with virtualized UFO enabled network cards. Could
> >someone test this on real hardware?
> >
> > net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
> > 1 file changed, 22 insertions(+), 31 deletions(-)
> >
> >diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >index 3a692d5..a54c45c 100644
> >--- a/net/ipv6/ip6_output.c
> >+++ b/net/ipv6/ip6_output.c
> >@@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 	 * udp datagram
> > 	 */
> > 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
> >+		struct frag_hdr fhdr;
> >+
> > 		skb = sock_alloc_send_skb(sk,
> > 			hh_len + fragheaderlen + transhdrlen + 20,
> > 			(flags & MSG_DONTWAIT), &err);
> >@@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 		skb->protocol = htons(ETH_P_IPV6);
> > 		skb->ip_summed = CHECKSUM_PARTIAL;
> > 		skb->csum = 0;
> >-	}
> >-
> >-	err = skb_append_datato_frags(sk,skb, getfrag, from,
> >-				      (length - transhdrlen));
> >-	if (!err) {
> >-		struct frag_hdr fhdr;
> > 
> > 		/* Specify the length of each IPv6 datagram fragment.
> > 		 * It has to be a multiple of 8.
> >@@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 		ipv6_select_ident(&fhdr, rt);
> > 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > 		__skb_queue_tail(&sk->sk_write_queue, skb);
> >-
> >-		return 0;
> > 	}
> >-	/* There is not enough support do UPD LSO,
> >-	 * so follow normal path
> >-	 */
> >-	kfree_skb(skb);
> > 
> >-	return err;
> >+	return skb_append_datato_frags(sk, skb, getfrag, from,
> >+				       (length - transhdrlen));
> > }
> > 
> 
> What if non-ufo-path-created skb is peeked?

You mean like:
first append a frame < mtu
second frame > mtu so it gets handles by ip6_ufo_append?

Currently I don't see a problem with it but I may be wrong. What is
your suspicion?

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3a692d5..a54c45c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1015,6 +1015,8 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
+		struct frag_hdr fhdr;
+
 		skb = sock_alloc_send_skb(sk,
 			hh_len + fragheaderlen + transhdrlen + 20,
 			(flags & MSG_DONTWAIT), &err);
@@ -1036,12 +1038,6 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
-	}
-
-	err = skb_append_datato_frags(sk,skb, getfrag, from,
-				      (length - transhdrlen));
-	if (!err) {
-		struct frag_hdr fhdr;
 
 		/* Specify the length of each IPv6 datagram fragment.
 		 * It has to be a multiple of 8.
@@ -1052,15 +1048,10 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 		ipv6_select_ident(&fhdr, rt);
 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
-
-		return 0;
 	}
-	/* There is not enough support do UPD LSO,
-	 * so follow normal path
-	 */
-	kfree_skb(skb);
 
-	return err;
+	return skb_append_datato_frags(sk, skb, getfrag, from,
+				       (length - transhdrlen));
 }
 
 static inline struct ipv6_opt_hdr *ip6_opt_dup(struct ipv6_opt_hdr *src,
@@ -1227,27 +1218,27 @@  int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	 * --yoshfuji
 	 */
 
-	cork->length += length;
-	if (length > mtu) {
-		int proto = sk->sk_protocol;
-		if (dontfrag && (proto == IPPROTO_UDP || proto == IPPROTO_RAW)){
-			ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen);
-			return -EMSGSIZE;
-		}
-
-		if (proto == IPPROTO_UDP &&
-		    (rt->dst.dev->features & NETIF_F_UFO)) {
+	if ((length > mtu) && dontfrag && (sk->sk_protocol == IPPROTO_UDP ||
+					   sk->sk_protocol == IPPROTO_RAW)) {
+		ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen);
+		return -EMSGSIZE;
+	}
 
-			err = ip6_ufo_append_data(sk, getfrag, from, length,
-						  hh_len, fragheaderlen,
-						  transhdrlen, mtu, flags, rt);
-			if (err)
-				goto error;
-			return 0;
-		}
+	skb = skb_peek_tail(&sk->sk_write_queue);
+	cork->length += length;
+	if (((length > mtu) ||
+	     (skb && skb_is_gso(skb))) &&
+	    (sk->sk_protocol == IPPROTO_UDP) &&
+	    (rt->dst.dev->features & NETIF_F_UFO)) {
+		err = ip6_ufo_append_data(sk, getfrag, from, length,
+					  hh_len, fragheaderlen,
+					  transhdrlen, mtu, flags, rt);
+		if (err)
+			goto error;
+		return 0;
 	}
 
-	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL)
+	if (!skb)
 		goto alloc_new_skb;
 
 	while (length > 0) {