diff mbox

ipv6: udp packets following an UFO enqueued packet need also be handled by UFO

Message ID 20131001105837.GA1424@minipsycho.brq.redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko Oct. 1, 2013, 10:58 a.m. UTC
>> 
>> What if non-ufo-path-created skb is peeked?
>
>You mean like:
>first append a frame < mtu
>second frame > mtu so it gets handles by ip6_ufo_append?
>
>Currently I don't see a problem with it but I may be wrong. What is
>your suspicion?

Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
Because of that it will be not threated as ufo skb.

Following patch fixes it:

Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/ip6_output.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

Comments

Hannes Frederic Sowa Oct. 1, 2013, 12:09 p.m. UTC | #1
On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> >> 
> >> What if non-ufo-path-created skb is peeked?
> >
> >You mean like:
> >first append a frame < mtu
> >second frame > mtu so it gets handles by ip6_ufo_append?
> >
> >Currently I don't see a problem with it but I may be wrong. What is
> >your suspicion?
> 
> Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> Because of that it will be not threated as ufo skb.
> 
> Following patch fixes it:
> 
> Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> 
> Now, if user application does:
> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

My understanding is that the frame is only a valid gso frame iff the first skb
queued up is setup as an UFO frame. In other cases we only need to make sure
we append to the fragment list without updating the gso field and the skb will
get linearized (if needed) later in the output path.

This seems not to matter for virtio_net and loopback because we seem to pass
the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
neterion and we have to verify if this assumption is correct, there.

This is also the way the IPv4 stack deals with UFO.

Either way, this needs a look from Eric Dumazet and Herbert Xu, so I added
them in Cc.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 1, 2013, 12:32 p.m. UTC | #2
On Tue, Oct 01, 2013 at 02:09:07PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> > >> 
> > >> What if non-ufo-path-created skb is peeked?
> > >
> > >You mean like:
> > >first append a frame < mtu
> > >second frame > mtu so it gets handles by ip6_ufo_append?
> > >
> > >Currently I don't see a problem with it but I may be wrong. What is
> > >your suspicion?
> > 
> > Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> > Because of that it will be not threated as ufo skb.
> > 
> > Following patch fixes it:
> > 
> > Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> > 
> > Now, if user application does:
> > sendto len<mtu flag MSG_MORE
> > sendto len>mtu flag 0
> > The skb is not treated as fragmented one because it is not initialized
> > that way. So move the initialization to fix this.
> > 
> > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> 
> My understanding is that the frame is only a valid gso frame iff the first skb
> queued up is setup as an UFO frame. In other cases we only need to make sure
> we append to the fragment list without updating the gso field and the skb will
> get linearized (if needed) later in the output path.
> 
> This seems not to matter for virtio_net and loopback because we seem to pass
> the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
> neterion and we have to verify if this assumption is correct, there.

Hm, ip_append_page seems to violate my assumption. I need to read up on the
code later today.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 1, 2013, 9:47 p.m. UTC | #3
Hi Jiri,

On Tue, Oct 01, 2013 at 02:32:14PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 01, 2013 at 02:09:07PM +0200, Hannes Frederic Sowa wrote:
> > On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> > > >> 
> > > >> What if non-ufo-path-created skb is peeked?
> > > >
> > > >You mean like:
> > > >first append a frame < mtu
> > > >second frame > mtu so it gets handles by ip6_ufo_append?
> > > >
> > > >Currently I don't see a problem with it but I may be wrong. What is
> > > >your suspicion?
> > > 
> > > Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> > > Because of that it will be not threated as ufo skb.
> > > 
> > > Following patch fixes it:
> > > 
> > > Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> > > 
> > > Now, if user application does:
> > > sendto len<mtu flag MSG_MORE
> > > sendto len>mtu flag 0
> > > The skb is not treated as fragmented one because it is not initialized
> > > that way. So move the initialization to fix this.
> > > 
> > > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> > 
> > My understanding is that the frame is only a valid gso frame iff the first skb
> > queued up is setup as an UFO frame. In other cases we only need to make sure
> > we append to the fragment list without updating the gso field and the skb will
> > get linearized (if needed) later in the output path.
> > 
> > This seems not to matter for virtio_net and loopback because we seem to pass
> > the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
> > neterion and we have to verify if this assumption is correct, there.
> 
> Hm, ip_append_page seems to violate my assumption. I need to read up on the
> code later today.

I still guess my assumption about gso is correct and I will have to review how
the ip_append_page codepath works in detail.

I currently have a tree where I have applied my UFO patch and your dontfrag
patch. When I have an application which does:

socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0	// UFO is enabled currently
setsockopt(3, SOL_UDP, 1, [1], 4)       = 0		// this sets UDP_CORK active
setsockopt(3, SOL_IPV6, IPV6_MTU, [1280], 4) = 0	// minimal IPV6 mtu
write(3, " ", 1)                        = 1		// generates non-gso head
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3701) = 3701	// generates gso segment in frags but skb_is_gso(skb) == false
write(3, " ", 1						// and boom: null pointer deref and slab corruption

Without your patch this would not been possible and I am not sure your
patch breaks assumptions for hardware UFO. We should wait with that
patch until we sorted out the details.

My current guess is that we need to replace the skb_is_gso() with
something that checks for already appended frags and continue to use
sbk_append_datato_frags in that case.

The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
get an oops.

IPv4 seems to work without problems, too.

Maybe I miss something? :|

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..c6cfa2f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1008,6 +1008,7 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 
 {
 	struct sk_buff *skb;
+	struct frag_hdr fhdr;
 	int err;
 
 	/* There is support for UDP large send offload by network
@@ -1015,8 +1016,6 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-		struct frag_hdr fhdr;
-
 		skb = sock_alloc_send_skb(sk,
 			hh_len + fragheaderlen + transhdrlen + 20,
 			(flags & MSG_DONTWAIT), &err);
@@ -1036,20 +1035,23 @@  static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->transport_header = skb->network_header + fragheaderlen;
 
 		skb->protocol = htons(ETH_P_IPV6);
-		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
 
-		/* Specify the length of each IPv6 datagram fragment.
-		 * It has to be a multiple of 8.
-		 */
-		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
-					     sizeof(struct frag_hdr)) & ~7;
-		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-		ipv6_select_ident(&fhdr, rt);
-		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
-	}
+	} else if (skb_is_gso(skb))
+		goto append;
+
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	/* Specify the length of each IPv6 datagram fragment.
+	 * It has to be a multiple of 8.
+	 */
+	skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
+				     sizeof(struct frag_hdr)) & ~7;
+	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+	ipv6_select_ident(&fhdr, rt);
+	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 
+append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
 				       (length - transhdrlen));
 }