diff mbox

[net] ipv6: don't use CHECKSUM_PARTIAL on MSG_MORE/UDP_CORK sockets

Message ID 1445351922-8463-1-git-send-email-hannes@stressinduktion.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa Oct. 20, 2015, 2:38 p.m. UTC
MSG_MORE might cause the packet to get fragmented in the end when
passed down to the flush function and the transhdrlen check alone is
not sufficient to protect against fragmentation. Instead check if the
socket user intends to add more data to the socket on the first packet.

This broke checksum calculation for UDPv6 for NFS protocols.

Fixes: 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
Cc: Vlad Yasevich <vyasevich@gmail.com>
Tested-by: Sabrina Dubroca <sd@quesysnail.net>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_output.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Vladislav Yasevich Oct. 20, 2015, 2:55 p.m. UTC | #1
On 10/20/2015 10:38 AM, Hannes Frederic Sowa wrote:
> MSG_MORE might cause the packet to get fragmented in the end when
> passed down to the flush function and the transhdrlen check alone is
> not sufficient to protect against fragmentation. Instead check if the
> socket user intends to add more data to the socket on the first packet.
> 
> This broke checksum calculation for UDPv6 for NFS protocols.
> 
> Fixes: 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
> Cc: Vlad Yasevich <vyasevich@gmail.com>

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> Tested-by: Sabrina Dubroca <sd@quesysnail.net>
> Tested-by: Benjamin Coddington <bcodding@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv6/ip6_output.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 61d403e..95c5780 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1317,6 +1317,7 @@ emsgsize:
>  	 * sums only work when transhdrlen is set.
>  	 */
>  	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
> +	    !(flags & MSG_MORE) &&
>  	    length + fragheaderlen < mtu &&
>  	    rt->dst.dev->features & NETIF_F_V6_CSUM &&
>  	    !exthdrlen)
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich Oct. 20, 2015, 9:39 p.m. UTC | #2
On 10/20/2015 10:38 AM, Hannes Frederic Sowa wrote:
> MSG_MORE might cause the packet to get fragmented in the end when
> passed down to the flush function and the transhdrlen check alone is
> not sufficient to protect against fragmentation. Instead check if the
> socket user intends to add more data to the socket on the first packet.
> 
> This broke checksum calculation for UDPv6 for NFS protocols.
> 
> Fixes: 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Tested-by: Sabrina Dubroca <sd@quesysnail.net>
> Tested-by: Benjamin Coddington <bcodding@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv6/ip6_output.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 61d403e..95c5780 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1317,6 +1317,7 @@ emsgsize:
>  	 * sums only work when transhdrlen is set.
>  	 */
>  	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
> +	    !(flags & MSG_MORE) &&
>  	    length + fragheaderlen < mtu &&
>  	    rt->dst.dev->features & NETIF_F_V6_CSUM &&
>  	    !exthdrlen)
> 

Hmm... so while this solves this problem by simply avoiding the combination of
skb #1 having CHECKSUM_PARTIAL and others having CHECKSUM_NONE, I think the actual
problem is a bit deeper.
The above combination seems to work for me since udp6_hwcsum_outgoing() corrects
the checksum.  However, my testing so far has been on nics that have NETIF_F_V6_CSUM,
but without UFO support.

On such systems a simple test of using MSG_MORE an IPv6 udp socket sending 200 bytes
followed by 2000 bytes works correctly.

I am now wondering if this might be UFO related instead and looking for a nic that
has UFO support.

-vlad


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 21, 2015, 9:51 a.m. UTC | #3
Hi Vlad,

On Tue, Oct 20, 2015, at 23:39, Vlad Yasevich wrote:
> On 10/20/2015 10:38 AM, Hannes Frederic Sowa wrote:
> > MSG_MORE might cause the packet to get fragmented in the end when
> > passed down to the flush function and the transhdrlen check alone is
> > not sufficient to protect against fragmentation. Instead check if the
> > socket user intends to add more data to the socket on the first packet.
> > 
> > This broke checksum calculation for UDPv6 for NFS protocols.
> > 
> > Fixes: 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Tested-by: Sabrina Dubroca <sd@quesysnail.net>
> > Tested-by: Benjamin Coddington <bcodding@redhat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> >  net/ipv6/ip6_output.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index 61d403e..95c5780 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1317,6 +1317,7 @@ emsgsize:
> >  	 * sums only work when transhdrlen is set.
> >  	 */
> >  	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
> > +	    !(flags & MSG_MORE) &&
> >  	    length + fragheaderlen < mtu &&
> >  	    rt->dst.dev->features & NETIF_F_V6_CSUM &&
> >  	    !exthdrlen)
> > 
> 
> Hmm... so while this solves this problem by simply avoiding the
> combination of
> skb #1 having CHECKSUM_PARTIAL and others having CHECKSUM_NONE, I think
> the actual
> problem is a bit deeper.
> The above combination seems to work for me since udp6_hwcsum_outgoing()
> corrects
> the checksum.  However, my testing so far has been on nics that have
> NETIF_F_V6_CSUM,
> but without UFO support.

With nfs tests we never branch into setting up or extending an UDP UFO
packet, also because on the test system UFO is disabled on all
interfaces.

I thought about relaxing this check in future when we simply make sure
we don't do fragmentation based based on the length while taking all
fragments into account.

> On such systems a simple test of using MSG_MORE an IPv6 udp socket
> sending 200 bytes
> followed by 2000 bytes works correctly.

Did you make sure it actually fragmented and the checksums are correct?

> I am now wondering if this might be UFO related instead and looking for a
> nic that
> has UFO support.

So far as I can see it has nothing to do with UFO. I will do more
investigation now. Thanks for bringing this up!

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 21, 2015, 12:52 p.m. UTC | #4
On Tue, Oct 20, 2015, at 23:39, Vlad Yasevich wrote:
> I am now wondering if this might be UFO related instead and looking for a
> nic that
> has UFO support.

I doubt that.

We overallocate memory first time in ip6_append_data because we are in
MSG_MORE mode. Then, in my case the second write only copies data into
the first skb on the write queue, no skb is appended to frags_list. So
udp6_hwcsum_outgoing doesn't clean up the flags, either.

We can improve the check if we fragment an ipv6 skb in ip6_append_mode
in net-next, I agree. But I still see this fix suitable for 'net' tree.

We could also improve udp6_hwcsum_outgoing to check if our packets get
fragmented and fall back to the clean-up path. But I think this kind of
optimization should go into net-next, too.

Currently the check made sure we don't use PARTIAL on skbs which could
fragment. MSG_MORE somehow circumvented that check, so I think the fix
is good to go. We certainly can try to improve PARTIAL checksums for
fragmented packets.

What do you think?

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 61d403e..95c5780 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1317,6 +1317,7 @@  emsgsize:
 	 * sums only work when transhdrlen is set.
 	 */
 	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
+	    !(flags & MSG_MORE) &&
 	    length + fragheaderlen < mtu &&
 	    rt->dst.dev->features & NETIF_F_V6_CSUM &&
 	    !exthdrlen)