diff mbox

[net,v2,1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets

Message ID 1445958135-19805-2-git-send-email-hannes@stressinduktion.org
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa Oct. 27, 2015, 3:02 p.m. UTC
We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/ip_output.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Tom Herbert Oct. 27, 2015, 4:04 p.m. UTC | #1
On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We cannot reliable calculate packet size on MSG_MORE corked sockets
> and thus cannot decide if they are going to be fragmented later on,
> so better not use CHECKSUM_PARTIAL in the first place.
>
MSG_MORE should be independent of checksum offload. If packet is
fragmented the fix in ip_output will ensure that skb_checksum_help is
properly called.

> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv4/ip_output.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 50e2973..0b02417 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -911,6 +911,7 @@ static int __ip_append_data(struct sock *sk,
>         if (transhdrlen &&
>             length + fragheaderlen <= mtu &&
>             rt->dst.dev->features & NETIF_F_V4_CSUM &&
> +           !(flags & MSG_MORE) &&
>             !exthdrlen)
>                 csummode = CHECKSUM_PARTIAL;
>
> --
> 2.5.0
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 27, 2015, 4:34 p.m. UTC | #2
On Tue, Oct 27, 2015, at 17:04, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > We cannot reliable calculate packet size on MSG_MORE corked sockets
> > and thus cannot decide if they are going to be fragmented later on,
> > so better not use CHECKSUM_PARTIAL in the first place.
> >
> MSG_MORE should be independent of checksum offload. If packet is
> fragmented the fix in ip_output will ensure that skb_checksum_help is
> properly called.

The probability is that we are going to fragment if MSG_MORE is set,
because exceeding link mtu is quite probable, see e.g. NFS use case. Why
not simply use the csum functions during copy-in in that case? It makes
much more sense to me.

I don't see a reason to test for fragment length at all, then.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Oct. 27, 2015, 4:41 p.m. UTC | #3
On Tue, Oct 27, 2015 at 9:34 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Oct 27, 2015, at 17:04, Tom Herbert wrote:
>> On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > We cannot reliable calculate packet size on MSG_MORE corked sockets
>> > and thus cannot decide if they are going to be fragmented later on,
>> > so better not use CHECKSUM_PARTIAL in the first place.
>> >
>> MSG_MORE should be independent of checksum offload. If packet is
>> fragmented the fix in ip_output will ensure that skb_checksum_help is
>> properly called.
>
> The probability is that we are going to fragment if MSG_MORE is set,
> because exceeding link mtu is quite probable, see e.g. NFS use case. Why
> not simply use the csum functions during copy-in in that case? It makes
> much more sense to me.
>
For datagram sockets MSG_MORE means that more datagrams will be sent,
it's not used to incrementally add data to a datagram already queued
(SEQPACKET with EOR is for that).

> I don't see a reason to test for fragment length at all, then.
>
> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 50e2973..0b02417 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -911,6 +911,7 @@  static int __ip_append_data(struct sock *sk,
 	if (transhdrlen &&
 	    length + fragheaderlen <= mtu &&
 	    rt->dst.dev->features & NETIF_F_V4_CSUM &&
+	    !(flags & MSG_MORE) &&
 	    !exthdrlen)
 		csummode = CHECKSUM_PARTIAL;