diff mbox series

[RFC,13/14] sendmsg: don't restart mptcp_sendmsg_frag

Message ID 20191114173225.21199-14-fw@strlen.de
State Superseded, archived
Headers show
Series [RFC] mptcp: wmem accounting and nonblocking io support | expand

Commit Message

Florian Westphal Nov. 14, 2019, 5:32 p.m. UTC
This function calls do_tcp_sendpages which already has such a loop.

When tcp sendbuffer runs out of space and non-blocking io is used,
do_tcp_sendpages will return early because it can't sleep.
No -EAGAIN is returned, as some data was sent.

When mptcp_sendmsg_frag is called again, next call will either return
-EAGAIN immediately or it will only send a few more bytes.

Simplify this and leave all 'allocate another skb?' logic to tcp.

This would need to be spread over multiple changes, I'd propose
I do the squash myself and send a pull request for the updated branch
if thats fine with you.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

Comments

Paolo Abeni Nov. 18, 2019, 12:11 p.m. UTC | #1
On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> This function calls do_tcp_sendpages which already has such a loop.
> 
> When tcp sendbuffer runs out of space and non-blocking io is used,
> do_tcp_sendpages will return early because it can't sleep.
> No -EAGAIN is returned, as some data was sent.
> 
> When mptcp_sendmsg_frag is called again, next call will either return
> -EAGAIN immediately or it will only send a few more bytes.

If I understand correctly, the goal here is setting the appropriate
return value when overall sendmsg() spooled a few bytes and than would
block, right? currently we can return erroneously  -EAGAIN instead of
the number of written bytes on some scenarios.

I think that there is a side effect with this change: before a
blocking, successful, write(<large val>) would always return <large
val>,

After this patch it will return min(<large val>, <max skb size>), as
mptcp_sendmsg_frag() is limited to a single skb.

Do we want to preserve the old behavior?

Cheers,

Paolo
Florian Westphal Nov. 18, 2019, 12:17 p.m. UTC | #2
Paolo Abeni <pabeni@redhat.com> wrote:
> On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> > This function calls do_tcp_sendpages which already has such a loop.
> > 
> > When tcp sendbuffer runs out of space and non-blocking io is used,
> > do_tcp_sendpages will return early because it can't sleep.
> > No -EAGAIN is returned, as some data was sent.
> > 
> > When mptcp_sendmsg_frag is called again, next call will either return
> > -EAGAIN immediately or it will only send a few more bytes.
> 
> If I understand correctly, the goal here is setting the appropriate
> return value when overall sendmsg() spooled a few bytes and than would
> block, right? currently we can return erroneously  -EAGAIN instead of
> the number of written bytes on some scenarios.

No, the goal is to remove useless code.  The other solution would be
to teach mptcp_send_frag to be able to tell when it did hit EGAIN to
exit the loop earlier, for that it would need to be able to return
both errno and the number of queued bytes.  I didn't like adding
another int *err argument.

> I think that there is a side effect with this change: before a
> blocking, successful, write(<large val>) would always return <large
> val>,

Yes.

> After this patch it will return min(<large val>, <max skb size>), as
> mptcp_sendmsg_frag() is limited to a single skb.
> 
> Do we want to preserve the old behavior?

TCP doesn't guarantee that it consumes entire buffer either, why would
mptcp try to?
Paolo Abeni Nov. 18, 2019, 9:35 p.m. UTC | #3
On Mon, 2019-11-18 at 13:17 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> > > This function calls do_tcp_sendpages which already has such a loop.
> > > 
> > > When tcp sendbuffer runs out of space and non-blocking io is used,
> > > do_tcp_sendpages will return early because it can't sleep.
> > > No -EAGAIN is returned, as some data was sent.
> > > 
> > > When mptcp_sendmsg_frag is called again, next call will either return
> > > -EAGAIN immediately or it will only send a few more bytes.
> > 
> > If I understand correctly, the goal here is setting the appropriate
> > return value when overall sendmsg() spooled a few bytes and than would
> > block, right? currently we can return erroneously  -EAGAIN instead of
> > the number of written bytes on some scenarios.
> 
> No, the goal is to remove useless code.  The other solution would be
> to teach mptcp_send_frag to be able to tell when it did hit EGAIN to
> exit the loop earlier, for that it would need to be able to return
> both errno and the number of queued bytes.  I didn't like adding
> another int *err argument.

I see.

I have no objections on this change.

Cheers,

Paolo

p.s. we will likely have to touch sendmsg_frag() again to fix the issue
reported by Matt.
Florian Westphal Nov. 18, 2019, 9:44 p.m. UTC | #4
Paolo Abeni <pabeni@redhat.com> wrote:
> I have no objections on this change.
> 
> Cheers,
> 
> Paolo
> 
> p.s. we will likely have to touch sendmsg_frag() again to fix the issue
> reported by Matt.

Ok, I will submit a v2 *without* this change.  We
can then probably better figure out how to handle this.

(We can also keep requirements for future .sendpage in mind).
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7d3bf189b407..fbbff667e07a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -499,14 +499,10 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	pr_debug("conn_list->subflow=%p", ssk);
 
 	lock_sock(ssk);
-	while (msg_data_left(msg)) {
-		ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,
-					 &size_goal);
-		if (ret < 0)
-			break;
-
-		copied += ret;
-	}
+	ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,
+				 &size_goal);
+	if (ret > 0)
+		copied = ret;
 
 	mptcp_set_timeout(sk, ssk);
 	if (copied) {
@@ -789,7 +785,6 @@  static void mptcp_worker(struct work_struct *work)
 	struct sock *ssk, *sk;
 	struct mptcp_sock *msk;
 	u64 orig_write_seq;
-	size_t copied = 0;
 	struct msghdr msg;
 	long timeo = 0;
 
@@ -816,20 +811,16 @@  static void mptcp_worker(struct work_struct *work)
 	orig_len = dfrag->data_len;
 	orig_offset = dfrag->offset;
 	orig_write_seq = dfrag->data_seq;
-	while (dfrag->data_len > 0) {
-		ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now,
-					 &size_goal);
-		if (ret < 0)
-			break;
 
+	ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now,
+				 &size_goal);
+	if (ret > 0) {
 		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
-		copied += ret;
 		dfrag->data_len -= ret;
 		dfrag->offset += ret;
-	}
-	if (copied)
 		tcp_push(ssk, msg.msg_flags, mss_now, tcp_sk(ssk)->nonagle,
 			 size_goal);
+	}
 
 	dfrag->data_seq = orig_write_seq;
 	dfrag->offset = orig_offset;