diff mbox series

[v2,net-next,2/2] mptcp: move recbuf adjustment to recvmsg path

Message ID 20200525181508.13492-3-fw@strlen.de
State Changes Requested
Delegated to: David Miller
Headers show
Series mptcp: adjust tcp rcvspace on rx | expand

Commit Message

Florian Westphal May 25, 2020, 6:15 p.m. UTC
From: Paolo Abeni <pabeni@redhat.com>

Place receive window tuning in the recvmsg path.
This makes sure the size is only increased when userspace consumes data.

Previously we would grow the sk receive buffer towards tcp_rmem[2], now we
so only if userspace reads data.

Simply adjust the msk rcvbuf size to the largest receive buffer of any of
the existing subflows.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 This patch is new in v2.

 net/mptcp/protocol.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

Comments

Christoph Paasch May 26, 2020, 4:07 p.m. UTC | #1
Hello,

On Mon, May 25, 2020 at 3:19 PM Florian Westphal <fw@strlen.de> wrote:
>
> From: Paolo Abeni <pabeni@redhat.com>
>
> Place receive window tuning in the recvmsg path.
> This makes sure the size is only increased when userspace consumes data.
>
> Previously we would grow the sk receive buffer towards tcp_rmem[2], now we
> so only if userspace reads data.
>
> Simply adjust the msk rcvbuf size to the largest receive buffer of any of
> the existing subflows.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  This patch is new in v2.
>
>  net/mptcp/protocol.c | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index dbb86cbb9e77..89a35c3fc499 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -190,13 +190,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>                 return false;
>         }
>
> -       if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> -               int rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
> -
> -               if (rcvbuf > sk->sk_rcvbuf)
> -                       sk->sk_rcvbuf = rcvbuf;
> -       }
> -
>         tp = tcp_sk(ssk);
>         do {
>                 u32 map_remaining, offset;
> @@ -933,6 +926,25 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
>         return moved > 0;
>  }
>
> +static void mptcp_rcv_space_adjust(struct mptcp_sock *msk)
> +{
> +       const struct mptcp_subflow_context *subflow;
> +       struct sock *sk = (struct sock *)msk;
> +       const struct sock *ssk;
> +       int rcvbuf = 0;
> +
> +       if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
> +               return;
> +
> +       mptcp_for_each_subflow(msk, subflow) {
> +               ssk = mptcp_subflow_tcp_sock(subflow);
> +               rcvbuf = max(ssk->sk_rcvbuf, rcvbuf);

tcp_rcv_space_adjust is called even when the app is not yet reading,
thus wouldn't this mean that we still end up with an ever-growing
window?
E.g., imagine an app that does not read at all at the beginning. The
call to tcp_rcv_space_adjust in patch 1/2 will make the subflow's
window grow. Now, the app comes and reads one byte. Then, the window
at MPTCP-level will jump regardless of how much the app actually read.

I think what is needed is to size the MPTCP-level window to 2 x the
amount of data read by the application within an RTT (maximum RTT
among all the active subflows). That way if an app reads 1 byte a
second, the window will remain low. While for a bulk-transfer it will
allow all subflows to receive at full speed [1].

Or do you think that kind of tuning can be done in a follow-up patch?


Christoph

[1]: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final125.pdf
-> Section 4.2
Florian Westphal May 27, 2020, 11:55 a.m. UTC | #2
Christoph Paasch <christoph.paasch@gmail.com> wrote:
> tcp_rcv_space_adjust is called even when the app is not yet reading,
> thus wouldn't this mean that we still end up with an ever-growing
> window?

Window is based on available mptcp sk recvbuf.  When data is moved from
ssk to the mptcp sk, the skb truesize is charged to the mptcp rmem.

> E.g., imagine an app that does not read at all at the beginning. The
> call to tcp_rcv_space_adjust in patch 1/2 will make the subflow's
> window grow. Now, the app comes and reads one byte. Then, the window
> at MPTCP-level will jump regardless of how much the app actually read.

Yes, the rcvbufsz value will jump, regardless homw much the app
actually read.

> I think what is needed is to size the MPTCP-level window to 2 x the
> amount of data read by the application within an RTT (maximum RTT
> among all the active subflows). That way if an app reads 1 byte a
> second, the window will remain low. While for a bulk-transfer it will
> allow all subflows to receive at full speed [1].

Sounds like the idea to move skbs to msk was bad one?
Sorry, I don't see how I can make this work.

Even deferring tcp_rcv_space_adjust() until recv() time won't work,
given data has been pulled to the mptcp socket already.

NOT calling tcp_rcv_space_adjust() at all might work, but that would
require something else entirely.  We would still have to adjust the
subflow rcvbufsz in this case, else we may announce a window that is
larger than the memory limit of the ssk (and we will end up dropping
data at tcp level if the worker can't move the skbs fast enough).

> Or do you think that kind of tuning can be done in a follow-up patch?

This sounds completely different so I don't think that makes sense.
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index dbb86cbb9e77..89a35c3fc499 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -190,13 +190,6 @@  static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		return false;
 	}
 
-	if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
-		int rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
-
-		if (rcvbuf > sk->sk_rcvbuf)
-			sk->sk_rcvbuf = rcvbuf;
-	}
-
 	tp = tcp_sk(ssk);
 	do {
 		u32 map_remaining, offset;
@@ -933,6 +926,25 @@  static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 	return moved > 0;
 }
 
+static void mptcp_rcv_space_adjust(struct mptcp_sock *msk)
+{
+	const struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	const struct sock *ssk;
+	int rcvbuf = 0;
+
+	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+		return;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		ssk = mptcp_subflow_tcp_sock(subflow);
+		rcvbuf = max(ssk->sk_rcvbuf, rcvbuf);
+	}
+
+	if (rcvbuf)
+		sk->sk_rcvbuf = rcvbuf;
+}
+
 static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			 int nonblock, int flags, int *addr_len)
 {
@@ -962,6 +974,8 @@  static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 	__mptcp_flush_join_list(msk);
 
+	mptcp_rcv_space_adjust(msk);
+
 	while (len > (size_t)copied) {
 		int bytes_read;
 
@@ -975,8 +989,10 @@  static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		copied += bytes_read;
 
 		if (skb_queue_empty(&sk->sk_receive_queue) &&
-		    __mptcp_move_skbs(msk))
+		    __mptcp_move_skbs(msk)) {
+			mptcp_rcv_space_adjust(msk);
 			continue;
+		}
 
 		/* only the master socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()