diff mbox series

[RFC,06/14] sendmsg: block until mptcp sk is writeable

Message ID 20191114173225.21199-7-fw@strlen.de
State Superseded, archived
Headers show
Series [RFC] mptcp: wmem accounting and nonblocking io support | expand

Commit Message

Florian Westphal Nov. 14, 2019, 5:32 p.m. UTC
This disables transmit of new data until the peer has acked
enough mptcp data to get below the wspace write threshold (more than
half of wspace upperlimit is available again).

Also have poll not report EPOLLOUT in this case, its not relevant if a
subflow is writeable.

The latter is a temporary workaround that is needed because mptcp_poll
walks the subflows and calls __tcp_poll on each of them.
Because subflow ssk is usually writable, we will have to undo-that
if the mptcp sndbuf is exhausted.  This won't be needed anymore once
__tcp_poll is removed, I am working on this.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Paolo Abeni Nov. 18, 2019, 11:29 a.m. UTC | #1
On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> This disables transmit of new data until the peer has acked
> enough mptcp data to get below the wspace write threshold (more than
> half of wspace upperlimit is available again).
> 
> Also have poll not report EPOLLOUT in this case, its not relevant if a
> subflow is writeable.
> 
> The latter is a temporary workaround that is needed because mptcp_poll
> walks the subflows and calls __tcp_poll on each of them.
> Because subflow ssk is usually writable, we will have to undo-that
> if the mptcp sndbuf is exhausted.  This won't be needed anymore once
> __tcp_poll is removed, I am working on this.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/mptcp/protocol.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2144e80b8704..83be407e1dd6 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -406,6 +406,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  		return ret;
>  	}
>  
> +	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> +
> +	mptcp_clean_una(sk);
> +
> +	while (!sk_stream_memory_free(sk)) {
> +		ret = sk_stream_wait_memory(sk, &timeo);
> +		if (ret)
> +			goto out;
> +
> +		mptcp_clean_una(sk);
> +	}
> +

Can we move the above loop to the non fallback case only ? e.g. after
the below !mptcp_subflow_get(msk) checks?

If so, we could have a single loop checking for:

!sk_stream_memory_free(sk) || !mptcp_subflow_get_send()

(together with the next patch)

Cheers,

Paolo
Paolo Abeni Nov. 18, 2019, 11:33 a.m. UTC | #2
On Mon, 2019-11-18 at 12:29 +0100, Paolo Abeni wrote:
> Can we move the above loop to the non fallback case only ? e.g. after
> the below !mptcp_subflow_get(msk) checks?
> 
> If so, we could have a single loop checking for:
> 
> !sk_stream_memory_free(sk) || !mptcp_subflow_get_send()
> 
> (together with the next patch)

Dumb me! I meant "with patch 9/14" - where a similar loop is added.

/P
Florian Westphal Nov. 18, 2019, 12:11 p.m. UTC | #3
Paolo Abeni <pabeni@redhat.com> wrote:
> On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> > This disables transmit of new data until the peer has acked
> > enough mptcp data to get below the wspace write threshold (more than
> > half of wspace upperlimit is available again).
> > 
> > Also have poll not report EPOLLOUT in this case, its not relevant if a
> > subflow is writeable.
> > 
> > The latter is a temporary workaround that is needed because mptcp_poll
> > walks the subflows and calls __tcp_poll on each of them.
> > Because subflow ssk is usually writable, we will have to undo-that
> > if the mptcp sndbuf is exhausted.  This won't be needed anymore once
> > __tcp_poll is removed, I am working on this.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  net/mptcp/protocol.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 2144e80b8704..83be407e1dd6 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -406,6 +406,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  		return ret;
> >  	}
> >  
> > +	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> > +
> > +	mptcp_clean_una(sk);
> > +
> > +	while (!sk_stream_memory_free(sk)) {
> > +		ret = sk_stream_wait_memory(sk, &timeo);
> > +		if (ret)
> > +			goto out;
> > +
> > +		mptcp_clean_una(sk);
> > +	}
> > +
> 
> Can we move the above loop to the non fallback case only ? e.g. after
> the below !mptcp_subflow_get(msk) checks?
> 
> If so, we could have a single loop checking for:
> 
> !sk_stream_memory_free(sk) || !mptcp_subflow_get_send()
> 
> (together with the next patch)

It would be easy to do if I remove

       if (!msg_data_left(msg)) {
	       pr_debug("empty send");
	       ret = sock_sendmsg(ssk->sk_socket, msg);

any idea why this is there in the first place?
Paolo Abeni Nov. 18, 2019, 12:19 p.m. UTC | #4
On Mon, 2019-11-18 at 13:11 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
> > > This disables transmit of new data until the peer has acked
> > > enough mptcp data to get below the wspace write threshold (more than
> > > half of wspace upperlimit is available again).
> > > 
> > > Also have poll not report EPOLLOUT in this case, its not relevant if a
> > > subflow is writeable.
> > > 
> > > The latter is a temporary workaround that is needed because mptcp_poll
> > > walks the subflows and calls __tcp_poll on each of them.
> > > Because subflow ssk is usually writable, we will have to undo-that
> > > if the mptcp sndbuf is exhausted.  This won't be needed anymore once
> > > __tcp_poll is removed, I am working on this.
> > > 
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > ---
> > >  net/mptcp/protocol.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 2144e80b8704..83be407e1dd6 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -406,6 +406,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> > >  		return ret;
> > >  	}
> > >  
> > > +	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> > > +
> > > +	mptcp_clean_una(sk);
> > > +
> > > +	while (!sk_stream_memory_free(sk)) {
> > > +		ret = sk_stream_wait_memory(sk, &timeo);
> > > +		if (ret)
> > > +			goto out;
> > > +
> > > +		mptcp_clean_una(sk);
> > > +	}
> > > +
> > 
> > Can we move the above loop to the non fallback case only ? e.g. after
> > the below !mptcp_subflow_get(msk) checks?
> > 
> > If so, we could have a single loop checking for:
> > 
> > !sk_stream_memory_free(sk) || !mptcp_subflow_get_send()
> > 
> > (together with the next patch)
> 
> It would be easy to do if I remove
> 
>        if (!msg_data_left(msg)) {
> 	       pr_debug("empty send");
> 	       ret = sock_sendmsg(ssk->sk_socket, msg);
> 
> any idea why this is there in the first place?

Uhmm... that looks like a left-over from the initial implementation ?!?
possibly trying to deal with fastopen?!? 

I think it can be dropped. @Peter / @Mat do you know better?

/P
Mat Martineau Nov. 18, 2019, 6:39 p.m. UTC | #5
On Mon, 18 Nov 2019, Paolo Abeni wrote:

> On Mon, 2019-11-18 at 13:11 +0100, Florian Westphal wrote:
>> Paolo Abeni <pabeni@redhat.com> wrote:
>>> On Thu, 2019-11-14 at 18:32 +0100, Florian Westphal wrote:
>>>> This disables transmit of new data until the peer has acked
>>>> enough mptcp data to get below the wspace write threshold (more than
>>>> half of wspace upperlimit is available again).
>>>>
>>>> Also have poll not report EPOLLOUT in this case, its not relevant if a
>>>> subflow is writeable.
>>>>
>>>> The latter is a temporary workaround that is needed because mptcp_poll
>>>> walks the subflows and calls __tcp_poll on each of them.
>>>> Because subflow ssk is usually writable, we will have to undo-that
>>>> if the mptcp sndbuf is exhausted.  This won't be needed anymore once
>>>> __tcp_poll is removed, I am working on this.
>>>>
>>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>>> ---
>>>>  net/mptcp/protocol.c | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>> index 2144e80b8704..83be407e1dd6 100644
>>>> --- a/net/mptcp/protocol.c
>>>> +++ b/net/mptcp/protocol.c
>>>> @@ -406,6 +406,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>>>  		return ret;
>>>>  	}
>>>>
>>>> +	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
>>>> +
>>>> +	mptcp_clean_una(sk);
>>>> +
>>>> +	while (!sk_stream_memory_free(sk)) {
>>>> +		ret = sk_stream_wait_memory(sk, &timeo);
>>>> +		if (ret)
>>>> +			goto out;
>>>> +
>>>> +		mptcp_clean_una(sk);
>>>> +	}
>>>> +
>>>
>>> Can we move the above loop to the non fallback case only ? e.g. after
>>> the below !mptcp_subflow_get(msk) checks?
>>>
>>> If so, we could have a single loop checking for:
>>>
>>> !sk_stream_memory_free(sk) || !mptcp_subflow_get_send()
>>>
>>> (together with the next patch)
>>
>> It would be easy to do if I remove
>>
>>        if (!msg_data_left(msg)) {
>> 	       pr_debug("empty send");
>> 	       ret = sock_sendmsg(ssk->sk_socket, msg);
>>
>> any idea why this is there in the first place?
>
> Uhmm... that looks like a left-over from the initial implementation ?!?
> possibly trying to deal with fastopen?!?
>
> I think it can be dropped. @Peter / @Mat do you know better?
>

Paolo's right, that's a leftover. You're welcome to drop it.

--
Mat Martineau
Intel
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2144e80b8704..83be407e1dd6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -406,6 +406,18 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		return ret;
 	}
 
+	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
+
+	mptcp_clean_una(sk);
+
+	while (!sk_stream_memory_free(sk)) {
+		ret = sk_stream_wait_memory(sk, &timeo);
+		if (ret)
+			goto out;
+
+		mptcp_clean_una(sk);
+	}
+
 	ssk = mptcp_subflow_get(msk);
 	if (!ssk) {
 		release_sock(sk);
@@ -421,8 +433,6 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	pr_debug("conn_list->subflow=%p", ssk);
 
 	lock_sock(ssk);
-	mptcp_clean_una(sk);
-	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
 	while (msg_data_left(msg)) {
 		ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,
 					 &size_goal);
@@ -1312,6 +1322,10 @@  static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 		tcp_sock = mptcp_subflow_tcp_socket(subflow);
 		ret |= __tcp_poll(tcp_sock->sk);
 	}
+
+	if (!sk_stream_is_writeable(sk))
+		ret &= ~(EPOLLOUT|EPOLLWRNORM);
+
 	release_sock(sk);
 
 	return ret;