diff mbox series

[3/4] protocol: use sk_wait_event helper

Message ID 20191105170531.30407-4-fw@strlen.de
State Accepted, archived
Delegated to: Matthieu Baerts
Headers show
Series [1/4] protocol.h: mptcp_subflow_get_map_offset arg can be const | expand

Commit Message

Florian Westphal Nov. 5, 2019, 5:05 p.m. UTC
Avoids a bit of copy&paste code.
squashto:  mptcp: recvmsg() can drain data from multiple subflows

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Comments

Paolo Abeni Nov. 5, 2019, 5:43 p.m. UTC | #1
On Tue, 2019-11-05 at 18:05 +0100, Florian Westphal wrote:
> Avoids a bit of copy&paste code.
> squashto:  mptcp: recvmsg() can drain data from multiple subflows
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/mptcp/protocol.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 3af5f121af9e..8cc9cc5675c2 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -443,22 +443,12 @@ static void mptcp_wait_data(struct sock *sk, long *timeo)
>  {
>  	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int data_ready;
>  
>  	add_wait_queue(sk_sleep(sk), &wait);
>  	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
>  
> -	release_sock(sk);
> -
> -	smp_mb__before_atomic();
> -	data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags);
> -	smp_mb__after_atomic();
> -
> -	if (!data_ready)
> -		*timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo);
> -
> -	sched_annotate_sleep();
> -	lock_sock(sk);
> +	sk_wait_event(sk, timeo,
> +		      test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait);
>  
>  	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
>  	remove_wait_queue(sk_sleep(sk), &wait);

IIRC, sk_wait_event() was intentionally avoided to allow placing the
smp_mb__{before,after}_atomic() pair around test_and_clear_bit(), which
I thought was required.

Reading again the relevant documentation:

https://elixir.bootlin.com/linux/latest/source/Documentation/core-api/atomic_ops.rst#L485

it looks like test_and_clear_bit() provides already the required
barriers (also grepping inside the kernel sources suggests that).

So the above should be correct, I'm sorry for being cyclothymic ;)

/P
Mat Martineau Nov. 5, 2019, 7:53 p.m. UTC | #2
On Tue, 5 Nov 2019, Paolo Abeni wrote:

> On Tue, 2019-11-05 at 18:05 +0100, Florian Westphal wrote:
>> Avoids a bit of copy&paste code.
>> squashto:  mptcp: recvmsg() can drain data from multiple subflows
>>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> ---
>>  net/mptcp/protocol.c | 14 ++------------
>>  1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 3af5f121af9e..8cc9cc5675c2 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -443,22 +443,12 @@ static void mptcp_wait_data(struct sock *sk, long *timeo)
>>  {
>>  	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>> -	int data_ready;
>>
>>  	add_wait_queue(sk_sleep(sk), &wait);
>>  	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
>>
>> -	release_sock(sk);
>> -
>> -	smp_mb__before_atomic();
>> -	data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags);
>> -	smp_mb__after_atomic();
>> -
>> -	if (!data_ready)
>> -		*timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo);
>> -
>> -	sched_annotate_sleep();
>> -	lock_sock(sk);
>> +	sk_wait_event(sk, timeo,
>> +		      test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait);
>>
>>  	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
>>  	remove_wait_queue(sk_sleep(sk), &wait);
>
> IIRC, sk_wait_event() was intentionally avoided to allow placing the
> smp_mb__{before,after}_atomic() pair around test_and_clear_bit(), which
> I thought was required.
>
> Reading again the relevant documentation:
>
> https://elixir.bootlin.com/linux/latest/source/Documentation/core-api/atomic_ops.rst#L485
>
> it looks like test_and_clear_bit() provides already the required
> barriers (also grepping inside the kernel sources suggests that).
>
> So the above should be correct, I'm sorry for being cyclothymic ;)

I agree that the barriers can be removed.

sk_wait_event is a macro that evaluates its third arg twice (but we've 
discussed this before: 
https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J), 
but in this case the second check of the flag is likely to be an 
inexpensive READ_ONCE() on a hot cache line. So, the extra code was there 
a reason but I'm ok with merging this for maintainability purposes.


--
Mat Martineau
Intel
Florian Westphal Nov. 6, 2019, 3:09 p.m. UTC | #3
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> sk_wait_event is a macro that evaluates its third arg twice (but we've
> discussed this before: https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J),
> but in this case the second check of the flag is likely to be an inexpensive
> READ_ONCE() on a hot cache line. So, the extra code was there a reason but
> I'm ok with merging this for maintainability purposes.

We can also drop it.  Its likely this needs to be rewritten again in
the future.

I'd like to get rid of __tcp_poll() and make mptcp_poll standalone.

For that to work reliably its needed that the mptcp sk has a reliable
indicator that there is data available.

Right now this isn't the case because the bit is cleared immediately
on mptcp_recvmsg entry.

Removing the clear_bit could get us in a state where the bit is still
set, but the last recvmsg did drain the last byte.

Checking the ssk read queue and clearing the bit on revmsg exit doesn't
work either because we could have data available in another subflow,
but walking the subflow list isn't exactly ideal.
Matthieu Baerts Nov. 6, 2019, 3:58 p.m. UTC | #4
Hi Florian, Mat, Paolo,

On 06/11/2019 16:09, Florian Westphal wrote:
> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>> sk_wait_event is a macro that evaluates its third arg twice (but we've
>> discussed this before: https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J),
>> but in this case the second check of the flag is likely to be an inexpensive
>> READ_ONCE() on a hot cache line. So, the extra code was there a reason but
>> I'm ok with merging this for maintainability purposes.
> 
> We can also drop it.  Its likely this needs to be rewritten again in
> the future.
> 
> I'd like to get rid of __tcp_poll() and make mptcp_poll standalone.
> 
> For that to work reliably its needed that the mptcp sk has a reliable
> indicator that there is data available.
> 
> Right now this isn't the case because the bit is cleared immediately
> on mptcp_recvmsg entry.
> 
> Removing the clear_bit could get us in a state where the bit is still
> set, but the last recvmsg did drain the last byte.
> 
> Checking the ssk read queue and clearing the bit on revmsg exit doesn't
> work either because we could have data available in another subflow,
> but walking the subflow list isn't exactly ideal.

Thank you for the patches and the reviews.
Is it then OK if I apply this series as it is?

Cheers,
Matt
Florian Westphal Nov. 6, 2019, 11:14 p.m. UTC | #5
Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
> Thank you for the patches and the reviews.
> Is it then OK if I apply this series as it is?

I think so.
Paolo Abeni Nov. 7, 2019, 2:21 p.m. UTC | #6
On Wed, 2019-11-06 at 16:58 +0100, Matthieu Baerts wrote:
> Thank you for the patches and the reviews.
> Is it then OK if I apply this series as it is?

yes, I agree with that.

Cheers,

Paolo
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3af5f121af9e..8cc9cc5675c2 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -443,22 +443,12 @@  static void mptcp_wait_data(struct sock *sk, long *timeo)
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	int data_ready;
 
 	add_wait_queue(sk_sleep(sk), &wait);
 	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 
-	release_sock(sk);
-
-	smp_mb__before_atomic();
-	data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags);
-	smp_mb__after_atomic();
-
-	if (!data_ready)
-		*timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo);
-
-	sched_annotate_sleep();
-	lock_sock(sk);
+	sk_wait_event(sk, timeo,
+		      test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait);
 
 	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 	remove_wait_queue(sk_sleep(sk), &wait);