Message ID | 20191105170531.30407-4-fw@strlen.de |
---|---|
State | Accepted, archived |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [1/4] protocol.h: mptcp_subflow_get_map_offset arg can be const | expand |
On Tue, 2019-11-05 at 18:05 +0100, Florian Westphal wrote: > Avoids a bit of copy&paste code. > squashto: mptcp: recvmsg() can drain data from multiple subflows > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > net/mptcp/protocol.c | 14 ++------------ > 1 file changed, 2 insertions(+), 12 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 3af5f121af9e..8cc9cc5675c2 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -443,22 +443,12 @@ static void mptcp_wait_data(struct sock *sk, long *timeo) > { > DEFINE_WAIT_FUNC(wait, woken_wake_function); > struct mptcp_sock *msk = mptcp_sk(sk); > - int data_ready; > > add_wait_queue(sk_sleep(sk), &wait); > sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); > > - release_sock(sk); > - > - smp_mb__before_atomic(); > - data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags); > - smp_mb__after_atomic(); > - > - if (!data_ready) > - *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); > - > - sched_annotate_sleep(); > - lock_sock(sk); > + sk_wait_event(sk, timeo, > + test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait); > > sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); > remove_wait_queue(sk_sleep(sk), &wait); IIRC, sk_wait_event() was intentionally avoided to allow placing the smp_mb__{before,after}_atomic() pair around test_and_clear_bit(), which I thought was required. Reading again the relevant documentation: https://elixir.bootlin.com/linux/latest/source/Documentation/core-api/atomic_ops.rst#L485 it looks like test_and_clear_bit() provides already the required barriers (also grepping inside the kernel sources suggests that). So the above should be correct, I'm sorry for being cyclothymic ;) /P
On Tue, 5 Nov 2019, Paolo Abeni wrote: > On Tue, 2019-11-05 at 18:05 +0100, Florian Westphal wrote: >> Avoids a bit of copy&paste code. >> squashto: mptcp: recvmsg() can drain data from multiple subflows >> >> Signed-off-by: Florian Westphal <fw@strlen.de> >> --- >> net/mptcp/protocol.c | 14 ++------------ >> 1 file changed, 2 insertions(+), 12 deletions(-) >> >> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >> index 3af5f121af9e..8cc9cc5675c2 100644 >> --- a/net/mptcp/protocol.c >> +++ b/net/mptcp/protocol.c >> @@ -443,22 +443,12 @@ static void mptcp_wait_data(struct sock *sk, long *timeo) >> { >> DEFINE_WAIT_FUNC(wait, woken_wake_function); >> struct mptcp_sock *msk = mptcp_sk(sk); >> - int data_ready; >> >> add_wait_queue(sk_sleep(sk), &wait); >> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); >> >> - release_sock(sk); >> - >> - smp_mb__before_atomic(); >> - data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags); >> - smp_mb__after_atomic(); >> - >> - if (!data_ready) >> - *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); >> - >> - sched_annotate_sleep(); >> - lock_sock(sk); >> + sk_wait_event(sk, timeo, >> + test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait); >> >> sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); >> remove_wait_queue(sk_sleep(sk), &wait); > > IIRC, sk_wait_event() was intentionally avoided to allow placing the > smp_mb__{before,after}_atomic() pair around test_and_clear_bit(), which > I thought was required. > > Reading again the relevant documentation: > > https://elixir.bootlin.com/linux/latest/source/Documentation/core-api/atomic_ops.rst#L485 > > it looks like test_and_clear_bit() provides already the required > barriers (also grepping inside the kernel sources suggests that). > > So the above should be correct, I'm sorry for being cyclothymic ;) I agree that the barriers can be removed. sk_wait_event is a macro that evaluates its third arg twice (but we've discussed this before: https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J), but in this case the second check of the flag is likely to be an inexpensive READ_ONCE() on a hot cache line. So, the extra code was there a reason but I'm ok with merging this for maintainability purposes. -- Mat Martineau Intel
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: > sk_wait_event is a macro that evaluates its third arg twice (but we've > discussed this before: https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J), > but in this case the second check of the flag is likely to be an inexpensive > READ_ONCE() on a hot cache line. So, the extra code was there a reason but > I'm ok with merging this for maintainability purposes. We can also drop it. Its likely this needs to be rewritten again in the future. I'd like to get rid of __tcp_poll() and make mptcp_poll standalone. For that to work reliably its needed that the mptcp sk has a reliable indicator that there is data available. Right now this isn't the case because the bit is cleared immediately on mptcp_recvmsg entry. Removing the clear_bit could get us in a state where the bit is still set, but the last recvmsg did drain the last byte. Checking the ssk read queue and clearing the bit on revmsg exit doesn't work either because we could have data available in another subflow, but walking the subflow list isn't exactly ideal.
Hi Florian, Mat, Paolo, On 06/11/2019 16:09, Florian Westphal wrote: > Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: >> sk_wait_event is a macro that evaluates its third arg twice (but we've >> discussed this before: https://lists.01.org/hyperkitty/list/mptcp@lists.01.org/thread/7HAWM43BLQVRBNIPV5II7C2WGTARAMDR/#7S54V7TAZUXZEUXVANQZ3OETQMPQGK6J), >> but in this case the second check of the flag is likely to be an inexpensive >> READ_ONCE() on a hot cache line. So, the extra code was there a reason but >> I'm ok with merging this for maintainability purposes. > > We can also drop it. Its likely this needs to be rewritten again in > the future. > > I'd like to get rid of __tcp_poll() and make mptcp_poll standalone. > > For that to work reliably its needed that the mptcp sk has a reliable > indicator that there is data available. > > Right now this isn't the case because the bit is cleared immediately > on mptcp_recvmsg entry. > > Removing the clear_bit could get us in a state where the bit is still > set, but the last recvmsg did drain the last byte. > > Checking the ssk read queue and clearing the bit on revmsg exit doesn't > work either because we could have data available in another subflow, > but walking the subflow list isn't exactly ideal. Thank you for the patches and the reviews. Is it then OK if I apply this series as it is? Cheers, Matt
Matthieu Baerts <matthieu.baerts@tessares.net> wrote: > Thank you for the patches and the reviews. > Is it then OK if I apply this series as it is? I think so.
On Wed, 2019-11-06 at 16:58 +0100, Matthieu Baerts wrote: > Thank you for the patches and the reviews. > Is it then OK if I apply this series as it is? yes, I agree with that. Cheers, Paolo
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3af5f121af9e..8cc9cc5675c2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -443,22 +443,12 @@ static void mptcp_wait_data(struct sock *sk, long *timeo) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct mptcp_sock *msk = mptcp_sk(sk); - int data_ready; add_wait_queue(sk_sleep(sk), &wait); sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - release_sock(sk); - - smp_mb__before_atomic(); - data_ready = test_and_clear_bit(MPTCP_DATA_READY, &msk->flags); - smp_mb__after_atomic(); - - if (!data_ready) - *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); - - sched_annotate_sleep(); - lock_sock(sk); + sk_wait_event(sk, timeo, + test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait); sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); remove_wait_queue(sk_sleep(sk), &wait);
Avoids a bit of copy&paste code. squashto: mptcp: recvmsg() can drain data from multiple subflows Signed-off-by: Florian Westphal <fw@strlen.de> --- net/mptcp/protocol.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)