diff mbox series

[1/3] mptcp: schedule worker when subflow is closed

Message ID 20210202174754.6861-2-fw@strlen.de
State Accepted, archived
Commit 95fc52697bf7b16a35103c52ecd72883dcc81f24
Delegated to: Matthieu Baerts
Headers show
Series mptcp: netlink event fixes | expand

Commit Message

Florian Westphal Feb. 2, 2021, 5:47 p.m. UTC
When remote side closes a subflow we should schedule the worker to
dispose of the subflow in a timely manner.

Otherwise, SF_CLOSED event won't be generated until the mptcp
socket itself is closing or local side is closing another subflow.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/subflow.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Paolo Abeni Feb. 8, 2021, 11:33 p.m. UTC | #1
Hello,

On Tue, 2021-02-02 at 18:47 +0100, Florian Westphal wrote:
> When remote side closes a subflow we should schedule the worker to
> dispose of the subflow in a timely manner.
> 
> Otherwise, SF_CLOSED event won't be generated until the mptcp
> socket itself is closing or local side is closing another subflow.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/mptcp/subflow.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index b8b670cb6116..57f6361eb07c 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1413,6 +1413,13 @@ static void subflow_state_change(struct sock *sk)
>  	if (mptcp_subflow_data_available(sk))
>  		mptcp_data_ready(parent, sk);
>  
> +	if (sk->sk_state == TCP_CLOSE &&
> +	    !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &mptcp_sk(parent)->flags)) {
> +		sock_hold(parent);
> +		if (!schedule_work(&mptcp_sk(parent)->work))
> +			sock_put(parent);
> +	}
> +

Looks like this is causing issue/154. Reverting this patch the issues
goes away here. @Matttbe: could you please double check that?

In case of fallback, the subflow can be closed by the peer while there
are data pending in the subflow receive queue. Calling
mptcp_close_subflow() will discard such data and will corrupt the
stream.

I think that scenario is possible even with full msk, if the peer
closes a single subflow while the msk is still alive (before DATA_FIN).

A possible fix could be additionally checking for empty ssk-
>sk_recevive_queue, plush adding a similar chunk
in __mptcp_move_skbs_from_subflow(). 

That means that subflow_close even could be delayed by user-space, if
the latter does not spool the pending ingress data.

Or the NL event generation cold be de-coupled from the actual subflow
close() but I fear some side effect even there.

Note that calling __mptcp_move_skbs_from_subflow()
from mptcp_close_subflow() will not solve the issue, as the msk receive
queue could be full - I think;)

Side note: it would be nice using the deferred actions to do the
netlink call, to avoid the workqueue usage. AFAICS this is one of the
few workqueue usage that can be triggered on a server by any client
regardless of the server configuration (the others being
DATA_FIN/DATA_FIN-ack processing).

Cheers,

Paolo
Florian Westphal Feb. 8, 2021, 11:47 p.m. UTC | #2
Paolo Abeni <pabeni@redhat.com> wrote:
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index b8b670cb6116..57f6361eb07c 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -1413,6 +1413,13 @@ static void subflow_state_change(struct sock *sk)
> >  	if (mptcp_subflow_data_available(sk))
> >  		mptcp_data_ready(parent, sk);
> >  
> > +	if (sk->sk_state == TCP_CLOSE &&
> > +	    !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &mptcp_sk(parent)->flags)) {
> > +		sock_hold(parent);
> > +		if (!schedule_work(&mptcp_sk(parent)->work))
> > +			sock_put(parent);
> > +	}
> > +
> 
> Looks like this is causing issue/154. Reverting this patch the issues
> goes away here. @Matttbe: could you please double check that?

No need, I am sure you are correct.

> In case of fallback, the subflow can be closed by the peer while there
> are data pending in the subflow receive queue. Calling
> mptcp_close_subflow() will discard such data and will corrupt the
> stream.

This.

> I think that scenario is possible even with full msk, if the peer
> closes a single subflow while the msk is still alive (before DATA_FIN).

You mean because there might be unprocessed data in subflow rx
queue?

> A possible fix could be additionally checking for empty ssk-
> >sk_recevive_queue, plush adding a similar chunk
> in __mptcp_move_skbs_from_subflow(). 

Yes, not nice, I don't see another solution atm though.

Will look into this tomorrow.

> That means that subflow_close even could be delayed by user-space, if
> the latter does not spool the pending ingress data.

We can decouple the event from actual close.

> Or the NL event generation cold be de-coupled from the actual subflow
> close() but I fear some side effect even there.

Hmm, why?

> Note that calling __mptcp_move_skbs_from_subflow()
> from mptcp_close_subflow() will not solve the issue, as the msk receive
> queue could be full - I think;)

Yes, it could be full.
thats possible.

> Side note: it would be nice using the deferred actions to do the
> netlink call, to avoid the workqueue usage.

Why?  How many events/s do you expect?

Events try to prefer GFP_KERNEL allocations where possible.
Paolo Abeni Feb. 9, 2021, 9:49 a.m. UTC | #3
On Tue, 2021-02-09 at 00:47 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > In case of fallback, the subflow can be closed by the peer while there
> > are data pending in the subflow receive queue. Calling
> > mptcp_close_subflow() will discard such data and will corrupt the
> > stream.
> 
> This.
> 
> > I think that scenario is possible even with full msk, if the peer
> > closes a single subflow while the msk is still alive (before DATA_FIN).
> 
> You mean because there might be unprocessed data in subflow rx
> queue?

Yes. With the fallback scenario it happens quite easily. I think
because the subflow finds the msk receive queue exceeding msk-
>sk_rcvbuf, but I really did not investigate that much.

> > Or the NL event generation cold be de-coupled from the actual subflow
> > close() but I fear some side effect even there.
> 
> Hmm, why?

if the actual close is also in charge of subflow accounting (and
subflow limits enforcing), likely there will be no issue. If subflow
accounting is done at NL event generation time, a malicius peer could
open/close a subflow multiple times and "leeking" (not really leaking,
but still cause allocation with no free) a lot of subflow on this side.

> > Side note: it would be nice using the deferred actions to do the
> > netlink call, to avoid the workqueue usage.
> 
> Why?  How many events/s do you expect?

At least one per msk connection, I think. On a busy server with an high
ingress connection rate is likely significant. Yes, DATA_FIN (ack) will
weight twice (or 4x) more, so I think we should use delegated action
even for that ;) 

> Events try to prefer GFP_KERNEL allocations where possible.

yep, that is relevant. Delegated actions are handled at msk release_cb
time - depending on the lock status when they are fires - even in non
atomic/process context. We could use GFP_KERNEL when triggering the
event from release_cb and ATOMIC elsewhere. Would that inconsistency be
acceptable?

I have some related patches "cleaning-up" mptcp_release_cb that I'll
hopefully share soon.

Or perhaps we could check for genl_has_listeners() before scheduling
the workqueue?

/P
Florian Westphal Feb. 9, 2021, 10:16 a.m. UTC | #4
Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2021-02-09 at 00:47 +0100, Florian Westphal wrote:
> if the actual close is also in charge of subflow accounting (and
> subflow limits enforcing), likely there will be no issue. If subflow
> accounting is done at NL event generation time, a malicius peer could
> open/close a subflow multiple times and "leeking" (not really leaking,
> but still cause allocation with no free) a lot of subflow on this side.

No, event generation should do just that, emit the mcast event, no
accounting or similar.

> > > Side note: it would be nice using the deferred actions to do the
> > > netlink call, to avoid the workqueue usage.
> > 
> > Why?  How many events/s do you expect?
> 
> At least one per msk connection, I think. On a busy server with an high
> ingress connection rate is likely significant. Yes, DATA_FIN (ack) will
> weight twice (or 4x) more, so I think we should use delegated action
> even for that ;) 

Compared to packet processing they are likely not relevant, but see
below.

> > Events try to prefer GFP_KERNEL allocations where possible.
> 
> yep, that is relevant. Delegated actions are handled at msk release_cb
> time - depending on the lock status when they are fires - even in non
> atomic/process context. We could use GFP_KERNEL when triggering the
> event from release_cb and ATOMIC elsewhere. Would that inconsistency be
> acceptable?

The function already provides a gfp_t arg, there are spots where event
happens from atomic context.

> I have some related patches "cleaning-up" mptcp_release_cb that I'll
> hopefully share soon.

Ok, will have a l ook.

> Or perhaps we could check for genl_has_listeners() before scheduling
> the workqueue?

I think thats a good idea, however, the idea was not just to kick the
event but also do the actual socket close, so if we don't sched the wq
the ssk will stay around until msk close time, no?
Paolo Abeni Feb. 9, 2021, 12:01 p.m. UTC | #5
On Tue, 2021-02-09 at 11:16 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > At least one per msk connection, I think. On a busy server with an high
> > ingress connection rate is likely significant. Yes, DATA_FIN (ack) will
> > weight twice (or 4x) more, so I think we should use delegated action
> > even for that ;) 
> 
> Compared to packet processing they are likely not relevant, 

I'm not sure - for short lived connection. Looks like time has come to
collect some connection/rate performance figure ;)

> > I have some related patches "cleaning-up" mptcp_release_cb that I'll
> > hopefully share soon.
> 
> Ok, will have a l ook.

Just shared ;)
> 
> > Or perhaps we could check for genl_has_listeners() before scheduling
> > the workqueue?
> 
> I think thats a good idea, however, the idea was not just to kick the
> event but also do the actual socket close, so if we don't sched the wq
> the ssk will stay around until msk close time, no?

We can use the delegated actions to close the subflow (or do any other
task requiring the msk socket lock), with the folloing pattern:

// under ssk lock

mptcp_data_lock();
if (!sock_owned_by_user(msk))
	if (<target ssk != us>)
		mptcp_subflow_delegate(target_subflow, action id); 
		// mptcp_subflow_delegate to be extended to schedule
		// different actions [1]
	else
		<do the intended task> // [2]
else
	<schedule the task for release_cb time> // [3]

[1] and [2] must act atomically, while [3] could sleep, using the loop
introduced by "mptcp: fix race in release_cb". 

For CLOSE_SUBFLOW the target ssk will always be the current one, so
there is actually no need for delegated actions, the release cb will
suffice.

/P
diff mbox series

Patch

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index b8b670cb6116..57f6361eb07c 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1413,6 +1413,13 @@  static void subflow_state_change(struct sock *sk)
 	if (mptcp_subflow_data_available(sk))
 		mptcp_data_ready(parent, sk);
 
+	if (sk->sk_state == TCP_CLOSE &&
+	    !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &mptcp_sk(parent)->flags)) {
+		sock_hold(parent);
+		if (!schedule_work(&mptcp_sk(parent)->work))
+			sock_put(parent);
+	}
+
 	if (__mptcp_check_fallback(mptcp_sk(parent)) &&
 	    !subflow->rx_eof && subflow_is_done(sk)) {
 		subflow->rx_eof = 1;