diff mbox series

[mptcp-net] mptcp: Wake up MPTCP worker when DATA_FIN found on a TCP FIN packet

Message ID 20200904003531.291283-1-mathew.j.martineau@linux.intel.com
State Accepted, archived
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-net] mptcp: Wake up MPTCP worker when DATA_FIN found on a TCP FIN packet | expand

Commit Message

Mat Martineau Sept. 4, 2020, 12:35 a.m. UTC
When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN
information would be stored but the MPTCP worker did not get
scheduled. In turn, the MPTCP socket state would remain in
TCP_ESTABLISHED and no blocked operations would be awakened.

TCP FIN packets are seen by the MPTCP socket when moving skbs out of the
subflow receive queues, so schedule the MPTCP worker when a skb with
DATA_FIN but no data payload is moved from a subflow queue. Other cases
(DATA_FIN on a bare TCP ACK or on a packet with data payload) are
already handled.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---

This patch is targeted at the -net tree, but also applies to net-next
without conflict. If it looks ok, we can carry it in the export branch
until it gets merged to both -net and net-next.

net/mptcp/subflow.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


base-commit: 556699341efa98243e08e34401b3f601da91f5a3

Comments

Paolo Abeni Sept. 7, 2020, 9:36 a.m. UTC | #1
On Thu, 2020-09-03 at 17:35 -0700, Mat Martineau wrote:
> When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN
> information would be stored but the MPTCP worker did not get
> scheduled. In turn, the MPTCP socket state would remain in
> TCP_ESTABLISHED and no blocked operations would be awakened.
> 
> TCP FIN packets are seen by the MPTCP socket when moving skbs out of the
> subflow receive queues, so schedule the MPTCP worker when a skb with
> DATA_FIN but no data payload is moved from a subflow queue. Other cases
> (DATA_FIN on a bare TCP ACK or on a packet with data payload) are
> already handled.
> 
> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> ---
> 
> This patch is targeted at the -net tree, but also applies to net-next
> without conflict. If it looks ok, we can carry it in the export branch
> until it gets merged to both -net and net-next.
> 
> net/mptcp/subflow.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index e8cac2655c82..3297233dece7 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -731,7 +731,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
>  
>  	if (mpext->data_fin == 1) {
>  		if (data_len == 1) {
> -			mptcp_update_rcv_data_fin(msk, mpext->data_seq);
> +			bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq);
>  			pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq);
>  			if (subflow->map_valid) {
>  				/* A DATA_FIN might arrive in a DSS
> @@ -742,6 +742,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
>  				skb_ext_del(skb, SKB_EXT_MPTCP);
>  				return MAPPING_OK;
>  			} else {
> +				if (updated && schedule_work(&msk->work))
> +					sock_hold((struct sock *)msk);
> +

We already have 'schedule_work' call to handle incoming data fin
in mptcp_incoming_options(), can we instead extend the checks there to
cover also this scenario?

Cheers,

Paolo
Mat Martineau Sept. 8, 2020, 8:33 p.m. UTC | #2
On Mon, 7 Sep 2020, Paolo Abeni wrote:

> On Thu, 2020-09-03 at 17:35 -0700, Mat Martineau wrote:
>> When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN
>> information would be stored but the MPTCP worker did not get
>> scheduled. In turn, the MPTCP socket state would remain in
>> TCP_ESTABLISHED and no blocked operations would be awakened.
>>
>> TCP FIN packets are seen by the MPTCP socket when moving skbs out of the
>> subflow receive queues, so schedule the MPTCP worker when a skb with
>> DATA_FIN but no data payload is moved from a subflow queue. Other cases
>> (DATA_FIN on a bare TCP ACK or on a packet with data payload) are
>> already handled.
>>
>> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>> ---
>>
>> This patch is targeted at the -net tree, but also applies to net-next
>> without conflict. If it looks ok, we can carry it in the export branch
>> until it gets merged to both -net and net-next.
>>
>> net/mptcp/subflow.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>> index e8cac2655c82..3297233dece7 100644
>> --- a/net/mptcp/subflow.c
>> +++ b/net/mptcp/subflow.c
>> @@ -731,7 +731,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
>>
>>  	if (mpext->data_fin == 1) {
>>  		if (data_len == 1) {
>> -			mptcp_update_rcv_data_fin(msk, mpext->data_seq);
>> +			bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq);
>>  			pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq);
>>  			if (subflow->map_valid) {
>>  				/* A DATA_FIN might arrive in a DSS
>> @@ -742,6 +742,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
>>  				skb_ext_del(skb, SKB_EXT_MPTCP);
>>  				return MAPPING_OK;
>>  			} else {
>> +				if (updated && schedule_work(&msk->work))
>> +					sock_hold((struct sock *)msk);
>> +
>
> We already have 'schedule_work' call to handle incoming data fin
> in mptcp_incoming_options(), can we instead extend the checks there to
> cover also this scenario?
>

I did look at modifying mptcp_incoming_options(), but this patch looked 
like the simpler fix.

The existing checks for empty packets in mptcp_incoming_options() 
accomplish two things:

1. DATA_FIN is handled for ACK packets that don't get in to the subflow 
receive queue (TCP FIN packets do go in to the receive queue, even if 
there's no payload).

2. skb_ext attachment is skipped for ACK packets.


Changing the logic in mptcp_incoming_options() would have to look for 
seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - 
which significantly changes the code paths involved. It might also do an 
extra wake up of the worker before the MPTCP socket is caught up with the 
incoming data.

In comparison, the above patch stores a return value that was previously 
(incorrectly) ignored, and schedules the worker when it is most likely to 
be able to do the work.

Given that, do you think it's better to extend the 
mptcp_incoming_options() checks?

--
Mat Martineau
Intel
Paolo Abeni Sept. 9, 2020, 4:45 p.m. UTC | #3
On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote:
> On Mon, 7 Sep 2020, Paolo Abeni wrote:
> > We already have 'schedule_work' call to handle incoming data fin
> > in mptcp_incoming_options(), can we instead extend the checks there to
> > cover also this scenario?
> > 
> 
> I did look at modifying mptcp_incoming_options(), but this patch looked 
> like the simpler fix.
> 
> The existing checks for empty packets in mptcp_incoming_options() 
> accomplish two things:
> 
> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow 
> receive queue (TCP FIN packets do go in to the receive queue, even if 
> there's no payload).
> 
> 2. skb_ext attachment is skipped for ACK packets.
> 
> 
> Changing the logic in mptcp_incoming_options() would have to look for 
> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - 
> which significantly changes the code paths involved. It might also do an 
> extra wake up of the worker before the MPTCP socket is caught up with the 
> incoming data.
> 
> In comparison, the above patch stores a return value that was previously 
> (incorrectly) ignored, and schedules the worker when it is most likely to 
> be able to do the work.
> 
> Given that, do you think it's better to extend the 
> mptcp_incoming_options() checks?

Yup, I did not look into the details of the other option. It looks like
it's overcomplicated, so I'm ok with this approach.

Cheers,

Paolo
Matthieu Baerts Sept. 10, 2020, 4:43 p.m. UTC | #4
Hi Mat, Paolo,

On 09/09/2020 18:45, Paolo Abeni wrote:
> On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote:
>> On Mon, 7 Sep 2020, Paolo Abeni wrote:
>>> We already have 'schedule_work' call to handle incoming data fin
>>> in mptcp_incoming_options(), can we instead extend the checks there to
>>> cover also this scenario?
>>>
>>
>> I did look at modifying mptcp_incoming_options(), but this patch looked
>> like the simpler fix.
>>
>> The existing checks for empty packets in mptcp_incoming_options()
>> accomplish two things:
>>
>> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow
>> receive queue (TCP FIN packets do go in to the receive queue, even if
>> there's no payload).
>>
>> 2. skb_ext attachment is skipped for ACK packets.
>>
>>
>> Changing the logic in mptcp_incoming_options() would have to look for
>> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext -
>> which significantly changes the code paths involved. It might also do an
>> extra wake up of the worker before the MPTCP socket is caught up with the
>> incoming data.
>>
>> In comparison, the above patch stores a return value that was previously
>> (incorrectly) ignored, and schedules the worker when it is most likely to
>> be able to do the work.
>>
>> Given that, do you think it's better to extend the
>> mptcp_incoming_options() checks?
> 
> Yup, I did not look into the details of the other option. It looks like
> it's overcomplicated, so I'm ok with this approach.

Thank you for the patch and the review!

I just applied your patch with Paolo's ACK!

Compilation tests are OK, "export" branch has been updated but the CI is 
still validating functional tests in the VM (selftests, packetdrill, etc.).

I will try to report the status later on IRC but if you don't hear from 
me today, I am sure this patch can be sent to netdev!

Cheers,
Matt
Mat Martineau Sept. 11, 2020, 12:12 a.m. UTC | #5
On Thu, 10 Sep 2020, Matthieu Baerts wrote:

> Hi Mat, Paolo,
>
> On 09/09/2020 18:45, Paolo Abeni wrote:
>> On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote:
>>> On Mon, 7 Sep 2020, Paolo Abeni wrote:
>>>> We already have 'schedule_work' call to handle incoming data fin
>>>> in mptcp_incoming_options(), can we instead extend the checks there to
>>>> cover also this scenario?
>>>> 
>>> 
>>> I did look at modifying mptcp_incoming_options(), but this patch looked
>>> like the simpler fix.
>>> 
>>> The existing checks for empty packets in mptcp_incoming_options()
>>> accomplish two things:
>>> 
>>> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow
>>> receive queue (TCP FIN packets do go in to the receive queue, even if
>>> there's no payload).
>>> 
>>> 2. skb_ext attachment is skipped for ACK packets.
>>> 
>>> 
>>> Changing the logic in mptcp_incoming_options() would have to look for
>>> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext -
>>> which significantly changes the code paths involved. It might also do an
>>> extra wake up of the worker before the MPTCP socket is caught up with the
>>> incoming data.
>>> 
>>> In comparison, the above patch stores a return value that was previously
>>> (incorrectly) ignored, and schedules the worker when it is most likely to
>>> be able to do the work.
>>> 
>>> Given that, do you think it's better to extend the
>>> mptcp_incoming_options() checks?
>> 
>> Yup, I did not look into the details of the other option. It looks like
>> it's overcomplicated, so I'm ok with this approach.
>
> Thank you for the patch and the review!
>
> I just applied your patch with Paolo's ACK!
>
> Compilation tests are OK, "export" branch has been updated but the CI is 
> still validating functional tests in the VM (selftests, packetdrill, etc.).
>
> I will try to report the status later on IRC but if you don't hear from me 
> today, I am sure this patch can be sent to netdev!
>

I found an interop case with the multipath-tcp.org kernel where v5.9 is 
not acknowledging the peer's DATA_FIN, so the peer keeps resending the 
DATA_FIN even though the v5.9 socket is already closed.

Working on a patch to squash with this one.

--
Mat Martineau
Intel
Mat Martineau Sept. 18, 2020, 10:52 p.m. UTC | #6
On Thu, 10 Sep 2020, Mat Martineau wrote:

> On Thu, 10 Sep 2020, Matthieu Baerts wrote:
>
>> Hi Mat, Paolo,
>> 
>> On 09/09/2020 18:45, Paolo Abeni wrote:
>>> On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote:
>>>> On Mon, 7 Sep 2020, Paolo Abeni wrote:
>>>>> We already have 'schedule_work' call to handle incoming data fin
>>>>> in mptcp_incoming_options(), can we instead extend the checks there to
>>>>> cover also this scenario?
>>>>> 
>>>> 
>>>> I did look at modifying mptcp_incoming_options(), but this patch looked
>>>> like the simpler fix.
>>>> 
>>>> The existing checks for empty packets in mptcp_incoming_options()
>>>> accomplish two things:
>>>> 
>>>> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow
>>>> receive queue (TCP FIN packets do go in to the receive queue, even if
>>>> there's no payload).
>>>> 
>>>> 2. skb_ext attachment is skipped for ACK packets.
>>>> 
>>>> 
>>>> Changing the logic in mptcp_incoming_options() would have to look for
>>>> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext -
>>>> which significantly changes the code paths involved. It might also do an
>>>> extra wake up of the worker before the MPTCP socket is caught up with the
>>>> incoming data.
>>>> 
>>>> In comparison, the above patch stores a return value that was previously
>>>> (incorrectly) ignored, and schedules the worker when it is most likely to
>>>> be able to do the work.
>>>> 
>>>> Given that, do you think it's better to extend the
>>>> mptcp_incoming_options() checks?
>>> 
>>> Yup, I did not look into the details of the other option. It looks like
>>> it's overcomplicated, so I'm ok with this approach.
>> 
>> Thank you for the patch and the review!
>> 
>> I just applied your patch with Paolo's ACK!
>> 
>> Compilation tests are OK, "export" branch has been updated but the CI is 
>> still validating functional tests in the VM (selftests, packetdrill, etc.).
>> 
>> I will try to report the status later on IRC but if you don't hear from me 
>> today, I am sure this patch can be sent to netdev!
>> 
>
> I found an interop case with the multipath-tcp.org kernel where v5.9 is not 
> acknowledging the peer's DATA_FIN, so the peer keeps resending the DATA_FIN 
> even though the v5.9 socket is already closed.
>
> Working on a patch to squash with this one.
>

I had been thinking the multipath-tcp.org interop problem was a variation 
of the bug fixed by this patch, but it looks like a separate (and very 
different) issue with mptcp_close(). I will send this patch to netdev now 
and open a separate issue on github.

--
Mat Martineau
Intel
Paolo Abeni Sept. 23, 2020, 3:56 p.m. UTC | #7
On Fri, 2020-09-18 at 15:52 -0700, Mat Martineau wrote:
> I had been thinking the multipath-tcp.org interop problem was a
> variation 
> of the bug fixed by this patch, but it looks like a separate (and
> very 
> different) issue with mptcp_close(). I will send this patch to netdev
> now 
> and open a separate issue on github.

https://github.com/multipath-tcp/mptcp_net-next/issues/92

I'm looking at the same problem from a different perspective.

To support correct snd wnd enforcing, we need the mptcp equivalent of
tcp_push_pending_frames(): data must be queued into the msk and pushed
to substream only when the MPTCP-level wnd is open.

As a side effect, at shutdown/close time we may end-up with data queued
into the msk that never reached any subflow. To allow the MPTCP and
later TCP stack to spool them we need to keep the ssk around.

And here we hit some nasty problems: at mptcp_close() time we need to
orphan all subflows sharing the same 'struct socket' as the msk - as
the latter is going to be deleted just after mptcp_close() completes.

But as soon as we orphan them, the tcp stack may async delete the
subflow via tcp_done(). That will free also the related subflow
context. In subflow_ulp_release() we can't acquire the msk socket lock,
so we can't remove the deleted subflow from the conn_list.

TL;DR: UaF in later MPTCP operation (e.g. [re]transmit) touching the
subflows, not easy to solve.

/P
Mat Martineau Sept. 24, 2020, 12:06 a.m. UTC | #8
On Wed, 23 Sep 2020, Paolo Abeni wrote:

> On Fri, 2020-09-18 at 15:52 -0700, Mat Martineau wrote:
>> I had been thinking the multipath-tcp.org interop problem was a
>> variation
>> of the bug fixed by this patch, but it looks like a separate (and
>> very
>> different) issue with mptcp_close(). I will send this patch to netdev
>> now
>> and open a separate issue on github.
>
> https://github.com/multipath-tcp/mptcp_net-next/issues/92
>
> I'm looking at the same problem from a different perspective.
>
> To support correct snd wnd enforcing, we need the mptcp equivalent of
> tcp_push_pending_frames(): data must be queued into the msk and pushed
> to substream only when the MPTCP-level wnd is open.
>
> As a side effect, at shutdown/close time we may end-up with data queued
> into the msk that never reached any subflow. To allow the MPTCP and
> later TCP stack to spool them we need to keep the ssk around.
>
> And here we hit some nasty problems: at mptcp_close() time we need to
> orphan all subflows sharing the same 'struct socket' as the msk - as
> the latter is going to be deleted just after mptcp_close() completes.
>
> But as soon as we orphan them, the tcp stack may async delete the
> subflow via tcp_done(). That will free also the related subflow
> context. In subflow_ulp_release() we can't acquire the msk socket lock,
> so we can't remove the deleted subflow from the conn_list.
>
> TL;DR: UaF in later MPTCP operation (e.g. [re]transmit) touching the
> subflows, not easy to solve.

This is exactly the problem I'm running in to (UaF on later call to 
subflow_state_change), and so far hasn't been easy to solve. Would help to 
talk it over at the Thursday meeting!

--
Mat Martineau
Intel
diff mbox series

Patch

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e8cac2655c82..3297233dece7 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -731,7 +731,7 @@  static enum mapping_status get_mapping_status(struct sock *ssk,
 
 	if (mpext->data_fin == 1) {
 		if (data_len == 1) {
-			mptcp_update_rcv_data_fin(msk, mpext->data_seq);
+			bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq);
 			pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq);
 			if (subflow->map_valid) {
 				/* A DATA_FIN might arrive in a DSS
@@ -742,6 +742,9 @@  static enum mapping_status get_mapping_status(struct sock *ssk,
 				skb_ext_del(skb, SKB_EXT_MPTCP);
 				return MAPPING_OK;
 			} else {
+				if (updated && schedule_work(&msk->work))
+					sock_hold((struct sock *)msk);
+
 				return MAPPING_DATA_FIN;
 			}
 		} else {