Message ID | 20200904003531.291283-1-mathew.j.martineau@linux.intel.com |
---|---|
State | Accepted, archived |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [mptcp-net] mptcp: Wake up MPTCP worker when DATA_FIN found on a TCP FIN packet | expand |
On Thu, 2020-09-03 at 17:35 -0700, Mat Martineau wrote: > When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN > information would be stored but the MPTCP worker did not get > scheduled. In turn, the MPTCP socket state would remain in > TCP_ESTABLISHED and no blocked operations would be awakened. > > TCP FIN packets are seen by the MPTCP socket when moving skbs out of the > subflow receive queues, so schedule the MPTCP worker when a skb with > DATA_FIN but no data payload is moved from a subflow queue. Other cases > (DATA_FIN on a bare TCP ACK or on a packet with data payload) are > already handled. > > Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> > --- > > This patch is targeted at the -net tree, but also applies to net-next > without conflict. If it looks ok, we can carry it in the export branch > until it gets merged to both -net and net-next. > > net/mptcp/subflow.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c > index e8cac2655c82..3297233dece7 100644 > --- a/net/mptcp/subflow.c > +++ b/net/mptcp/subflow.c > @@ -731,7 +731,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk, > > if (mpext->data_fin == 1) { > if (data_len == 1) { > - mptcp_update_rcv_data_fin(msk, mpext->data_seq); > + bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq); > pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq); > if (subflow->map_valid) { > /* A DATA_FIN might arrive in a DSS > @@ -742,6 +742,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk, > skb_ext_del(skb, SKB_EXT_MPTCP); > return MAPPING_OK; > } else { > + if (updated && schedule_work(&msk->work)) > + sock_hold((struct sock *)msk); > + We already have 'schedule_work' call to handle incoming data fin in mptcp_incoming_options(), can we instead extend the checks there to cover also this scenario? Cheers, Paolo
On Mon, 7 Sep 2020, Paolo Abeni wrote: > On Thu, 2020-09-03 at 17:35 -0700, Mat Martineau wrote: >> When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN >> information would be stored but the MPTCP worker did not get >> scheduled. In turn, the MPTCP socket state would remain in >> TCP_ESTABLISHED and no blocked operations would be awakened. >> >> TCP FIN packets are seen by the MPTCP socket when moving skbs out of the >> subflow receive queues, so schedule the MPTCP worker when a skb with >> DATA_FIN but no data payload is moved from a subflow queue. Other cases >> (DATA_FIN on a bare TCP ACK or on a packet with data payload) are >> already handled. >> >> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> >> --- >> >> This patch is targeted at the -net tree, but also applies to net-next >> without conflict. If it looks ok, we can carry it in the export branch >> until it gets merged to both -net and net-next. >> >> net/mptcp/subflow.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c >> index e8cac2655c82..3297233dece7 100644 >> --- a/net/mptcp/subflow.c >> +++ b/net/mptcp/subflow.c >> @@ -731,7 +731,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk, >> >> if (mpext->data_fin == 1) { >> if (data_len == 1) { >> - mptcp_update_rcv_data_fin(msk, mpext->data_seq); >> + bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq); >> pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq); >> if (subflow->map_valid) { >> /* A DATA_FIN might arrive in a DSS >> @@ -742,6 +742,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk, >> skb_ext_del(skb, SKB_EXT_MPTCP); >> return MAPPING_OK; >> } else { >> + if (updated && schedule_work(&msk->work)) >> + sock_hold((struct sock *)msk); >> + > > We already have 'schedule_work' call to handle incoming data fin > in mptcp_incoming_options(), can we instead extend the checks there to > cover also this scenario? > I did look at modifying mptcp_incoming_options(), but this patch looked like the simpler fix. The existing checks for empty packets in mptcp_incoming_options() accomplish two things: 1. DATA_FIN is handled for ACK packets that don't get in to the subflow receive queue (TCP FIN packets do go in to the receive queue, even if there's no payload). 2. skb_ext attachment is skipped for ACK packets. Changing the logic in mptcp_incoming_options() would have to look for seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - which significantly changes the code paths involved. It might also do an extra wake up of the worker before the MPTCP socket is caught up with the incoming data. In comparison, the above patch stores a return value that was previously (incorrectly) ignored, and schedules the worker when it is most likely to be able to do the work. Given that, do you think it's better to extend the mptcp_incoming_options() checks? -- Mat Martineau Intel
On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote: > On Mon, 7 Sep 2020, Paolo Abeni wrote: > > We already have 'schedule_work' call to handle incoming data fin > > in mptcp_incoming_options(), can we instead extend the checks there to > > cover also this scenario? > > > > I did look at modifying mptcp_incoming_options(), but this patch looked > like the simpler fix. > > The existing checks for empty packets in mptcp_incoming_options() > accomplish two things: > > 1. DATA_FIN is handled for ACK packets that don't get in to the subflow > receive queue (TCP FIN packets do go in to the receive queue, even if > there's no payload). > > 2. skb_ext attachment is skipped for ACK packets. > > > Changing the logic in mptcp_incoming_options() would have to look for > seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - > which significantly changes the code paths involved. It might also do an > extra wake up of the worker before the MPTCP socket is caught up with the > incoming data. > > In comparison, the above patch stores a return value that was previously > (incorrectly) ignored, and schedules the worker when it is most likely to > be able to do the work. > > Given that, do you think it's better to extend the > mptcp_incoming_options() checks? Yup, I did not look into the details of the other option. It looks like it's overcomplicated, so I'm ok with this approach. Cheers, Paolo
Hi Mat, Paolo, On 09/09/2020 18:45, Paolo Abeni wrote: > On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote: >> On Mon, 7 Sep 2020, Paolo Abeni wrote: >>> We already have 'schedule_work' call to handle incoming data fin >>> in mptcp_incoming_options(), can we instead extend the checks there to >>> cover also this scenario? >>> >> >> I did look at modifying mptcp_incoming_options(), but this patch looked >> like the simpler fix. >> >> The existing checks for empty packets in mptcp_incoming_options() >> accomplish two things: >> >> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow >> receive queue (TCP FIN packets do go in to the receive queue, even if >> there's no payload). >> >> 2. skb_ext attachment is skipped for ACK packets. >> >> >> Changing the logic in mptcp_incoming_options() would have to look for >> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - >> which significantly changes the code paths involved. It might also do an >> extra wake up of the worker before the MPTCP socket is caught up with the >> incoming data. >> >> In comparison, the above patch stores a return value that was previously >> (incorrectly) ignored, and schedules the worker when it is most likely to >> be able to do the work. >> >> Given that, do you think it's better to extend the >> mptcp_incoming_options() checks? > > Yup, I did not look into the details of the other option. It looks like > it's overcomplicated, so I'm ok with this approach. Thank you for the patch and the review! I just applied your patch with Paolo's ACK! Compilation tests are OK, "export" branch has been updated but the CI is still validating functional tests in the VM (selftests, packetdrill, etc.). I will try to report the status later on IRC but if you don't hear from me today, I am sure this patch can be sent to netdev! Cheers, Matt
On Thu, 10 Sep 2020, Matthieu Baerts wrote: > Hi Mat, Paolo, > > On 09/09/2020 18:45, Paolo Abeni wrote: >> On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote: >>> On Mon, 7 Sep 2020, Paolo Abeni wrote: >>>> We already have 'schedule_work' call to handle incoming data fin >>>> in mptcp_incoming_options(), can we instead extend the checks there to >>>> cover also this scenario? >>>> >>> >>> I did look at modifying mptcp_incoming_options(), but this patch looked >>> like the simpler fix. >>> >>> The existing checks for empty packets in mptcp_incoming_options() >>> accomplish two things: >>> >>> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow >>> receive queue (TCP FIN packets do go in to the receive queue, even if >>> there's no payload). >>> >>> 2. skb_ext attachment is skipped for ACK packets. >>> >>> >>> Changing the logic in mptcp_incoming_options() would have to look for >>> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - >>> which significantly changes the code paths involved. It might also do an >>> extra wake up of the worker before the MPTCP socket is caught up with the >>> incoming data. >>> >>> In comparison, the above patch stores a return value that was previously >>> (incorrectly) ignored, and schedules the worker when it is most likely to >>> be able to do the work. >>> >>> Given that, do you think it's better to extend the >>> mptcp_incoming_options() checks? >> >> Yup, I did not look into the details of the other option. It looks like >> it's overcomplicated, so I'm ok with this approach. > > Thank you for the patch and the review! > > I just applied your patch with Paolo's ACK! > > Compilation tests are OK, "export" branch has been updated but the CI is > still validating functional tests in the VM (selftests, packetdrill, etc.). > > I will try to report the status later on IRC but if you don't hear from me > today, I am sure this patch can be sent to netdev! > I found an interop case with the multipath-tcp.org kernel where v5.9 is not acknowledging the peer's DATA_FIN, so the peer keeps resending the DATA_FIN even though the v5.9 socket is already closed. Working on a patch to squash with this one. -- Mat Martineau Intel
On Thu, 10 Sep 2020, Mat Martineau wrote: > On Thu, 10 Sep 2020, Matthieu Baerts wrote: > >> Hi Mat, Paolo, >> >> On 09/09/2020 18:45, Paolo Abeni wrote: >>> On Tue, 2020-09-08 at 13:33 -0700, Mat Martineau wrote: >>>> On Mon, 7 Sep 2020, Paolo Abeni wrote: >>>>> We already have 'schedule_work' call to handle incoming data fin >>>>> in mptcp_incoming_options(), can we instead extend the checks there to >>>>> cover also this scenario? >>>>> >>>> >>>> I did look at modifying mptcp_incoming_options(), but this patch looked >>>> like the simpler fix. >>>> >>>> The existing checks for empty packets in mptcp_incoming_options() >>>> accomplish two things: >>>> >>>> 1. DATA_FIN is handled for ACK packets that don't get in to the subflow >>>> receive queue (TCP FIN packets do go in to the receive queue, even if >>>> there's no payload). >>>> >>>> 2. skb_ext attachment is skipped for ACK packets. >>>> >>>> >>>> Changing the logic in mptcp_incoming_options() would have to look for >>>> seq+1 == end_seq and TCP FIN, and then (maybe?) not attach the skb_ext - >>>> which significantly changes the code paths involved. It might also do an >>>> extra wake up of the worker before the MPTCP socket is caught up with the >>>> incoming data. >>>> >>>> In comparison, the above patch stores a return value that was previously >>>> (incorrectly) ignored, and schedules the worker when it is most likely to >>>> be able to do the work. >>>> >>>> Given that, do you think it's better to extend the >>>> mptcp_incoming_options() checks? >>> >>> Yup, I did not look into the details of the other option. It looks like >>> it's overcomplicated, so I'm ok with this approach. >> >> Thank you for the patch and the review! >> >> I just applied your patch with Paolo's ACK! >> >> Compilation tests are OK, "export" branch has been updated but the CI is >> still validating functional tests in the VM (selftests, packetdrill, etc.). >> >> I will try to report the status later on IRC but if you don't hear from me >> today, I am sure this patch can be sent to netdev! >> > > I found an interop case with the multipath-tcp.org kernel where v5.9 is not > acknowledging the peer's DATA_FIN, so the peer keeps resending the DATA_FIN > even though the v5.9 socket is already closed. > > Working on a patch to squash with this one. > I had been thinking the multipath-tcp.org interop problem was a variation of the bug fixed by this patch, but it looks like a separate (and very different) issue with mptcp_close(). I will send this patch to netdev now and open a separate issue on github. -- Mat Martineau Intel
On Fri, 2020-09-18 at 15:52 -0700, Mat Martineau wrote: > I had been thinking the multipath-tcp.org interop problem was a > variation > of the bug fixed by this patch, but it looks like a separate (and > very > different) issue with mptcp_close(). I will send this patch to netdev > now > and open a separate issue on github. https://github.com/multipath-tcp/mptcp_net-next/issues/92 I'm looking at the same problem from a different perspective. To support correct snd wnd enforcing, we need the mptcp equivalent of tcp_push_pending_frames(): data must be queued into the msk and pushed to substream only when the MPTCP-level wnd is open. As a side effect, at shutdown/close time we may end-up with data queued into the msk that never reached any subflow. To allow the MPTCP and later TCP stack to spool them we need to keep the ssk around. And here we hit some nasty problems: at mptcp_close() time we need to orphan all subflows sharing the same 'struct socket' as the msk - as the latter is going to be deleted just after mptcp_close() completes. But as soon as we orphan them, the tcp stack may async delete the subflow via tcp_done(). That will free also the related subflow context. In subflow_ulp_release() we can't acquire the msk socket lock, so we can't remove the deleted subflow from the conn_list. TL;DR: UaF in later MPTCP operation (e.g. [re]transmit) touching the subflows, not easy to solve. /P
On Wed, 23 Sep 2020, Paolo Abeni wrote: > On Fri, 2020-09-18 at 15:52 -0700, Mat Martineau wrote: >> I had been thinking the multipath-tcp.org interop problem was a >> variation >> of the bug fixed by this patch, but it looks like a separate (and >> very >> different) issue with mptcp_close(). I will send this patch to netdev >> now >> and open a separate issue on github. > > https://github.com/multipath-tcp/mptcp_net-next/issues/92 > > I'm looking at the same problem from a different perspective. > > To support correct snd wnd enforcing, we need the mptcp equivalent of > tcp_push_pending_frames(): data must be queued into the msk and pushed > to substream only when the MPTCP-level wnd is open. > > As a side effect, at shutdown/close time we may end-up with data queued > into the msk that never reached any subflow. To allow the MPTCP and > later TCP stack to spool them we need to keep the ssk around. > > And here we hit some nasty problems: at mptcp_close() time we need to > orphan all subflows sharing the same 'struct socket' as the msk - as > the latter is going to be deleted just after mptcp_close() completes. > > But as soon as we orphan them, the tcp stack may async delete the > subflow via tcp_done(). That will free also the related subflow > context. In subflow_ulp_release() we can't acquire the msk socket lock, > so we can't remove the deleted subflow from the conn_list. > > TL;DR: UaF in later MPTCP operation (e.g. [re]transmit) touching the > subflows, not easy to solve. This is exactly the problem I'm running in to (UaF on later call to subflow_state_change), and so far hasn't been easy to solve. Would help to talk it over at the Thursday meeting! -- Mat Martineau Intel
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index e8cac2655c82..3297233dece7 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -731,7 +731,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk, if (mpext->data_fin == 1) { if (data_len == 1) { - mptcp_update_rcv_data_fin(msk, mpext->data_seq); + bool updated = mptcp_update_rcv_data_fin(msk, mpext->data_seq); pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq); if (subflow->map_valid) { /* A DATA_FIN might arrive in a DSS @@ -742,6 +742,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk, skb_ext_del(skb, SKB_EXT_MPTCP); return MAPPING_OK; } else { + if (updated && schedule_work(&msk->work)) + sock_hold((struct sock *)msk); + return MAPPING_DATA_FIN; } } else {
When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN information would be stored but the MPTCP worker did not get scheduled. In turn, the MPTCP socket state would remain in TCP_ESTABLISHED and no blocked operations would be awakened. TCP FIN packets are seen by the MPTCP socket when moving skbs out of the subflow receive queues, so schedule the MPTCP worker when a skb with DATA_FIN but no data payload is moved from a subflow queue. Other cases (DATA_FIN on a bare TCP ACK or on a packet with data payload) are already handled. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> --- This patch is targeted at the -net tree, but also applies to net-next without conflict. If it looks ok, we can carry it in the export branch until it gets merged to both -net and net-next. net/mptcp/subflow.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) base-commit: 556699341efa98243e08e34401b3f601da91f5a3