mbox series

[v2] mptcp: wmem accounting and nonblocking io support

Message ID 20191118214538.21931-1-fw@strlen.de
State Accepted, archived
Delegated to: Matthieu Baerts
Headers show
Series [v2] mptcp: wmem accounting and nonblocking io support | expand

Pull-request

git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_08

Message

Florian Westphal Nov. 18, 2019, 9:45 p.m. UTC
This is the second iteration of the mptcp poll rework.

Most patches have no changes, the ones with changes have these
highlighted in the changelog (6/9/10).

This addresses comments from Paolo.  I've also axed the 'empty send'
part as per comments from Paolo and Mat.

There were doubts as to the repeated invocations of mptcp_send_frag()
(RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
isn't related to this patch series.

Old covert letter:

The first patch extends the test suite with a mmap-based mode to
check large, blocking writes.  This uncovered a minor problem with the
earlier v2 wmem accounting patch series -- we would happily take a lot
more data than sndbuf allowed, as we only limited based on what the subflow
could accept.  So with a 4k sndbuf we could easily accept 256kb or even more.

This patch doesn't change anything in the test suite behaviour however,
you need to use "-b 4096" and/or "-m mmap" to enable this mode.

Second patch changes test suite to move to nonblocking io, this breaks mptcp
because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets
-EAGAIN even though poll told it otherwise.

Patches 3/4/5/6 are an update vs. last wmem accounting series.
Remaining patches fix the nonblocking io behaviour.

mptcp_poll is made to be stand-alone, i.e. it no longer calls
__tcp_poll on the subflow sockets and only considers mptcp_sk state.

After this series the selftest works again and mptcp sk rtx queue is
limited by msk wmem.

The patches can't easily be rebased/merged so I propose that I would
squash this myself and send a pull request when done.

The following changes since commit c925affd3fa9dd18e6c326cff510bd30087f0ac8:

  subflow: wake parent mptcp socket on subflow state change (2019-11-18 02:44:11 +0000)

are available in the Git repository at:

  git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_08

for you to fetch changes up to 2bbcfcd464c6ff1626800cc2e7c02085e02f35e8:

  sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-18 17:45:44 +0100)

----------------------------------------------------------------
Florian Westphal (13):
      selftest: add mmap-write support
      selftests: make sockets non-blocking for default poll mode
      mptcp: add wmem_queued accounting
      mptcp: allow partial cleaning of rtx head dfrag
      mptcp: add and use mptcp RTX flag
      sendmsg: block until mptcp sk is writeable
      subflow: sk_data_ready: make wakeup on tcp sock conditional
      mptcp: add and use mptcp_subflow_get_retrans
      mptcp: sendmsg: transmit on backup if other subflows have been closed
      recv: make DATA_READY reflect ssk in-sequence state
      sendmsg: clear SEND_SPACE if write caused wmem to grow too large
      mptcp_poll: don't consider subflow socket state anymore
      sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace

 net/mptcp/options.c                                |   2 +-
 net/mptcp/protocol.c                               | 249 ++++++++++++++++---
 net/mptcp/protocol.h                               |   4 +-
 net/mptcp/subflow.c                                |  12 +-
 tools/testing/selftests/net/mptcp/mptcp_connect.c  | 268 ++++++++++++++++++++-
 tools/testing/selftests/net/mptcp/mptcp_connect.sh |  36 ++-
 6 files changed, 518 insertions(+), 53 deletions(-)

Comments

Paolo Abeni Nov. 19, 2019, 11:38 a.m. UTC | #1
On Mon, 2019-11-18 at 22:45 +0100, Florian Westphal wrote:
> This is the second iteration of the mptcp poll rework.
> 
> Most patches have no changes, the ones with changes have these
> highlighted in the changelog (6/9/10).
> 
> This addresses comments from Paolo.  I've also axed the 'empty send'
> part as per comments from Paolo and Mat.
> 
> There were doubts as to the repeated invocations of mptcp_send_frag()
> (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
> dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
> isn't related to this patch series.
> 
> Old covert letter:
> 
> The first patch extends the test suite with a mmap-based mode to
> check large, blocking writes.  This uncovered a minor problem with the
> earlier v2 wmem accounting patch series -- we would happily take a lot
> more data than sndbuf allowed, as we only limited based on what the subflow
> could accept.  So with a 4k sndbuf we could easily accept 256kb or even more.
> 
> This patch doesn't change anything in the test suite behaviour however,
> you need to use "-b 4096" and/or "-m mmap" to enable this mode.
> 
> Second patch changes test suite to move to nonblocking io, this breaks mptcp
> because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets
> -EAGAIN even though poll told it otherwise.
> 
> Patches 3/4/5/6 are an update vs. last wmem accounting series.
> Remaining patches fix the nonblocking io behaviour.
> 
> mptcp_poll is made to be stand-alone, i.e. it no longer calls
> __tcp_poll on the subflow sockets and only considers mptcp_sk state.
> 
> After this series the selftest works again and mptcp sk rtx queue is
> limited by msk wmem.
> 
> The patches can't easily be rebased/merged so I propose that I would
> squash this myself and send a pull request when done.
> 
> The following changes since commit c925affd3fa9dd18e6c326cff510bd30087f0ac8:
> 
>   subflow: wake parent mptcp socket on subflow state change (2019-11-18 02:44:11 +0000)
> 
> are available in the Git repository at:
> 
>   git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_08
> 
> for you to fetch changes up to 2bbcfcd464c6ff1626800cc2e7c02085e02f35e8:
> 
>   sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-18 17:45:44 +0100)

The series LGTM, thanks Florian!

/P
Mat Martineau Nov. 19, 2019, 10:52 p.m. UTC | #2
On Mon, 18 Nov 2019, Florian Westphal wrote:

> This is the second iteration of the mptcp poll rework.
>
> Most patches have no changes, the ones with changes have these
> highlighted in the changelog (6/9/10).
>
> This addresses comments from Paolo.  I've also axed the 'empty send'
> part as per comments from Paolo and Mat.
>
> There were doubts as to the repeated invocations of mptcp_send_frag()
> (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
> dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
> isn't related to this patch series.

The series looks good to me, with one minor comment on patch 1.


Mat



>
> Old covert letter:
>
> The first patch extends the test suite with a mmap-based mode to
> check large, blocking writes.  This uncovered a minor problem with the
> earlier v2 wmem accounting patch series -- we would happily take a lot
> more data than sndbuf allowed, as we only limited based on what the subflow
> could accept.  So with a 4k sndbuf we could easily accept 256kb or even more.
>
> This patch doesn't change anything in the test suite behaviour however,
> you need to use "-b 4096" and/or "-m mmap" to enable this mode.
>
> Second patch changes test suite to move to nonblocking io, this breaks mptcp
> because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets
> -EAGAIN even though poll told it otherwise.
>
> Patches 3/4/5/6 are an update vs. last wmem accounting series.
> Remaining patches fix the nonblocking io behaviour.
>
> mptcp_poll is made to be stand-alone, i.e. it no longer calls
> __tcp_poll on the subflow sockets and only considers mptcp_sk state.
>
> After this series the selftest works again and mptcp sk rtx queue is
> limited by msk wmem.
>
> The patches can't easily be rebased/merged so I propose that I would
> squash this myself and send a pull request when done.
>
> The following changes since commit c925affd3fa9dd18e6c326cff510bd30087f0ac8:
>
>  subflow: wake parent mptcp socket on subflow state change (2019-11-18 02:44:11 +0000)
>
> are available in the Git repository at:
>
>  git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_08
>
> for you to fetch changes up to 2bbcfcd464c6ff1626800cc2e7c02085e02f35e8:
>
>  sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-18 17:45:44 +0100)
>
> ----------------------------------------------------------------
> Florian Westphal (13):
>      selftest: add mmap-write support
>      selftests: make sockets non-blocking for default poll mode
>      mptcp: add wmem_queued accounting
>      mptcp: allow partial cleaning of rtx head dfrag
>      mptcp: add and use mptcp RTX flag
>      sendmsg: block until mptcp sk is writeable
>      subflow: sk_data_ready: make wakeup on tcp sock conditional
>      mptcp: add and use mptcp_subflow_get_retrans
>      mptcp: sendmsg: transmit on backup if other subflows have been closed
>      recv: make DATA_READY reflect ssk in-sequence state
>      sendmsg: clear SEND_SPACE if write caused wmem to grow too large
>      mptcp_poll: don't consider subflow socket state anymore
>      sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace
>
> net/mptcp/options.c                                |   2 +-
> net/mptcp/protocol.c                               | 249 ++++++++++++++++---
> net/mptcp/protocol.h                               |   4 +-
> net/mptcp/subflow.c                                |  12 +-
> tools/testing/selftests/net/mptcp/mptcp_connect.c  | 268 ++++++++++++++++++++-
> tools/testing/selftests/net/mptcp/mptcp_connect.sh |  36 ++-
> 6 files changed, 518 insertions(+), 53 deletions(-)
> _______________________________________________
> mptcp mailing list -- mptcp@lists.01.org
> To unsubscribe send an email to mptcp-leave@lists.01.org
>

--
Mat Martineau
Intel
Florian Westphal Nov. 20, 2019, 3:10 p.m. UTC | #3
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > There were doubts as to the repeated invocations of mptcp_send_frag()
> > (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
> > dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
> > isn't related to this patch series.
> 
> The series looks good to me, with one minor comment on patch 1.

Thanks, I can fix that up.

How to proceed?  There are several patches in-flight.  I think it might
make sense to apply this series as-is, with no squashing.

We can then do another squash-marathon later on before next upstream
submission.

Otherwise, I could start to squash this series here locally and then
send a pull request.

What do you think?

Thanks!
Paolo Abeni Nov. 20, 2019, 3:25 p.m. UTC | #4
On Wed, 2019-11-20 at 16:10 +0100, Florian Westphal wrote:
> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > > There were doubts as to the repeated invocations of mptcp_send_frag()
> > > (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
> > > dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
> > > isn't related to this patch series.
> > 
> > The series looks good to me, with one minor comment on patch 1.
> 
> Thanks, I can fix that up.
> 
> How to proceed?  There are several patches in-flight.  I think it might
> make sense to apply this series as-is, with no squashing.

I'm personally ok with that. 

Can we wait for the series 'Awaiting Upstream' being applied, before
going with this one?

Than v1 support can be rebased on top of the resulting tree (still to
be applied just before self-tests).

> We can then do another squash-marathon later on before next upstream
> submission.

Yep, we will likely need some kind of "code-freeze", and squash/bugfix
only before the submission.

Cheers,

Paolo
Matthieu Baerts Nov. 20, 2019, 3:36 p.m. UTC | #5
Hi Florian, Paolo,

On 20/11/2019 16:25, Paolo Abeni wrote:
> On Wed, 2019-11-20 at 16:10 +0100, Florian Westphal wrote:
>> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>>>> There were doubts as to the repeated invocations of mptcp_send_frag()
>>>> (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
>>>> dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
>>>> isn't related to this patch series.
>>>
>>> The series looks good to me, with one minor comment on patch 1.
>>
>> Thanks, I can fix that up.
>>
>> How to proceed?  There are several patches in-flight.  I think it might
>> make sense to apply this series as-is, with no squashing.
> 
> I'm personally ok with that.
> 
> Can we wait for the series 'Awaiting Upstream' being applied, before
> going with this one?

I was replying to Florian to say that I was going to do that :)
I would like to apply these two series:

   https://patchwork.ozlabs.org/project/mptcp/list/?state=8&delegate=100661

Ideally, I would also like to move commits after kselftests introduction 
but this can wait if needed.

> Than v1 support can be rebased on top of the resulting tree (still to
> be applied just before self-tests).
> 
>> We can then do another squash-marathon later on before next upstream
>> submission.
> 
> Yep, we will likely need some kind of "code-freeze", and squash/bugfix
> only before the submission.

We just need to be careful that if we add a lot of modifications at the 
end of the series, it can make other new patches difficult to squash 
(based on new code, we need to apply on older code, then resolve 
conflicts, etc.).

But I understand that it takes time to do that and we might want to 
postpone that. As long as it is not too long, otherwise we might have to 
queue a few patches for a future squash :)

If we agree to put these 13 patches at the end (+ the two to include 
MPTCPv1 support?), I can easily do that when I get the go!

Cheers,
Matt
Paolo Abeni Nov. 20, 2019, 3:49 p.m. UTC | #6
On Wed, 2019-11-20 at 16:36 +0100, Matthieu Baerts wrote:
> Hi Florian, Paolo,
> 
> On 20/11/2019 16:25, Paolo Abeni wrote:
> > On Wed, 2019-11-20 at 16:10 +0100, Florian Westphal wrote:
> > > Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > > > > There were doubts as to the repeated invocations of mptcp_send_frag()
> > > > > (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
> > > > > dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
> > > > > isn't related to this patch series.
> > > > 
> > > > The series looks good to me, with one minor comment on patch 1.
> > > 
> > > Thanks, I can fix that up.
> > > 
> > > How to proceed?  There are several patches in-flight.  I think it might
> > > make sense to apply this series as-is, with no squashing.
> > 
> > I'm personally ok with that.
> > 
> > Can we wait for the series 'Awaiting Upstream' being applied, before
> > going with this one?
> 
> I was replying to Florian to say that I was going to do that :)
> I would like to apply these two series:
> 
>    https://patchwork.ozlabs.org/project/mptcp/list/?state=8&delegate=100661
> 
> Ideally, I would also like to move commits after kselftests introduction 
> but this can wait if needed.

If you could do that, I would appreciate the effort a lot: v1 patches
rebasing could be started just after that.

Note: will cause some conflict on Makefile, due to pm.c being added
after ctrl.c

Cheers,

Paolo
Matthieu Baerts Nov. 20, 2019, 4:04 p.m. UTC | #7
Hi Paolo,

On 20/11/2019 16:49, Paolo Abeni wrote:
> On Wed, 2019-11-20 at 16:36 +0100, Matthieu Baerts wrote:
>> Hi Florian, Paolo,
>>
>> On 20/11/2019 16:25, Paolo Abeni wrote:
>>> On Wed, 2019-11-20 at 16:10 +0100, Florian Westphal wrote:
>>>> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>>>>>> There were doubts as to the repeated invocations of mptcp_send_frag()
>>>>>> (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have
>>>>>> dropped it from this iteration -- mptcp_sendmsg_frag needs rework which
>>>>>> isn't related to this patch series.
>>>>>
>>>>> The series looks good to me, with one minor comment on patch 1.
>>>>
>>>> Thanks, I can fix that up.
>>>>
>>>> How to proceed?  There are several patches in-flight.  I think it might
>>>> make sense to apply this series as-is, with no squashing.
>>>
>>> I'm personally ok with that.
>>>
>>> Can we wait for the series 'Awaiting Upstream' being applied, before
>>> going with this one?
>>
>> I was replying to Florian to say that I was going to do that :)
>> I would like to apply these two series:
>>
>>     https://patchwork.ozlabs.org/project/mptcp/list/?state=8&delegate=100661
>>
>> Ideally, I would also like to move commits after kselftests introduction
>> but this can wait if needed.
> 
> If you could do that, I would appreciate the effort a lot: v1 patches
> rebasing could be started just after that.

Great, I am then going to:

- apply the two series that are waiting to be applied
- then move these 3 commits after "mptcp: add basic kselftest for mptcp":

   mptcp: Add handling of incoming MP_JOIN requests
   mptcp: Add ADD_ADDR handling
   mptcp: Add path manager interface

- then can I add Florian' series at the end?

> Note: will cause some conflict on Makefile, due to pm.c being added
> after ctrl.c

That's nothing :-)
But thank you for the notice!

Cheers,
Matt
Florian Westphal Nov. 20, 2019, 4:12 p.m. UTC | #8
Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
> Great, I am then going to:
> 
> - apply the two series that are waiting to be applied
> - then move these 3 commits after "mptcp: add basic kselftest for mptcp":
> 
>   mptcp: Add handling of incoming MP_JOIN requests
>   mptcp: Add ADD_ADDR handling
>   mptcp: Add path manager interface
> 
> - then can I add Florian' series at the end?

Works for me.  If you like you can also ping me once
you're done with the pending series and I can send you a pull request
with my patches applied on top.

Whatever makes your life easier.  Let me know.
Matthieu Baerts Nov. 20, 2019, 8:10 p.m. UTC | #9
Hi Florian,

On 20/11/2019 17:12, Florian Westphal wrote:
> Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
>> Great, I am then going to:
>>
>> - apply the two series that are waiting to be applied
>> - then move these 3 commits after "mptcp: add basic kselftest for mptcp":
>>
>>    mptcp: Add handling of incoming MP_JOIN requests
>>    mptcp: Add ADD_ADDR handling
>>    mptcp: Add path manager interface
>>
>> - then can I add Florian' series at the end?
> 
> Works for me.  If you like you can also ping me once
> you're done with the pending series and I can send you a pull request
> with my patches applied on top.
> 
> Whatever makes your life easier.  Let me know.

Thank you for the offer!

I only had some conflicts with mptcp_connect.sh due to the modifications 
I recently did and applied. It was then easy for me to resolve this.

I added all the 13 patches at the end of the series. Please tell me if I 
can do more (squash some commits in others) and/or stop the CI to sync 
with the latest net-next.

Cheers,
Matt