diff mbox series

Re: [PATCH mptcp-next 2/2] mptcp: add receive buffer auto-tuning

Message ID 20200616160912.GI3048@MacBook-Pro-64.local
State Superseded, archived
Headers show
Series Re: [PATCH mptcp-next 2/2] mptcp: add receive buffer auto-tuning | expand

Commit Message

Christoph Paasch June 16, 2020, 4:09 p.m. UTC
Hello,

On 06/10/20 - 13:55, Florian Westphal wrote:
> When mptcp is used, userspace doesn't read from the tcp (subflow)
> socket but from the parent (mptcp) socket receive queue.
> 
> skbs are moved from the subflow socket to the mptcp rx queue either from
> 'data_ready' callback (if mptcp socket can be locked), a work queue, or
> the socket receive function.
> 
> This means tcp_rcv_space_adjust() is never called and thus no receive
> buffer size auto-tuning is done.
> 
> An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
> function that moves skbs from subflow to mptcp socket.
> While this enabled autotuning, it also meant tuning was done even if
> userspace was reading the mptcp socket very slowly.
> 
> This adds mptcp_rcv_space_adjust() and calls it after userspace has
> read data from the mptcp socket rx queue.
> 
> Its very similar to tcp_rcv_space_adjust, with two differences:
> 
> 1. The rtt estimate is the largest one observed on a subflow
> 2. The rcvbuf size and window clamp of all subflows is adjusted
>    to the mptcp-level rcvbuf.
> 
> Otherwise, we get spurious drops at tcp (subflow) socket level if
> the skbs are not moved to the mptcp socket fast enough and reduced
> throughput..
> 
> Before:
> time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
> [..]
> ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40562ms) [ OK ]
> ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5415ms) [ OK ]
> ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5413ms) [ OK ]
> ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41331ms) [ OK ]
> ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5415ms) [ OK ]
> ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5714ms) [ OK ]
> Time: 846 seconds
> 
> After:
> ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
> ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5429ms) [ OK ]
> ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5418ms) [ OK ]
> ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5423ms) [ OK ]
> ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5715ms) [ OK ]
> ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5415ms) [ OK ]
> Time: 275 seconds
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  changes since last version:
>   - adjust for 'mptcp fallback' patches
>   - add 'const' qualifier to function argument

syzkaller found another divide-by-zero. I don't yet have a real repro
generated by syzkaller, but from the logs it looks like it is the
simultaneous self-connect case.

In Davide's other patch I think we need to call mptcp_rcv_space_init in this
hunk:




Christoph
diff mbox series

Patch

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index fc3be8ab2168..781836f93a68 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1114,6 +1114,16 @@  static void subflow_state_change(struct sock *sk)

        __subflow_state_change(sk);

+       if (subflow_simultaneous_connect(sk)) {
+               mptcp_do_fallback(sk);
+               pr_fallback(mptcp_sk(parent));
+               subflow->conn_finished = 1;
+               if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
+                       inet_sk_state_store(parent, TCP_ESTABLISHED);
+                       parent->sk_state_change(parent);
+               }
+       }
+
        /* as recvmsg() does not acquire the subflow socket for ssk selection
         * a fin packet carrying a DSS can be unnoticed if we don't trigger
         * the data available machinery here.