Message ID | 20200312154901.540-1-fw@strlen.de |
---|---|
State | Superseded, archived |
Delegated to: | Paolo Abeni |
Headers | show |
Series | tcp: mptcp: use mptcp receive buffer space to select rcv window | expand |
On Thu, 12 Mar 2020, Florian Westphal wrote: > In MPTCP, the receive windo is shared across all subflows, because it > refers to the mptcp-level sequence space. > > This commit doesn't change choice of initial window for passive or active > connections. > While it would be possible to change those as well, this adds complexity > (especially when handling MP_JOIN requests). > > However, the MPTCP RFC specifically says that a MPTCP sender > 'MUST NOT use the RCV.WND field of a TCP segment at the connection level if > it does not also carry a DSS option with a Data ACK field.' > > SYN/SYNACK packets do not carry a DSS option with a Data ACK field. > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > Unless there are comments i will post this as RFC to netdev and will > CC Eric Dumazet. > I think this looks ready for netdev RFC. Thanks. -- Mat Martineau Intel
Hi, On Thu, 2020-03-12 at 16:49 +0100, Florian Westphal wrote: > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 306e25d743e8..b786da1db76a 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2771,6 +2771,12 @@ u32 __tcp_select_window(struct sock *sk) > int full_space = min_t(int, tp->window_clamp, allowed_space); > int window; I'm wondering if we should move the above assigments (full_space = ..., allowed_space = ...) to an else branch below - no strong opinion, mptcp path will be slower anyway. Additionally, can we have a single helper fetching both mptcp full and allowed space? The mptcp code will have a single back and fourth from TCP to MPTCP code (likely less icache miss) and the TCP code section will be smaller (a single call instead of two). > + if (sk_is_mptcp(sk)) { > + allowed_space = mptcp_full_space(sk); > + free_space = mptcp_space(sk); > + full_space = min_t(int, tp->window_clamp, allowed_space); > + } > + > if (unlikely(mss > full_space)) { > mss = full_space; > if (mss <= 0) > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c > index f9fe60599f79..52d76af75b7f 100644 > --- a/net/mptcp/subflow.c > +++ b/net/mptcp/subflow.c > @@ -738,6 +738,30 @@ bool mptcp_subflow_data_available(struct sock *sk) > return subflow->data_avail; > } > > +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy, > + * not the ssk one. > + * > + * In mptcp, rwin is about the mptcp-level connection data. > + * > + * Data that is still on the ssk rx queue can thus be ignored, > + * as far as mptcp peer is concerened that data is still inflight. > + */ > +int mptcp_full_space(const struct sock *ssk) > +{ > + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); > + const struct sock *sk = READ_ONCE(subflow->conn); > + > + return sk ? tcp_full_space(sk) : tcp_full_space(ssk); After commit 58b09919626b "mptcp: create msk early" subflow->conn is always != NULL for MPTCP sockets. Cheers, Paolo
diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 7489f9267640..611a4d959470 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -66,6 +66,9 @@ static inline bool rsk_is_mptcp(const struct request_sock *req) return tcp_rsk(req)->is_mptcp; } +int mptcp_full_space(const struct sock *ssk); +int mptcp_space(const struct sock *ssk); + void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx); bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, @@ -195,6 +198,16 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk) return false; } +static inline int mptcp_space(struct sock *ssk) +{ + return 0; +} + +static inline int mptcp_full_space(struct sock *ssk) +{ + return 0; +} + static inline void mptcp_seq_show(struct seq_file *seq) { } #endif /* CONFIG_MPTCP */ diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 306e25d743e8..b786da1db76a 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2771,6 +2771,12 @@ u32 __tcp_select_window(struct sock *sk) int full_space = min_t(int, tp->window_clamp, allowed_space); int window; + if (sk_is_mptcp(sk)) { + allowed_space = mptcp_full_space(sk); + free_space = mptcp_space(sk); + full_space = min_t(int, tp->window_clamp, allowed_space); + } + if (unlikely(mss > full_space)) { mss = full_space; if (mss <= 0) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index f9fe60599f79..52d76af75b7f 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -738,6 +738,30 @@ bool mptcp_subflow_data_available(struct sock *sk) return subflow->data_avail; } +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy, + * not the ssk one. + * + * In mptcp, rwin is about the mptcp-level connection data. + * + * Data that is still on the ssk rx queue can thus be ignored, + * as far as mptcp peer is concerened that data is still inflight. + */ +int mptcp_full_space(const struct sock *ssk) +{ + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + const struct sock *sk = READ_ONCE(subflow->conn); + + return sk ? tcp_full_space(sk) : tcp_full_space(ssk); +} + +int mptcp_space(const struct sock *ssk) +{ + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + const struct sock *sk = READ_ONCE(subflow->conn); + + return sk ? tcp_space(sk) : tcp_space(ssk); +} + static void subflow_data_ready(struct sock *sk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
In MPTCP, the receive windo is shared across all subflows, because it refers to the mptcp-level sequence space. This commit doesn't change choice of initial window for passive or active connections. While it would be possible to change those as well, this adds complexity (especially when handling MP_JOIN requests). However, the MPTCP RFC specifically says that a MPTCP sender 'MUST NOT use the RCV.WND field of a TCP segment at the connection level if it does not also carry a DSS option with a Data ACK field.' SYN/SYNACK packets do not carry a DSS option with a Data ACK field. Signed-off-by: Florian Westphal <fw@strlen.de> --- Unless there are comments i will post this as RFC to netdev and will CC Eric Dumazet. include/net/mptcp.h | 13 +++++++++++++ net/ipv4/tcp_output.c | 6 ++++++ net/mptcp/subflow.c | 24 ++++++++++++++++++++++++ 3 files changed, 43 insertions(+)