Message ID | 20191217203807.12579-8-mathew.j.martineau@linux.intel.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | Multipath TCP: Prerequisites | expand |
On 12/17/19 12:38 PM, Mat Martineau wrote: > The MPTCP extension data needs to be preserved as it passes through the > TCP stack. Make sure that these skbs are not appended to others during > coalesce or collapse, so the data remains associated with the payload of > the given skb. > This seems a very pessimistic change to me. Are you planing later to refine this limitation ? Surely if a sender sends TSO packet, we allow all the segments being aggregated at receive side either by GRO or TCP coalescing.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 18 Dec 2019 11:50:24 -0800 > On 12/17/19 12:38 PM, Mat Martineau wrote: >> The MPTCP extension data needs to be preserved as it passes through the >> TCP stack. Make sure that these skbs are not appended to others during >> coalesce or collapse, so the data remains associated with the payload of >> the given skb. > > This seems a very pessimistic change to me. > > Are you planing later to refine this limitation ? > > Surely if a sender sends TSO packet, we allow all the segments > being aggregated at receive side either by GRO or TCP coalescing. This turns off absolutely crucial functional elements of our TCP stack, and will avoid all of the machinery that avoids wastage in TCP packets sitting in the various queues. skb->truesize management, etc. I will not apply these patches with such a non-trivial regression in place for MPTCP streams, sorry.
On Wed, 18 Dec 2019, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 18 Dec 2019 11:50:24 -0800 > >> On 12/17/19 12:38 PM, Mat Martineau wrote: >>> The MPTCP extension data needs to be preserved as it passes through the >>> TCP stack. Make sure that these skbs are not appended to others during >>> coalesce or collapse, so the data remains associated with the payload of >>> the given skb. >> >> This seems a very pessimistic change to me. >> >> Are you planing later to refine this limitation ? >> >> Surely if a sender sends TSO packet, we allow all the segments >> being aggregated at receive side either by GRO or TCP coalescing. > > This turns off absolutely crucial functional elements of our TCP > stack, and will avoid all of the machinery that avoids wastage in TCP > packets sitting in the various queues. skb->truesize management, etc. > > I will not apply these patches with such a non-trivial regression in > place for MPTCP streams, sorry. > Ok, understood. Not every packet has this MPTCP extension data so coalescing was not always turned off, but given the importance of avoiding this memory waste I'll confirm GRO behavior and work on maintaining coalesce/collapse with identical MPTCP extension data. -- Mat Martineau Intel
> Ok, understood. Not every packet has this MPTCP extension data so > coalescing was not always turned off, but given the importance of > avoiding > > this memory waste I'll confirm GRO behavior and work on maintaining > coalesce/collapse with identical MPTCP extension data. > Hi Mat Are identical MPTCP extensions a common case? AFAIK the data sequence number and the subflow sequence number change per packet even in a single stream scenario.
On Wed, 2019-12-18 at 14:36 -0700, subashab@codeaurora.org wrote: > > Ok, understood. Not every packet has this MPTCP extension data so > > coalescing was not always turned off, but given the importance of > > avoiding > > > > this memory waste I'll confirm GRO behavior and work on maintaining > > coalesce/collapse with identical MPTCP extension data. > > > > Hi Mat > > Are identical MPTCP extensions a common case? Yes, they are. > AFAIK the data sequence number and the subflow sequence number change > per packet even in a single stream scenario. What actually change on per packet basis is the TCP sequence number. The DSS mapping can spawn on multiple packets, and will have constand (base) sequence number and length. Cheers, Paolo
On Wed, 2019-12-18 at 12:45 -0800, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 18 Dec 2019 11:50:24 -0800 > > > On 12/17/19 12:38 PM, Mat Martineau wrote: > >> The MPTCP extension data needs to be preserved as it passes through the > >> TCP stack. Make sure that these skbs are not appended to others during > >> coalesce or collapse, so the data remains associated with the payload of > >> the given skb. > > > > This seems a very pessimistic change to me. > > > > Are you planing later to refine this limitation ? > > > > Surely if a sender sends TSO packet, we allow all the segments > > being aggregated at receive side either by GRO or TCP coalescing. > > This turns off absolutely crucial functional elements of our TCP > stack, and will avoid all of the machinery that avoids wastage in TCP > packets sitting in the various queues. skb->truesize management, etc. Thank you for the feedback! Just to clarify, with the code we have currently posted TSO trains of MPTCP packets can be aggregated by the GRO engine almost exactly as currently happens for plain TCP packets. We still have chances to aggregate packets belonging to a MPTCP stream, as not all of them carry a DSS option. We opted to not coalesce at the TCP level for the moment to avoid adding additional hook code inside the coalescing code. If you are ok without such hooks in the initial version, we can handle MPTCP coalescing, too. The real work will likely land in part 2. Would that fit you? Thanks, Paolo
From: Paolo Abeni <pabeni@redhat.com> Date: Wed, 18 Dec 2019 23:15:46 +0100 > Just to clarify, with the code we have currently posted TSO trains of > MPTCP packets can be aggregated by the GRO engine almost exactly as > currently happens for plain TCP packets. > > We still have chances to aggregate packets belonging to a MPTCP stream, > as not all of them carry a DSS option. > > We opted to not coalesce at the TCP level for the moment to avoid > adding additional hook code inside the coalescing code. > > If you are ok without such hooks in the initial version, we can handle > MPTCP coalescing, too. The real work will likely land in part 2. > > Would that fit you? Can't you someone encode the MPTCP metadata so that it will only apply to a range or something like that? Would that help?
diff --git a/include/net/mptcp.h b/include/net/mptcp.h index f9f668ac4339..43ddfdf9e4a3 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -8,6 +8,7 @@ #ifndef __NET_MPTCP_H #define __NET_MPTCP_H +#include <linux/skbuff.h> #include <linux/types.h> /* MPTCP sk_buff extension data */ @@ -24,4 +25,19 @@ struct mptcp_ext { __unused:2; }; +#ifdef CONFIG_MPTCP + +static inline bool mptcp_skb_ext_exist(const struct sk_buff *skb) +{ + return skb_ext_exist(skb, SKB_EXT_MPTCP); +} + +#else + +static inline bool mptcp_skb_ext_exist(const struct sk_buff *skb) +{ + return false; +} + +#endif /* CONFIG_MPTCP */ #endif /* __NET_MPTCP_H */ diff --git a/include/net/tcp.h b/include/net/tcp.h index c82b2f75d024..c483c73b8d41 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -39,6 +39,7 @@ #include <net/tcp_states.h> #include <net/inet_ecn.h> #include <net/dst.h> +#include <net/mptcp.h> #include <linux/seq_file.h> #include <linux/memcontrol.h> @@ -978,6 +979,13 @@ static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) return likely(!TCP_SKB_CB(skb)->eor); } +static inline bool tcp_skb_can_collapse(const struct sk_buff *to, + const struct sk_buff *from) +{ + return likely(tcp_skb_can_collapse_to(to) && + !mptcp_skb_ext_exist(from)); +} + /* Events passed to congestion control interface */ enum tcp_ca_event { CA_EVENT_TX_START, /* first transmit when no packets in flight */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 88b987ca9ebb..55b460a2ece2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1422,7 +1422,7 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb, if ((TCP_SKB_CB(prev)->sacked & TCPCB_TAGBITS) != TCPCB_SACKED_ACKED) goto fallback; - if (!tcp_skb_can_collapse_to(prev)) + if (!tcp_skb_can_collapse(prev, skb)) goto fallback; in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) && @@ -4420,6 +4420,9 @@ static bool tcp_try_coalesce(struct sock *sk, if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq) return false; + if (mptcp_skb_ext_exist(from)) + return false; + #ifdef CONFIG_TLS_DEVICE if (from->decrypted != to->decrypted) return false; @@ -4928,10 +4931,12 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, /* The first skb to collapse is: * - not SYN/FIN and + * - does not include a MPTCP skb extension * - bloated or contains data before "start" or * overlaps to the next one. */ if (!(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) && + !mptcp_skb_ext_exist(skb) && (tcp_win_from_space(sk, skb->truesize) > skb->len || before(TCP_SKB_CB(skb)->seq, start))) { end_of_skbs = false; @@ -4947,7 +4952,7 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, /* Decided to skip this, advance start seq. */ start = TCP_SKB_CB(skb)->end_seq; } - if (end_of_skbs || + if (end_of_skbs || mptcp_skb_ext_exist(skb) || (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) return; @@ -4990,6 +4995,7 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, skb = tcp_collapse_one(sk, skb, list, root); if (!skb || skb == tail || + mptcp_skb_ext_exist(skb) || (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) goto end; #ifdef CONFIG_TLS_DEVICE diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b184f03d7437..9e04d45bc0e4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2854,7 +2854,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, if (!tcp_can_collapse(sk, skb)) break; - if (!tcp_skb_can_collapse_to(to)) + if (!tcp_skb_can_collapse(to, skb)) break; space -= skb->len;
The MPTCP extension data needs to be preserved as it passes through the TCP stack. Make sure that these skbs are not appended to others during coalesce or collapse, so the data remains associated with the payload of the given skb. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> --- include/net/mptcp.h | 16 ++++++++++++++++ include/net/tcp.h | 8 ++++++++ net/ipv4/tcp_input.c | 10 ++++++++-- net/ipv4/tcp_output.c | 2 +- 4 files changed, 33 insertions(+), 3 deletions(-)