Message ID | 1571425340-7082-1-git-send-email-jbaron@akamai.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] tcp: add TCP_INFO status for failed client TFO | expand |
On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@akamai.com> wrote: > > The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether > or not data-in-SYN was ack'd on both the client and server side. We'd like > to gather more information on the client-side in the failure case in order > to indicate the reason for the failure. This can be useful for not only > debugging TFO, but also for creating TFO socket policies. For example, if > a middle box removes the TFO option or drops a data-in-SYN, we can > can detect this case, and turn off TFO for these connections saving the > extra retransmits. > > The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 > states: > > 1) TFO_STATUS_UNSPEC > > catch-all. > > 2) TFO_NO_COOKIE_SENT > > If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie > was sent because we don't have one yet, its not in cache or black-holing > may be enabled (already indicated by the global > LINUX_MIB_TCPFASTOPENBLACKHOLE). > > 3) TFO_NO_SYN_DATA > > Data was sent with SYN, we received a SYN/ACK but it did not cover the data > portion. Cookie is not accepted by server because the cookie may be invalid > or the server may be overloaded. > > > 4) TFO_NO_SYN_DATA_TIMEOUT > > Data was sent with SYN, we received a SYN/ACK which did not cover the data > after at least 1 additional SYN was sent (without data). It may be the case > that a middle-box is dropping data-in-SYN packets. Thus, it would be more > efficient to not use TFO on this connection to avoid extra retransmits > during connection establishment. > > These new fields certainly not cover all the cases where TFO may fail, but > other failures, such as SYN/ACK + data being dropped, will result in the > connection not becoming established. And a connection blackhole after > session establishment shows up as a stalled connection. > > Signed-off-by: Jason Baron <jbaron@akamai.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Christoph Paasch <cpaasch@apple.com> > --- Thanks for adding this! It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect(). > +/* why fastopen failed from client perspective */ > +enum tcp_fastopen_client_fail { > + TFO_STATUS_UNSPEC, /* catch-all */ > + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ > + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to me like this means the client didn't send a SYN+DATA. What about "TFO_DATA_NOT_ACKED", or something like that? If you don't mind, it would be great to cc: Yuchung on the next rev. thanks, neal
Thanks for the patch. Detailed comments below On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardwell@google.com> wrote: > > On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@akamai.com> wrote: > > > > The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether > > or not data-in-SYN was ack'd on both the client and server side. We'd like > > to gather more information on the client-side in the failure case in order > > to indicate the reason for the failure. This can be useful for not only > > debugging TFO, but also for creating TFO socket policies. For example, if > > a middle box removes the TFO option or drops a data-in-SYN, we can > > can detect this case, and turn off TFO for these connections saving the > > extra retransmits. > > > > The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 > > states: > > > > 1) TFO_STATUS_UNSPEC > > > > catch-all. > > > > 2) TFO_NO_COOKIE_SENT > > > > If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie > > was sent because we don't have one yet, its not in cache or black-holing > > may be enabled (already indicated by the global > > LINUX_MIB_TCPFASTOPENBLACKHOLE). It'd be useful to separate the two that cookie is available but is prohibited to use due to BH checking. We've seen users internally get confused due to lack of this info (after seeing cookies from ip metrics). > > > > 3) TFO_NO_SYN_DATA > > > > Data was sent with SYN, we received a SYN/ACK but it did not cover the data > > portion. Cookie is not accepted by server because the cookie may be invalid > > or the server may be overloaded. > > > > > > 4) TFO_NO_SYN_DATA_TIMEOUT > > > > Data was sent with SYN, we received a SYN/ACK which did not cover the data > > after at least 1 additional SYN was sent (without data). It may be the case > > that a middle-box is dropping data-in-SYN packets. Thus, it would be more > > efficient to not use TFO on this connection to avoid extra retransmits > > during connection establishment. > > > > These new fields certainly not cover all the cases where TFO may fail, but > > other failures, such as SYN/ACK + data being dropped, will result in the > > connection not becoming established. And a connection blackhole after > > session establishment shows up as a stalled connection. > > > > Signed-off-by: Jason Baron <jbaron@akamai.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Christoph Paasch <cpaasch@apple.com> > > --- > > Thanks for adding this! > > It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect(). > > > +/* why fastopen failed from client perspective */ > > +enum tcp_fastopen_client_fail { > > + TFO_STATUS_UNSPEC, /* catch-all */ > > + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ > > + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ > > I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to > me like this means the client didn't send a SYN+DATA. What about > "TFO_DATA_NOT_ACKED", or something like that? > > If you don't mind, it would be great to cc: Yuchung on the next rev. TFO_DATA_NOT_ACKED is already available from the inverse of TCPI_OPT_SYN_DATA #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */ It occurs (3)(4) are already available indirectly from TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must query tcpi_total_retrans right after connect/sendto returns which may not be preferred. How about an alternative proposal to the types to catch more TFO issues: TFO_STATUS_UNSPEC TFO_DISABLED_BLACKHOLE_DETECTED TFO_COOKIE_UNAVAILABLE TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4) > > thanks, > neal
On 10/21/19 2:02 PM, Yuchung Cheng wrote: > Thanks for the patch. Detailed comments below > > On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardwell@google.com> wrote: >> >> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@akamai.com> wrote: >>> >>> The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether >>> or not data-in-SYN was ack'd on both the client and server side. We'd like >>> to gather more information on the client-side in the failure case in order >>> to indicate the reason for the failure. This can be useful for not only >>> debugging TFO, but also for creating TFO socket policies. For example, if >>> a middle box removes the TFO option or drops a data-in-SYN, we can >>> can detect this case, and turn off TFO for these connections saving the >>> extra retransmits. >>> >>> The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 >>> states: >>> >>> 1) TFO_STATUS_UNSPEC >>> >>> catch-all. >>> >>> 2) TFO_NO_COOKIE_SENT >>> >>> If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie >>> was sent because we don't have one yet, its not in cache or black-holing >>> may be enabled (already indicated by the global >>> LINUX_MIB_TCPFASTOPENBLACKHOLE). > > It'd be useful to separate the two that cookie is available but is > prohibited to use due to BH checking. We've seen users internally get > confused due to lack of this info (after seeing cookies from ip > metrics). > ok, yeah i had been thinking about splitting these out but thought that the LINUX_MIB_TCPFASTOPENBLACKHOLE counter could help differentiate these cases - but I'm ok making it explicit. >>> >>> 3) TFO_NO_SYN_DATA >>> >>> Data was sent with SYN, we received a SYN/ACK but it did not cover the data >>> portion. Cookie is not accepted by server because the cookie may be invalid >>> or the server may be overloaded. >>> >>> >>> 4) TFO_NO_SYN_DATA_TIMEOUT >>> >>> Data was sent with SYN, we received a SYN/ACK which did not cover the data >>> after at least 1 additional SYN was sent (without data). It may be the case >>> that a middle-box is dropping data-in-SYN packets. Thus, it would be more >>> efficient to not use TFO on this connection to avoid extra retransmits >>> during connection establishment. >>> >>> These new fields certainly not cover all the cases where TFO may fail, but >>> other failures, such as SYN/ACK + data being dropped, will result in the >>> connection not becoming established. And a connection blackhole after >>> session establishment shows up as a stalled connection. >>> >>> Signed-off-by: Jason Baron <jbaron@akamai.com> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Neal Cardwell <ncardwell@google.com> >>> Cc: Christoph Paasch <cpaasch@apple.com> >>> --- >> >> Thanks for adding this! >> >> It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect(). >> >>> +/* why fastopen failed from client perspective */ >>> +enum tcp_fastopen_client_fail { >>> + TFO_STATUS_UNSPEC, /* catch-all */ >>> + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ >>> + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ >> >> I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to >> me like this means the client didn't send a SYN+DATA. What about >> "TFO_DATA_NOT_ACKED", or something like that? >> >> If you don't mind, it would be great to cc: Yuchung on the next rev. > TFO_DATA_NOT_ACKED is already available from the inverse of TCPI_OPT_SYN_DATA > #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */ > > It occurs (3)(4) are already available indirectly from > TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must > query tcpi_total_retrans right after connect/sendto returns which may > not be preferred. > > How about an alternative proposal to the types to catch more TFO issues: > > TFO_STATUS_UNSPEC > TFO_DISABLED_BLACKHOLE_DETECTED > TFO_COOKIE_UNAVAILABLE > TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4) Ok, that set works for me. I will re-spin with these states for v2. Thanks for the suggestion! -Jason
Hello Jason, On Sat, Oct 19, 2019 at 11:10 AM Jason Baron <jbaron@akamai.com> wrote: > The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether > or not data-in-SYN was ack'd on both the client and server side. We'd like > to gather more information on the client-side in the failure case in order > to indicate the reason for the failure. This can be useful for not only > debugging TFO, but also for creating TFO socket policies. For example, if > a middle box removes the TFO option or drops a data-in-SYN, we can > can detect this case, and turn off TFO for these connections saving the > extra retransmits. > > The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 > states: > > 1) TFO_STATUS_UNSPEC > > catch-all. > > 2) TFO_NO_COOKIE_SENT > > If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie > was sent because we don't have one yet, its not in cache or black-holing > may be enabled (already indicated by the global > LINUX_MIB_TCPFASTOPENBLACKHOLE). > > 3) TFO_NO_SYN_DATA > > Data was sent with SYN, we received a SYN/ACK but it did not cover the data > portion. Cookie is not accepted by server because the cookie may be invalid > or the server may be overloaded. > > > 4) TFO_NO_SYN_DATA_TIMEOUT > > Data was sent with SYN, we received a SYN/ACK which did not cover the data > after at least 1 additional SYN was sent (without data). It may be the case > that a middle-box is dropping data-in-SYN packets. Thus, it would be more > efficient to not use TFO on this connection to avoid extra retransmits > during connection establishment. > > These new fields certainly not cover all the cases where TFO may fail, but > other failures, such as SYN/ACK + data being dropped, will result in the > connection not becoming established. And a connection blackhole after > session establishment shows up as a stalled connection. I'm curious what would be the arguments compared to creating a new getsockopt() to fetch this data? Thanks,
On 10/21/19 11:49 AM, William Dauchy wrote: > I'm curious what would be the arguments compared to creating a new > getsockopt() to fetch this data? Reporting tsval/tsecr values were not well defined and seemed quite experimental to me. TCP fastopen would use 2 unused bits, not real extra cost here. This is persistent information after the connect(), while your tsval/tsecr report seems quite dynamic stuff can be stale as soon as the info is fetched from the kernel.
Hello Eric, On Mon, Oct 21, 2019 at 9:02 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > Reporting tsval/tsecr values were not well defined and seemed quite experimental to me. > > TCP fastopen would use 2 unused bits, not real extra cost here. > > This is persistent information after the connect(), while your tsval/tsecr report > seems quite dynamic stuff can be stale as soon as the info is fetched from the kernel. Thanks for your answer. I better understand the context.
On 21/10/19 - 14:27:24, Jason Baron wrote: > > > On 10/21/19 2:02 PM, Yuchung Cheng wrote: > > Thanks for the patch. Detailed comments below > > > > On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardwell@google.com> wrote: > >> > >> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@akamai.com> wrote: > >>> > >>> The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether > >>> or not data-in-SYN was ack'd on both the client and server side. We'd like > >>> to gather more information on the client-side in the failure case in order > >>> to indicate the reason for the failure. This can be useful for not only > >>> debugging TFO, but also for creating TFO socket policies. For example, if > >>> a middle box removes the TFO option or drops a data-in-SYN, we can > >>> can detect this case, and turn off TFO for these connections saving the > >>> extra retransmits. > >>> > >>> The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 > >>> states: > >>> > >>> 1) TFO_STATUS_UNSPEC > >>> > >>> catch-all. > >>> > >>> 2) TFO_NO_COOKIE_SENT > >>> > >>> If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie > >>> was sent because we don't have one yet, its not in cache or black-holing > >>> may be enabled (already indicated by the global > >>> LINUX_MIB_TCPFASTOPENBLACKHOLE). > > > > It'd be useful to separate the two that cookie is available but is > > prohibited to use due to BH checking. We've seen users internally get > > confused due to lack of this info (after seeing cookies from ip > > metrics). > > > > ok, yeah i had been thinking about splitting these out but thought that > the LINUX_MIB_TCPFASTOPENBLACKHOLE counter could help differentiate > these cases - but I'm ok making it explicit. > > >>> > >>> 3) TFO_NO_SYN_DATA > >>> > >>> Data was sent with SYN, we received a SYN/ACK but it did not cover the data > >>> portion. Cookie is not accepted by server because the cookie may be invalid > >>> or the server may be overloaded. > >>> > >>> > >>> 4) TFO_NO_SYN_DATA_TIMEOUT > >>> > >>> Data was sent with SYN, we received a SYN/ACK which did not cover the data > >>> after at least 1 additional SYN was sent (without data). It may be the case > >>> that a middle-box is dropping data-in-SYN packets. Thus, it would be more > >>> efficient to not use TFO on this connection to avoid extra retransmits > >>> during connection establishment. > >>> > >>> These new fields certainly not cover all the cases where TFO may fail, but > >>> other failures, such as SYN/ACK + data being dropped, will result in the > >>> connection not becoming established. And a connection blackhole after > >>> session establishment shows up as a stalled connection. > >>> > >>> Signed-off-by: Jason Baron <jbaron@akamai.com> > >>> Cc: Eric Dumazet <edumazet@google.com> > >>> Cc: Neal Cardwell <ncardwell@google.com> > >>> Cc: Christoph Paasch <cpaasch@apple.com> > >>> --- > >> > >> Thanks for adding this! > >> > >> It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect(). > >> > >>> +/* why fastopen failed from client perspective */ > >>> +enum tcp_fastopen_client_fail { > >>> + TFO_STATUS_UNSPEC, /* catch-all */ > >>> + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ > >>> + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ > >> > >> I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to > >> me like this means the client didn't send a SYN+DATA. What about > >> "TFO_DATA_NOT_ACKED", or something like that? > >> > >> If you don't mind, it would be great to cc: Yuchung on the next rev. > > TFO_DATA_NOT_ACKED is already available from the inverse of TCPI_OPT_SYN_DATA > > #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */ > > > > It occurs (3)(4) are already available indirectly from > > TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must > > query tcpi_total_retrans right after connect/sendto returns which may > > not be preferred. > > > > How about an alternative proposal to the types to catch more TFO issues: > > > > TFO_STATUS_UNSPEC > > TFO_DISABLED_BLACKHOLE_DETECTED > > TFO_COOKIE_UNAVAILABLE > > TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4) > > Ok, that set works for me. I will re-spin with these states for v2. > Thanks for the suggestion! Actually, longterm I hope we would be able to get rid of the blackhole-detection and fallback heuristics. In a far distant future where these middleboxes have been weeded out ;-) So, do we really want to eternalize this as part of the API in tcp_info ? Christoph
On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: > > Actually, longterm I hope we would be able to get rid of the > blackhole-detection and fallback heuristics. In a far distant future where > these middleboxes have been weeded out ;-) > > So, do we really want to eternalize this as part of the API in tcp_info ? > A getsockopt() option won't be available for netlink diag (ss command). So it all depends on plans to use this FASTOPEN information ?
On 10/21/19 4:36 PM, Eric Dumazet wrote: > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: >> > >> Actually, longterm I hope we would be able to get rid of the >> blackhole-detection and fallback heuristics. In a far distant future where >> these middleboxes have been weeded out ;-) >> >> So, do we really want to eternalize this as part of the API in tcp_info ? >> > > A getsockopt() option won't be available for netlink diag (ss command). > > So it all depends on plans to use this FASTOPEN information ? > The original proposal I had 4 states of interest: First, we are interested in knowing when a socket has TFO set but actually requires a retransmit of a non-TFO syn to become established. In this case, we'd be better off not using TFO. A second case is when the server immediately rejects the DATA and just acks the syn (but not the data). Again in that case, we don't want to be sending syn+data. The third case was whether or not we sent a cookie. Perhaps, the server doesn't have TFO enabled in which case, it really doesn't make make sense to enable TFO in the first place. Or if one also controls the server its helpful in understanding if the server is mis-configured. So that was the motivation I had for the original four states that I proposed (final state was a catch-all state). Yuchung suggested dividing the 3rd case into 2 for - no cookie sent because of blackhole or no cookie sent because its not in cache. And then dropping the second state because we already have the TCPI_OPT_SYN_DATA bit. However, the TCPI_OPT_SYN_DATA may not be set because we may fallback in tcp_send_syn_data() due to send failure. So I'm inclined to say that the second state is valuable. And since blackhole is already a global thing via MIB, I didn't see a strong need for it. But it sounded like Yuchung had an interest in it, and I'd obviously like a set of states that is generally useful. Thanks, -Jason
On Mon, Oct 21, 2019 at 5:11 PM Jason Baron <jbaron@akamai.com> wrote: > > > > On 10/21/19 4:36 PM, Eric Dumazet wrote: > > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: > >> > > > >> Actually, longterm I hope we would be able to get rid of the > >> blackhole-detection and fallback heuristics. In a far distant future where > >> these middleboxes have been weeded out ;-) > >> > >> So, do we really want to eternalize this as part of the API in tcp_info ? > >> > > > > A getsockopt() option won't be available for netlink diag (ss command). > > > > So it all depends on plans to use this FASTOPEN information ? > > > > The original proposal I had 4 states of interest: > > First, we are interested in knowing when a socket has TFO set but > actually requires a retransmit of a non-TFO syn to become established. > In this case, we'd be better off not using TFO. > > A second case is when the server immediately rejects the DATA and just > acks the syn (but not the data). Again in that case, we don't want to be > sending syn+data. > > The third case was whether or not we sent a cookie. Perhaps, the server > doesn't have TFO enabled in which case, it really doesn't make make > sense to enable TFO in the first place. Or if one also controls the > server its helpful in understanding if the server is mis-configured. So > that was the motivation I had for the original four states that I > proposed (final state was a catch-all state). > > Yuchung suggested dividing the 3rd case into 2 for - no cookie sent > because of blackhole or no cookie sent because its not in cache. And > then dropping the second state because we already have the > TCPI_OPT_SYN_DATA bit. However, the TCPI_OPT_SYN_DATA may not be set > because we may fallback in tcp_send_syn_data() due to send failure. So > I'm inclined to say that the second state is valuable. And since > blackhole is already a global thing via MIB, I didn't see a strong need > for it. But it sounded like Yuchung had an interest in it, and I'd > obviously like a set of states that is generally useful. I have not kept up with all the proposals in this thread, but would it work to include all of the cases proposed in this thread? It seems like there are enough bits available in holes in tcp_sock and tcp_info to have up to 7 bits of information saved in tcp_sock and exported to tcp_info. That seems like more than enough? The pahole tool shows the following small holes in tcp_sock: u8 compressed_ack; /* 1530 1 */ /* XXX 1 byte hole, try to pack */ u32 chrono_start; /* 1532 4 */ ... u8 bpf_sock_ops_cb_flags; /* 2076 1 */ /* XXX 3 bytes hole, try to pack */ u32 rcv_ooopack; /* 2080 4 */ ...and the following hole in tcp_info: __u8 tcpi_delivery_rate_app_limited:1; /* 7: 7 1 */ /* XXX 7 bits hole, try to pack */ __u32 tcpi_rto; /* 8 4 */ cheers, neal
On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell <ncardwell@google.com> wrote: > > On Mon, Oct 21, 2019 at 5:11 PM Jason Baron <jbaron@akamai.com> wrote: > > > > > > > > On 10/21/19 4:36 PM, Eric Dumazet wrote: > > > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: > > >> > > > > > >> Actually, longterm I hope we would be able to get rid of the > > >> blackhole-detection and fallback heuristics. In a far distant future where > > >> these middleboxes have been weeded out ;-) > > >> > > >> So, do we really want to eternalize this as part of the API in tcp_info ? > > >> > > > > > > A getsockopt() option won't be available for netlink diag (ss command). > > > > > > So it all depends on plans to use this FASTOPEN information ? > > > > > > > The original proposal I had 4 states of interest: > > > > First, we are interested in knowing when a socket has TFO set but > > actually requires a retransmit of a non-TFO syn to become established. > > In this case, we'd be better off not using TFO. > > > > A second case is when the server immediately rejects the DATA and just > > acks the syn (but not the data). Again in that case, we don't want to be > > sending syn+data. > > > > The third case was whether or not we sent a cookie. Perhaps, the server > > doesn't have TFO enabled in which case, it really doesn't make make > > sense to enable TFO in the first place. Or if one also controls the > > server its helpful in understanding if the server is mis-configured. So > > that was the motivation I had for the original four states that I > > proposed (final state was a catch-all state). > > > > Yuchung suggested dividing the 3rd case into 2 for - no cookie sent > > because of blackhole or no cookie sent because its not in cache. And > > then dropping the second state because we already have the > > TCPI_OPT_SYN_DATA bit. However, the TCPI_OPT_SYN_DATA may not be set > > because we may fallback in tcp_send_syn_data() due to send failure. So but sendto would return -1 w/ EINPROGRESS in this case already so the application shouldn't expect TCPI_OPT_SYN_DATA? > > I'm inclined to say that the second state is valuable. And since > > blackhole is already a global thing via MIB, I didn't see a strong need > > for it. But it sounded like Yuchung had an interest in it, and I'd > > obviously like a set of states that is generally useful. > > I have not kept up with all the proposals in this thread, but would it > work to include all of the cases proposed in this thread? It seems > like there are enough bits available in holes in tcp_sock and tcp_info > to have up to 7 bits of information saved in tcp_sock and exported to > tcp_info. That seems like more than enough? I would rather use only at most 2-bits for TFO errors to be parsimonious on (bloated) tcp sock. I don't mind if the next patch skip my idea of BH detection. my experience is reading host global stats for most applications are a hassle (or sometimes not even feasible). they mostly care about information of their own sockets. > > The pahole tool shows the following small holes in tcp_sock: > > u8 compressed_ack; /* 1530 1 */ > > /* XXX 1 byte hole, try to pack */ > > u32 chrono_start; /* 1532 4 */ > ... > u8 bpf_sock_ops_cb_flags; /* 2076 1 */ > > /* XXX 3 bytes hole, try to pack */ > > u32 rcv_ooopack; /* 2080 4 */ > > ...and the following hole in tcp_info: > __u8 tcpi_delivery_rate_app_limited:1; > /* 7: 7 1 */ > > /* XXX 7 bits hole, try to pack */ > > __u32 tcpi_rto; /* 8 4 */ > > cheers, > neal
On 10/22/19 2:17 PM, Yuchung Cheng wrote: > On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell <ncardwell@google.com> wrote: >> >> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron <jbaron@akamai.com> wrote: >>> >>> >>> >>> On 10/21/19 4:36 PM, Eric Dumazet wrote: >>>> On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: >>>>> >>>> >>>>> Actually, longterm I hope we would be able to get rid of the >>>>> blackhole-detection and fallback heuristics. In a far distant future where >>>>> these middleboxes have been weeded out ;-) >>>>> >>>>> So, do we really want to eternalize this as part of the API in tcp_info ? >>>>> >>>> >>>> A getsockopt() option won't be available for netlink diag (ss command). >>>> >>>> So it all depends on plans to use this FASTOPEN information ? >>>> >>> >>> The original proposal I had 4 states of interest: >>> >>> First, we are interested in knowing when a socket has TFO set but >>> actually requires a retransmit of a non-TFO syn to become established. >>> In this case, we'd be better off not using TFO. >>> >>> A second case is when the server immediately rejects the DATA and just >>> acks the syn (but not the data). Again in that case, we don't want to be >>> sending syn+data. >>> >>> The third case was whether or not we sent a cookie. Perhaps, the server >>> doesn't have TFO enabled in which case, it really doesn't make make >>> sense to enable TFO in the first place. Or if one also controls the >>> server its helpful in understanding if the server is mis-configured. So >>> that was the motivation I had for the original four states that I >>> proposed (final state was a catch-all state). >>> >>> Yuchung suggested dividing the 3rd case into 2 for - no cookie sent >>> because of blackhole or no cookie sent because its not in cache. And >>> then dropping the second state because we already have the >>> TCPI_OPT_SYN_DATA bit. However, the TCPI_OPT_SYN_DATA may not be set >>> because we may fallback in tcp_send_syn_data() due to send failure. So > but sendto would return -1 w/ EINPROGRESS in this case already so the > application shouldn't expect TCPI_OPT_SYN_DATA? Ok, but let's say the sk_stream_alloc_skb() fails in tcp_send_syn_data(), in that case we aren't going to send a TFO cookie (just a regular syn). The user isn't going to get any error and would expect TCPI_OPT_SYN_DATA. Now, TCPI_OPT_SYN_DATA wouldn't be set but we can't assume then that a SYN+data was sent and the SYN_ACK didn't cover the data part. Instead, the reason for failure is really -ENOMEM, which in the proposed states would fall into TFO_STATUS_UNSPEC, but it does mean that I think we shouldn't have a TFO_DATA_NOT_ACKED state otherwise we can't differentiate the two. > > >>> I'm inclined to say that the second state is valuable. And since >>> blackhole is already a global thing via MIB, I didn't see a strong need >>> for it. But it sounded like Yuchung had an interest in it, and I'd >>> obviously like a set of states that is generally useful. >> >> I have not kept up with all the proposals in this thread, but would it >> work to include all of the cases proposed in this thread? It seems >> like there are enough bits available in holes in tcp_sock and tcp_info >> to have up to 7 bits of information saved in tcp_sock and exported to >> tcp_info. That seems like more than enough? > I would rather use only at most 2-bits for TFO errors to be > parsimonious on (bloated) tcp sock. I don't mind if the next patch > skip my idea of BH detection. > my experience is reading host global stats for most applications are a > hassle (or sometimes not even feasible). they mostly care about > information of their own sockets. hmmm, if you are ok without having the BH failure state which Christoph also pushed back on a bit, but we still only want 2 bits as you suggested how about: 1) TFO_STATUS_UNSPEC - includes black hole state and other failures 2) TFO_COOKIE_UNAVAILABLE - we don't have a cookie in the cache 3) TFO_DATA_NOT_ACKED - syn+data sent but no ack for data part 4) TFO_SYN_RETRANSMITTED - same as 3 but at least 1 additional syn sent Thanks, -Jason
On Tue, Oct 22, 2019 at 12:34 PM Jason Baron <jbaron@akamai.com> wrote: > > > > On 10/22/19 2:17 PM, Yuchung Cheng wrote: > > On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell <ncardwell@google.com> wrote: > >> > >> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron <jbaron@akamai.com> wrote: > >>> > >>> > >>> > >>> On 10/21/19 4:36 PM, Eric Dumazet wrote: > >>>> On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch <cpaasch@apple.com> wrote: > >>>>> > >>>> > >>>>> Actually, longterm I hope we would be able to get rid of the > >>>>> blackhole-detection and fallback heuristics. In a far distant future where > >>>>> these middleboxes have been weeded out ;-) > >>>>> > >>>>> So, do we really want to eternalize this as part of the API in tcp_info ? > >>>>> > >>>> > >>>> A getsockopt() option won't be available for netlink diag (ss command). > >>>> > >>>> So it all depends on plans to use this FASTOPEN information ? > >>>> > >>> > >>> The original proposal I had 4 states of interest: > >>> > >>> First, we are interested in knowing when a socket has TFO set but > >>> actually requires a retransmit of a non-TFO syn to become established. > >>> In this case, we'd be better off not using TFO. > >>> > >>> A second case is when the server immediately rejects the DATA and just > >>> acks the syn (but not the data). Again in that case, we don't want to be > >>> sending syn+data. > >>> > >>> The third case was whether or not we sent a cookie. Perhaps, the server > >>> doesn't have TFO enabled in which case, it really doesn't make make > >>> sense to enable TFO in the first place. Or if one also controls the > >>> server its helpful in understanding if the server is mis-configured. So > >>> that was the motivation I had for the original four states that I > >>> proposed (final state was a catch-all state). > >>> > >>> Yuchung suggested dividing the 3rd case into 2 for - no cookie sent > >>> because of blackhole or no cookie sent because its not in cache. And > >>> then dropping the second state because we already have the > >>> TCPI_OPT_SYN_DATA bit. However, the TCPI_OPT_SYN_DATA may not be set > >>> because we may fallback in tcp_send_syn_data() due to send failure. So > > but sendto would return -1 w/ EINPROGRESS in this case already so the > > application shouldn't expect TCPI_OPT_SYN_DATA? > > Ok, but let's say the sk_stream_alloc_skb() fails in > tcp_send_syn_data(), in that case we aren't going to send a TFO cookie > (just a regular syn). The user isn't going to get any error and would > expect TCPI_OPT_SYN_DATA. Now, TCPI_OPT_SYN_DATA wouldn't be set but we > can't assume then that a SYN+data was sent and the SYN_ACK didn't cover > the data part. Instead, the reason for failure is really -ENOMEM, which > in the proposed states would fall into TFO_STATUS_UNSPEC, but it does > mean that I think we shouldn't have a TFO_DATA_NOT_ACKED state otherwise > we can't differentiate the two. > > > > > > >>> I'm inclined to say that the second state is valuable. And since > >>> blackhole is already a global thing via MIB, I didn't see a strong need > >>> for it. But it sounded like Yuchung had an interest in it, and I'd > >>> obviously like a set of states that is generally useful. > >> > >> I have not kept up with all the proposals in this thread, but would it > >> work to include all of the cases proposed in this thread? It seems > >> like there are enough bits available in holes in tcp_sock and tcp_info > >> to have up to 7 bits of information saved in tcp_sock and exported to > >> tcp_info. That seems like more than enough? > > I would rather use only at most 2-bits for TFO errors to be > > parsimonious on (bloated) tcp sock. I don't mind if the next patch > > skip my idea of BH detection. > > my experience is reading host global stats for most applications are a > > hassle (or sometimes not even feasible). they mostly care about > > information of their own sockets. > > hmmm, if you are ok without having the BH failure state which Christoph > also pushed back on a bit, but we still only want 2 bits as you > suggested how about: > > 1) TFO_STATUS_UNSPEC - includes black hole state and other failures > 2) TFO_COOKIE_UNAVAILABLE - we don't have a cookie in the cache > 3) TFO_DATA_NOT_ACKED - syn+data sent but no ack for data part > 4) TFO_SYN_RETRANSMITTED - same as 3 but at least 1 additional syn sent sounds good to me. thanks > > Thanks, > > -Jason
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 99617e5..7790f28 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -223,7 +223,7 @@ struct tcp_sock { fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */ fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */ is_sack_reneg:1, /* in recovery from loss with SACK reneg? */ - unused:2; + fastopen_client_fail:2; /* reason why fastopen failed */ u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto : 1,/* Use linear timeouts for thin streams */ recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 81e6979..dbee3ed 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -155,6 +155,14 @@ enum { TCP_QUEUES_NR, }; +/* why fastopen failed from client perspective */ +enum tcp_fastopen_client_fail { + TFO_STATUS_UNSPEC, /* catch-all */ + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ + TFO_NO_SYN_DATA_TIMEOUT /* SYN-ACK did not ack SYN data after timeout */ +}; + /* for TCP_INFO socket option */ #define TCPI_OPT_TIMESTAMPS 1 #define TCPI_OPT_SACK 2 @@ -211,7 +219,7 @@ struct tcp_info { __u8 tcpi_backoff; __u8 tcpi_options; __u8 tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4; - __u8 tcpi_delivery_rate_app_limited:1; + __u8 tcpi_delivery_rate_app_limited:1, tcpi_fastopen_client_fail:2; __u32 tcpi_rto; __u32 tcpi_ato; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9f41a76..764f623 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3296,6 +3296,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_reord_seen = tp->reord_seen; info->tcpi_rcv_ooopack = tp->rcv_ooopack; info->tcpi_snd_wnd = tp->snd_wnd; + info->tcpi_fastopen_client_fail = tp->fastopen_client_fail; unlock_sock_fast(sk, slow); } EXPORT_SYMBOL_GPL(tcp_get_info); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 3fd4512..d88286d 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -413,7 +413,7 @@ bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss, /* Firewall blackhole issue check */ if (tcp_fastopen_active_should_disable(sk)) { cookie->len = -1; - return false; + goto no_cookie; } dst = __sk_dst_get(sk); @@ -422,7 +422,12 @@ bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss, cookie->len = -1; return true; } - return cookie->len > 0; + if (cookie->len > 0) + return true; + +no_cookie: + tcp_sk(sk)->fastopen_client_fail = TFO_NO_COOKIE_SENT; + return false; } /* This function checks if we want to defer sending SYN until the first diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 3578357a..357f757 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5819,6 +5819,11 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, tcp_rearm_rto(sk); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENACTIVEFAIL); + if (syn_drop) + tp->fastopen_client_fail = TFO_NO_SYN_DATA_TIMEOUT; + else + tp->fastopen_client_fail = TFO_NO_SYN_DATA; + return true; } tp->syn_data_acked = tp->syn_data;
The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether or not data-in-SYN was ack'd on both the client and server side. We'd like to gather more information on the client-side in the failure case in order to indicate the reason for the failure. This can be useful for not only debugging TFO, but also for creating TFO socket policies. For example, if a middle box removes the TFO option or drops a data-in-SYN, we can can detect this case, and turn off TFO for these connections saving the extra retransmits. The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 states: 1) TFO_STATUS_UNSPEC catch-all. 2) TFO_NO_COOKIE_SENT If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie was sent because we don't have one yet, its not in cache or black-holing may be enabled (already indicated by the global LINUX_MIB_TCPFASTOPENBLACKHOLE). 3) TFO_NO_SYN_DATA Data was sent with SYN, we received a SYN/ACK but it did not cover the data portion. Cookie is not accepted by server because the cookie may be invalid or the server may be overloaded. 4) TFO_NO_SYN_DATA_TIMEOUT Data was sent with SYN, we received a SYN/ACK which did not cover the data after at least 1 additional SYN was sent (without data). It may be the case that a middle-box is dropping data-in-SYN packets. Thus, it would be more efficient to not use TFO on this connection to avoid extra retransmits during connection establishment. These new fields certainly not cover all the cases where TFO may fail, but other failures, such as SYN/ACK + data being dropped, will result in the connection not becoming established. And a connection blackhole after session establishment shows up as a stalled connection. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Christoph Paasch <cpaasch@apple.com> --- include/linux/tcp.h | 2 +- include/uapi/linux/tcp.h | 10 +++++++++- net/ipv4/tcp.c | 1 + net/ipv4/tcp_fastopen.c | 9 +++++++-- net/ipv4/tcp_input.c | 5 +++++ 5 files changed, 23 insertions(+), 4 deletions(-)