Message ID | 1351238750-13611-1-git-send-email-subramanian.vijay@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2012-10-26 at 01:05 -0700, Vijay Subramanian wrote: > Elliott Hughes <enh@google.com> saw strange behavior when server socket was not > calling accept(). Client was receiving SYN-ACK back even when socket on server > side was not yet available. Eric noted server sockets kept resending SYN_ACKS > and further investigation revealed the following problem. > > If server socket is slow to accept() connections, request_socks can represent > connections for which the three-way handshake is already done. From client's > point of view, the connection is in ESTABLISHED state but on server side, socket > is not in accept_queue or ESTABLISHED state. When the syn-ack timer expires, > because of the order in which tests are performed, server can retransmit the > synack repeatedly. Following patch prevents the server from retransmitting the > synack needlessly (and prevents client from replying with ack). This reduces > traffic when server is slow to accept() connections. > > If the server socket has received the third ack during connection establishment, > this is remembered in inet_rsk(req)->acked. The request_sock will expire in > around 30 seconds and will be dropped if it does not move into accept_queue. > > With help from Eric Dumazet. > > Reported-by: Eric Dumazet <edumazet@google.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Tested-by: Neal Cardwell <ncardwell@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> > --- > Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and > Tested-by tags. > > Ignoring "WARNING: line over 80 characters" in the interest of readability. > > net/ipv4/inet_connection_sock.c | 5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index d34ce29..4e8e52e 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > &expire, &resend); > req->rsk_ops->syn_ack_timeout(parent, req); > if (!expire && > - (!resend || > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > - inet_rsk(req)->acked)) { > + (!resend || inet_rsk(req)->acked || > + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { > unsigned long timeo; > > if (req->retrans++ == 0) Thanks Vijay ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, 26 Oct 2012, Vijay Subramanian wrote: > Elliott Hughes <enh@google.com> saw strange behavior when server socket was not > calling accept(). Client was receiving SYN-ACK back even when socket on server > side was not yet available. Eric noted server sockets kept resending SYN_ACKS > and further investigation revealed the following problem. > > If server socket is slow to accept() connections, request_socks can represent > connections for which the three-way handshake is already done. From client's > point of view, the connection is in ESTABLISHED state but on server side, socket > is not in accept_queue or ESTABLISHED state. When the syn-ack timer expires, > because of the order in which tests are performed, server can retransmit the > synack repeatedly. Following patch prevents the server from retransmitting the > synack needlessly (and prevents client from replying with ack). This reduces > traffic when server is slow to accept() connections. > > If the server socket has received the third ack during connection establishment, > this is remembered in inet_rsk(req)->acked. The request_sock will expire in > around 30 seconds and will be dropped if it does not move into accept_queue. > > With help from Eric Dumazet. > > Reported-by: Eric Dumazet <edumazet@google.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Tested-by: Neal Cardwell <ncardwell@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> > --- > Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and > Tested-by tags. > > Ignoring "WARNING: line over 80 characters" in the interest of readability. > > net/ipv4/inet_connection_sock.c | 5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index d34ce29..4e8e52e 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > &expire, &resend); > req->rsk_ops->syn_ack_timeout(parent, req); > if (!expire && > - (!resend || > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > - inet_rsk(req)->acked)) { > + (!resend || inet_rsk(req)->acked || Wait a minute, this can cause problem at least for the TCP_DEFER_ACCEPT mode. It is supposed to timeout in SYN_RECV state if after silence period (no retransmissions) and some final retransmissions (until max_retries) client still does not send data - the request should be expired without notifying the listener. So, the logic in syn_ack_recalc() was tuned to resend after the TCP_DEFER_ACCEPT period. This patch stops such resends after TCP_DEFER_ACCEPT period. May be the change should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT? Lets analyze the default case without TCP_DEFER_ACCEPT. Think for protocols like SMTP where server sends welcome message. This patch stops SYN-ACK resends, client sends one ACK (which server drops) and enters EST state. Client is waiting for welcome message in EST state while server is waiting silently for ACK message to create child socket. No progress, may but timeout error in client. Is the patch safe for such case? Is there a logic that creates child socket from request if the dropped ACK was the last message from client? It must not do it for TCP_DEFER_ACCEPT. > + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { > unsigned long timeo; > > if (req->retrans++ == 0) > -- > 1.7.0.4 > > -- Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2012-10-27 at 00:30 +0300, Julian Anastasov wrote: > Hello, > > On Fri, 26 Oct 2012, Vijay Subramanian wrote: > > > Elliott Hughes <enh@google.com> saw strange behavior when server socket was not > > calling accept(). Client was receiving SYN-ACK back even when socket on server > > side was not yet available. Eric noted server sockets kept resending SYN_ACKS > > and further investigation revealed the following problem. > > > > If server socket is slow to accept() connections, request_socks can represent > > connections for which the three-way handshake is already done. From client's > > point of view, the connection is in ESTABLISHED state but on server side, socket > > is not in accept_queue or ESTABLISHED state. When the syn-ack timer expires, > > because of the order in which tests are performed, server can retransmit the > > synack repeatedly. Following patch prevents the server from retransmitting the > > synack needlessly (and prevents client from replying with ack). This reduces > > traffic when server is slow to accept() connections. > > > > If the server socket has received the third ack during connection establishment, > > this is remembered in inet_rsk(req)->acked. The request_sock will expire in > > around 30 seconds and will be dropped if it does not move into accept_queue. > > > > With help from Eric Dumazet. > > > > Reported-by: Eric Dumazet <edumazet@google.com> > > Acked-by: Neal Cardwell <ncardwell@google.com> > > Tested-by: Neal Cardwell <ncardwell@google.com> > > Acked-by: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> > > --- > > Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and > > Tested-by tags. > > > > Ignoring "WARNING: line over 80 characters" in the interest of readability. > > > > net/ipv4/inet_connection_sock.c | 5 ++--- > > 1 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index d34ce29..4e8e52e 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > > &expire, &resend); > > req->rsk_ops->syn_ack_timeout(parent, req); > > if (!expire && > > - (!resend || > > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > > - inet_rsk(req)->acked)) { > > + (!resend || inet_rsk(req)->acked || > > Wait a minute, this can cause problem at least > for the TCP_DEFER_ACCEPT mode. It is supposed to timeout > in SYN_RECV state if after silence period (no retransmissions) > and some final retransmissions (until max_retries) client > still does not send data - the request should be expired > without notifying the listener. > > So, the logic in syn_ack_recalc() was tuned to resend > after the TCP_DEFER_ACCEPT period. This patch stops such > resends after TCP_DEFER_ACCEPT period. May be the change > should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT? > > Lets analyze the default case without TCP_DEFER_ACCEPT. > > Think for protocols like SMTP where server sends > welcome message. This patch stops SYN-ACK resends, client > sends one ACK (which server drops) and enters EST state. > Client is waiting for welcome message in EST state while > server is waiting silently for ACK message to create child > socket. No progress, may but timeout error in client. > > Is the patch safe for such case? Is there a logic > that creates child socket from request if the dropped ACK > was the last message from client? It must not do it for > TCP_DEFER_ACCEPT. I see no impact with TCP_DEFER_ACCEPT handling. TCP_DEFER_ACCEPT is quite different from this stuff The 3WHS is completed, and the socket is ready. But its not delivered to the accept() (listener) until we receive a DATA frame (or defer timeout elapsed) We dont resend SYNACK messages for them. We just wait the 4th packet. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, 26 Oct 2012, Eric Dumazet wrote: > On Sat, 2012-10-27 at 00:30 +0300, Julian Anastasov wrote: > > > > Wait a minute, this can cause problem at least > > for the TCP_DEFER_ACCEPT mode. It is supposed to timeout > > in SYN_RECV state if after silence period (no retransmissions) > > and some final retransmissions (until max_retries) client > > still does not send data - the request should be expired > > without notifying the listener. > > > > So, the logic in syn_ack_recalc() was tuned to resend > > after the TCP_DEFER_ACCEPT period. This patch stops such > > resends after TCP_DEFER_ACCEPT period. May be the change > > should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT? > > > > Lets analyze the default case without TCP_DEFER_ACCEPT. > > > > Think for protocols like SMTP where server sends > > welcome message. This patch stops SYN-ACK resends, client > > sends one ACK (which server drops) and enters EST state. > > Client is waiting for welcome message in EST state while > > server is waiting silently for ACK message to create child > > socket. No progress, may but timeout error in client. > > > > Is the patch safe for such case? Is there a logic > > that creates child socket from request if the dropped ACK > > was the last message from client? It must not do it for > > TCP_DEFER_ACCEPT. > > > I see no impact with TCP_DEFER_ACCEPT handling. > > TCP_DEFER_ACCEPT is quite different from this stuff > > The 3WHS is completed, and the socket is ready. > > But its not delivered to the accept() (listener) until we receive a DATA > frame (or defer timeout elapsed) > > We dont resend SYNACK messages for them. We just wait the 4th packet. We have 3 general cases: 1. HTTP-kind of protocol: client sends first, server without TCP_DEFER_ACCEPT Server should retransmit but anyways client will send packet (ACK) that will move request_sock into child socket. 2. HTTP-kind of protocol: client sends first, server with TCP_DEFER_ACCEPT Server retransmits before 3WHS to get ACK from client. After ACK we keep request_sock because we do not want to wakeup listener without data. During TCP_DEFER_ACCEPT period nothing is retransmitted because we have ACK from client. After TCP_DEFER_ACCEPT period we start retransmissions because the server needs such request_socks to become child sockets after the TCP_DEFER_ACCEPT period and because received ACK is the only way to create child socket. Server wants to accept() them. If TCP_DEFER_ACCEPT is above sysctl_tcp_synack_retries there are no such retransmissions because server does not want to accept() such request_socks without data. So, servers decide what they want with the TCP_DEFER_ACCEPT period value. 3. SMTP-kind of protocol: server sends first, TCP_DEFER_ACCEPT must not be used by server. 3WHS is completed, there is no 4th packet from client. It is the server side that needs to move request_sock to child socket, to accept the connection and to send welcome message. AFAIK, child socket is created only on ACK. Or I'm missing something? Now the question is: how request_sock is moved into child socket if ACK is dropped on syn_recv_sock failure and we do not send SYN-ACK retransmissions to trigger ACK from client? Client does not plan to send new packet in short time. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Julian, Thanks for the review. > > We have 3 general cases: > > 1. HTTP-kind of protocol: client sends first, server without > TCP_DEFER_ACCEPT > > Server should retransmit but anyways client will > send packet (ACK) that will move request_sock into child socket. In this case, is this retransmit from server and ack from client necessary? I believe that when client finally sends fourth packet i.e. data, server socket (currently request_sock) can move into accept queue and into ESTABLISHED state. So this patch just removes the syn-ack /ack transmissions in between. > > 2. HTTP-kind of protocol: client sends first, server with > TCP_DEFER_ACCEPT > > Server retransmits before 3WHS to get ACK from > client. After ACK we keep request_sock because we do not > want to wakeup listener without data. During TCP_DEFER_ACCEPT > period nothing is retransmitted because we have ACK from > client. After TCP_DEFER_ACCEPT period we start retransmissions > because the server needs such request_socks to become > child sockets after the TCP_DEFER_ACCEPT period and because > received ACK is the only way to create child socket. > Server wants to accept() them. As I mentioned, if client is sending data first, then the data packet from client will also cause the creation of child socket. If server sends the synack again, if client is not ready with data, it will just send an ack back. This is a needless synack/ack. On the other hand if client is ready with data, it will send it and server socket will create child socket if accept queue has space. Even with DEFER_ACCEPT sockets, if a third ack comes in without data, it is remembered in inet_rsk(req)->acked. So, logic is the same. If inet_rsk(req)->acked==1, it means we have received third ack and client is in ESTABLISHED state. So why bother resending needless synacks? > If TCP_DEFER_ACCEPT is > above sysctl_tcp_synack_retries there are no such > retransmissions because server does not want to accept() > such request_socks without data. So, servers decide > what they want with the TCP_DEFER_ACCEPT period value. > > 3. SMTP-kind of protocol: server sends first, > TCP_DEFER_ACCEPT must not be used by server. > 3WHS is completed, there is no 4th packet from > client. It is the server side that needs to move request_sock > to child socket, to accept the connection and to send > welcome message. AFAIK, child socket is created only on > ACK. Or I'm missing something? Now the question is: You raise a good point. As far as I can see, it looks like the request_sock will not become a full socket if server is the one that has to send the data first. The problem it seems is that we always have to wait for the client to send something to create the full socket. Since server side already knows that TWHS has finished, maybe we can find a way to move a request_sock to accept_queue as space opens up without sending synack. > how request_sock is moved into child socket if ACK is > dropped on syn_recv_sock failure and we do not send SYN-ACK > retransmissions to trigger ACK from client? Client does not > plan to send new packet in short time. Agreed. Unless I am missing something too, for cases where server socket sends data first, the patch will break things though for defer-sockets it seems ok. Maybe we need another approach? Thanks for the feedback. I will wait for suggestions as to what to do with the patch and have another look at the code. Regards, Vijay -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, 26 Oct 2012, Vijay Subramanian wrote: > Julian, > Thanks for the review. > > > > > We have 3 general cases: > > > > 1. HTTP-kind of protocol: client sends first, server without > > TCP_DEFER_ACCEPT > > > > Server should retransmit but anyways client will > > send packet (ACK) that will move request_sock into child socket. > > In this case, is this retransmit from server and ack from client > necessary? I believe that when client finally sends fourth packet i.e. > data, server socket (currently request_sock) can move into accept > queue and into ESTABLISHED state. So this patch just removes > the syn-ack /ack transmissions in between. I agree that we have the right to stop SYN-ACK retransmits but only if there is a mechanism that converts request_sock into child socket and wakeups listener when there is space for new child. For case 1 this patch works because it is client's job to talk first. But patch can not differentiate case 1 from case 3 where child is not created. > > 2. HTTP-kind of protocol: client sends first, server with > > TCP_DEFER_ACCEPT > > > > Server retransmits before 3WHS to get ACK from > > client. After ACK we keep request_sock because we do not > > want to wakeup listener without data. During TCP_DEFER_ACCEPT > > period nothing is retransmitted because we have ACK from > > client. After TCP_DEFER_ACCEPT period we start retransmissions > > because the server needs such request_socks to become > > child sockets after the TCP_DEFER_ACCEPT period and because > > received ACK is the only way to create child socket. > > Server wants to accept() them. > > As I mentioned, if client is sending data first, then the data packet > from client will also cause the creation of child socket. > If server sends the synack again, if client is not ready with data, it > will just send an ack back. This is a needless synack/ack. Indeed, here TCP_DEFER_ACCEPT can also benefit from such mechanism to convert requests to childs. In this case, child must appear after the TCP_DEFER_ACCEPT period (in case no data is received). So, this mechanism should work depending on the TCP_DEFER_ACCEPT period. We can stop such retransmits only when new mechanism is implemented. > On the other hand if client is ready with data, it will send it and > server socket will create child socket if accept queue has space. > > Even with DEFER_ACCEPT sockets, if a third ack comes in without data, > it is remembered in inet_rsk(req)->acked. So, > logic is the same. If inet_rsk(req)->acked==1, it means we have > received third ack and client is in ESTABLISHED state. > So why bother resending needless synacks? Only to trigger ACK that will create child. We just want the child after the TCP_DEFER_ACCEPT period has expired. Retransmits are the current mechanism to create child, they are not goal. The goal is to create child. > > If TCP_DEFER_ACCEPT is > > above sysctl_tcp_synack_retries there are no such > > retransmissions because server does not want to accept() > > such request_socks without data. So, servers decide > > what they want with the TCP_DEFER_ACCEPT period value. > > > > 3. SMTP-kind of protocol: server sends first, > > TCP_DEFER_ACCEPT must not be used by server. > > 3WHS is completed, there is no 4th packet from > > client. It is the server side that needs to move request_sock > > to child socket, to accept the connection and to send > > welcome message. AFAIK, child socket is created only on > > ACK. Or I'm missing something? Now the question is: > > You raise a good point. As far as I can see, it looks like the > request_sock will not become a full socket > if server is the one that has to send the data first. The problem it > seems is that we always have to wait > for the client to send something to create the full socket. Since > server side already knows that TWHS has finished, > maybe we can find a way to move a request_sock to accept_queue as > space opens up without sending synack. I'm not sure if that is possible, may be ACK comes with more data that is not saved in request_sock, even payload. It is lost. From time to time we get new fields into struct request_sock but I don't know if we have everything that is needed to create child, I doubt it. > > how request_sock is moved into child socket if ACK is > > dropped on syn_recv_sock failure and we do not send SYN-ACK > > retransmissions to trigger ACK from client? Client does not > > plan to send new packet in short time. > > Agreed. Unless I am missing something too, for cases where server > socket sends data first, the patch will break things though for > defer-sockets > it seems ok. Maybe we need another approach? The effect of current patch is: - case 1: saves retransmits, that is good - case 2: server can not accept child after TCP_DEFER_ACCEPT period, clients can hate your service. 3WHS is complete and it is expected longer timers to apply, say 15mins, while server uses SYN_RECV state timeouts which are shorter. Client can get RST if data is delayed above that timers. Without the patch, we will trigger new ACK from client and new child can enter established state. TCP_DEFER_ACCEPT period is just an optimization that avoids wakeups without data for a period but server still has the following options: - send SYN-ACK, on ACK create child and send FIN - send SYN-ACK, on ACK create child and wait 15mins for data - expire them as request_socks if they do not send data, later send RST on delayed data from client With the patch we do not have a way to accept childs. You restrict server using TCP_DEFER_ACCEPT to shorter timeout to get client data because we are left only with the option to expire request_socks, server can prefer small SYN-ACK retry count but long timeout for client data. - case 3: no welcome, client aborts > Thanks for the feedback. I will wait for suggestions as to what to do > with the patch and have another look at the code. I'll think on this problem but my knowledge in this area is limited. May be the TCP gurus know better what can be done. My guess is that child creation on ACK is mandatory, especially when ACK comes with payload. > Regards, > Vijay Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-10-26 at 17:07 -0700, Vijay Subramanian wrote: > > Thanks for the feedback. I will wait for suggestions as to what to do > with the patch and have another look at the code. It seems to me current code for defer accept should be fixed as well. 'Allowing' to send the last SYNACK hoping this wont be lost somewhere seems really strange. syn_ack_recalc(...) { ... /* * Do not resend while waiting for data after ACK, * start to resend on end of deferring period to give * last chance for data or ACK to create established socket. */ *resend = !inet_rsk(req)->acked || req->num_timeout >= rskq_defer_accept - 1; } DEFER_ACCEPT should really mean : We have the socket, only defer putting it in the accept queue until : - Client sent data. Or - A given amount of time has elapsed Defer accept is typically used by protocols where client sends the first chunk of data. If after XX seconds chunk of data is still not there, sending a SYNACK wont really help : just pass the fd to application and application probably will wait another XX seconds for the client request, then timeout and close the socket. So instead of sending a final SYNACK just to be able to receive an ACK and finally put the request into listener accept queue is a waste of bandwidth. We should put the request into listener accept queue and save one/two messages. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-10-26 at 01:05 -0700, Vijay Subramanian wrote: > Elliott Hughes <enh@google.com> saw strange behavior when server socket was not > calling accept(). Client was receiving SYN-ACK back even when socket on server > side was not yet available. Eric noted server sockets kept resending SYN_ACKS > and further investigation revealed the following problem. > > If server socket is slow to accept() connections, request_socks can represent > connections for which the three-way handshake is already done. From client's > point of view, the connection is in ESTABLISHED state but on server side, socket > is not in accept_queue or ESTABLISHED state. When the syn-ack timer expires, > because of the order in which tests are performed, server can retransmit the > synack repeatedly. Following patch prevents the server from retransmitting the > synack needlessly (and prevents client from replying with ack). This reduces > traffic when server is slow to accept() connections. > > If the server socket has received the third ack during connection establishment, > this is remembered in inet_rsk(req)->acked. The request_sock will expire in > around 30 seconds and will be dropped if it does not move into accept_queue. > > With help from Eric Dumazet. > > Reported-by: Eric Dumazet <edumazet@google.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Tested-by: Neal Cardwell <ncardwell@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> > --- > Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and > Tested-by tags. > > Ignoring "WARNING: line over 80 characters" in the interest of readability. > > net/ipv4/inet_connection_sock.c | 5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index d34ce29..4e8e52e 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > &expire, &resend); > req->rsk_ops->syn_ack_timeout(parent, req); > if (!expire && > - (!resend || > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > - inet_rsk(req)->acked)) { > + (!resend || inet_rsk(req)->acked || > + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { > unsigned long timeo; > > if (req->retrans++ == 0) Part of the complexity of this is that req->retrans is the number of timeouts, serving as the exponential backoff base. Unfortunately we have a side effect because number of retransmits is wrong for defer accept. Here is what I suggest : upstream to net-next this patch we use at Google : Author: Eric Dumazet <edumazet@google.com> Date: Tue Oct 2 02:21:12 2012 -0700 net-tcp: better retrans tracking for defer-accept For passive TCP connections using TCP_DEFER_ACCEPT facility, we incorrectly increment req->retrans each time timeout triggers while no SYNACK is sent. Decouple req->retrans field into two fields : num_retrans : number of retransmit num_timeout : number of timeouts (retrans was renamed to make sure we didnt miss an occurrence) introduce inet_rtx_syn_ack() helper to increment num_retrans only if ->rtx_syn_ack() succeeded. Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans when we re-send a SYNACK in answer to a SYN. Prior to this patch, we were not counting these retransmits. Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS only if a synack packet was successfuly queued. Reported-by: Yuchung Cheng <ycheng@google.com> Then, we could more easily address this silly SYNACK syndrom. What do you think ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Sat, 27 Oct 2012, Eric Dumazet wrote: > > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > > &expire, &resend); > > req->rsk_ops->syn_ack_timeout(parent, req); > > if (!expire && > > - (!resend || > > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > > - inet_rsk(req)->acked)) { > > + (!resend || inet_rsk(req)->acked || > > + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { > > unsigned long timeo; > > > > if (req->retrans++ == 0) > > > Part of the complexity of this is that req->retrans is the number of > timeouts, serving as the exponential backoff base. > > Unfortunately we have a side effect because number of retransmits is > wrong for defer accept. > > Here is what I suggest : upstream to net-next this patch we use at > Google : > > Author: Eric Dumazet <edumazet@google.com> > Date: Tue Oct 2 02:21:12 2012 -0700 > > net-tcp: better retrans tracking for defer-accept > > For passive TCP connections using TCP_DEFER_ACCEPT facility, > we incorrectly increment req->retrans each time timeout triggers > while no SYNACK is sent. > > Decouple req->retrans field into two fields : > > num_retrans : number of retransmit > num_timeout : number of timeouts > > (retrans was renamed to make sure we didnt miss an occurrence) > > introduce inet_rtx_syn_ack() helper to increment num_retrans > only if ->rtx_syn_ack() succeeded. This is dangerous, the first of the cases is route failure, what if we just added reject route for some attacker? We will get error forever. May be it is difficult to decide which error should change the counter. IMHO, such reliability is not needed, we can be short of memory too. > Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans > when we re-send a SYNACK in answer to a SYN. Prior to this patch, > we were not counting these retransmits. Such change looks correct. Of course, it has side effect on current TCP_DEFER_ACCEPT calculations but it is a TCP_DEFER_ACCEPT implementation problem. > Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS > only if a synack packet was successfuly queued. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2012-10-27 at 16:23 +0300, Julian Anastasov wrote: > Hello, > > On Sat, 27 Oct 2012, Eric Dumazet wrote: > > > > > Author: Eric Dumazet <edumazet@google.com> > > Date: Tue Oct 2 02:21:12 2012 -0700 > > > > net-tcp: better retrans tracking for defer-accept > > > > For passive TCP connections using TCP_DEFER_ACCEPT facility, > > we incorrectly increment req->retrans each time timeout triggers > > while no SYNACK is sent. > > > > Decouple req->retrans field into two fields : > > > > num_retrans : number of retransmit > > num_timeout : number of timeouts > > > > (retrans was renamed to make sure we didnt miss an occurrence) > > > > introduce inet_rtx_syn_ack() helper to increment num_retrans > > only if ->rtx_syn_ack() succeeded. > > This is dangerous, the first of the cases is route > failure, what if we just added reject route for some attacker? > We will get error forever. May be it is difficult to decide > which error should change the counter. IMHO, such reliability > is not needed, we can be short of memory too. > We increase num_timeout regardless of success or failure sending a SYNACK (can be a route failure, a memory allocation failure, a full qdisc...) So its not 'forever'. The decision to abort a SYN_RECV is based on num_timeouts only, not anymore on 'number of restransmits' num_retrans is only counting number of SYNACKS that were sent. num_retrans <= num_timeouts (Usually its the same, unless you have errors, or DEFER_ACCEPT mini-sockets) > > Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans > > when we re-send a SYNACK in answer to a SYN. Prior to this patch, > > we were not counting these retransmits. > > Such change looks correct. Of course, it has > side effect on current TCP_DEFER_ACCEPT calculations but > it is a TCP_DEFER_ACCEPT implementation problem. Better wait to see the patch, it changes nothing yet for TCP_DEFER_ACCEPT It only changes accounting problems, for more precise tracking of tcp stack behavior. TCP_DEFER_ACCEPT sockets have this strange accounting bug saying that some packets were retransmitted, while its not true : We only were waiting the user request. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d34ce29..4e8e52e 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, &expire, &resend); req->rsk_ops->syn_ack_timeout(parent, req); if (!expire && - (!resend || - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || - inet_rsk(req)->acked)) { + (!resend || inet_rsk(req)->acked || + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { unsigned long timeo; if (req->retrans++ == 0)