diff mbox

[net-next,V2,1/1] tcp: Prevent needless syn-ack rexmt during TWHS

Message ID 1351238750-13611-1-git-send-email-subramanian.vijay@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Vijay Subramanian Oct. 26, 2012, 8:05 a.m. UTC
Elliott Hughes <enh@google.com> saw strange behavior when server socket was not
calling accept(). Client was receiving SYN-ACK back even when socket on server
side was not yet available. Eric noted server sockets kept resending SYN_ACKS
and further investigation revealed the following problem.

If server socket is slow to accept() connections, request_socks can represent
connections for which the three-way handshake is already done.  From client's
point of view, the connection is in ESTABLISHED state but on server side, socket
is not in accept_queue or ESTABLISHED state.  When the syn-ack timer expires,
because of the order in which tests are performed, server can retransmit the
synack repeatedly. Following patch prevents the server from retransmitting the
synack needlessly (and prevents client from replying with ack).  This reduces
traffic when server is slow to accept() connections.

If the server socket has received the third ack during connection establishment,
this is remembered in inet_rsk(req)->acked.  The request_sock will expire in
around 30 seconds and will be dropped if it does not move into accept_queue.

With help from Eric Dumazet.

Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and
Tested-by tags.

Ignoring "WARNING: line over 80 characters" in the interest of readability.

 net/ipv4/inet_connection_sock.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Oct. 26, 2012, 8:03 a.m. UTC | #1
On Fri, 2012-10-26 at 01:05 -0700, Vijay Subramanian wrote:
> Elliott Hughes <enh@google.com> saw strange behavior when server socket was not
> calling accept(). Client was receiving SYN-ACK back even when socket on server
> side was not yet available. Eric noted server sockets kept resending SYN_ACKS
> and further investigation revealed the following problem.
> 
> If server socket is slow to accept() connections, request_socks can represent
> connections for which the three-way handshake is already done.  From client's
> point of view, the connection is in ESTABLISHED state but on server side, socket
> is not in accept_queue or ESTABLISHED state.  When the syn-ack timer expires,
> because of the order in which tests are performed, server can retransmit the
> synack repeatedly. Following patch prevents the server from retransmitting the
> synack needlessly (and prevents client from replying with ack).  This reduces
> traffic when server is slow to accept() connections.
> 
> If the server socket has received the third ack during connection establishment,
> this is remembered in inet_rsk(req)->acked.  The request_sock will expire in
> around 30 seconds and will be dropped if it does not move into accept_queue.
> 
> With help from Eric Dumazet.
> 
> Reported-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and
> Tested-by tags.
> 
> Ignoring "WARNING: line over 80 characters" in the interest of readability.
> 
>  net/ipv4/inet_connection_sock.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d34ce29..4e8e52e 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
>  					       &expire, &resend);
>  				req->rsk_ops->syn_ack_timeout(parent, req);
>  				if (!expire &&
> -				    (!resend ||
> -				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
> -				     inet_rsk(req)->acked)) {
> +				    (!resend || inet_rsk(req)->acked ||
> +				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) {
>  					unsigned long timeo;
>  
>  					if (req->retrans++ == 0)

Thanks Vijay !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Oct. 26, 2012, 9:30 p.m. UTC | #2
Hello,

On Fri, 26 Oct 2012, Vijay Subramanian wrote:

> Elliott Hughes <enh@google.com> saw strange behavior when server socket was not
> calling accept(). Client was receiving SYN-ACK back even when socket on server
> side was not yet available. Eric noted server sockets kept resending SYN_ACKS
> and further investigation revealed the following problem.
> 
> If server socket is slow to accept() connections, request_socks can represent
> connections for which the three-way handshake is already done.  From client's
> point of view, the connection is in ESTABLISHED state but on server side, socket
> is not in accept_queue or ESTABLISHED state.  When the syn-ack timer expires,
> because of the order in which tests are performed, server can retransmit the
> synack repeatedly. Following patch prevents the server from retransmitting the
> synack needlessly (and prevents client from replying with ack).  This reduces
> traffic when server is slow to accept() connections.
> 
> If the server socket has received the third ack during connection establishment,
> this is remembered in inet_rsk(req)->acked.  The request_sock will expire in
> around 30 seconds and will be dropped if it does not move into accept_queue.
> 
> With help from Eric Dumazet.
> 
> Reported-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and
> Tested-by tags.
> 
> Ignoring "WARNING: line over 80 characters" in the interest of readability.
> 
>  net/ipv4/inet_connection_sock.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d34ce29..4e8e52e 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
>  					       &expire, &resend);
>  				req->rsk_ops->syn_ack_timeout(parent, req);
>  				if (!expire &&
> -				    (!resend ||
> -				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
> -				     inet_rsk(req)->acked)) {
> +				    (!resend || inet_rsk(req)->acked ||

	Wait a minute, this can cause problem at least
for the TCP_DEFER_ACCEPT mode. It is supposed to timeout
in SYN_RECV state if after silence period (no retransmissions)
and some final retransmissions (until max_retries) client
still does not send data - the request should be expired
without notifying the listener.

	So, the logic in syn_ack_recalc() was tuned to resend
after the TCP_DEFER_ACCEPT period. This patch stops such
resends after TCP_DEFER_ACCEPT period. May be the change
should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT?

	Lets analyze the default case without TCP_DEFER_ACCEPT.

	Think for protocols like SMTP where server sends
welcome message. This patch stops SYN-ACK resends, client
sends one ACK (which server drops) and enters EST state.
Client is waiting for welcome message in EST state while
server is waiting silently for ACK message to create child
socket. No progress, may but timeout error in client.

	Is the patch safe for such case? Is there a logic
that creates child socket from request if the dropped ACK
was the last message from client? It must not do it for
TCP_DEFER_ACCEPT.

> +				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) {
>  					unsigned long timeo;
>  
>  					if (req->retrans++ == 0)
> -- 
> 1.7.0.4
> 
> --

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 26, 2012, 9:42 p.m. UTC | #3
On Sat, 2012-10-27 at 00:30 +0300, Julian Anastasov wrote:
> 	Hello,
> 
> On Fri, 26 Oct 2012, Vijay Subramanian wrote:
> 
> > Elliott Hughes <enh@google.com> saw strange behavior when server socket was not
> > calling accept(). Client was receiving SYN-ACK back even when socket on server
> > side was not yet available. Eric noted server sockets kept resending SYN_ACKS
> > and further investigation revealed the following problem.
> > 
> > If server socket is slow to accept() connections, request_socks can represent
> > connections for which the three-way handshake is already done.  From client's
> > point of view, the connection is in ESTABLISHED state but on server side, socket
> > is not in accept_queue or ESTABLISHED state.  When the syn-ack timer expires,
> > because of the order in which tests are performed, server can retransmit the
> > synack repeatedly. Following patch prevents the server from retransmitting the
> > synack needlessly (and prevents client from replying with ack).  This reduces
> > traffic when server is slow to accept() connections.
> > 
> > If the server socket has received the third ack during connection establishment,
> > this is remembered in inet_rsk(req)->acked.  The request_sock will expire in
> > around 30 seconds and will be dropped if it does not move into accept_queue.
> > 
> > With help from Eric Dumazet.
> > 
> > Reported-by: Eric Dumazet <edumazet@google.com>
> > Acked-by: Neal Cardwell <ncardwell@google.com>
> > Tested-by: Neal Cardwell <ncardwell@google.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> > ---
> > Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and
> > Tested-by tags.
> > 
> > Ignoring "WARNING: line over 80 characters" in the interest of readability.
> > 
> >  net/ipv4/inet_connection_sock.c |    5 ++---
> >  1 files changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index d34ce29..4e8e52e 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
> >  					       &expire, &resend);
> >  				req->rsk_ops->syn_ack_timeout(parent, req);
> >  				if (!expire &&
> > -				    (!resend ||
> > -				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
> > -				     inet_rsk(req)->acked)) {
> > +				    (!resend || inet_rsk(req)->acked ||
> 
> 	Wait a minute, this can cause problem at least
> for the TCP_DEFER_ACCEPT mode. It is supposed to timeout
> in SYN_RECV state if after silence period (no retransmissions)
> and some final retransmissions (until max_retries) client
> still does not send data - the request should be expired
> without notifying the listener.
> 
> 	So, the logic in syn_ack_recalc() was tuned to resend
> after the TCP_DEFER_ACCEPT period. This patch stops such
> resends after TCP_DEFER_ACCEPT period. May be the change
> should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT?
> 
> 	Lets analyze the default case without TCP_DEFER_ACCEPT.
> 
> 	Think for protocols like SMTP where server sends
> welcome message. This patch stops SYN-ACK resends, client
> sends one ACK (which server drops) and enters EST state.
> Client is waiting for welcome message in EST state while
> server is waiting silently for ACK message to create child
> socket. No progress, may but timeout error in client.
> 
> 	Is the patch safe for such case? Is there a logic
> that creates child socket from request if the dropped ACK
> was the last message from client? It must not do it for
> TCP_DEFER_ACCEPT.


I see no impact with TCP_DEFER_ACCEPT handling.

TCP_DEFER_ACCEPT is quite different from this stuff

The 3WHS is completed, and the socket is ready.

But its not delivered to the accept() (listener) until we receive a DATA
frame (or defer timeout elapsed)

We dont resend SYNACK messages for them. We just wait the 4th packet.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Oct. 26, 2012, 10:52 p.m. UTC | #4
Hello,

On Fri, 26 Oct 2012, Eric Dumazet wrote:

> On Sat, 2012-10-27 at 00:30 +0300, Julian Anastasov wrote:
> > 
> > 	Wait a minute, this can cause problem at least
> > for the TCP_DEFER_ACCEPT mode. It is supposed to timeout
> > in SYN_RECV state if after silence period (no retransmissions)
> > and some final retransmissions (until max_retries) client
> > still does not send data - the request should be expired
> > without notifying the listener.
> > 
> > 	So, the logic in syn_ack_recalc() was tuned to resend
> > after the TCP_DEFER_ACCEPT period. This patch stops such
> > resends after TCP_DEFER_ACCEPT period. May be the change
> > should be in syn_ack_recalc() without hurting TCP_DEFER_ACCEPT?
> > 
> > 	Lets analyze the default case without TCP_DEFER_ACCEPT.
> > 
> > 	Think for protocols like SMTP where server sends
> > welcome message. This patch stops SYN-ACK resends, client
> > sends one ACK (which server drops) and enters EST state.
> > Client is waiting for welcome message in EST state while
> > server is waiting silently for ACK message to create child
> > socket. No progress, may but timeout error in client.
> > 
> > 	Is the patch safe for such case? Is there a logic
> > that creates child socket from request if the dropped ACK
> > was the last message from client? It must not do it for
> > TCP_DEFER_ACCEPT.
> 
> 
> I see no impact with TCP_DEFER_ACCEPT handling.
> 
> TCP_DEFER_ACCEPT is quite different from this stuff
> 
> The 3WHS is completed, and the socket is ready.
> 
> But its not delivered to the accept() (listener) until we receive a DATA
> frame (or defer timeout elapsed)
> 
> We dont resend SYNACK messages for them. We just wait the 4th packet.

	We have 3 general cases:

1. HTTP-kind of protocol: client sends first, server without
TCP_DEFER_ACCEPT

	Server should retransmit but anyways client will
send packet (ACK) that will move request_sock into child socket.

2. HTTP-kind of protocol: client sends first, server with
TCP_DEFER_ACCEPT

	Server retransmits before 3WHS to get ACK from
client. After ACK we keep request_sock because we do not
want to wakeup listener without data. During TCP_DEFER_ACCEPT
period nothing is retransmitted because we have ACK from
client. After TCP_DEFER_ACCEPT period we start retransmissions
because the server needs such request_socks to become
child sockets after the TCP_DEFER_ACCEPT period and because
received ACK is the only way to create child socket.
Server wants to accept() them. If TCP_DEFER_ACCEPT is
above sysctl_tcp_synack_retries there are no such
retransmissions because server does not want to accept()
such request_socks without data. So, servers decide
what they want with the TCP_DEFER_ACCEPT period value.

3. SMTP-kind of protocol: server sends first,
TCP_DEFER_ACCEPT must not be used by server.

	3WHS is completed, there is no 4th packet from
client. It is the server side that needs to move request_sock
to child socket, to accept the connection and to send
welcome message. AFAIK, child socket is created only on
ACK. Or I'm missing something? Now the question is:
how request_sock is moved into child socket if ACK is
dropped on syn_recv_sock failure and we do not send SYN-ACK 
retransmissions to trigger ACK from client? Client does not
plan to send new packet in short time.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vijay Subramanian Oct. 27, 2012, 12:07 a.m. UTC | #5
Julian,
Thanks for the review.

>
>         We have 3 general cases:
>
> 1. HTTP-kind of protocol: client sends first, server without
> TCP_DEFER_ACCEPT
>
>         Server should retransmit but anyways client will
> send packet (ACK) that will move request_sock into child socket.

In  this case, is this retransmit from server and ack from client
necessary? I believe that when client finally sends fourth packet i.e.
data, server socket (currently request_sock) can move into accept
queue and into ESTABLISHED state. So this patch just removes
the syn-ack /ack transmissions in between.


>
> 2. HTTP-kind of protocol: client sends first, server with
> TCP_DEFER_ACCEPT
>
>         Server retransmits before 3WHS to get ACK from
> client. After ACK we keep request_sock because we do not
> want to wakeup listener without data. During TCP_DEFER_ACCEPT
> period nothing is retransmitted because we have ACK from
> client. After TCP_DEFER_ACCEPT period we start retransmissions
> because the server needs such request_socks to become
> child sockets after the TCP_DEFER_ACCEPT period and because
> received ACK is the only way to create child socket.
> Server wants to accept() them.

As I mentioned, if client is sending data first, then the data packet
from client will also cause the creation of child socket.
If server sends the synack again, if client is not ready with data, it
will just send an ack back. This is a needless synack/ack.
On the other hand if client is ready with data, it will send it and
server socket will create child socket if accept queue has space.

Even with DEFER_ACCEPT sockets, if a third ack comes in without data,
it is remembered in inet_rsk(req)->acked. So,
logic is the same. If inet_rsk(req)->acked==1, it means we have
received third ack and client is in ESTABLISHED state.
So why bother resending needless synacks?


> If TCP_DEFER_ACCEPT is
> above sysctl_tcp_synack_retries there are no such
> retransmissions because server does not want to accept()
> such request_socks without data. So, servers decide
> what they want with the TCP_DEFER_ACCEPT period value.
>
> 3. SMTP-kind of protocol: server sends first,
> TCP_DEFER_ACCEPT must not be used by server.
>         3WHS is completed, there is no 4th packet from
> client. It is the server side that needs to move request_sock
> to child socket, to accept the connection and to send
> welcome message. AFAIK, child socket is created only on
> ACK. Or I'm missing something? Now the question is:

You raise a good point. As far as I can see, it looks like the
request_sock will not become a full socket
if server is the one that has to send the data first. The problem it
seems is that we always have to wait
for the client to send something to create the full socket. Since
server side already knows that TWHS has finished,
maybe we can find a way to move a request_sock to accept_queue as
space opens up without sending synack.


> how request_sock is moved into child socket if ACK is
> dropped on syn_recv_sock failure and we do not send SYN-ACK
> retransmissions to trigger ACK from client? Client does not
> plan to send new packet in short time.

Agreed. Unless I am missing something too, for cases where server
socket sends data first, the patch will break things though for
defer-sockets
it seems ok. Maybe we need another approach?

Thanks for the feedback. I will wait for suggestions as to what to do
with the patch and have another look at the code.

Regards,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Oct. 27, 2012, 8:43 a.m. UTC | #6
Hello,

On Fri, 26 Oct 2012, Vijay Subramanian wrote:

> Julian,
> Thanks for the review.
> 
> >
> >         We have 3 general cases:
> >
> > 1. HTTP-kind of protocol: client sends first, server without
> > TCP_DEFER_ACCEPT
> >
> >         Server should retransmit but anyways client will
> > send packet (ACK) that will move request_sock into child socket.
> 
> In  this case, is this retransmit from server and ack from client
> necessary? I believe that when client finally sends fourth packet i.e.
> data, server socket (currently request_sock) can move into accept
> queue and into ESTABLISHED state. So this patch just removes
> the syn-ack /ack transmissions in between.

	I agree that we have the right to stop SYN-ACK
retransmits but only if there is a mechanism that converts
request_sock into child socket and wakeups listener when
there is space for new child. For case 1 this patch works
because it is client's job to talk first. But patch can not
differentiate case 1 from case 3 where child is not created.

> > 2. HTTP-kind of protocol: client sends first, server with
> > TCP_DEFER_ACCEPT
> >
> >         Server retransmits before 3WHS to get ACK from
> > client. After ACK we keep request_sock because we do not
> > want to wakeup listener without data. During TCP_DEFER_ACCEPT
> > period nothing is retransmitted because we have ACK from
> > client. After TCP_DEFER_ACCEPT period we start retransmissions
> > because the server needs such request_socks to become
> > child sockets after the TCP_DEFER_ACCEPT period and because
> > received ACK is the only way to create child socket.
> > Server wants to accept() them.
> 
> As I mentioned, if client is sending data first, then the data packet
> from client will also cause the creation of child socket.
> If server sends the synack again, if client is not ready with data, it
> will just send an ack back. This is a needless synack/ack.

	Indeed, here TCP_DEFER_ACCEPT can also benefit from such 
mechanism to convert requests to childs. In this case, child must
appear after the TCP_DEFER_ACCEPT period (in case no data
is received). So, this mechanism should work depending on
the TCP_DEFER_ACCEPT period. We can stop such retransmits
only when new mechanism is implemented.

> On the other hand if client is ready with data, it will send it and
> server socket will create child socket if accept queue has space.
> 
> Even with DEFER_ACCEPT sockets, if a third ack comes in without data,
> it is remembered in inet_rsk(req)->acked. So,
> logic is the same. If inet_rsk(req)->acked==1, it means we have
> received third ack and client is in ESTABLISHED state.
> So why bother resending needless synacks?

	Only to trigger ACK that will create child. We
just want the child after the TCP_DEFER_ACCEPT period has
expired. Retransmits are the current mechanism to
create child, they are not goal. The goal is to create child.

> > If TCP_DEFER_ACCEPT is
> > above sysctl_tcp_synack_retries there are no such
> > retransmissions because server does not want to accept()
> > such request_socks without data. So, servers decide
> > what they want with the TCP_DEFER_ACCEPT period value.
> >
> > 3. SMTP-kind of protocol: server sends first,
> > TCP_DEFER_ACCEPT must not be used by server.
> >         3WHS is completed, there is no 4th packet from
> > client. It is the server side that needs to move request_sock
> > to child socket, to accept the connection and to send
> > welcome message. AFAIK, child socket is created only on
> > ACK. Or I'm missing something? Now the question is:
> 
> You raise a good point. As far as I can see, it looks like the
> request_sock will not become a full socket
> if server is the one that has to send the data first. The problem it
> seems is that we always have to wait
> for the client to send something to create the full socket. Since
> server side already knows that TWHS has finished,
> maybe we can find a way to move a request_sock to accept_queue as
> space opens up without sending synack.

	I'm not sure if that is possible, may be ACK comes
with more data that is not saved in request_sock, even
payload. It is lost. From time to time we get new fields
into struct request_sock but I don't know if we have
everything that is needed to create child, I doubt it.

> > how request_sock is moved into child socket if ACK is
> > dropped on syn_recv_sock failure and we do not send SYN-ACK
> > retransmissions to trigger ACK from client? Client does not
> > plan to send new packet in short time.
> 
> Agreed. Unless I am missing something too, for cases where server
> socket sends data first, the patch will break things though for
> defer-sockets
> it seems ok. Maybe we need another approach?

	The effect of current patch is:

- case 1: saves retransmits, that is good

- case 2: server can not accept child after TCP_DEFER_ACCEPT
period, clients can hate your service. 3WHS is complete
and it is expected longer timers to apply, say 15mins,
while server uses SYN_RECV state timeouts which are
shorter. Client can get RST if data is delayed above
that timers. Without the patch, we will trigger new ACK
from client and new child can enter established state.
TCP_DEFER_ACCEPT period is just an optimization that
avoids wakeups without data for a period but server
still has the following options:

	- send SYN-ACK, on ACK create child and send FIN
	- send SYN-ACK, on ACK create child and wait 15mins for data
	- expire them as request_socks if they do not send data,
	later send RST on delayed data from client

	With the patch we do not have a way to accept childs.
	You restrict server using TCP_DEFER_ACCEPT to
	shorter timeout to get client data because we
	are left only with the option to expire request_socks,
	server can prefer small SYN-ACK retry count but
	long timeout for client data.

- case 3: no welcome, client aborts

> Thanks for the feedback. I will wait for suggestions as to what to do
> with the patch and have another look at the code.

	I'll think on this problem but my knowledge in this
area is limited. May be the TCP gurus know better what can
be done. My guess is that child creation on ACK is mandatory,
especially when ACK comes with payload.

> Regards,
> Vijay

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 27, 2012, 8:50 a.m. UTC | #7
On Fri, 2012-10-26 at 17:07 -0700, Vijay Subramanian wrote:

> 
> Thanks for the feedback. I will wait for suggestions as to what to do
> with the patch and have another look at the code.

It seems to me current code for defer accept should be fixed as well.

'Allowing' to send the last SYNACK hoping this wont be lost somewhere
seems really strange.

syn_ack_recalc(...)
{

...
        /*
         * Do not resend while waiting for data after ACK,
         * start to resend on end of deferring period to give
         * last chance for data or ACK to create established socket.
         */
        *resend = !inet_rsk(req)->acked ||
                  req->num_timeout >= rskq_defer_accept - 1;
}

DEFER_ACCEPT should really mean :

We have the socket, only defer putting it in the accept queue until :

- Client sent data.
Or
- A given amount of time has elapsed


Defer accept is typically used by protocols where client sends the first
chunk of data. 

If after XX seconds chunk of data is still not there, sending a SYNACK
wont really help : just pass the fd to application and application
probably will wait another XX seconds for the client request, then
timeout and close the socket.

So instead of sending a final SYNACK just to be able to receive an ACK
and finally put the request into listener accept queue is a waste of
bandwidth. We should put the request into listener accept queue and save
one/two messages.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 27, 2012, 11:57 a.m. UTC | #8
On Fri, 2012-10-26 at 01:05 -0700, Vijay Subramanian wrote:
> Elliott Hughes <enh@google.com> saw strange behavior when server socket was not
> calling accept(). Client was receiving SYN-ACK back even when socket on server
> side was not yet available. Eric noted server sockets kept resending SYN_ACKS
> and further investigation revealed the following problem.
> 
> If server socket is slow to accept() connections, request_socks can represent
> connections for which the three-way handshake is already done.  From client's
> point of view, the connection is in ESTABLISHED state but on server side, socket
> is not in accept_queue or ESTABLISHED state.  When the syn-ack timer expires,
> because of the order in which tests are performed, server can retransmit the
> synack repeatedly. Following patch prevents the server from retransmitting the
> synack needlessly (and prevents client from replying with ack).  This reduces
> traffic when server is slow to accept() connections.
> 
> If the server socket has received the third ack during connection establishment,
> this is remembered in inet_rsk(req)->acked.  The request_sock will expire in
> around 30 seconds and will be dropped if it does not move into accept_queue.
> 
> With help from Eric Dumazet.
> 
> Reported-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> Changes from V1: Changed Reported-by tag and commit message. Added Acked-by and
> Tested-by tags.
> 
> Ignoring "WARNING: line over 80 characters" in the interest of readability.
> 
>  net/ipv4/inet_connection_sock.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d34ce29..4e8e52e 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
>  					       &expire, &resend);
>  				req->rsk_ops->syn_ack_timeout(parent, req);
>  				if (!expire &&
> -				    (!resend ||
> -				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
> -				     inet_rsk(req)->acked)) {
> +				    (!resend || inet_rsk(req)->acked ||
> +				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) {
>  					unsigned long timeo;
>  
>  					if (req->retrans++ == 0)


Part of the complexity of this is that req->retrans is the number of
timeouts, serving as the exponential backoff base.

Unfortunately we have a side effect because number of retransmits is
wrong for defer accept.

Here is what I suggest : upstream to net-next this patch we use at
Google :

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 2 02:21:12 2012 -0700

net-tcp: better retrans tracking for defer-accept

For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.

Decouple req->retrans field into two fields :

num_retrans : number of retransmit
num_timeout : number of timeouts

(retrans was renamed to make sure we didnt miss an occurrence)

introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.

Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a SYN. Prior to this patch,
we were not counting these retransmits.

Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfuly queued.

Reported-by: Yuchung Cheng <ycheng@google.com>
    
Then, we could more easily address this silly SYNACK syndrom.

What do you think ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Oct. 27, 2012, 1:23 p.m. UTC | #9
Hello,

On Sat, 27 Oct 2012, Eric Dumazet wrote:

> > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
> >  					       &expire, &resend);
> >  				req->rsk_ops->syn_ack_timeout(parent, req);
> >  				if (!expire &&
> > -				    (!resend ||
> > -				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
> > -				     inet_rsk(req)->acked)) {
> > +				    (!resend || inet_rsk(req)->acked ||
> > +				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) {
> >  					unsigned long timeo;
> >  
> >  					if (req->retrans++ == 0)
> 
> 
> Part of the complexity of this is that req->retrans is the number of
> timeouts, serving as the exponential backoff base.
> 
> Unfortunately we have a side effect because number of retransmits is
> wrong for defer accept.
> 
> Here is what I suggest : upstream to net-next this patch we use at
> Google :
> 
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Oct 2 02:21:12 2012 -0700
> 
> net-tcp: better retrans tracking for defer-accept
> 
> For passive TCP connections using TCP_DEFER_ACCEPT facility,
> we incorrectly increment req->retrans each time timeout triggers
> while no SYNACK is sent.
> 
> Decouple req->retrans field into two fields :
> 
> num_retrans : number of retransmit
> num_timeout : number of timeouts
> 
> (retrans was renamed to make sure we didnt miss an occurrence)
> 
> introduce inet_rtx_syn_ack() helper to increment num_retrans
> only if ->rtx_syn_ack() succeeded.

	This is dangerous, the first of the cases is route
failure, what if we just added reject route for some attacker?
We will get error forever. May be it is difficult to decide
which error should change the counter. IMHO, such reliability
is not needed, we can be short of memory too.

> Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
> when we re-send a SYNACK in answer to a SYN. Prior to this patch,
> we were not counting these retransmits.

	Such change looks correct. Of course, it has
side effect on current TCP_DEFER_ACCEPT calculations but
it is a TCP_DEFER_ACCEPT implementation problem.

> Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
> only if a synack packet was successfuly queued.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 27, 2012, 1:32 p.m. UTC | #10
On Sat, 2012-10-27 at 16:23 +0300, Julian Anastasov wrote:
> 	Hello,
> 
> On Sat, 27 Oct 2012, Eric Dumazet wrote:
> 

> > 
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Tue Oct 2 02:21:12 2012 -0700
> > 
> > net-tcp: better retrans tracking for defer-accept
> > 
> > For passive TCP connections using TCP_DEFER_ACCEPT facility,
> > we incorrectly increment req->retrans each time timeout triggers
> > while no SYNACK is sent.
> > 
> > Decouple req->retrans field into two fields :
> > 
> > num_retrans : number of retransmit
> > num_timeout : number of timeouts
> > 
> > (retrans was renamed to make sure we didnt miss an occurrence)
> > 
> > introduce inet_rtx_syn_ack() helper to increment num_retrans
> > only if ->rtx_syn_ack() succeeded.
> 
> 	This is dangerous, the first of the cases is route
> failure, what if we just added reject route for some attacker?
> We will get error forever. May be it is difficult to decide
> which error should change the counter. IMHO, such reliability
> is not needed, we can be short of memory too.
> 


We increase num_timeout regardless of success or failure sending a
SYNACK (can be a route failure, a memory allocation failure, a full
qdisc...)

So its not 'forever'. The decision to abort a SYN_RECV is based on
num_timeouts only, not anymore on 'number of restransmits'

num_retrans is only counting number of SYNACKS that were sent.

num_retrans <= num_timeouts

(Usually its the same, unless you have errors, or DEFER_ACCEPT
mini-sockets)


> > Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
> > when we re-send a SYNACK in answer to a SYN. Prior to this patch,
> > we were not counting these retransmits.
> 
> 	Such change looks correct. Of course, it has
> side effect on current TCP_DEFER_ACCEPT calculations but
> it is a TCP_DEFER_ACCEPT implementation problem.

Better wait to see the patch, it changes nothing yet for
TCP_DEFER_ACCEPT

It only changes accounting problems, for more precise tracking of tcp
stack behavior.

TCP_DEFER_ACCEPT sockets have this strange accounting bug saying that
some packets were retransmitted, while its not true : We only were
waiting the user request.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d34ce29..4e8e52e 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -598,9 +598,8 @@  void inet_csk_reqsk_queue_prune(struct sock *parent,
 					       &expire, &resend);
 				req->rsk_ops->syn_ack_timeout(parent, req);
 				if (!expire &&
-				    (!resend ||
-				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL) ||
-				     inet_rsk(req)->acked)) {
+				    (!resend || inet_rsk(req)->acked ||
+				     !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)