diff mbox

listen(2) backlog changes in or around Linux 3.1?

Message ID 1350634013.2293.262.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 19, 2012, 8:06 a.m. UTC
On Fri, 2012-10-19 at 08:50 +0200, Eric Dumazet wrote:

> I came to the same analysis than you.
> 
> Current behavior is stupid, because the traffic for such 'sockets' is
> insane :
> 
> As we sent a SYNACK, client sends the 3rd packet (ACK), and we ignore
> it.
> 
> Then we keep retransmitting SYNACKS....
> 
> Oh well.


What about the following patch ?

 include/net/sock.h        |    7 ++++++-
 include/uapi/linux/snmp.h |    1 +
 net/ipv4/proc.c           |    1 +
 net/ipv4/tcp_ipv4.c       |    4 +++-
 net/ipv6/tcp_ipv6.c       |    3 ++-
 5 files changed, 13 insertions(+), 3 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Vijay Subramanian Oct. 19, 2012, 9:14 a.m. UTC | #1
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index ef998b0..0404926 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>          * clogging syn queue with openreqs with exponentially increasing
>          * timeout.
>          */
> -       if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
> +       if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk)))
>                 goto drop;
>

For what its worth, I think the changes make sense. But is there any
reason to exclude old request_socks in the  call to
__sk_acceptq_is_full().?
as in
      if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk)))
               goto drop;

I am not sure why the current code looks only at young request_socks.
Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 19, 2012, 10:29 a.m. UTC | #2
On Fri, 2012-10-19 at 02:14 -0700, Vijay Subramanian wrote:
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index ef998b0..0404926 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> >          * clogging syn queue with openreqs with exponentially increasing
> >          * timeout.
> >          */
> > -       if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
> > +       if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk)))
> >                 goto drop;
> >
> 
> For what its worth, I think the changes make sense. But is there any
> reason to exclude old request_socks in the  call to
> __sk_acceptq_is_full().?
> as in
>       if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk)))
>                goto drop;
> 
> I am not sure why the current code looks only at young request_socks.
> Thanks,
> Vijay

Old requests are assumed to be unlikely to complete (SYN attack).

young requests are assumed to have a reasonable chance to complete.

Note that we drop the SYN packet, so its not a 'final' decision.

Some other OSes send RST in case the listener queue is full
(I tested FreeBSD 9.0 this morning.)

Note also we probably have a bug elsewhere :

If we send a SYNACK, then receive the ACK from client, and the acceptq
is full, we should reset the connexion. Right now we have kind of stupid
situation, were we drop the ACK, and leave the REQ in the SYN_RECV
state, so we retransmit SYNACKS.

I am working on this part as well.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 19, 2012, 11:39 a.m. UTC | #3
On Fri, 2012-10-19 at 12:29 +0200, Eric Dumazet wrote:
> On Fri, 2012-10-19 at 02:14 -0700, Vijay Subramanian wrote:
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index ef998b0..0404926 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> > > @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> > >          * clogging syn queue with openreqs with exponentially increasing
> > >          * timeout.
> > >          */
> > > -       if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
> > > +       if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk)))
> > >                 goto drop;
> > >
> > 
> > For what its worth, I think the changes make sense. But is there any
> > reason to exclude old request_socks in the  call to
> > __sk_acceptq_is_full().?
> > as in
> >       if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk)))
> >                goto drop;
> > 
> > I am not sure why the current code looks only at young request_socks.
> > Thanks,
> > Vijay
> 
> Old requests are assumed to be unlikely to complete (SYN attack).
> 
> young requests are assumed to have a reasonable chance to complete.
> 
> Note that we drop the SYN packet, so its not a 'final' decision.
> 
> Some other OSes send RST in case the listener queue is full
> (I tested FreeBSD 9.0 this morning.)
> 
> Note also we probably have a bug elsewhere :
> 
> If we send a SYNACK, then receive the ACK from client, and the acceptq
> is full, we should reset the connexion. Right now we have kind of stupid
> situation, were we drop the ACK, and leave the REQ in the SYN_RECV
> state, so we retransmit SYNACKS.
> 
> I am working on this part as well.
> 

Well, it seems a documented feature :

tcp_abort_on_overflow - BOOLEAN
        If listening service is too slow to accept new connections,
        reset them. Default state is FALSE. It means that if overflow
        occurred due to a burst, connection will recover. Enable this
        option _only_ if you are really sure that listening daemon
        cannot be tuned to accept connections faster. Enabling this
        option can harm clients of your server.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 0baccb6..d2ecfbe 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -698,9 +698,14 @@  static inline void sk_acceptq_added(struct sock *sk)
 	sk->sk_ack_backlog++;
 }
 
+static inline bool __sk_acceptq_is_full(const struct sock *sk, unsigned int young)
+{
+	return (sk->sk_ack_backlog + young) > sk->sk_max_ack_backlog;
+}
+
 static inline bool sk_acceptq_is_full(const struct sock *sk)
 {
-	return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
+	return __sk_acceptq_is_full(sk, 0);
 }
 
 /*
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index fdfba23..5ff2daf 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -245,6 +245,7 @@  enum
 	LINUX_MIB_TCPFASTOPENPASSIVEFAIL,	/* TCPFastOpenPassiveFail */
 	LINUX_MIB_TCPFASTOPENLISTENOVERFLOW,	/* TCPFastOpenListenOverflow */
 	LINUX_MIB_TCPFASTOPENCOOKIEREQD,	/* TCPFastOpenCookieReqd */
+	LINUX_MIB_TCPSYNDROP,			/* TCPSynDrop */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 8de53e1..a5f59ab 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -267,6 +267,7 @@  static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPFastOpenPassiveFail", LINUX_MIB_TCPFASTOPENPASSIVEFAIL),
 	SNMP_MIB_ITEM("TCPFastOpenListenOverflow", LINUX_MIB_TCPFASTOPENLISTENOVERFLOW),
 	SNMP_MIB_ITEM("TCPFastOpenCookieReqd", LINUX_MIB_TCPFASTOPENCOOKIEREQD),
+	SNMP_MIB_ITEM("TCPSynDrop", LINUX_MIB_TCPSYNDROP),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ef998b0..0404926 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1507,7 +1507,7 @@  int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	 * clogging syn queue with openreqs with exponentially increasing
 	 * timeout.
 	 */
-	if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
+	if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk)))
 		goto drop;
 
 	req = inet_reqsk_alloc(&tcp_request_sock_ops);
@@ -1673,6 +1673,7 @@  drop_and_release:
 drop_and_free:
 	reqsk_free(req);
 drop:
+	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNDROP);
 	return 0;
 }
 EXPORT_SYMBOL(tcp_v4_conn_request);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 26175bf..39ffc54 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1054,7 +1054,7 @@  static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
-	if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
+	if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk)))
 		goto drop;
 
 	req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
@@ -1204,6 +1204,7 @@  drop_and_release:
 drop_and_free:
 	reqsk_free(req);
 drop:
+	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNDROP);
 	return 0; /* don't send reset */
 }