diff mbox

[1/2] socket: increase default maximum listen queue length

Message ID 1300662282-24741-1-git-send-email-hagen@jauu.net
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Hagen Paul Pfeifer March 20, 2011, 11:04 p.m. UTC
sysctl_somaxconn (SOMAXCONN: 128) specifies the maximum number of
sockets in state SYN_RECV per listen socket queue. At listen(2) time the
backlog is adjusted to this limit if bigger then that.

Afterwards in reqsk_queue_alloc() the backlog value is checked again
(nr_table_entries == backlog):

    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

sysctl_max_syn_backlog on the other hand is dynamically adjusted,
depending on the memory characteristic of the system. Default is 256,
128 for small systems and up to 1024 for bigger systems.

For real server work the defacto sysctl_somaxconn limit seems inadequate:

    Experiments with real servers show, that it is absolutely not enough
    even at 100conn/sec. 256 cures most of problems.

Increase default sysctl_somaxconn from 128 to 256 to meet todays condition by
simultaneously limit nr_table_entries by sysctl_max_syn_backlog which is
based on memory condition (max(128, (tcp_hashinfo.ehash_mask + 1 / 256)).

Signed_off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/socket.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

David Miller March 20, 2011, 11:09 p.m. UTC | #1
From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Mon, 21 Mar 2011 00:04:41 +0100

> For real server work the defacto sysctl_somaxconn limit seems inadequate:
> 
>     Experiments with real servers show, that it is absolutely not enough
>     even at 100conn/sec. 256 cures most of problems.

Absolutely no context is provided for this number.

What's the RTT?  How fast are the cpus?  etc.

You must tell the whole story in order to justify these changes
properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer March 20, 2011, 11:52 p.m. UTC | #2
* David Miller | 2011-03-20 16:09:06 [-0700]:

>Absolutely no context is provided for this number.
>
>What's the RTT?  How fast are the cpus?  etc.
>
>You must tell the whole story in order to justify these changes
>properly.

(you can skip the first paragraphs and read the last one ;)

The number is somewhat magically - like many other values. I greped
tglx/history.git but the comment (at that time tcp_ipv4.c) seems pre 2002 era.

Providing context is a little bit artificial: I can construct an scenario with
a RTT of 200ms and 1000 connection request per second and the table will
overflow. This can happen, sure. On the other hand there are scenarios with a
RTT of 20ms and 10 connection requests per second - no problem there.

Increasing the number _has_ one essential advantage: it is aligned on
sysctl_max_syn_backlog which in turn is determined by memory characteristics.


Without patch (sysctl not modified, BUT sysctl_max_syn_backlog depending on memory characteristic):

listen-queue-length = max(8, min(userspace_backlog, min(128, sysctl_max_syn_backlog))

Wit patch (sysctl not modified, BUT sysctl_max_syn_backlog depending on memory characteristic):

listen-queue-length = max(8, min(userspace_backlog, min(256, sysctl_max_syn_backlog))


The point is now: sysctl_max_syn_backlog is per default 256, 128 for small
systems and up to 1024 for larger systems. But sysctl_somaxconn (128) will
_always_ restrict the queue length to 128 and make therefore
sysctl_max_syn_backlog defacto unfeasible - it will always restrict the value
to 128. IMHO sysctl_somaxconn should be removed, the overhead of the
listen-queue size per listening socket is insignificant. Especially because
sysctl_max_syn_backlog already consider the memory characteristic. There are a
bunch more connected sockets as these <10 listening sockets, but performance
lack because of will always be noticeable:

netstat -s | grep overflowed
    2621 times the listen queue of a socket overflowed

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer March 20, 2011, 11:57 p.m. UTC | #3
* David Miller | 2011-03-20 16:09:06 [-0700]:

>You must tell the whole story in order to justify these changes
>properly.

BTW: the second patch is independent and provides a new feature, mentioned by
Eric.

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 21, 2011, 12:18 a.m. UTC | #4
From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Mon, 21 Mar 2011 00:52:53 +0100

> The number is somewhat magically - like many other values. I greped
> tglx/history.git but the comment (at that time tcp_ipv4.c) seems pre 2002 era.

Then don't use that number as part of the justification for the
change.

Describe what matters, and only what matters.  Providing magic and
arbitrary numbers doesn't help people reading your commit message.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/socket.h b/include/linux/socket.h
index edbb1d0..bf35ce2 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -237,7 +237,7 @@  struct ucred {
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
-#define SOMAXCONN	128
+#define SOMAXCONN	256
 
 /* Flags we can use with send/ and recv. 
    Added those for 1003.1g not all are supported yet