diff mbox

add a sysctl to disable TCP simultaneous connection opening

Message ID 20081008081109.GA25342@1wt.eu
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Willy Tarreau Oct. 8, 2008, 8:11 a.m. UTC
Hi David,

I hope you had a pleasant journey in Paris last week. It was nice to meet you.

In 2005, I submitted a patch for 2.6.11 which we finally did not decide
on, it was about the ability to disable TCP simultaneous connection open.

For the last few years, we've been bothered several times by newbies
reading "TCP/IP for dummies" then trying to get a name by discovering
the ultimate vulnerability which will get a lot of press coverage. Of
course those attacks are often pointless or just proofs of concepts
with no application in real life, but it's nonetheless annoying to have
to deal with the issues, especially to explain to customers why they
shouldn't have to worry.

I would not be surprized that a next one will exploit TCP's ability to
perform simultaneous connections between two clients. It's very easy
to trigger, there's no SEQ to guess, just a port, and the effect is
simply a poor DoS on the service trying to connect outside. In other
times we would have found it very minor, but judging by the consideration
given to harder and less effective "attacks" these days, this trivial
one may finally get picked and annoy us again.

As a reminder (especially for those who are not aware of this feature),
it is possible with TCP to connect two clients together if both send
crossed SYNs, then SYN-ACKs, then ACKs. This implies that each side
accepts the sequence number of the other one without any ability to
check that it matches its SYN. So it's trivial for an attacker to
prevent one client from establishing a connection from a known port
to a known address/port by sending it a SYN to that port. The client
will then send a SYN-ACK and will not accept the expected server's
SYN-ACK because the SYN SEQ will be different. The server might also
send an RST on the client's SYN-ACK if it's not firewalled. The
connection will eventually timeout in a SYN-RECV state or simply be
aborted.

The theorical DoS effect on some predictable address/port destinations
is easy to understand. Services with very few destination IP/ports such
as software/signature updates, SMTP relaying, DNS clients for zone
transfers, or SSH remote accesses are easy targets. In practice, the
SYN would have to be sent after the client's SYN and before the server's
SYN-ACK, which leaves a small time window limiting the attack to far
remote, unfirewalled communications.

This is very easy to test, I'm used to do it between two netcats, and
preventing the initial RST by unplugging the cable before sending the
connects. I remember it also worked on Solaris 8, and I don't remember
about BSDs (though I would not be surprized they support it too).

IMHO this feature is totally useless nowadays, because :
  - if one of the machines is firewalled, the firewall will block it
    (none of the firewalls I've tested among Netfilter, Checkpoint,
     Cisco, Fortinet, Juniper supports simultaneous connect, and it
     would cause a big security issue).

  - if neither machines are firewalled, then the SYN to a closed port
    will immediately trigger an RST, making it very difficult to
    establish a working connection.

For this reason, I'd like that we plan on merging the attached patch
(or any variant) for 2.6.28 or 2.6.29 before a new random junkie comes
aroung screaming loud he uncovered a big DoS hole in Linux TCP stack. The
patch provides a sysctl allowing the user to enable or disable the feature.
I've been running all my kernels with the code ifdef'ed out for the last
4-5 years, but I certainly can understand that some people want to be able
to enable it for any reason (even for educationnal purposes), hence the
sysctl.

I have rediffed the patch for 2.6.27-rc9 and successfully tested it (both
with feature enabled and disabled). It disables the feature by default,
but I have no problem with leaving it on and expecting that distros will
ship it off.

By the way, during the tests I noticed something strange. While the
socket is in SYN_RECV on the first side to receive the other one's
SYN, it has a huge receive queue :

Active Internet connections (servers and established)
Proto Recv-Q     Send-Q Local Address        Foreign Address      State
tcp   3964220580      1 192.32.189.160:12346 192.32.189.228:4000  SYN_RECV

The value corresponds to the ACK value emitted, which is equal to the
other end's SYN+1. I don't know if this is just an artefact of the way
the queue size is reported (probably because the first ACK has not yet
been considered since we're not in ESTABLISHED state) or if this can
have any further impact (eg: unexpected memory freeing on termination,
etc...).

Best regards,
Willy


From 61abc5ef6c3bc210c63036b5f36cc96a7802b605 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Wed, 8 Oct 2008 10:00:42 +0200
Subject: TCP: add a sysctl to disable simultaneous connection opening.

Strict implementation of RFC793 (TCP) requires support for a feature
called "simultaneous connect", which allows two clients to connect to
each other without anyone entering a listening state.  While almost
never used, and supported by few OSes, Linux supports this feature.

However, it introduces a weakness in the protocol which makes it very
easy for an attacker to prevent a client from connecting to a known
server. The attacker only has to guess the source port to shut down
the client connection during its establishment. The impact is limited,
but it may be used to prevent an antivirus or IPS from fetching updates
and not detecting an attack, or to prevent an SSL gateway from fetching
a CRL for example.

This patch provides a new sysctl "tcp_simult_connect" to enable or disable
support for this useless feature. It comes disabled by default.

Hundreds of systems running with that feature disabled for more than 4 years
have never encountered an application which requires it. It is almost never
supported by firewalls BTW.

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 Documentation/networking/ip-sysctl.txt |   22 ++++++++++++++++++++++
 include/linux/sysctl.h                 |    1 +
 include/net/tcp.h                      |    1 +
 net/ipv4/sysctl_net_ipv4.c             |    8 ++++++++
 net/ipv4/tcp_input.c                   |    5 ++++-
 5 files changed, 36 insertions(+), 1 deletions(-)

Comments

Michael Kerrisk \(man-pages\) Oct. 8, 2008, 9:19 a.m. UTC | #1
[CC+=linux-api@vger.kernel.org]

(As very recently noted in SubmittingPatches)

On Wed, Oct 8, 2008 at 10:11 AM, Willy Tarreau <w@1wt.eu> wrote:
> Hi David,
>
> I hope you had a pleasant journey in Paris last week. It was nice to meet you.
>
> In 2005, I submitted a patch for 2.6.11 which we finally did not decide
> on, it was about the ability to disable TCP simultaneous connection open.
>
> For the last few years, we've been bothered several times by newbies
> reading "TCP/IP for dummies" then trying to get a name by discovering
> the ultimate vulnerability which will get a lot of press coverage. Of
> course those attacks are often pointless or just proofs of concepts
> with no application in real life, but it's nonetheless annoying to have
> to deal with the issues, especially to explain to customers why they
> shouldn't have to worry.
>
> I would not be surprized that a next one will exploit TCP's ability to
> perform simultaneous connections between two clients. It's very easy
> to trigger, there's no SEQ to guess, just a port, and the effect is
> simply a poor DoS on the service trying to connect outside. In other
> times we would have found it very minor, but judging by the consideration
> given to harder and less effective "attacks" these days, this trivial
> one may finally get picked and annoy us again.
>
> As a reminder (especially for those who are not aware of this feature),
> it is possible with TCP to connect two clients together if both send
> crossed SYNs, then SYN-ACKs, then ACKs. This implies that each side
> accepts the sequence number of the other one without any ability to
> check that it matches its SYN. So it's trivial for an attacker to
> prevent one client from establishing a connection from a known port
> to a known address/port by sending it a SYN to that port. The client
> will then send a SYN-ACK and will not accept the expected server's
> SYN-ACK because the SYN SEQ will be different. The server might also
> send an RST on the client's SYN-ACK if it's not firewalled. The
> connection will eventually timeout in a SYN-RECV state or simply be
> aborted.
>
> The theorical DoS effect on some predictable address/port destinations
> is easy to understand. Services with very few destination IP/ports such
> as software/signature updates, SMTP relaying, DNS clients for zone
> transfers, or SSH remote accesses are easy targets. In practice, the
> SYN would have to be sent after the client's SYN and before the server's
> SYN-ACK, which leaves a small time window limiting the attack to far
> remote, unfirewalled communications.
>
> This is very easy to test, I'm used to do it between two netcats, and
> preventing the initial RST by unplugging the cable before sending the
> connects. I remember it also worked on Solaris 8, and I don't remember
> about BSDs (though I would not be surprized they support it too).
>
> IMHO this feature is totally useless nowadays, because :
>  - if one of the machines is firewalled, the firewall will block it
>    (none of the firewalls I've tested among Netfilter, Checkpoint,
>     Cisco, Fortinet, Juniper supports simultaneous connect, and it
>     would cause a big security issue).
>
>  - if neither machines are firewalled, then the SYN to a closed port
>    will immediately trigger an RST, making it very difficult to
>    establish a working connection.
>
> For this reason, I'd like that we plan on merging the attached patch
> (or any variant) for 2.6.28 or 2.6.29 before a new random junkie comes
> aroung screaming loud he uncovered a big DoS hole in Linux TCP stack. The
> patch provides a sysctl allowing the user to enable or disable the feature.
> I've been running all my kernels with the code ifdef'ed out for the last
> 4-5 years, but I certainly can understand that some people want to be able
> to enable it for any reason (even for educationnal purposes), hence the
> sysctl.
>
> I have rediffed the patch for 2.6.27-rc9 and successfully tested it (both
> with feature enabled and disabled). It disables the feature by default,
> but I have no problem with leaving it on and expecting that distros will
> ship it off.
>
> By the way, during the tests I noticed something strange. While the
> socket is in SYN_RECV on the first side to receive the other one's
> SYN, it has a huge receive queue :
>
> Active Internet connections (servers and established)
> Proto Recv-Q     Send-Q Local Address        Foreign Address      State
> tcp   3964220580      1 192.32.189.160:12346 192.32.189.228:4000  SYN_RECV
>
> The value corresponds to the ACK value emitted, which is equal to the
> other end's SYN+1. I don't know if this is just an artefact of the way
> the queue size is reported (probably because the first ACK has not yet
> been considered since we're not in ESTABLISHED state) or if this can
> have any further impact (eg: unexpected memory freeing on termination,
> etc...).
>
> Best regards,
> Willy
>
>
> From 61abc5ef6c3bc210c63036b5f36cc96a7802b605 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Wed, 8 Oct 2008 10:00:42 +0200
> Subject: TCP: add a sysctl to disable simultaneous connection opening.
>
> Strict implementation of RFC793 (TCP) requires support for a feature
> called "simultaneous connect", which allows two clients to connect to
> each other without anyone entering a listening state.  While almost
> never used, and supported by few OSes, Linux supports this feature.
>
> However, it introduces a weakness in the protocol which makes it very
> easy for an attacker to prevent a client from connecting to a known
> server. The attacker only has to guess the source port to shut down
> the client connection during its establishment. The impact is limited,
> but it may be used to prevent an antivirus or IPS from fetching updates
> and not detecting an attack, or to prevent an SSL gateway from fetching
> a CRL for example.
>
> This patch provides a new sysctl "tcp_simult_connect" to enable or disable
> support for this useless feature. It comes disabled by default.
>
> Hundreds of systems running with that feature disabled for more than 4 years
> have never encountered an application which requires it. It is almost never
> supported by firewalls BTW.
>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  Documentation/networking/ip-sysctl.txt |   22 ++++++++++++++++++++++
>  include/linux/sysctl.h                 |    1 +
>  include/net/tcp.h                      |    1 +
>  net/ipv4/sysctl_net_ipv4.c             |    8 ++++++++
>  net/ipv4/tcp_input.c                   |    5 ++++-
>  5 files changed, 36 insertions(+), 1 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index d849326..cefc894 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -101,6 +101,28 @@ inet_peer_gc_maxtime - INTEGER
>
>  TCP variables:
>
> +tcp_simult_connect - BOOLEAN
> +       Enables TCP simultaneous connect feature conforming to RFC793.
> +       Strict implementation of RFC793 (TCP) requires support for a feature
> +       called "simultaneous connect", which allows two clients to connect to
> +       each other without anyone entering a listening state.  While almost
> +       never used, and supported by few OSes, Linux supports this feature.
> +
> +       However, it introduces a weakness in the protocol which makes it very
> +       easy for an attacker to prevent a client from connecting to a known
> +       server. The attacker only has to guess the source port to shut down
> +       the client connection during its establishment. The impact is limited,
> +       but it may be used to prevent an antivirus or IPS from fetching updates
> +       and not detecting an attack, or to prevent an SSL gateway from fetching
> +       a CRL for example.
> +
> +       If you want absolute compatibility with any possible application,
> +       you should set it to 1. If you prefer to enhance security on your
> +       systems you'd better let it to 0. After four years of usage on
> +       hundreds of systems, no application was ever found to require this
> +       feature, which is not even supported by most firewalls.
> +       Default: 0
> +
>  somaxconn - INTEGER
>        Limit of socket listen() backlog, known in userspace as SOMAXCONN.
>        Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index d0437f3..0e23062 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -435,6 +435,7 @@ enum
>        NET_TCP_ALLOWED_CONG_CONTROL=123,
>        NET_TCP_MAX_SSTHRESH=124,
>        NET_TCP_FRTO_RESPONSE=125,
> +       NET_TCP_SIMULT_CONNECT=126,
>  };
>
>  enum {
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 8983386..c61fe3c 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -236,6 +236,7 @@ extern int sysctl_tcp_base_mss;
>  extern int sysctl_tcp_workaround_signed_windows;
>  extern int sysctl_tcp_slow_start_after_idle;
>  extern int sysctl_tcp_max_ssthresh;
> +extern int sysctl_tcp_simult_connect;
>
>  extern atomic_t tcp_memory_allocated;
>  extern atomic_t tcp_sockets_allocated;
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index e0689fd..d2a73ec 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -716,6 +716,14 @@ static struct ctl_table ipv4_table[] = {
>                .proc_handler   = &proc_dointvec,
>        },
>        {
> +               .ctl_name       = NET_TCP_SIMULT_CONNECT,
> +               .procname       = "tcp_simult_connect",
> +               .data           = &sysctl_tcp_simult_connect,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = &proc_dointvec,
> +       },
> +       {
>                .ctl_name       = CTL_UNNUMBERED,
>                .procname       = "udp_mem",
>                .data           = &sysctl_udp_mem,
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 67ccce2..932504e 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -87,6 +87,7 @@ int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
>  int sysctl_tcp_frto __read_mostly = 2;
>  int sysctl_tcp_frto_response __read_mostly;
>  int sysctl_tcp_nometrics_save __read_mostly;
> +int sysctl_tcp_simult_connect __read_mostly;
>
>  int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
>  int sysctl_tcp_abc __read_mostly;
> @@ -5149,10 +5150,12 @@ discard:
>            tcp_paws_check(&tp->rx_opt, 0))
>                goto discard_and_undo;
>
> -       if (th->syn) {
> +       if (th->syn && sysctl_tcp_simult_connect) {
>                /* We see SYN without ACK. It is attempt of
>                 * simultaneous connect with crossed SYNs.
>                 * Particularly, it can be connect to self.
> +                * This feature is disabled by default as it introduces a
> +                * weakness in the protocol. It can be enabled by a sysctl.
>                 */
>                tcp_set_state(sk, TCP_SYN_RECV);
>
> --
> 1.5.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
stephen hemminger Oct. 8, 2008, 11:54 a.m. UTC | #2
On Wed, 8 Oct 2008 10:11:09 +0200
Willy Tarreau <w@1wt.eu> wrote:

> Hi David,
> 
> I hope you had a pleasant journey in Paris last week. It was nice to meet you.
> 
> In 2005, I submitted a patch for 2.6.11 which we finally did not decide
> on, it was about the ability to disable TCP simultaneous connection open.
> 
> For the last few years, we've been bothered several times by newbies
> reading "TCP/IP for dummies" then trying to get a name by discovering
> the ultimate vulnerability which will get a lot of press coverage. Of
> course those attacks are often pointless or just proofs of concepts
> with no application in real life, but it's nonetheless annoying to have
> to deal with the issues, especially to explain to customers why they
> shouldn't have to worry.
> 
> I would not be surprized that a next one will exploit TCP's ability to
> perform simultaneous connections between two clients. It's very easy
> to trigger, there's no SEQ to guess, just a port, and the effect is
> simply a poor DoS on the service trying to connect outside. In other
> times we would have found it very minor, but judging by the consideration
> given to harder and less effective "attacks" these days, this trivial
> one may finally get picked and annoy us again.
> 
> As a reminder (especially for those who are not aware of this feature),
> it is possible with TCP to connect two clients together if both send
> crossed SYNs, then SYN-ACKs, then ACKs. This implies that each side
> accepts the sequence number of the other one without any ability to
> check that it matches its SYN. So it's trivial for an attacker to
> prevent one client from establishing a connection from a known port
> to a known address/port by sending it a SYN to that port. The client
> will then send a SYN-ACK and will not accept the expected server's
> SYN-ACK because the SYN SEQ will be different. The server might also
> send an RST on the client's SYN-ACK if it's not firewalled. The
> connection will eventually timeout in a SYN-RECV state or simply be
> aborted.
> 
> The theorical DoS effect on some predictable address/port destinations
> is easy to understand. Services with very few destination IP/ports such
> as software/signature updates, SMTP relaying, DNS clients for zone
> transfers, or SSH remote accesses are easy targets. In practice, the
> SYN would have to be sent after the client's SYN and before the server's
> SYN-ACK, which leaves a small time window limiting the attack to far
> remote, unfirewalled communications.
> 
> This is very easy to test, I'm used to do it between two netcats, and
> preventing the initial RST by unplugging the cable before sending the
> connects. I remember it also worked on Solaris 8, and I don't remember
> about BSDs (though I would not be surprized they support it too).
> 
> IMHO this feature is totally useless nowadays, because :
>   - if one of the machines is firewalled, the firewall will block it
>     (none of the firewalls I've tested among Netfilter, Checkpoint,
>      Cisco, Fortinet, Juniper supports simultaneous connect, and it
>      would cause a big security issue).
> 
>   - if neither machines are firewalled, then the SYN to a closed port
>     will immediately trigger an RST, making it very difficult to
>     establish a working connection.
> 
> For this reason, I'd like that we plan on merging the attached patch
> (or any variant) for 2.6.28 or 2.6.29 before a new random junkie comes
> aroung screaming loud he uncovered a big DoS hole in Linux TCP stack. The
> patch provides a sysctl allowing the user to enable or disable the feature.
> I've been running all my kernels with the code ifdef'ed out for the last
> 4-5 years, but I certainly can understand that some people want to be able
> to enable it for any reason (even for educationnal purposes), hence the
> sysctl.
> 
> I have rediffed the patch for 2.6.27-rc9 and successfully tested it (both
> with feature enabled and disabled). It disables the feature by default,
> but I have no problem with leaving it on and expecting that distros will
> ship it off.
> 
> By the way, during the tests I noticed something strange. While the
> socket is in SYN_RECV on the first side to receive the other one's
> SYN, it has a huge receive queue :
> 
> Active Internet connections (servers and established)
> Proto Recv-Q     Send-Q Local Address        Foreign Address      State
> tcp   3964220580      1 192.32.189.160:12346 192.32.189.228:4000  SYN_RECV
> 
> The value corresponds to the ACK value emitted, which is equal to the
> other end's SYN+1. I don't know if this is just an artefact of the way
> the queue size is reported (probably because the first ACK has not yet
> been considered since we're not in ESTABLISHED state) or if this can
> have any further impact (eg: unexpected memory freeing on termination,
> etc...).
> 
> Best regards,
> Willy

Does this break NAT traversal via STUNT used by applications like Skype?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 12:10 p.m. UTC | #3
Hi Stephen,

On Wed, Oct 08, 2008 at 01:54:02PM +0200, Stephen Hemminger wrote:
> On Wed, 8 Oct 2008 10:11:09 +0200
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > Hi David,
> > 
> > I hope you had a pleasant journey in Paris last week. It was nice to meet you.
> > 
> > In 2005, I submitted a patch for 2.6.11 which we finally did not decide
> > on, it was about the ability to disable TCP simultaneous connection open.
> > 
> > For the last few years, we've been bothered several times by newbies
> > reading "TCP/IP for dummies" then trying to get a name by discovering
> > the ultimate vulnerability which will get a lot of press coverage. Of
> > course those attacks are often pointless or just proofs of concepts
> > with no application in real life, but it's nonetheless annoying to have
> > to deal with the issues, especially to explain to customers why they
> > shouldn't have to worry.
> > 
> > I would not be surprized that a next one will exploit TCP's ability to
> > perform simultaneous connections between two clients. It's very easy
> > to trigger, there's no SEQ to guess, just a port, and the effect is
> > simply a poor DoS on the service trying to connect outside. In other
> > times we would have found it very minor, but judging by the consideration
> > given to harder and less effective "attacks" these days, this trivial
> > one may finally get picked and annoy us again.
> > 
> > As a reminder (especially for those who are not aware of this feature),
> > it is possible with TCP to connect two clients together if both send
> > crossed SYNs, then SYN-ACKs, then ACKs. This implies that each side
> > accepts the sequence number of the other one without any ability to
> > check that it matches its SYN. So it's trivial for an attacker to
> > prevent one client from establishing a connection from a known port
> > to a known address/port by sending it a SYN to that port. The client
> > will then send a SYN-ACK and will not accept the expected server's
> > SYN-ACK because the SYN SEQ will be different. The server might also
> > send an RST on the client's SYN-ACK if it's not firewalled. The
> > connection will eventually timeout in a SYN-RECV state or simply be
> > aborted.
> > 
> > The theorical DoS effect on some predictable address/port destinations
> > is easy to understand. Services with very few destination IP/ports such
> > as software/signature updates, SMTP relaying, DNS clients for zone
> > transfers, or SSH remote accesses are easy targets. In practice, the
> > SYN would have to be sent after the client's SYN and before the server's
> > SYN-ACK, which leaves a small time window limiting the attack to far
> > remote, unfirewalled communications.
> > 
> > This is very easy to test, I'm used to do it between two netcats, and
> > preventing the initial RST by unplugging the cable before sending the
> > connects. I remember it also worked on Solaris 8, and I don't remember
> > about BSDs (though I would not be surprized they support it too).
> > 
> > IMHO this feature is totally useless nowadays, because :
> >   - if one of the machines is firewalled, the firewall will block it
> >     (none of the firewalls I've tested among Netfilter, Checkpoint,
> >      Cisco, Fortinet, Juniper supports simultaneous connect, and it
> >      would cause a big security issue).
> > 
> >   - if neither machines are firewalled, then the SYN to a closed port
> >     will immediately trigger an RST, making it very difficult to
> >     establish a working connection.
> > 
> > For this reason, I'd like that we plan on merging the attached patch
> > (or any variant) for 2.6.28 or 2.6.29 before a new random junkie comes
> > aroung screaming loud he uncovered a big DoS hole in Linux TCP stack. The
> > patch provides a sysctl allowing the user to enable or disable the feature.
> > I've been running all my kernels with the code ifdef'ed out for the last
> > 4-5 years, but I certainly can understand that some people want to be able
> > to enable it for any reason (even for educationnal purposes), hence the
> > sysctl.
> > 
> > I have rediffed the patch for 2.6.27-rc9 and successfully tested it (both
> > with feature enabled and disabled). It disables the feature by default,
> > but I have no problem with leaving it on and expecting that distros will
> > ship it off.
> > 
> > By the way, during the tests I noticed something strange. While the
> > socket is in SYN_RECV on the first side to receive the other one's
> > SYN, it has a huge receive queue :
> > 
> > Active Internet connections (servers and established)
> > Proto Recv-Q     Send-Q Local Address        Foreign Address      State
> > tcp   3964220580      1 192.32.189.160:12346 192.32.189.228:4000  SYN_RECV
> > 
> > The value corresponds to the ACK value emitted, which is equal to the
> > other end's SYN+1. I don't know if this is just an artefact of the way
> > the queue size is reported (probably because the first ACK has not yet
> > been considered since we're not in ESTABLISHED state) or if this can
> > have any further impact (eg: unexpected memory freeing on termination,
> > etc...).
> > 
> > Best regards,
> > Willy
> 
> Does this break NAT traversal via STUNT used by applications like Skype?

Not IMHO. I've seen UDP-based protocols used for NAT traversal, since it's
stateless and the firewall is not able to perform much control, but TCP
would not work with most firewalls (starting with netfilter) because the
firewall will not accept a SYN in return to a SYN.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Oct. 8, 2008, 12:16 p.m. UTC | #4
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index d0437f3..0e23062 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -435,6 +435,7 @@ enum
>  	NET_TCP_ALLOWED_CONG_CONTROL=123,
>  	NET_TCP_MAX_SSTHRESH=124,
>  	NET_TCP_FRTO_RESPONSE=125,
> +	NET_TCP_SIMULT_CONNECT=126,

Minor - sysctl syscall (as long as numbers and ->strategy callbacks) are
about to be removed some time, so new sysctls are to be CTL_UNNUMBERED.

>  };
>  
>  enum {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 12:26 p.m. UTC | #5
Hi Pavel,

On Wed, Oct 08, 2008 at 04:16:44PM +0400, Pavel Emelyanov wrote:
> > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> > index d0437f3..0e23062 100644
> > --- a/include/linux/sysctl.h
> > +++ b/include/linux/sysctl.h
> > @@ -435,6 +435,7 @@ enum
> >  	NET_TCP_ALLOWED_CONG_CONTROL=123,
> >  	NET_TCP_MAX_SSTHRESH=124,
> >  	NET_TCP_FRTO_RESPONSE=125,
> > +	NET_TCP_SIMULT_CONNECT=126,
> 
> Minor - sysctl syscall (as long as numbers and ->strategy callbacks) are
> about to be removed some time, so new sysctls are to be CTL_UNNUMBERED.

I noticed that one in the two latest UDP sysctls, but was not aware of the
rules related to its use. Given the number of unconverted sysctls, is it
worth changing it now ?

Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Oct. 8, 2008, 12:32 p.m. UTC | #6
Willy Tarreau wrote:
> Hi Pavel,
> 
> On Wed, Oct 08, 2008 at 04:16:44PM +0400, Pavel Emelyanov wrote:
>>> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
>>> index d0437f3..0e23062 100644
>>> --- a/include/linux/sysctl.h
>>> +++ b/include/linux/sysctl.h
>>> @@ -435,6 +435,7 @@ enum
>>>  	NET_TCP_ALLOWED_CONG_CONTROL=123,
>>>  	NET_TCP_MAX_SSTHRESH=124,
>>>  	NET_TCP_FRTO_RESPONSE=125,
>>> +	NET_TCP_SIMULT_CONNECT=126,
>> Minor - sysctl syscall (as long as numbers and ->strategy callbacks) are
>> about to be removed some time, so new sysctls are to be CTL_UNNUMBERED.
> 
> I noticed that one in the two latest UDP sysctls, but was not aware of the
> rules related to its use. Given the number of unconverted sysctls, is it
> worth changing it now ?

Sorry, I don't get you here. Sysctls are not supposed to get converted,
the intention is just to a) stop adding new sysctls with strategies and
b) drop all the existing strategies in the future.

> Willy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 12:56 p.m. UTC | #7
On Wed, Oct 08, 2008 at 04:32:29PM +0400, Pavel Emelyanov wrote:
> Willy Tarreau wrote:
> > Hi Pavel,
> > 
> > On Wed, Oct 08, 2008 at 04:16:44PM +0400, Pavel Emelyanov wrote:
> >>> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> >>> index d0437f3..0e23062 100644
> >>> --- a/include/linux/sysctl.h
> >>> +++ b/include/linux/sysctl.h
> >>> @@ -435,6 +435,7 @@ enum
> >>>  	NET_TCP_ALLOWED_CONG_CONTROL=123,
> >>>  	NET_TCP_MAX_SSTHRESH=124,
> >>>  	NET_TCP_FRTO_RESPONSE=125,
> >>> +	NET_TCP_SIMULT_CONNECT=126,
> >> Minor - sysctl syscall (as long as numbers and ->strategy callbacks) are
> >> about to be removed some time, so new sysctls are to be CTL_UNNUMBERED.
> > 
> > I noticed that one in the two latest UDP sysctls, but was not aware of the
> > rules related to its use. Given the number of unconverted sysctls, is it
> > worth changing it now ?
> 
> Sorry, I don't get you here. Sysctls are not supposed to get converted,
> the intention is just to a) stop adding new sysctls with strategies and

> b) drop all the existing strategies in the future.

If existing declarations eventually get changed, that's what I call a
conversion :-)

OK I get you here. The patch will be resent with CTL_UNNUMBERED if there is
an agreement to get it merged.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Randy Dunlap Oct. 8, 2008, 3:50 p.m. UTC | #8
On Wed, 8 Oct 2008 10:11:09 +0200 Willy Tarreau wrote:

>  Documentation/networking/ip-sysctl.txt |   22 ++++++++++++++++++++++
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index d849326..cefc894 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -101,6 +101,28 @@ inet_peer_gc_maxtime - INTEGER
>  
>  TCP variables: 
>  
> +tcp_simult_connect - BOOLEAN
> +	Enables TCP simultaneous connect feature conforming to RFC793.
> +	Strict implementation of RFC793 (TCP) requires support for a feature
> +	called "simultaneous connect", which allows two clients to connect to
> +	each other without anyone entering a listening state.  While almost
> +	never used, and supported by few OSes, Linux supports this feature.
> +
> +	However, it introduces a weakness in the protocol which makes it very
> +	easy for an attacker to prevent a client from connecting to a known
> +	server. The attacker only has to guess the source port to shut down
> +	the client connection during its establishment. The impact is limited,
> +	but it may be used to prevent an antivirus or IPS from fetching updates
> +	and not detecting an attack, or to prevent an SSL gateway from fetching
> +	a CRL for example.
> +
> +	If you want absolute compatibility with any possible application,
> +	you should set it to 1. If you prefer to enhance security on your
> +	systems you'd better let it to 0. After four years of usage on

	                     set it to 0.
or did you mean: (?)
	                     let it be 0.

> +	hundreds of systems, no application was ever found to require this
> +	feature, which is not even supported by most firewalls.
> +	Default: 0
> +
>  somaxconn - INTEGER
>  	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
>  	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning


---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 4:04 p.m. UTC | #9
On Wed, Oct 08, 2008 at 08:50:34AM -0700, Randy Dunlap wrote:
> > +	If you want absolute compatibility with any possible application,
> > +	you should set it to 1. If you prefer to enhance security on your
> > +	systems you'd better let it to 0. After four years of usage on
> 
> 	                     set it to 0.
> or did you mean: (?)
> 	                     let it be 0.

I meant "leave it" but I'd prefer "let it be 0".

Thanks Randy,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 8, 2008, 4:42 p.m. UTC | #10
I'm not really interested in applying this.

You can make the same argument for just about any queer corner case
aspect of TCP.

The world hasn't ended even though this patch hasn't applied since
2005 and that's a good argument to not apply "yet another random knob"
to the kernel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 5:07 p.m. UTC | #11
On Wed, Oct 08, 2008 at 09:42:12AM -0700, David Miller wrote:
> 
> I'm not really interested in applying this.

OK.

> You can make the same argument for just about any queer corner case
> aspect of TCP.

Not exactly since there are very few corner cases not conditionned by
guessing a 32-bit sequence number, and this one certainly is.

> The world hasn't ended even though this patch hasn't applied since
> 2005

Oh don't take me wrong. I'm pretty much convinced this is not critical at
all. It's just that it's a tempting attack vector for an easy DoS proof
of concept relying on a feature not even usable today.

> and that's a good argument to not apply "yet another random knob"
> to the kernel.

OK for the random knob. If you feel comfortable with a patch to simply remove
the feature, I can send you such a patch too. BTW, the strange netstat output
I got also shows that the code paths involved to support this feature are not
much exercised, which would be another argument to disable it.

Anyway, I'm not trying to put any pressure. I can continue to live with the
ifdef patch in my trees. I just wanted to get your opinion on this one since
the discussion I initiated about it in 2005 brought to nowhere.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 8, 2008, 6:21 p.m. UTC | #12
From: Willy Tarreau <w@1wt.eu>
Date: Wed, 8 Oct 2008 19:07:39 +0200

> On Wed, Oct 08, 2008 at 09:42:12AM -0700, David Miller wrote:
> > and that's a good argument to not apply "yet another random knob"
> > to the kernel.
> 
> OK for the random knob. If you feel comfortable with a patch to simply remove
> the feature, I can send you such a patch too.

If I don't want the knob, what makes you think I'll accept a patch to
remove or ifdef the code?

I absolutely will not.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Oct. 8, 2008, 6:24 p.m. UTC | #13
On Wed, Oct 08, 2008 at 11:21:33AM -0700, David Miller wrote:
> From: Willy Tarreau <w@1wt.eu>
> Date: Wed, 8 Oct 2008 19:07:39 +0200
> 
> > On Wed, Oct 08, 2008 at 09:42:12AM -0700, David Miller wrote:
> > > and that's a good argument to not apply "yet another random knob"
> > > to the kernel.
> > 
> > OK for the random knob. If you feel comfortable with a patch to simply remove
> > the feature, I can send you such a patch too.
> 
> If I don't want the knob, what makes you think I'll accept a patch to
> remove or ifdef the code?
> 
> I absolutely will not.

OK at least now it's clear.

Thanks anyway,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen Oct. 9, 2008, 3:49 a.m. UTC | #14
Willy Tarreau <w@1wt.eu> writes:
> As a reminder (especially for those who are not aware of this feature),
> it is possible with TCP to connect two clients together if both send
> crossed SYNs, then SYN-ACKs, then ACKs. This implies that each side
> accepts the sequence number of the other one without any ability to
> check that it matches its SYN. So it's trivial for an attacker to
> prevent one client from establishing a connection from a known port
> to a known address/port by sending it a SYN to that port. The client
> will then send a SYN-ACK and will not accept the expected server's
> SYN-ACK because the SYN SEQ will be different. The server might also
> send an RST on the client's SYN-ACK if it's not firewalled. The
> connection will eventually timeout in a SYN-RECV state or simply be
> aborted.

One reasonable tweak for this would be to use a very aggressive
timeout for simultaneously opened connections that didn't get an ACK
yet.

But to be honest it doesn't seem like a very pressing problem to me either.

-Andi
RĂ©mi Denis-Courmont Oct. 10, 2008, 8:44 a.m. UTC | #15
On Friday 10 October 2008 11:10:22 ext Willy Tarreau, you wrote:
> > Duh? If you require a SYN from the outside to the server, before you
> > allow the server to send either SYN or SYN/ACK, I fail to see the
> > problem.
>
> Requiring the firewall to expect a first SYN to come from the internet is
> like doing no check at all.

On ports which are open to the outside you MUST allow inbound SYNs anyway. 
From a security perspective, it does not matter whether the server answers 
with a SYN/ACK as normally or with a SYN-not-ACK as in "simultaneous open".

On ports which the server is using outbound only (if any), you can expect the 
server to send a SYN out first. It again does not matter whether the other 
end answers with a SYN/ACK or a SYN-not-ACK.

On other ports, a plain dumb stateless blackhole will do.

> When your server has been rooted, you can 
> pretty much expect that your guest has no problem sending you a SYN.

And why would (s)he have problem sending a SYN/ACK? It makes no difference.
diff mbox

Patch

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d849326..cefc894 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -101,6 +101,28 @@  inet_peer_gc_maxtime - INTEGER
 
 TCP variables: 
 
+tcp_simult_connect - BOOLEAN
+	Enables TCP simultaneous connect feature conforming to RFC793.
+	Strict implementation of RFC793 (TCP) requires support for a feature
+	called "simultaneous connect", which allows two clients to connect to
+	each other without anyone entering a listening state.  While almost
+	never used, and supported by few OSes, Linux supports this feature.
+
+	However, it introduces a weakness in the protocol which makes it very
+	easy for an attacker to prevent a client from connecting to a known
+	server. The attacker only has to guess the source port to shut down
+	the client connection during its establishment. The impact is limited,
+	but it may be used to prevent an antivirus or IPS from fetching updates
+	and not detecting an attack, or to prevent an SSL gateway from fetching
+	a CRL for example.
+
+	If you want absolute compatibility with any possible application,
+	you should set it to 1. If you prefer to enhance security on your
+	systems you'd better let it to 0. After four years of usage on
+	hundreds of systems, no application was ever found to require this
+	feature, which is not even supported by most firewalls.
+	Default: 0
+
 somaxconn - INTEGER
 	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
 	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index d0437f3..0e23062 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -435,6 +435,7 @@  enum
 	NET_TCP_ALLOWED_CONG_CONTROL=123,
 	NET_TCP_MAX_SSTHRESH=124,
 	NET_TCP_FRTO_RESPONSE=125,
+	NET_TCP_SIMULT_CONNECT=126,
 };
 
 enum {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8983386..c61fe3c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -236,6 +236,7 @@  extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
 extern int sysctl_tcp_max_ssthresh;
+extern int sysctl_tcp_simult_connect;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e0689fd..d2a73ec 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -716,6 +716,14 @@  static struct ctl_table ipv4_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 	{
+		.ctl_name	= NET_TCP_SIMULT_CONNECT,
+		.procname	= "tcp_simult_connect",
+		.data		= &sysctl_tcp_simult_connect,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "udp_mem",
 		.data		= &sysctl_udp_mem,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 67ccce2..932504e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -87,6 +87,7 @@  int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_frto_response __read_mostly;
 int sysctl_tcp_nometrics_save __read_mostly;
+int sysctl_tcp_simult_connect __read_mostly;
 
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_abc __read_mostly;
@@ -5149,10 +5150,12 @@  discard:
 	    tcp_paws_check(&tp->rx_opt, 0))
 		goto discard_and_undo;
 
-	if (th->syn) {
+	if (th->syn && sysctl_tcp_simult_connect) {
 		/* We see SYN without ACK. It is attempt of
 		 * simultaneous connect with crossed SYNs.
 		 * Particularly, it can be connect to self.
+		 * This feature is disabled by default as it introduces a
+		 * weakness in the protocol. It can be enabled by a sysctl.
 		 */
 		tcp_set_state(sk, TCP_SYN_RECV);