diff mbox

[RFC,2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods

Message ID 1338360073.2760.81.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 30, 2012, 6:41 a.m. UTC
On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:

> So basically handling syncookie lockless? 
> 
> Makes sense. Syncookies is a bit obsolete these days of course, due
> to the lack of options. But may be still useful for this.
> 
> Obviously you'll need to clean up the patch and support IPv6,
> but the basic idea looks good to me.

Also TCP Fast Open should be a good way to make the SYN flood no more
effective.

Yuchung Cheng and Jerry Chu should upstream this code in a very near
future.

Another way to mitigate SYN scalability issues before the full RCU
solution I was cooking is to either :

1) Use a hardware filter (like on Intel NICS) to force all SYN packets
going to one queue (so that they are all serviced on one CPU)

2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
dependent on src port/address, to get same effect (All SYN packets
processed by one cpu). Note this only address the SYN flood problem, not
the general 3WHS scalability one, since if real connection is
established, the third packet (ACK from client) will have the 'real'
rxhash and will be processed by another cpu.

(Of course, RPS must be enabled to benefit from this)

Untested patch to get the idea :

 include/net/flow_keys.h   |    1 +
 net/core/dev.c            |    8 ++++++++
 net/core/flow_dissector.c |    9 +++++++++
 3 files changed, 18 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jesper Dangaard Brouer May 30, 2012, 7:45 a.m. UTC | #1
On Wed, 2012-05-30 at 08:41 +0200, Eric Dumazet wrote:
> On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:
> 
> > So basically handling syncookie lockless? 
> > 
> > Makes sense. Syncookies is a bit obsolete these days of course, due
> > to the lack of options. But may be still useful for this.
> > 
> > Obviously you'll need to clean up the patch and support IPv6,
> > but the basic idea looks good to me.
> 
> Also TCP Fast Open should be a good way to make the SYN flood no more
> effective.

Sounds interesting, but TCP Fast Open is primarily concerned with
enabling data exchange during SYN establishment.  I don't see any
indication that they have implemented parallel SYN handling.

Implementing parallel SYN handling, should also benefit their work.
After studying this code path, I also see great performance benefit in
also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
they are very entangled at the moment AFAIKS.

> Yuchung Cheng and Jerry Chu should upstream this code in a very near
> future.

Looking forward to see the code, and the fallout discussions, on
transferring data on SYN packets.


> Another way to mitigate SYN scalability issues before the full RCU
> solution I was cooking is to either :
> 
> 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> going to one queue (so that they are all serviced on one CPU)
> 
> 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> dependent on src port/address, to get same effect (All SYN packets
> processed by one cpu). Note this only address the SYN flood problem, not
> the general 3WHS scalability one, since if real connection is
> established, the third packet (ACK from client) will have the 'real'
> rxhash and will be processed by another cpu.

I don't like the idea of overloading one CPU with SYN packets. As the
attacker can still cause a DoS on new connections.

My "unlocked" parallel SYN cookie approach, should favor established
connections, as they are allowed to run under a BH lock, and thus don't
let new SYN packets in (on this CPU), until the establish conn packet is
finished.  Unless I have misunderstood something... I think I have,
established connections have their own/seperate struck sock, and thus
this is another slock spinlock, right?. (Well let Eric bash me for
this ;-))

[...cut...]

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom May 30, 2012, 8:03 a.m. UTC | #2
On Wednesday 30 May 2012 08:41:13 Eric Dumazet wrote:
> On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:
> 
> > So basically handling syncookie lockless? 
> > 
> > Makes sense. Syncookies is a bit obsolete these days of course, due
> > to the lack of options. But may be still useful for this.
> > 
> > Obviously you'll need to clean up the patch and support IPv6,
> > but the basic idea looks good to me.
> 
> Also TCP Fast Open should be a good way to make the SYN flood no more
> effective.
> 
> Yuchung Cheng and Jerry Chu should upstream this code in a very near
> future.
> 
> Another way to mitigate SYN scalability issues before the full RCU
> solution I was cooking is to either :
> 
> 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> going to one queue (so that they are all serviced on one CPU)

We have this option running right now, and it gave slightly higher values.
The upside is only one core is running at 100% load.

To be able to process more SYN an attempt was made to spread them with RPS to 
2 other cores gave 60% more SYN:s per sec
i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
Adding more cores than two didn't help that much.

> 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> dependent on src port/address, to get same effect (All SYN packets
> processed by one cpu). Note this only address the SYN flood problem, not
> the general 3WHS scalability one, since if real connection is
> established, the third packet (ACK from client) will have the 'real'
> rxhash and will be processed by another cpu.

Neither the NIC:s SYN filter or this scale that well..

> (Of course, RPS must be enabled to benefit from this)
> 
> Untested patch to get the idea :
> 
>  include/net/flow_keys.h   |    1 +
>  net/core/dev.c            |    8 ++++++++
>  net/core/flow_dissector.c |    9 +++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 80461c1..b5bae21 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -10,6 +10,7 @@ struct flow_keys {
>  		__be16 port16[2];
>  	};
>  	u8 ip_proto;
> +	u8 tcpflags;
>  };
>  
>  extern bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd09819..c9c039e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -135,6 +135,7 @@
>  #include <linux/net_tstamp.h>
>  #include <linux/static_key.h>
>  #include <net/flow_keys.h>
> +#include <net/tcp.h>
>  
>  #include "net-sysfs.h"
>  
> @@ -2614,6 +2615,12 @@ void __skb_get_rxhash(struct sk_buff *skb)
>  		return;
>  
>  	if (keys.ports) {
> +		if ((keys.tcpflags & (TCPHDR_SYN | TCPHDR_ACK)) == TCPHDR_SYN) {
> +			hash = jhash_2words((__force u32)keys.dst,
> +					    (__force u32)keys.port16[1],
> +					    hashrnd);
> +			goto end;
> +		}
>  		if ((__force u16)keys.port16[1] < (__force u16)keys.port16[0])
>  			swap(keys.port16[0], keys.port16[1]);
>  		skb->l4_rxhash = 1;
> @@ -2626,6 +2633,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
>  	hash = jhash_3words((__force u32)keys.dst,
>  			    (__force u32)keys.src,
>  			    (__force u32)keys.ports, hashrnd);
> +end:
>  	if (!hash)
>  		hash = 1;
>  
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index a225089..cd4aedf 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -137,6 +137,15 @@ ipv6:
>  		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
>  		if (ports)
>  			flow->ports = *ports;
> +		if (ip_proto == IPPROTO_TCP) {
> +			__u8 *tcpflags, _tcpflags;
> +
> +			tcpflags = skb_header_pointer(skb, nhoff + 13,
> +						      sizeof(_tcpflags),
> +						      &_tcpflags);
> +			if (tcpflags)
> +				flow->tcpflags = *tcpflags;
> +		}
>  	}
>  
>  	return true;
> 
> 
>
Eric Dumazet May 30, 2012, 8:15 a.m. UTC | #3
On Wed, 2012-05-30 at 09:45 +0200, Jesper Dangaard Brouer wrote:

> Sounds interesting, but TCP Fast Open is primarily concerned with
> enabling data exchange during SYN establishment.  I don't see any
> indication that they have implemented parallel SYN handling.
> 

Not at all, TCP fast open main goal is to allow connection establishment
with a single packet (thus removing one RTT). This also removes the
whole idea of having half-sockets (in SYN_RCV state)

Then, allowing DATA in the SYN packet is an extra bonus, only if the
whole request can fit in the packet (it is unlikely for typical http
requests)


> Implementing parallel SYN handling, should also benefit their work.

Why do you think I am working on this ? Hint : I am a Google coworker.

> After studying this code path, I also see great performance benefit in
> also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
> Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
> they are very entangled at the moment AFAIKS.
> 
> > Yuchung Cheng and Jerry Chu should upstream this code in a very near
> > future.
> 
> Looking forward to see the code, and the fallout discussions, on
> transferring data on SYN packets.
> 

Problem is this code will be delayed if we change net-next code in this
area, because we'll have to rebase and retest everything.

> 
> > Another way to mitigate SYN scalability issues before the full RCU
> > solution I was cooking is to either :
> > 
> > 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> > going to one queue (so that they are all serviced on one CPU)
> > 
> > 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> > dependent on src port/address, to get same effect (All SYN packets
> > processed by one cpu). Note this only address the SYN flood problem, not
> > the general 3WHS scalability one, since if real connection is
> > established, the third packet (ACK from client) will have the 'real'
> > rxhash and will be processed by another cpu.
> 
> I don't like the idea of overloading one CPU with SYN packets. As the
> attacker can still cause a DoS on new connections.
> 

One CPU can handle more than one million SYN per second, while 32 cpus
fighting on socket lock can not handle 1 % of this load.

If Intel chose to implement this hardware filter in their NIC, its for a
good reason.


> My "unlocked" parallel SYN cookie approach, should favor established
> connections, as they are allowed to run under a BH lock, and thus don't
> let new SYN packets in (on this CPU), until the establish conn packet is
> finished.  Unless I have misunderstood something... I think I have,
> established connections have their own/seperate struck sock, and thus
> this is another slock spinlock, right?. (Well let Eric bash me for
> this ;-))

It seems you forgot I have patches to have full parallelism, not only
the SYNCOOKIE hack.

I am still polishing them, its a _long_ process, especially if network
tree changes a lot.

If you believe you can beat me on this, please let me know so that I can
switch to other tasks.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 30, 2012, 8:24 a.m. UTC | #4
On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:

> We have this option running right now, and it gave slightly higher values.
> The upside is only one core is running at 100% load.
> 
> To be able to process more SYN an attempt was made to spread them with RPS to 
> 2 other cores gave 60% more SYN:s per sec
> i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
> adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
> Adding more cores than two didn't help that much.

When you say 52.000 pkt/s, is that for fully established sockets, or
SYNFLOOD ?

19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
is no contention on listener socket.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer May 30, 2012, 9:24 a.m. UTC | #5
On Wed, 2012-05-30 at 10:15 +0200, Eric Dumazet wrote:
> On Wed, 2012-05-30 at 09:45 +0200, Jesper Dangaard Brouer wrote:
> 
> > Sounds interesting, but TCP Fast Open is primarily concerned with
> > enabling data exchange during SYN establishment.  I don't see any
> > indication that they have implemented parallel SYN handling.
> > 
> 
> Not at all, TCP fast open main goal is to allow connection establishment
> with a single packet (thus removing one RTT). This also removes the
> whole idea of having half-sockets (in SYN_RCV state)
> 
> Then, allowing DATA in the SYN packet is an extra bonus, only if the
> whole request can fit in the packet (it is unlikely for typical http
> requests)
> 
> 
> > Implementing parallel SYN handling, should also benefit their work.
> 
> Why do you think I am working on this ? Hint : I am a Google coworker.

Did know you work for Google, but didn't know you worked actively on
parallel SYN handling.  Your previous quote "eventually in a short
time", indicated to me, that I should solve the issue my self first, and
then we would replace my code with your full solution later.


> > After studying this code path, I also see great performance benefit in
> > also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
> > Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
> > they are very entangled at the moment AFAIKS.
> > 
> > > Yuchung Cheng and Jerry Chu should upstream this code in a very near
> > > future.
> > 
> > Looking forward to see the code, and the fallout discussions, on
> > transferring data on SYN packets.
> > 
> 
> Problem is this code will be delayed if we change net-next code in this
> area, because we'll have to rebase and retest everything.

Okay, don't want to delay your work.  We can wait merging my cleanup
patches, and I can take the pain of rebasing them after your work is
merged.  And then we will see if my performance patches have gotten
obsolete.

I'm going to post some updated v2 patches, just because I know some
people that are desperate for a quick solution to their DDoS issues, and
are willing patch their kernels for production.

 
> > > Another way to mitigate SYN scalability issues before the full RCU
> > > solution I was cooking is to either :
> > > 
> > > 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> > > going to one queue (so that they are all serviced on one CPU)
> > > 
> > > 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> > > dependent on src port/address, to get same effect (All SYN packets
> > > processed by one cpu). Note this only address the SYN flood problem, not
> > > the general 3WHS scalability one, since if real connection is
> > > established, the third packet (ACK from client) will have the 'real'
> > > rxhash and will be processed by another cpu.
> > 
> > I don't like the idea of overloading one CPU with SYN packets. As the
> > attacker can still cause a DoS on new connections.
> > 
> 
> One CPU can handle more than one million SYN per second, while 32 cpus
> fighting on socket lock can not handle 1 % of this load.

Not sure, one CPU can handle 1Mpps on this particular path.  And Hans
have some other measurements, although I'm assuming he has small CPUs.
But if you are working on the real solution, we don't need to discuss
this :-)


> If Intel chose to implement this hardware filter in their NIC, its for a
> good reason.
> 
> 
> > My "unlocked" parallel SYN cookie approach, should favor established
> > connections, as they are allowed to run under a BH lock, and thus don't
> > let new SYN packets in (on this CPU), until the establish conn packet is
> > finished.  Unless I have misunderstood something... I think I have,
> > established connections have their own/seperate struck sock, and thus
> > this is another slock spinlock, right?. (Well let Eric bash me for
> > this ;-))
> 
> It seems you forgot I have patches to have full parallelism, not only
> the SYNCOOKIE hack.

I'm so much, looking forward to this :-)

> I am still polishing them, its a _long_ process, especially if network
> tree changes a lot.
> 
> If you believe you can beat me on this, please let me know so that I can
> switch to other tasks.

I don't dare to go into that battle with the network ninja, I surrender.
DaveM, Eric's patches take precedence over mine...

/me Crawing back into my cave, and switching to boring bugzilla cases of
backporting kernel patches instead...
Eric Dumazet May 30, 2012, 9:46 a.m. UTC | #6
On Wed, 2012-05-30 at 11:24 +0200, Jesper Dangaard Brouer wrote:

> I don't dare to go into that battle with the network ninja, I surrender.
> DaveM, Eric's patches take precedence over mine...
> 
> /me Crawing back into my cave, and switching to boring bugzilla cases of
> backporting kernel patches instead...
> 

Hey, I only wanted to say that we were working on the same area and that
we should expect conflicts.

In the long term, we want a scalable listener solution, but I can
understand if some customers want an immediate solution (SYN flood
mitigation)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom May 30, 2012, 11:14 a.m. UTC | #7
On Wednesday 30 May 2012 10:24:48 Eric Dumazet wrote:
> On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:
> 
> > We have this option running right now, and it gave slightly higher values.
> > The upside is only one core is running at 100% load.
> > 
> > To be able to process more SYN an attempt was made to spread them with RPS to 
> > 2 other cores gave 60% more SYN:s per sec
> > i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
> > adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
> > Adding more cores than two didn't help that much.
> 
> When you say 52.000 pkt/s, is that for fully established sockets, or
> SYNFLOOD ?

SYN Flood with hping3  random source ip, dest port 5060
and there is a listener on that port.
(kernel 3.0.13)

> 19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
> is no contention on listener socket.
> 

BTW. 
I also see a strange behavior during SYN flood.
The client starts data sending directly in the ack, 
and that first packet is more or less always retransmitted once.

I'll dig into that later, or do anyone have an idea of the reason ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rick Jones May 30, 2012, 9:20 p.m. UTC | #8
On 05/30/2012 01:24 AM, Eric Dumazet wrote:
> On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:
>
>> We have this option running right now, and it gave slightly higher values.
>> The upside is only one core is running at 100% load.
>>
>> To be able to process more SYN an attempt was made to spread them with RPS to
>> 2 other cores gave 60% more SYN:s per sec
>> i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
>> adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
>> Adding more cores than two didn't help that much.
>
> When you say 52.000 pkt/s, is that for fully established sockets, or
> SYNFLOOD ?
>
> 19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
> is no contention on listener socket.

It may still be high, but a very quick netperf TCP_CC test over loopback 
on a W3550 system running a 2.6.38 kernel shows:

raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
localhost.localdomain () port 0 AF_INET
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr

16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
16384  87380

57 microseconds per "transaction" which in this case is establishing and 
tearing-down the connection, with nothing else (no data packets) makes 
19 microseconds for a SYN seem perhaps not all that beyond the realm of 
possibility?

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 31, 2012, 8:28 a.m. UTC | #9
On Wed, 2012-05-30 at 14:20 -0700, Rick Jones wrote:

> It may still be high, but a very quick netperf TCP_CC test over loopback 
> on a W3550 system running a 2.6.38 kernel shows:
> 
> raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
> TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> localhost.localdomain () port 0 AF_INET
> Local /Remote
> Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr
> 
> 16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
> 16384  87380
> 
> 57 microseconds per "transaction" which in this case is establishing and 
> tearing-down the connection, with nothing else (no data packets) makes 
> 19 microseconds for a SYN seem perhaps not all that beyond the realm of 
> possibility?

Thats a different story, on loopback device (without stressing IP route
cache by the way)

Your netperf test is a full userspace transactions, and 5 frames per
transaction. Two sockets creation/destruction, process scheduler
activations, and not enter syncookie mode.

In case of synflood/(syncookies on), we receive a packet and send one
from softirq.

One expensive thing might be the md5 to compute the SYNACK sequence.

I suspect other things :

1) Of course we have to take into account the timer responsible for
SYNACK retransmits of previously queued requests. Its cost depends on
the listen backlog. When this timer runs, listen socket is locked.

2) IP route cache overflows.
   In case of SYNFLOOD, we should not store dst(s) in route cache but
destroy them immediately.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom May 31, 2012, 8:45 a.m. UTC | #10
On Thursday 31 May 2012 10:28:37 Eric Dumazet wrote:
> On Wed, 2012-05-30 at 14:20 -0700, Rick Jones wrote:
> 
> > It may still be high, but a very quick netperf TCP_CC test over loopback 
> > on a W3550 system running a 2.6.38 kernel shows:
> > 
> > raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
> > TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > localhost.localdomain () port 0 AF_INET
> > Local /Remote
> > Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> > Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> > bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr
> > 
> > 16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
> > 16384  87380
> > 
> > 57 microseconds per "transaction" which in this case is establishing and 
> > tearing-down the connection, with nothing else (no data packets) makes 
> > 19 microseconds for a SYN seem perhaps not all that beyond the realm of 
> > possibility?
> 
> Thats a different story, on loopback device (without stressing IP route
> cache by the way)
> 
> Your netperf test is a full userspace transactions, and 5 frames per
> transaction. Two sockets creation/destruction, process scheduler
> activations, and not enter syncookie mode.
> 
> In case of synflood/(syncookies on), we receive a packet and send one
> from softirq.
> 
> One expensive thing might be the md5 to compute the SYNACK sequence.
> 
> I suspect other things :
> 
> 1) Of course we have to take into account the timer responsible for
> SYNACK retransmits of previously queued requests. Its cost depends on
> the listen backlog. When this timer runs, listen socket is locked.
> 
> 2) IP route cache overflows.
>    In case of SYNFLOOD, we should not store dst(s) in route cache but
> destroy them immediately.
> 
I can see plenty "IPv4: dst cache overflow"

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 31, 2012, 2:09 p.m. UTC | #11
On Thu, 2012-05-31 at 10:45 +0200, Hans Schillstrom wrote:

> I can see plenty "IPv4: dst cache overflow"
> 

This is probably the most problematic problem in DDOS attacks.

I have a patch for this problem.

Idea is to not cache dst entries for following cases :

1) Input dst, if listener queue is full (syncookies possibly engaged)

2) Output dst of SYNACK messages.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom May 31, 2012, 3:31 p.m. UTC | #12
On Thursday 31 May 2012 16:09:21 Eric Dumazet wrote:
> On Thu, 2012-05-31 at 10:45 +0200, Hans Schillstrom wrote:
> 
> > I can see plenty "IPv4: dst cache overflow"
> > 
> 
> This is probably the most problematic problem in DDOS attacks.
> 
> I have a patch for this problem.
> 
> Idea is to not cache dst entries for following cases :
> 
> 1) Input dst, if listener queue is full (syncookies possibly engaged)
> 
> 2) Output dst of SYNACK messages.
> 
Sound like a good idea, 
if you need some testing just the patches
diff mbox

Patch

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 80461c1..b5bae21 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -10,6 +10,7 @@  struct flow_keys {
 		__be16 port16[2];
 	};
 	u8 ip_proto;
+	u8 tcpflags;
 };
 
 extern bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..c9c039e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -135,6 +135,7 @@ 
 #include <linux/net_tstamp.h>
 #include <linux/static_key.h>
 #include <net/flow_keys.h>
+#include <net/tcp.h>
 
 #include "net-sysfs.h"
 
@@ -2614,6 +2615,12 @@  void __skb_get_rxhash(struct sk_buff *skb)
 		return;
 
 	if (keys.ports) {
+		if ((keys.tcpflags & (TCPHDR_SYN | TCPHDR_ACK)) == TCPHDR_SYN) {
+			hash = jhash_2words((__force u32)keys.dst,
+					    (__force u32)keys.port16[1],
+					    hashrnd);
+			goto end;
+		}
 		if ((__force u16)keys.port16[1] < (__force u16)keys.port16[0])
 			swap(keys.port16[0], keys.port16[1]);
 		skb->l4_rxhash = 1;
@@ -2626,6 +2633,7 @@  void __skb_get_rxhash(struct sk_buff *skb)
 	hash = jhash_3words((__force u32)keys.dst,
 			    (__force u32)keys.src,
 			    (__force u32)keys.ports, hashrnd);
+end:
 	if (!hash)
 		hash = 1;
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index a225089..cd4aedf 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -137,6 +137,15 @@  ipv6:
 		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
 		if (ports)
 			flow->ports = *ports;
+		if (ip_proto == IPPROTO_TCP) {
+			__u8 *tcpflags, _tcpflags;
+
+			tcpflags = skb_header_pointer(skb, nhoff + 13,
+						      sizeof(_tcpflags),
+						      &_tcpflags);
+			if (tcpflags)
+				flow->tcpflags = *tcpflags;
+		}
 	}
 
 	return true;