diff mbox

tcp: do not create inetpeer on SYNACK message

Message ID 1338534026.2760.1451.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet June 1, 2012, 7 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
larger and larger, using lots of memory and cpu time.

tcp_v4_send_synack()
->inet_csk_route_req()
 ->ip_route_output_flow()
  ->rt_set_nexthop()
   ->rt_init_metrics()
    ->inet_getpeer( create = true)

This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
TCP) added in 2.6.39

Possible solution : 

Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS 

Before patch :

# grep peer /proc/slabinfo 
inet_peer_cache   4175430 4175430    192   42    2 : tunables    0    0    0 : slabdata  99415  99415      0

Samples: 41K of event 'cycles', Event count (approx.): 30716565122                                          
+  20,24%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_getpeer
+   8,19%      ksoftirqd/0  [kernel.kallsyms]           [k] peer_avl_rebalance.isra.1
+   4,81%      ksoftirqd/0  [kernel.kallsyms]           [k] sha_transform
+   3,64%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_table_lookup
+   2,36%      ksoftirqd/0  [ixgbe]                     [k] ixgbe_poll
+   2,16%      ksoftirqd/0  [kernel.kallsyms]           [k] __ip_route_output_key
+   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] kernel_map_pages
+   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] ip_route_input_common
+   2,01%      ksoftirqd/0  [kernel.kallsyms]           [k] __inet_lookup_established
+   1,83%      ksoftirqd/0  [kernel.kallsyms]           [k] md5_transform
+   1,75%      ksoftirqd/0  [kernel.kallsyms]           [k] check_leaf.isra.9
+   1,49%      ksoftirqd/0  [kernel.kallsyms]           [k] ipt_do_table
+   1,46%      ksoftirqd/0  [kernel.kallsyms]           [k] hrtimer_interrupt
+   1,45%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_alloc
+   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_csk_search_req
+   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] __netif_receive_skb
+   1,16%      ksoftirqd/0  [kernel.kallsyms]           [k] copy_user_generic_string
+   1,15%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_free
+   1,02%      ksoftirqd/0  [kernel.kallsyms]           [k] tcp_make_synack
+   0,93%      ksoftirqd/0  [kernel.kallsyms]           [k] _raw_spin_lock_bh
+   0,87%      ksoftirqd/0  [kernel.kallsyms]           [k] __call_rcu
+   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] rt_garbage_collect
+   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_rules_lookup

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
---
 net/ipv4/inet_connection_sock.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller June 1, 2012, 6:24 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 01 Jun 2012 09:00:26 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
> larger and larger, using lots of memory and cpu time.
> 
> tcp_v4_send_synack()
> ->inet_csk_route_req()
>  ->ip_route_output_flow()
>   ->rt_set_nexthop()
>    ->rt_init_metrics()
>     ->inet_getpeer( create = true)
> 
> This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
> TCP) added in 2.6.39
> 
> Possible solution : 
> 
> Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS 
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This is definitely the right thing to do.

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom June 1, 2012, 9:34 p.m. UTC | #2
Hi Eric
>Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
>larger and larger, using lots of memory and cpu time.
>
>>tcp_v4_send_synack()
>->inet_csk_route_req()
> ->ip_route_output_flow()
>  ->rt_set_nexthop()
>   ->rt_init_metrics()
>    ->inet_getpeer( create = true)
>
>This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
>TCP) added in 2.6.39
>
>Possible solution :
>
>Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS
>

It think we are on the right way now,

Some results from one of our testers:
before applying "reflect SYN queue_mapping into SYNACK"

"(The latest one from Eric is not included. I am building with
that one right now.)
Results were that with the same number of SYN/s, load went down
30% on each of the three Cpus that were handling the SYNs.
Great !!!"

I'm looking forward to see the results of the latests patch.

Then I think conntrack need a little shape up, like a "mini-conntrack"
it is way to expensive to alloc a full "coontack for every SYN.

I have a bunch of patches and ideas for that...

Thanks Eric for a great job

/Hans
 --
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2012, 6:56 a.m. UTC | #3
On Fri, 2012-06-01 at 23:34 +0200, Hans Schillström wrote:

> It think we are on the right way now,
> 
> Some results from one of our testers:
> before applying "reflect SYN queue_mapping into SYNACK"
> 
> "(The latest one from Eric is not included. I am building with
> that one right now.)
> Results were that with the same number of SYN/s, load went down
> 30% on each of the three Cpus that were handling the SYNs.
> Great !!!"
> 

I am not sure reflecting queue_mapping will help your workload, since
you specifically asked to your NIC to queue all SYN packets on one
single queue.

Eventually not relying on skb->queue_mapping but skb->rxhash to chose an
outgoing queue for the SYNACKS to not harm a single tx queue ?

Then it might be not needed, if the queue is dedicated to SYN and SYNACK
packets, since net_rx_action/net_tx_action should both dequeue 64
packets each round, in a round robin fashion.

(I had problems in a standard setup, where you can have a single cpu
(CPU0 in my case) servicing all NAPI interrupts, so with 16 queues, the
rx_action/tx_action ratio is 16/1 if all synack go to a single queue,
while SYN are distributed to all 16 rx queues)


> I'm looking forward to see the results of the latests patch.
> 
> Then I think conntrack need a little shape up, like a "mini-conntrack"
> it is way to expensive to alloc a full "coontack for every SYN.
> 
> I have a bunch of patches and ideas for that...
> 

Cool ! the conntrack issue is a real one for sure.


Given the conntrack current requirement (being protected by a central
lock), I guess your best bet would be following setup :

One single CPU to handle all SYN packets.

Eventually not relying on skb->queue_mapping but skb->rxhash to chose an
outgoing queue for the SYNACKS to not harm a single tx queue.

> Thanks Eric for a great job
> 

Thanks for giving testing results and ideas !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 95e61596..f9ee741 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -377,7 +377,8 @@  struct dst_entry *inet_csk_route_req(struct sock *sk,
 
 	flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark,
 			   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE,
-			   sk->sk_protocol, inet_sk_flowi_flags(sk),
+			   sk->sk_protocol,
+			   inet_sk_flowi_flags(sk) & ~FLOWI_FLAG_PRECOW_METRICS,
 			   (opt && opt->opt.srr) ? opt->opt.faddr : ireq->rmt_addr,
 			   ireq->loc_addr, ireq->rmt_port, inet_sk(sk)->inet_sport);
 	security_req_classify_flow(req, flowi4_to_flowi(fl4));