From patchwork Thu Oct 8 21:58:55 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 527993 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id A37CE140DA0 for ; Fri, 9 Oct 2015 08:59:40 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b=OAT4XzaR; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932358AbbJHV7f (ORCPT ); Thu, 8 Oct 2015 17:59:35 -0400 Received: from mail-pa0-f45.google.com ([209.85.220.45]:36780 "EHLO mail-pa0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758809AbbJHV7c (ORCPT ); Thu, 8 Oct 2015 17:59:32 -0400 Received: by pablk4 with SMTP id lk4so65785110pab.3 for ; Thu, 08 Oct 2015 14:59:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=IosD2FvHQa2kaVZcqpbuEfYoR2ryXnpSEfYnV9y5BQs=; b=OAT4XzaRt0xSthorjVyImKo/6xzcOkYnU8YK3/IonD87T6nndLBrgiLvpbrV0nqQM8 bBI4oKFLSryIwki541PIfWMKxnLchLp5yawDGJx+IbjBsRtBaZUhmKeN1VfcC2G5QEiB 5zkMv03vYJAcVdGhZfMsKuEmz0E/zMb0OUaMgzp8+l0RAJcZeg4lTxd+GxVjU62F2nWq a5XhYB8aXcJOYm14cXtCbnomweHarmG1lP9liff/UsEOU/U6XvL/09MXc6pANUdsT7PB UFFMGEsFzL53Icd14KiJt2szcLuOP1Or/zhVng75jrch1lxjbZYK6lUVpxbYMTeMh70R F75A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=IosD2FvHQa2kaVZcqpbuEfYoR2ryXnpSEfYnV9y5BQs=; b=TGVNcGII4iAQjIuzYzrLYkNNGkiOYivfQ/v/FYqr15kBZzGtscfLrbLqYfDzFUoVI7 iAfBYR8xT15TW+bEKARcfcMIJNzS2CX74bvrQ2saC3UAVl4R1knYkDMM7x6h+whIbnrS zMraNcZgAy/oRRGC5/vVwdtJKj1+CSwYtEKrhck/o8vmFuzWWiadQelLnBf0gs/yLhRt +59I2wSl8/DKH19xpjbCL+AzOzE8S6N++uoLxdTKUpBQ1BPZ9glKoR/TagKrjT9v5SU9 qETxrmC2oFb2j3lPpRMy0b1FRj8tsiOVXowmjCgSOdpVPneNxUgwHAuh50ZP16jlS1re iygQ== X-Gm-Message-State: ALoCoQlYCHDXY97xPwg8oLh5s4hm+xAyQu5EL+1R//f+nAv3TzIrJUKl7QNFpb4xTkZgNrX9wB+D X-Received: by 10.66.97.102 with SMTP id dz6mr10953910pab.29.1444341572043; Thu, 08 Oct 2015 14:59:32 -0700 (PDT) Received: from localhost ([2620:0:1000:3002:80e1:bcd2:875:fa6]) by smtp.gmail.com with ESMTPSA id qv5sm47338274pbc.71.2015.10.08.14.59.30 (version=TLS1_2 cipher=AES128-SHA256 bits=128/128); Thu, 08 Oct 2015 14:59:31 -0700 (PDT) From: Eric Dumazet To: "David S . Miller" Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH v2 net-next 2/4] net: align sk_refcnt on 128 bytes boundary Date: Thu, 8 Oct 2015 14:58:55 -0700 Message-Id: <1444341537-20945-3-git-send-email-edumazet@google.com> X-Mailer: git-send-email 2.6.0.rc2.230.g3dd15c0 In-Reply-To: <1444341537-20945-1-git-send-email-edumazet@google.com> References: <1444341537-20945-1-git-send-email-edumazet@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org sk->sk_refcnt is dirtied for every TCP/UDP incoming packet. This is a performance issue if multiple cpus hit a common socket, or multiple sockets are chained due to SO_REUSEPORT. By moving sk_refcnt 8 bytes further, first 128 bytes of sockets are mostly read. As they contain the lookup keys, this has a considerable performance impact, as cpus can cache them. These 8 bytes are not wasted, we use them as a place holder for various fields, depending on the socket type. Tested: SYN flood hitting a 16 RX queues NIC. TCP listener using 16 sockets and SO_REUSEPORT and SO_INCOMING_CPU for proper siloing. Could process 6.0 Mpps SYN instead of 4.2 Mpps Kernel profile looked like : 11.68% [kernel] [k] sha_transform 6.51% [kernel] [k] __inet_lookup_listener 5.07% [kernel] [k] __inet_lookup_established 4.15% [kernel] [k] memcpy_erms 3.46% [kernel] [k] ipt_do_table 2.74% [kernel] [k] fib_table_lookup 2.54% [kernel] [k] tcp_make_synack 2.34% [kernel] [k] tcp_conn_request 2.05% [kernel] [k] __netif_receive_skb_core 2.03% [kernel] [k] kmem_cache_alloc Signed-off-by: Eric Dumazet --- include/net/inet_timewait_sock.h | 2 +- include/net/request_sock.h | 2 +- include/net/sock.h | 17 ++++++++++++++--- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 186f3a1e1b1f..e581fc69129d 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -70,6 +70,7 @@ struct inet_timewait_sock { #define tw_dport __tw_common.skc_dport #define tw_num __tw_common.skc_num #define tw_cookie __tw_common.skc_cookie +#define tw_dr __tw_common.skc_tw_dr int tw_timeout; volatile unsigned char tw_substate; @@ -88,7 +89,6 @@ struct inet_timewait_sock { kmemcheck_bitfield_end(flags); struct timer_list tw_timer; struct inet_bind_bucket *tw_tb; - struct inet_timewait_death_row *tw_dr; }; #define tw_tclass tw_tos diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 95ab5d7aab96..6b818b77d5e5 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -50,9 +50,9 @@ struct request_sock { struct sock_common __req_common; #define rsk_refcnt __req_common.skc_refcnt #define rsk_hash __req_common.skc_hash +#define rsk_listener __req_common.skc_listener struct request_sock *dl_next; - struct sock *rsk_listener; u16 mss; u8 num_retrans; /* number of retransmits */ u8 cookie_ts:1; /* syncookie: encode tcpopts in timestamp */ diff --git a/include/net/sock.h b/include/net/sock.h index 08abffe32236..a7818104a73f 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -150,6 +150,9 @@ typedef __u64 __bitwise __addrpair; * @skc_node: main hash linkage for various protocol lookup tables * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol * @skc_tx_queue_mapping: tx queue number for this connection + * @skc_flags: place holder for sk_flags + * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, + * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @skc_incoming_cpu: record/match cpu processing incoming packets * @skc_refcnt: reference count * @@ -201,6 +204,16 @@ struct sock_common { atomic64_t skc_cookie; + /* following fields are padding to force + * offset(struct sock, sk_refcnt) == 128 on 64bit arches + * assuming IPV6 is enabled. We use this padding differently + * for different kind of 'sockets' + */ + union { + unsigned long skc_flags; + struct sock *skc_listener; /* request_sock */ + struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ + }; /* * fields between dontcopy_begin/dontcopy_end * are not copied in sock_copy() @@ -246,8 +259,6 @@ struct cg_proto; * @sk_pacing_rate: Pacing rate (if supported by transport/packet scheduler) * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) * @sk_sndbuf: size of send buffer in bytes - * @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, - * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets * @sk_no_check_rx: allow zero checksum in RX packets * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) @@ -334,6 +345,7 @@ struct sock { #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr #define sk_cookie __sk_common.skc_cookie #define sk_incoming_cpu __sk_common.skc_incoming_cpu +#define sk_flags __sk_common.skc_flags socket_lock_t sk_lock; struct sk_buff_head sk_receive_queue; @@ -371,7 +383,6 @@ struct sock { #ifdef CONFIG_XFRM struct xfrm_policy *sk_policy[2]; #endif - unsigned long sk_flags; struct dst_entry *sk_rx_dst; struct dst_entry __rcu *sk_dst_cache; spinlock_t sk_dst_lock;