diff mbox

[net-next] net: move inet_dport/inet_num in sock_common

Message ID 1354069414.8918.13.camel@joe-AO722
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Joe Perches Nov. 28, 2012, 2:23 a.m. UTC
On Tue, 2012-11-27 at 13:24 -0800, Eric Dumazet wrote:
> On Tue, 2012-11-27 at 09:23 -0800, Joe Perches wrote:
> > On Tue, 2012-11-27 at 07:06 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > commit 68835aba4d9b (net: optimize INET input path further)
> > > moved some fields used for tcp/udp sockets lookup in the first cache
> > > line of struct sock_common.
> > []
> > > diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> > > index 5e11905..196ede4 100644
> > > --- a/include/linux/ipv6.h
> > > +++ b/include/linux/ipv6.h
> > > @@ -365,19 +365,21 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk)
> > >  #endif /* IS_ENABLED(CONFIG_IPV6) */
> > >  
> > >  #define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\
> > > +	(((__sk)->sk_hash == (__hash)) &&					\
> > > +	 ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) &&	\
> > > +	 ((__sk)->sk_family		== AF_INET6)		&&		\
> > 
> > Perhaps these could be |'d together to avoid the test/jump
> > after each comparison by using some bit operations instead.
> > 
> > > +	 ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr))	&&		\
> > > +	 ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr))	&&		\
> > > +	 (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
> > > +	 net_eq(sock_net(__sk), (__net)))
> > 
> But it would be wrong.

OK, so it's an and not an or.  Duh.

Still, the logical tests that are likely to be in the same
cacheline could be ANDed together to avoid a test and jump.

Perhaps this:

It shrinks the object a trivial bit and could be a tiny bit
faster too.

(allyesconfig x86/32)
$ size net/ipv6/inet6_hashtables.o*
   text	   data	    bss	    dec	    hex	filename
   6277	    962	   1832	   9071	   236f	net/ipv6/inet6_hashtables.o.new
   6381	    962	   1880	   9223	   2407	net/ipv6/inet6_hashtables.o.old



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Hutchings Nov. 28, 2012, 3:12 a.m. UTC | #1
On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote:
> On Tue, 2012-11-27 at 13:24 -0800, Eric Dumazet wrote:
> > On Tue, 2012-11-27 at 09:23 -0800, Joe Perches wrote:
> > > On Tue, 2012-11-27 at 07:06 -0800, Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@google.com>
> > > > 
> > > > commit 68835aba4d9b (net: optimize INET input path further)
> > > > moved some fields used for tcp/udp sockets lookup in the first cache
> > > > line of struct sock_common.
> > > []
> > > > diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> > > > index 5e11905..196ede4 100644
> > > > --- a/include/linux/ipv6.h
> > > > +++ b/include/linux/ipv6.h
> > > > @@ -365,19 +365,21 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk)
> > > >  #endif /* IS_ENABLED(CONFIG_IPV6) */
> > > >  
> > > >  #define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\
> > > > +	(((__sk)->sk_hash == (__hash)) &&					\
> > > > +	 ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) &&	\
> > > > +	 ((__sk)->sk_family		== AF_INET6)		&&		\
> > > 
> > > Perhaps these could be |'d together to avoid the test/jump
> > > after each comparison by using some bit operations instead.
> > > 
> > > > +	 ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr))	&&		\
> > > > +	 ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr))	&&		\
> > > > +	 (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
> > > > +	 net_eq(sock_net(__sk), (__net)))
> > > 
> > But it would be wrong.
> 
> OK, so it's an and not an or.  Duh.
[...]

The way to combine these sorts of comparisons is along the lines of:

(((left->a ^ right->a) |
  (left->b ^ right->b) |
  ...) == 0)

But when there are big-endian types involved, sparse is likely to
complain about combining them.

Ben.
Joe Perches Nov. 28, 2012, 3:31 a.m. UTC | #2
On Wed, 2012-11-28 at 03:12 +0000, Ben Hutchings wrote:
> On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote:
> > OK, so it's an and not an or.  Duh.
> [...]
> 
> The way to combine these sorts of comparisons is along the lines of:
> 
> (((left->a ^ right->a) |
>   (left->b ^ right->b) |
>   ...) == 0)
> 
> But when there are big-endian types involved, sparse is likely to
> complain about combining them.

I believe there's only the 2 items that could be combined
for cacheline purposes so using 2 logical tests with AND
seems more readable.  Maybe a single combined test would
be faster.  I don't have equipment at hand to test it.

If you prefer I supposed it could be converted.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 28, 2012, 3:55 a.m. UTC | #3
On Tue, 2012-11-27 at 19:31 -0800, Joe Perches wrote:
> On Wed, 2012-11-28 at 03:12 +0000, Ben Hutchings wrote:
> > On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote:
> > > OK, so it's an and not an or.  Duh.
> > [...]
> > 
> > The way to combine these sorts of comparisons is along the lines of:
> > 
> > (((left->a ^ right->a) |
> >   (left->b ^ right->b) |
> >   ...) == 0)
> > 
> > But when there are big-endian types involved, sparse is likely to
> > complain about combining them.
> 
> I believe there's only the 2 items that could be combined
> for cacheline purposes so using 2 logical tests with AND
> seems more readable.  Maybe a single combined test would
> be faster.  I don't have equipment at hand to test it.
> 
> If you prefer I supposed it could be converted.

I don't particularly care, and I gave up this trick myself because it
didn't seem worth fighting sparse.

Ben.
Eric Dumazet Nov. 28, 2012, 4:11 a.m. UTC | #4
On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote:

> Still, the logical tests that are likely to be in the same
> cacheline could be ANDed together to avoid a test and jump.

The point of having the cond jump on sk_hash/hash was that in one
compare, we catch the yes/no status with 99.999999 % success rate.

All the following compares are predicted by the cpu and essentially are
free. Adding the AND or OR will basically have the same cpu cost.

If we wanted to do a full test of all tuple fields and a single
conditional jump, we would not have to include hash test at all.

(If the 4-tuple matches, then sk_hash/hash value _must_ be the same by
definition)

Note its quite different from the optimization we did in
ipv6_addr_equal(), as it allowed fewer memory loads and instructions.

I would say this can come later, as the meat of my patch was about
avoiding a full cache line miss, which is far more expensive than any
tricks we can even think about.

Note it will be hard to actually measure any further gains, since I did
TCP_RR tests (200 threads) and the lookup cost went from 1.4 % to 0.8 %
of the grand total, mostly dominated by the atomic to increase socket
refcount.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 28, 2012, 11:27 a.m. UTC | #5
On Tue, 2012-11-27 at 20:12 -0800, Eric Dumazet wrote:

> The point of having the cond jump on sk_hash/hash was that in one
> compare, we catch the yes/no status with 99.999999 % success rate.
> 
> All the following compares are predicted by the cpu and essentially are
> free. Adding the AND or OR will basically have the same cpu cost.
> 
> If we wanted to do a full test of all tuple fields and a single
> conditional jump, we would not have to include hash test at all.
> 
> (If the 4-tuple matches, then sk_hash/hash value _must_ be the same by
> definition)

What I am going to do is to remove the hash compare from the macros so
that we can use likely()/unlikely() to explicitly give hints to the
compiler.

The hash compare can be omitted in the validation done after the
atomic_inc_not_zero() [ done to make sure keys didnt change ]

begin:
        sk_nulls_for_each_rcu(sk, node, &head->chain) {
                if (sk->sk_hash != hash)
                        continue;
                if (likely(INET_MATCH(sk, net, acookie,
                                      saddr, daddr, ports, dif))) {
                        if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
                                goto begintw;
                        if (unlikely(!INET_MATCH(sk, net, acookie,
                                                 saddr, daddr, ports, dif))) {
                                sock_put(sk);
                                goto begin;
                        }
                        goto out;
                }
        }
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 196ede4..91870de 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -364,22 +364,24 @@  static inline struct raw6_sock *raw6_sk(const struct sock *sk)
 #define inet_v6_ipv6only(__sk)		0
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
-#define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\
-	(((__sk)->sk_hash == (__hash)) &&					\
-	 ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) &&	\
-	 ((__sk)->sk_family		== AF_INET6)		&&		\
-	 ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr))	&&		\
-	 ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr))	&&		\
-	 (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+#define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif) \
+	((((__sk)->sk_hash == (__hash)) &				\
+	  ((__sk)->sk_family == AF_INET6)) &&				\
+	 ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
+	 ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) &&		\
+	 ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) &&	\
+	 (!((__sk)->sk_bound_dev_if) ||					\
+	  ((__sk)->sk_bound_dev_if == (__dif))) &&			\
 	 net_eq(sock_net(__sk), (__net)))
 
 #define INET6_TW_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif) \
-	(((__sk)->sk_hash == (__hash)) &&					\
-	 (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports))	&&	\
-	 ((__sk)->sk_family	       == PF_INET6)			&&	\
-	 (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr)))	&&	\
-	 (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) &&	\
-	 (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+	((((__sk)->sk_hash == (__hash)) &				\
+	  ((__sk)->sk_family == PF_INET6)) &&				\
+	 (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports)) && \
+	 (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))) && \
+	 (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) && \
+	 (!((__sk)->sk_bound_dev_if) ||					\
+	  ((__sk)->sk_bound_dev_if == (__dif))) &&			\
 	 net_eq(sock_net(__sk), (__net)))
 
 #endif /* _IPV6_H */