Message ID | 1354069414.8918.13.camel@joe-AO722 |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote: > On Tue, 2012-11-27 at 13:24 -0800, Eric Dumazet wrote: > > On Tue, 2012-11-27 at 09:23 -0800, Joe Perches wrote: > > > On Tue, 2012-11-27 at 07:06 -0800, Eric Dumazet wrote: > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > > > commit 68835aba4d9b (net: optimize INET input path further) > > > > moved some fields used for tcp/udp sockets lookup in the first cache > > > > line of struct sock_common. > > > [] > > > > diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h > > > > index 5e11905..196ede4 100644 > > > > --- a/include/linux/ipv6.h > > > > +++ b/include/linux/ipv6.h > > > > @@ -365,19 +365,21 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk) > > > > #endif /* IS_ENABLED(CONFIG_IPV6) */ > > > > > > > > #define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\ > > > > + (((__sk)->sk_hash == (__hash)) && \ > > > > + ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \ > > > > + ((__sk)->sk_family == AF_INET6) && \ > > > > > > Perhaps these could be |'d together to avoid the test/jump > > > after each comparison by using some bit operations instead. > > > > > > > + ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) && \ > > > > + ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) && \ > > > > + (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \ > > > > + net_eq(sock_net(__sk), (__net))) > > > > > But it would be wrong. > > OK, so it's an and not an or. Duh. [...] The way to combine these sorts of comparisons is along the lines of: (((left->a ^ right->a) | (left->b ^ right->b) | ...) == 0) But when there are big-endian types involved, sparse is likely to complain about combining them. Ben.
On Wed, 2012-11-28 at 03:12 +0000, Ben Hutchings wrote: > On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote: > > OK, so it's an and not an or. Duh. > [...] > > The way to combine these sorts of comparisons is along the lines of: > > (((left->a ^ right->a) | > (left->b ^ right->b) | > ...) == 0) > > But when there are big-endian types involved, sparse is likely to > complain about combining them. I believe there's only the 2 items that could be combined for cacheline purposes so using 2 logical tests with AND seems more readable. Maybe a single combined test would be faster. I don't have equipment at hand to test it. If you prefer I supposed it could be converted. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-11-27 at 19:31 -0800, Joe Perches wrote: > On Wed, 2012-11-28 at 03:12 +0000, Ben Hutchings wrote: > > On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote: > > > OK, so it's an and not an or. Duh. > > [...] > > > > The way to combine these sorts of comparisons is along the lines of: > > > > (((left->a ^ right->a) | > > (left->b ^ right->b) | > > ...) == 0) > > > > But when there are big-endian types involved, sparse is likely to > > complain about combining them. > > I believe there's only the 2 items that could be combined > for cacheline purposes so using 2 logical tests with AND > seems more readable. Maybe a single combined test would > be faster. I don't have equipment at hand to test it. > > If you prefer I supposed it could be converted. I don't particularly care, and I gave up this trick myself because it didn't seem worth fighting sparse. Ben.
On Tue, 2012-11-27 at 18:23 -0800, Joe Perches wrote: > Still, the logical tests that are likely to be in the same > cacheline could be ANDed together to avoid a test and jump. The point of having the cond jump on sk_hash/hash was that in one compare, we catch the yes/no status with 99.999999 % success rate. All the following compares are predicted by the cpu and essentially are free. Adding the AND or OR will basically have the same cpu cost. If we wanted to do a full test of all tuple fields and a single conditional jump, we would not have to include hash test at all. (If the 4-tuple matches, then sk_hash/hash value _must_ be the same by definition) Note its quite different from the optimization we did in ipv6_addr_equal(), as it allowed fewer memory loads and instructions. I would say this can come later, as the meat of my patch was about avoiding a full cache line miss, which is far more expensive than any tricks we can even think about. Note it will be hard to actually measure any further gains, since I did TCP_RR tests (200 threads) and the lookup cost went from 1.4 % to 0.8 % of the grand total, mostly dominated by the atomic to increase socket refcount. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-11-27 at 20:12 -0800, Eric Dumazet wrote: > The point of having the cond jump on sk_hash/hash was that in one > compare, we catch the yes/no status with 99.999999 % success rate. > > All the following compares are predicted by the cpu and essentially are > free. Adding the AND or OR will basically have the same cpu cost. > > If we wanted to do a full test of all tuple fields and a single > conditional jump, we would not have to include hash test at all. > > (If the 4-tuple matches, then sk_hash/hash value _must_ be the same by > definition) What I am going to do is to remove the hash compare from the macros so that we can use likely()/unlikely() to explicitly give hints to the compiler. The hash compare can be omitted in the validation done after the atomic_inc_not_zero() [ done to make sure keys didnt change ] begin: sk_nulls_for_each_rcu(sk, node, &head->chain) { if (sk->sk_hash != hash) continue; if (likely(INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) { if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) goto begintw; if (unlikely(!INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) { sock_put(sk); goto begin; } goto out; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 196ede4..91870de 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -364,22 +364,24 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk) #define inet_v6_ipv6only(__sk) 0 #endif /* IS_ENABLED(CONFIG_IPV6) */ -#define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\ - (((__sk)->sk_hash == (__hash)) && \ - ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \ - ((__sk)->sk_family == AF_INET6) && \ - ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) && \ - ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) && \ - (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \ +#define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif) \ + ((((__sk)->sk_hash == (__hash)) & \ + ((__sk)->sk_family == AF_INET6)) && \ + ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \ + ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) && \ + ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) && \ + (!((__sk)->sk_bound_dev_if) || \ + ((__sk)->sk_bound_dev_if == (__dif))) && \ net_eq(sock_net(__sk), (__net))) #define INET6_TW_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif) \ - (((__sk)->sk_hash == (__hash)) && \ - (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports)) && \ - ((__sk)->sk_family == PF_INET6) && \ - (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))) && \ - (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) && \ - (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \ + ((((__sk)->sk_hash == (__hash)) & \ + ((__sk)->sk_family == PF_INET6)) && \ + (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports)) && \ + (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))) && \ + (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) && \ + (!((__sk)->sk_bound_dev_if) || \ + ((__sk)->sk_bound_dev_if == (__dif))) && \ net_eq(sock_net(__sk), (__net))) #endif /* _IPV6_H */