Message ID | 006ec2edb2d592ab9b90ccc5c084ac8926e811df.1446053264.git.pabeni@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote: > This patch changes how the multipath hash is computed for locally > generated UDP or TCP flows: now the hash comprises also l4 information > (source and destination port). > > This allows better utilization of the available paths when the existing > flows have the same source IP and the same destination IP: with l3 hash, > even when multiple connections are in place simultaneously, a single path > will be used, while with l4 hash we can use all the available paths. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> Please note that many skb already have a l4 hash. No need to perform yet another dissection. This would work even for tunneled traffic... See commit 4b1b865e4e97e336316f30e32af36d71e98bdabc for an example. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-29 at 07:31 -0700, Eric Dumazet wrote: > On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote: > > This patch changes how the multipath hash is computed for locally > > generated UDP or TCP flows: now the hash comprises also l4 information > > (source and destination port). > > > > This allows better utilization of the available paths when the existing > > flows have the same source IP and the same destination IP: with l3 hash, > > even when multiple connections are in place simultaneously, a single path > > will be used, while with l4 hash we can use all the available paths. > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > Please note that many skb already have a l4 hash. No need to perform yet > another dissection. This patch do not add dissection code: it use the information provided by the available flowi4 structure. Moreover the skb is not available on the calling site (in __ip_route_output_key_hash) and pushing it all the way will require a lot of intrusive changes. Do you think it's the better option ? Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote: > This patch do not add dissection code: it use the information provided > by the available flowi4 structure. Moreover the skb is not available on > the calling site (in __ip_route_output_key_hash) and pushing it all the > way will require a lot of intrusive changes. Do you think it's the > better option ? If skb is provided, then we could use its information. Local TCP flows now provide l4 hash in skb, and automatically shuffles it in case of dst failures. This provides resilience over dead paths, over variety of aggregation-like devices (bonding being an example) # git grep -n4 sk_rethink_txhash include/net/sock.h-1695-{ include/net/sock.h-1696- sk->sk_txhash = net_tx_rndhash(); include/net/sock.h-1697-} include/net/sock.h-1698- include/net/sock.h:1699:static inline void sk_rethink_txhash(struct sock *sk) include/net/sock.h-1700-{ include/net/sock.h-1701- if (sk->sk_txhash) include/net/sock.h-1702- sk_set_txhash(sk); include/net/sock.h-1703-} -- include/net/sock.h-1725-static inline void dst_negative_advice(struct sock *sk) include/net/sock.h-1726-{ include/net/sock.h-1727- struct dst_entry *ndst, *dst = __sk_dst_get(sk); include/net/sock.h-1728- include/net/sock.h:1729: sk_rethink_txhash(sk); include/net/sock.h-1730- include/net/sock.h-1731- if (dst && dst->ops->negative_advice) { include/net/sock.h-1732- ndst = dst->ops->negative_advice(dst); include/net/sock.h-1733- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 29, 2015 at 8:00 AM, Paolo Abeni <pabeni@redhat.com> wrote: > On Thu, 2015-10-29 at 07:31 -0700, Eric Dumazet wrote: >> On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote: >> > This patch changes how the multipath hash is computed for locally >> > generated UDP or TCP flows: now the hash comprises also l4 information >> > (source and destination port). >> > >> > This allows better utilization of the available paths when the existing >> > flows have the same source IP and the same destination IP: with l3 hash, >> > even when multiple connections are in place simultaneously, a single path >> > will be used, while with l4 hash we can use all the available paths. >> > >> > Signed-off-by: Paolo Abeni <pabeni@redhat.com> >> >> Please note that many skb already have a l4 hash. No need to perform yet >> another dissection. > > This patch do not add dissection code: it use the information provided > by the available flowi4 structure. Moreover the skb is not available on > the calling site (in __ip_route_output_key_hash) and pushing it all the > way will require a lot of intrusive changes. Do you think it's the > better option ? > If there is an associated skb then skb_get_hash should be called to get the hash. If you only have the flow structure then get_hash_from_flowi4 (or skb_get_hash_flowi6 for IPv6 should be called). Tom > Paolo > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote: > On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote: > > > This patch do not add dissection code: it use the information provided > > by the available flowi4 structure. Moreover the skb is not available on > > the calling site (in __ip_route_output_key_hash) and pushing it all the > > way will require a lot of intrusive changes. Do you think it's the > > better option ? > > If skb is provided, then we could use its information. I see your point, but providing an skb to __ip_route_output_key_hash() is not very viable: it has a lot of indirect callers which are problematic, i.e.: __ip_route_output_key() ip_route_output_flow() inet_csk_route_req() tcp_v4_send_synack() <- skb available here, but created using dst information. I think it's better to use a flow-based hash. Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-29 at 17:52 +0100, Paolo Abeni wrote: > On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote: > > On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote: > > > > > This patch do not add dissection code: it use the information provided > > > by the available flowi4 structure. Moreover the skb is not available on > > > the calling site (in __ip_route_output_key_hash) and pushing it all the > > > way will require a lot of intrusive changes. Do you think it's the > > > better option ? > > > > If skb is provided, then we could use its information. > > I see your point, but providing an skb to __ip_route_output_key_hash() > is not very viable: it has a lot of indirect callers which are > problematic, i.e.: > > __ip_route_output_key() > ip_route_output_flow() > inet_csk_route_req() > tcp_v4_send_synack() <- skb available here, but created using dst > information. > I never said it was trivial. I said : "If skb is provided, then we can use its l4hash" If not, then sure, a flow-based hash fallback is better than nothing. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-29 at 11:43 -0700, Eric Dumazet wrote: > On Thu, 2015-10-29 at 17:52 +0100, Paolo Abeni wrote: > > On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote: > > > On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote: > > > > > > > This patch do not add dissection code: it use the information provided > > > > by the available flowi4 structure. Moreover the skb is not available on > > > > the calling site (in __ip_route_output_key_hash) and pushing it all the > > > > way will require a lot of intrusive changes. Do you think it's the > > > > better option ? > > > > > > If skb is provided, then we could use its information. > > > > I see your point, but providing an skb to __ip_route_output_key_hash() > > is not very viable: it has a lot of indirect callers which are > > problematic, i.e.: > > > > __ip_route_output_key() > > ip_route_output_flow() > > inet_csk_route_req() > > tcp_v4_send_synack() <- skb available here, but created using dst > > information. > > > > I never said it was trivial. > > I said : "If skb is provided, then we can use its l4hash" > > If not, then sure, a flow-based hash fallback is better than nothing. Thank you for all the feedback. I'll resubmit using the flow-based hash. Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index ac5c6e8..56bf68c 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -328,6 +328,18 @@ static inline int fib_multipath_hash(__be32 saddr, __be32 daddr) return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1; } +static inline int fib_multipath_output_hash(const struct flowi4 *fl4) +{ + if ((fl4->flowi4_proto == IPPROTO_TCP) || + (fl4->flowi4_proto == IPPROTO_UDP)) + return jhash_3words(fl4->saddr, fl4->daddr, + *((__u32 *)&fl4->uli.ports), + fib_multipath_secret) >> 1; + + return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret) >> 1; +} + + void fib_select_multipath(struct fib_result *res, int hash); void fib_select_path(struct net *net, struct fib_result *res, struct flowi4 *fl4, int mp_hash); diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 42778d9..8a18349 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1564,7 +1564,8 @@ void fib_select_path(struct net *net, struct fib_result *res, #ifdef CONFIG_IP_ROUTE_MULTIPATH if (res->fi->fib_nhs > 1 && fl4->flowi4_oif == 0) { if (mp_hash < 0) - mp_hash = fib_multipath_hash(fl4->saddr, fl4->daddr); + mp_hash = fib_multipath_output_hash(fl4); + fib_select_multipath(res, mp_hash); } else
This patch changes how the multipath hash is computed for locally generated UDP or TCP flows: now the hash comprises also l4 information (source and destination port). This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- include/net/ip_fib.h | 12 ++++++++++++ net/ipv4/fib_semantics.c | 3 ++- 2 files changed, 14 insertions(+), 1 deletion(-)