diff mbox

[net-next] ipv4: use l4 hash for locally generated multipath flows

Message ID 006ec2edb2d592ab9b90ccc5c084ac8926e811df.1446053264.git.pabeni@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Paolo Abeni Oct. 28, 2015, 5:39 p.m. UTC
This patch changes how the multipath hash is computed for locally
generated UDP or TCP flows: now the hash comprises also l4 information
(source and destination port).

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ip_fib.h     | 12 ++++++++++++
 net/ipv4/fib_semantics.c |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Oct. 29, 2015, 2:31 p.m. UTC | #1
On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote:
> This patch changes how the multipath hash is computed for locally
> generated UDP or TCP flows: now the hash comprises also l4 information
> (source and destination port).
> 
> This allows better utilization of the available paths when the existing
> flows have the same source IP and the same destination IP: with l3 hash,
> even when multiple connections are in place simultaneously, a single path
> will be used, while with l4 hash we can use all the available paths.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Please note that many skb already have a l4 hash. No need to perform yet
another dissection.

This would work even for tunneled traffic...

See commit 4b1b865e4e97e336316f30e32af36d71e98bdabc for an example.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Abeni Oct. 29, 2015, 3 p.m. UTC | #2
On Thu, 2015-10-29 at 07:31 -0700, Eric Dumazet wrote:
> On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote:
> > This patch changes how the multipath hash is computed for locally
> > generated UDP or TCP flows: now the hash comprises also l4 information
> > (source and destination port).
> > 
> > This allows better utilization of the available paths when the existing
> > flows have the same source IP and the same destination IP: with l3 hash,
> > even when multiple connections are in place simultaneously, a single path
> > will be used, while with l4 hash we can use all the available paths.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> Please note that many skb already have a l4 hash. No need to perform yet
> another dissection.

This patch do not add dissection code: it use the information provided
by the available flowi4 structure. Moreover the skb is not available on
the calling site (in __ip_route_output_key_hash) and pushing it all the
way will require a lot of intrusive changes. Do you think it's the
better option ?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 29, 2015, 3:28 p.m. UTC | #3
On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote:

> This patch do not add dissection code: it use the information provided
> by the available flowi4 structure. Moreover the skb is not available on
> the calling site (in __ip_route_output_key_hash) and pushing it all the
> way will require a lot of intrusive changes. Do you think it's the
> better option ?

If skb is provided, then we could use its information.

Local TCP flows now provide l4 hash in skb, and automatically shuffles
it in case of dst failures.

This provides resilience over dead paths, over variety of
aggregation-like devices (bonding being an example)

# git grep -n4 sk_rethink_txhash
include/net/sock.h-1695-{
include/net/sock.h-1696-        sk->sk_txhash = net_tx_rndhash();
include/net/sock.h-1697-}
include/net/sock.h-1698-
include/net/sock.h:1699:static inline void sk_rethink_txhash(struct sock *sk)
include/net/sock.h-1700-{
include/net/sock.h-1701-        if (sk->sk_txhash)
include/net/sock.h-1702-                sk_set_txhash(sk);
include/net/sock.h-1703-}
--
include/net/sock.h-1725-static inline void dst_negative_advice(struct sock *sk)
include/net/sock.h-1726-{
include/net/sock.h-1727-        struct dst_entry *ndst, *dst = __sk_dst_get(sk);
include/net/sock.h-1728-
include/net/sock.h:1729:        sk_rethink_txhash(sk);
include/net/sock.h-1730-
include/net/sock.h-1731-        if (dst && dst->ops->negative_advice) {
include/net/sock.h-1732-                ndst = dst->ops->negative_advice(dst);
include/net/sock.h-1733-


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Oct. 29, 2015, 3:37 p.m. UTC | #4
On Thu, Oct 29, 2015 at 8:00 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Thu, 2015-10-29 at 07:31 -0700, Eric Dumazet wrote:
>> On Wed, 2015-10-28 at 18:39 +0100, Paolo Abeni wrote:
>> > This patch changes how the multipath hash is computed for locally
>> > generated UDP or TCP flows: now the hash comprises also l4 information
>> > (source and destination port).
>> >
>> > This allows better utilization of the available paths when the existing
>> > flows have the same source IP and the same destination IP: with l3 hash,
>> > even when multiple connections are in place simultaneously, a single path
>> > will be used, while with l4 hash we can use all the available paths.
>> >
>> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>
>> Please note that many skb already have a l4 hash. No need to perform yet
>> another dissection.
>
> This patch do not add dissection code: it use the information provided
> by the available flowi4 structure. Moreover the skb is not available on
> the calling site (in __ip_route_output_key_hash) and pushing it all the
> way will require a lot of intrusive changes. Do you think it's the
> better option ?
>
If there is an associated skb then skb_get_hash should be called to
get the hash. If you only have the flow structure then
get_hash_from_flowi4 (or skb_get_hash_flowi6 for IPv6 should be
called).

Tom

> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Abeni Oct. 29, 2015, 4:52 p.m. UTC | #5
On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote:
> On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote:
> 
> > This patch do not add dissection code: it use the information provided
> > by the available flowi4 structure. Moreover the skb is not available on
> > the calling site (in __ip_route_output_key_hash) and pushing it all the
> > way will require a lot of intrusive changes. Do you think it's the
> > better option ?
> 
> If skb is provided, then we could use its information.

I see your point, but providing an skb to __ip_route_output_key_hash()
is not very viable: it has a lot of indirect callers which are
problematic, i.e.:

__ip_route_output_key()
ip_route_output_flow()
inet_csk_route_req()
tcp_v4_send_synack()  <- skb available here, but created using dst
information.

I think it's better to use a flow-based hash.

Paolo


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 29, 2015, 6:43 p.m. UTC | #6
On Thu, 2015-10-29 at 17:52 +0100, Paolo Abeni wrote:
> On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote:
> > On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote:
> > 
> > > This patch do not add dissection code: it use the information provided
> > > by the available flowi4 structure. Moreover the skb is not available on
> > > the calling site (in __ip_route_output_key_hash) and pushing it all the
> > > way will require a lot of intrusive changes. Do you think it's the
> > > better option ?
> > 
> > If skb is provided, then we could use its information.
> 
> I see your point, but providing an skb to __ip_route_output_key_hash()
> is not very viable: it has a lot of indirect callers which are
> problematic, i.e.:
> 
> __ip_route_output_key()
> ip_route_output_flow()
> inet_csk_route_req()
> tcp_v4_send_synack()  <- skb available here, but created using dst
> information.
> 

I never said it was trivial.

I said : "If skb is provided, then we can use its l4hash"

If not, then sure, a flow-based hash fallback is better than nothing.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Abeni Oct. 29, 2015, 9:12 p.m. UTC | #7
On Thu, 2015-10-29 at 11:43 -0700, Eric Dumazet wrote:
> On Thu, 2015-10-29 at 17:52 +0100, Paolo Abeni wrote:
> > On Thu, 2015-10-29 at 08:28 -0700, Eric Dumazet wrote:
> > > On Thu, 2015-10-29 at 16:00 +0100, Paolo Abeni wrote:
> > > 
> > > > This patch do not add dissection code: it use the information provided
> > > > by the available flowi4 structure. Moreover the skb is not available on
> > > > the calling site (in __ip_route_output_key_hash) and pushing it all the
> > > > way will require a lot of intrusive changes. Do you think it's the
> > > > better option ?
> > > 
> > > If skb is provided, then we could use its information.
> > 
> > I see your point, but providing an skb to __ip_route_output_key_hash()
> > is not very viable: it has a lot of indirect callers which are
> > problematic, i.e.:
> > 
> > __ip_route_output_key()
> > ip_route_output_flow()
> > inet_csk_route_req()
> > tcp_v4_send_synack()  <- skb available here, but created using dst
> > information.
> > 
> 
> I never said it was trivial.
> 
> I said : "If skb is provided, then we can use its l4hash"
> 
> If not, then sure, a flow-based hash fallback is better than nothing.

Thank you for all the feedback. I'll resubmit using the flow-based hash.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ac5c6e8..56bf68c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -328,6 +328,18 @@  static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
 	return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1;
 }
 
+static inline int fib_multipath_output_hash(const struct flowi4 *fl4)
+{
+	if ((fl4->flowi4_proto == IPPROTO_TCP) ||
+	    (fl4->flowi4_proto == IPPROTO_UDP))
+		return jhash_3words(fl4->saddr, fl4->daddr,
+				    *((__u32 *)&fl4->uli.ports),
+				    fib_multipath_secret) >> 1;
+
+	return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret) >> 1;
+}
+
+
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
 		     struct flowi4 *fl4, int mp_hash);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 42778d9..8a18349 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1564,7 +1564,8 @@  void fib_select_path(struct net *net, struct fib_result *res,
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	if (res->fi->fib_nhs > 1 && fl4->flowi4_oif == 0) {
 		if (mp_hash < 0)
-			mp_hash = fib_multipath_hash(fl4->saddr, fl4->daddr);
+			mp_hash = fib_multipath_output_hash(fl4);
+
 		fib_select_multipath(res, mp_hash);
 	}
 	else