diff mbox

net: fix saving TX flow hash in sock for outgoing connections

Message ID 1413994321-20435-1-git-send-email-sathya.perla@emulex.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Sathya Perla Oct. 22, 2014, 4:12 p.m. UTC
The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 net/ipv6/tcp_ipv6.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Oct. 22, 2014, 5:39 p.m. UTC | #1
On Wed, 2014-10-22 at 21:42 +0530, Sathya Perla wrote:
> The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
> introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
> and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
> to set skb->hash which is used to spread flows across multiple TXQs.
> 
> But, the above routines are invoked before the source port of the connection
> is created. Because of this all outgoing connections that just differ in the
> source port get hashed into the same TXQ.

Acked-by: Eric Dumazet <edumazet@google.com>

Are you really using the socket/flow hash to select a TXQ ?

Even with this patch, you have a good probability of multiple
cpus hitting same TXQ.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sathya Perla Oct. 22, 2014, 6:35 p.m. UTC | #2
> -----Original Message-----

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> 

> On Wed, 2014-10-22 at 21:42 +0530, Sathya Perla wrote:

> > The commit "net: Save TX flow hash in sock and set in skbuf on xmit"

> > introduced the inet_set_txhash() and ip6_set_txhash() routines to

> calculate

> > and record flow hash(sk_txhash) in the socket structure. sk_txhash is used

> > to set skb->hash which is used to spread flows across multiple TXQs.

> >

> > But, the above routines are invoked before the source port of the

> connection

> > is created. Because of this all outgoing connections that just differ in the

> > source port get hashed into the same TXQ.

> 

> Acked-by: Eric Dumazet <edumazet@google.com>

> 

> Are you really using the socket/flow hash to select a TXQ ?

Yes, as I don't have XPS configured on my setup.
netdev_pick_tx() uses the socket/flow hash when XPS is not used.

> 

> Even with this patch, you have a good probability of multiple

> cpus hitting same TXQ.

Agree. Are you suggesting that drivers should automatically
register an XPS configuration? I thought it was upto the user
to enable it...
Eric Dumazet Oct. 22, 2014, 7:09 p.m. UTC | #3
On Wed, 2014-10-22 at 18:35 +0000, Sathya Perla wrote:
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> > 
> > Are you really using the socket/flow hash to select a TXQ ?
> Yes, as I don't have XPS configured on my setup.
> netdev_pick_tx() uses the socket/flow hash when XPS is not used.

Yes, this is the (poor) fallback

> 
> > 
> > Even with this patch, you have a good probability of multiple
> > cpus hitting same TXQ.
> Agree. Are you suggesting that drivers should automatically
> register an XPS configuration? I thought it was upto the user
> to enable it...

Yep, search for netif_set_xps_queue()

(commit 537c00de1c9ba9876b9)

Look at commit d03a68f8217ea0349 for an example of how it can be done,
if user do not override this later.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 22, 2014, 8:14 p.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 22 Oct 2014 12:09:56 -0700

> On Wed, 2014-10-22 at 18:35 +0000, Sathya Perla wrote:
>> Agree. Are you suggesting that drivers should automatically
>> register an XPS configuration? I thought it was upto the user
>> to enable it...
> 
> Yep, search for netif_set_xps_queue()
> 
> (commit 537c00de1c9ba9876b9)
> 
> Look at commit d03a68f8217ea0349 for an example of how it can be done,
> if user do not override this later.

Very few people know about this :-/

I only see 4 drivers adjusted to do this, it would be nice if this
was more widespread.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 22, 2014, 8:15 p.m. UTC | #5
From: Sathya Perla <sathya.perla@emulex.com>
Date: Wed, 22 Oct 2014 21:42:01 +0530

> The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
> introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
> and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
> to set skb->hash which is used to spread flows across multiple TXQs.
> 
> But, the above routines are invoked before the source port of the connection
> is created. Because of this all outgoing connections that just differ in the
> source port get hashed into the same TXQ.
> 
> This patch fixes this problem for IPv4/6 by invoking the the above routines
> after the source port is available for the socket.
> 
> Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 1, 2014, 1:30 a.m. UTC | #6
On Wed, 2014-10-22 at 16:14 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 22 Oct 2014 12:09:56 -0700
> 
> > On Wed, 2014-10-22 at 18:35 +0000, Sathya Perla wrote:
> >> Agree. Are you suggesting that drivers should automatically
> >> register an XPS configuration? I thought it was upto the user
> >> to enable it...
> > 
> > Yep, search for netif_set_xps_queue()
> > 
> > (commit 537c00de1c9ba9876b9)
> > 
> > Look at commit d03a68f8217ea0349 for an example of how it can be done,
> > if user do not override this later.
> 
> Very few people know about this :-/
> 
> I only see 4 drivers adjusted to do this, it would be nice if this
> was more widespread.

It seems to require that the driver also sets IRQ affinity hints, which
I've never been very comfortable with.  Drivers either set queue n = CPU
n without regard for topology, or optimise for whichever
micro-architecture the vendor most cares about.  irqbalance then
slavishly follows these hints, so we have each driver (rather than the
administrator) setting policy.

Alternately the driver could create an irq_cpu_rmap for its TX interupts
and that could be used to select queues based on current affinity
(<http://thread.gmane.org/gmane.linux.network/186698>).  But we already
seem to have too many ways to do TX queue selection.

Ben.
diff mbox

Patch

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 94d1a77..9c7d762 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -206,8 +206,6 @@  int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->inet_dport = usin->sin_port;
 	inet->inet_daddr = daddr;
 
-	inet_set_txhash(sk);
-
 	inet_csk(sk)->icsk_ext_hdr_len = 0;
 	if (inet_opt)
 		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
@@ -224,6 +222,8 @@  int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (err)
 		goto failure;
 
+	inet_set_txhash(sk);
+
 	rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
 			       inet->inet_sport, inet->inet_dport, sk);
 	if (IS_ERR(rt)) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8314955..ace29b6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -200,8 +200,6 @@  static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	sk->sk_v6_daddr = usin->sin6_addr;
 	np->flow_label = fl6.flowlabel;
 
-	ip6_set_txhash(sk);
-
 	/*
 	 *	TCP over IPv4
 	 */
@@ -297,6 +295,8 @@  static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (err)
 		goto late_failure;
 
+	ip6_set_txhash(sk);
+
 	if (!tp->write_seq && likely(!tp->repair))
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
 							     sk->sk_v6_daddr.s6_addr32,