Message ID | 1271520633.16881.4754.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
> > With attached patch, I reached > > Throughput 4465.13 MB/sec 16 procs > > RFS better than no RPS/RFS :) > > So, the old idea to make rxhash consistent (same value in both > directions) is a win for some workloads (Consider connection tracking / > firewalling) > > port1 = ... > port2 = ... > addr1 = ... > addr2 = ... > if (addr1 > addr2) > exchange(addr1, addr2) > if (port1 > port2) > exchange(port, port2) > > hash = jhash(addr1, addr2, (port1<<16)+port2, ...) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 7abf959..6b757ff 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2280,8 +2280,10 @@ static int get_rps_cpu(struct net_device *dev, > struct sk_buff *skb, > case IPPROTO_AH: > case IPPROTO_SCTP: > case IPPROTO_UDPLITE: > - if (pskb_may_pull(skb, (ihl * 4) + 4)) > - ports = *((u32 *) (skb->data + (ihl * 4))); > + if (pskb_may_pull(skb, (ihl * 4) + 4)) { > + u16 *_ports = (u16 *)(skb->data + (ihl * 4)); > + ports = _ports[0] ^ _ports[1]; > + } > break; > > default: > That's cool!, but I still like the idea that this hash is treated as an opaque value, getting the hash from the device to avoid the jhash or cache misses on the packet can also be a win... Maybe connection tracking/firewall could use the skb->rxhash which provides the consistency and also eliminates the need to do more jhashes. > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Apr 18, 2010 at 1:38 AM, Tom Herbert <therbert@google.com> wrote: > That's cool!, but I still like the idea that this hash is treated as > an opaque value getting the hash from the device to avoid the jhash > or cache misses on the packet can also be a win... Maybe connection > tracking/firewall could use the skb->rxhash which provides the > consistency and also eliminates the need to do more jhashes. > consistent rxhash only adds the risk of the hash collision, and I don't think it is a big problem. For connection tracking/firewall use, I am afraid that we have to recompute this value after defrag. So we have to export the hash function we used in RPS. As NIC's hash function can be changed dynamically, the rxhash isn't consistent, so the rxhash can't be used by connection tracking, socket lookup and others come later.
Changli Gao wrote: > On Sun, Apr 18, 2010 at 1:38 AM, Tom Herbert <therbert@google.com> wrote: > >> That's cool!, but I still like the idea that this hash is treated as >> an opaque value getting the hash from the device to avoid the jhash >> or cache misses on the packet can also be a win... Maybe connection >> tracking/firewall could use the skb->rxhash which provides the >> consistency and also eliminates the need to do more jhashes. >> >> > > consistent rxhash only adds the risk of the hash collision, and I > don't think it is a big problem. For connection tracking/firewall use, > I am afraid that we have to recompute this value after defrag. So we > have to export the hash function we used in RPS. > > As NIC's hash function can be changed dynamically, the rxhash isn't > consistent, so the rxhash can't be used by connection tracking, socket > lookup and others come later. > > I have to agree with Eric and Changli here. It's especially true if you're passively tracking via one NIC, where all traffic is just forwarded. In this scenario, you need to compute consistent hashes. rxhashes by NIC will be different for "incoming" and "outgoing" traffic... Where rxhash by NIC can be used (note: didn't say _useful_) are scenarios with different net ports for incoming and outgoing traffic (in active but also passive traffic scenarios). Here, rxhashes could be used on a per-port basis, but associating two seemingly separate rxhashes with one another to match CPUs is a really annoying task. This would involve computing the corresponding "txhash" and looking it up, which is what we'd be doing with the jhash anyway. For proper flow tracking Eric's suggestion is the way to go. And if there are worries about collisions, why not add IPPROTO_* to the mix. Franco -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 17 Apr 2010 18:10:33 +0200 > Le vendredi 16 avril 2010 à 23:12 +0200, Eric Dumazet a écrit : >> My results are on a "tbench 16" on an dual X5570 @ 2.93GHz. >> (16 logical cpus) >> >> No RPS , no RFS : 4448.14 MB/sec >> RPS : 2298.00 MB/sec (but lot of variation) >> RFS : 2600 MB/sec >> >> Maybe my RFS setup is bad ? >> (8192 flows) >> > > With attached patch, I reached > > Throughput 4465.13 MB/sec 16 procs > > RFS better than no RPS/RFS :) > > So, the old idea to make rxhash consistent (same value in both > directions) is a win for some workloads (Consider connection tracking / > firewalling) Fun :-) I toyed around with this on my 128 cpu machine (2 NUMA nodes, 64 cpus each NUMA node). Vanilla net-next-2.6, no configuration changes: tbench 64: Throughput 1843.43 MB/sec 64 procs tbench 128: Throughput 1889.67 MB/sec 128 procs Vanilla net-next-2.6, rps_cpus="ffffffff,ffffffff,ffffffff,ffffffff" tbench 64: Throughput 1455.89 MB/sec 64 procs tbench 128: Throughput 2009.91 MB/sec 128 procs net-next-2.6 + Eric's port hashing patch, rps_cpus="ffffffff,ffffffff,ffffffff,ffffffff" tbench 64: Throughput 1593.13 MB/sec 64 procs tbench 128: Throughput 2367.27 MB/sec 128 procs -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Miller <davem@davemloft.net> Date: Mon, 19 Apr 2010 13:09:05 -0700 (PDT) > net-next-2.6 + Eric's port hashing patch, rps_cpus="ffffffff,ffffffff,ffffffff,ffffffff" > > tbench 64: Throughput 1593.13 MB/sec 64 procs > tbench 128: Throughput 2367.27 MB/sec 128 procs Eric, I think there is agreement that your patch is not a bad idea. Your original posting had whitespace damange in the patch plus I want to see a proper commit message and signoff, so could you please submit this formally? Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 19 avril 2010 à 13:23 -0700, David Miller a écrit : > From: David Miller <davem@davemloft.net> > Date: Mon, 19 Apr 2010 13:09:05 -0700 (PDT) > > > net-next-2.6 + Eric's port hashing patch, rps_cpus="ffffffff,ffffffff,ffffffff,ffffffff" > > > > tbench 64: Throughput 1593.13 MB/sec 64 procs > > tbench 128: Throughput 2367.27 MB/sec 128 procs > > Eric, I think there is agreement that your patch is not a bad idea. > > Your original posting had whitespace damange in the patch plus I want > to see a proper commit message and signoff, so could you please submit > this formally? Hmm, this was not a formal patch, just an information. Problem is if hardware provides rxhash, will it be "consistent" too ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 19 Apr 2010 22:32:01 +0200 > Hmm, this was not a formal patch, just an information. > > Problem is if hardware provides rxhash, will it be "consistent" too ? Yes, it is an issue. I am not aware of whether the Toeplitz hash computed by cards is impervious to the order of the input bits of not, probably it is. I was thinking also about how we could compute rxhash in the loopback driver :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 20, 2010 at 4:32 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Hmm, this was not a formal patch, just an information. > > Problem is if hardware provides rxhash, will it be "consistent" too ? > > Does this problem has relationship with your patch? No. If the rxhash isn't provided by hardware, we can get more throughput from you patch, and on the other side, we don't lose anything but potential more hash collision.
Le mardi 20 avril 2010 à 07:38 +0800, Changli Gao a écrit : > Does this problem has relationship with your patch? No. If the rxhash > isn't provided by hardware, we can get more throughput from you patch, > and on the other side, we don't lose anything but potential more hash > collision. > I am not sure what you call hash collision. There is no hash chain here. This 32bit hash is a jhash one, and we only need 1 to 12 bits in it, I am pretty sure its OK. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 19, 2010 at 10:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 20 avril 2010 à 07:38 +0800, Changli Gao a écrit : > >> Does this problem has relationship with your patch? No. If the rxhash >> isn't provided by hardware, we can get more throughput from you patch, >> and on the other side, we don't lose anything but potential more hash >> collision. >> > > I am not sure what you call hash collision. There is no hash chain here. > > This 32bit hash is a jhash one, and we only need 1 to 12 bits in it, I > am pretty sure its OK. > Maybe for the purposes of RPS, but hash collisions could definitely be an issue in RFS. If two active connections hit the same rps_flow entry this may cause thrashing of those connections between CPUs. I think your patch may increase the probability of this happening. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 20 avril 2010 à 08:04 -0700, Tom Herbert a écrit : > Maybe for the purposes of RPS, but hash collisions could definitely be > an issue in RFS. If two active connections hit the same rps_flow > entry this may cause thrashing of those connections between CPUs. I > think your patch may increase the probability of this happening. > Good point. I'll make a gathering of tcp tuples on a busy server over a day and try to compute number of clashes we can get with and without the addr/port swapping. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 19 avril 2010 à 14:19 -0700, David Miller a écrit : > > I was thinking also about how we could compute rxhash in the > loopback driver :-) This would be easy if rxhash was not a "struct inet_sock" field but a "struct sock" one sock_alloc_send_pskb() (or skb_set_owner_w()) skb->rxhash = sk->rxhash; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index 7abf959..6b757ff 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2280,8 +2280,10 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, case IPPROTO_AH: case IPPROTO_SCTP: case IPPROTO_UDPLITE: - if (pskb_may_pull(skb, (ihl * 4) + 4)) - ports = *((u32 *) (skb->data + (ihl * 4))); + if (pskb_may_pull(skb, (ihl * 4) + 4)) { + u16 *_ports = (u16 *)(skb->data + (ihl * 4)); + ports = _ports[0] ^ _ports[1]; + } break;