diff mbox

[net-next] net: ovs: use CRC32 accelerated flow hash if available

Message ID 1386669178-14450-1-git-send-email-ffusco@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Francesco Fusco Dec. 10, 2013, 9:52 a.m. UTC
Currently OVS uses jhash2() for calculating flow hashes in its
internal flow_hash() function. The performance of the flow_hash() 
function is critical, as the input data can be hundreds of bytes 
long.

One possible direction to achieve higher performance consists
of replacing jhash2() with an architecture specific hash
function pretty much like the Intel folks have proposed in
DPDK. DPDK provides a very fast hash function that leverages
the 32bit crc32l instruction part of the Intel SSE4.2 
instruction set.

OVS is largely deployed in x86_64 based datacenters.
Therefore, we argue that the performance critical fast path
of OVS should exploit underlying CPU features in order to reduce 
the per packet processing costs.

We adapted and integrated Intel's hashing function from DPDK in
order to be used with OVS internally. Our patch greatly reduces 
the hash footprint from ~200 cycles of jhash2() to around ~90 
cycles in case of ovs_flow_hash_crc() (measured with rdtsc over 
maximum length flow keys on an i7 Intel CPU).

Additionally, we wrote a microbenchmark to stress the flow table 
performance. The benchmark inserts random flows into the flow 
hash and then performs lookups. Our hash deployed on a CRC32 
capable CPU reduces the lookup for 1000 flows, 100 masks from 
~10,100us to ~6,700us, for example.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
---
 net/openvswitch/flow_table.c | 12 ++++++--
 net/openvswitch/flow_table.h | 66 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 3 deletions(-)

Comments

David Laight Dec. 10, 2013, 10:20 a.m. UTC | #1
> From: Francesco Fusco
...
> One possible direction to achieve higher performance consists
> of replacing jhash2() with an architecture specific hash
> function pretty much like the Intel folks have proposed in
> DPDK. DPDK provides a very fast hash function that leverages
> the 32bit crc32l instruction part of the Intel SSE4.2
> instruction set.
...

IIRC the cpu can execute multiple crc32l instructions in parallel.
If you use a software lookup table to crc (say) 256 zero bytes
it should be possible to crc chunks of a large buffer in parallel.

> +#ifdef CONFIG_X86_64

Why not also i386?

> +static inline u32 ovs_flow_hash_crc_4b(u32 crc, u32 val)
> +{
> +	asm ("crc32l %[val], %[crc]\n"
> +		: [crc] "+r" (crc)
> +		: [val] "rm" (val));
> +	return crc;
> +}
> +
> +static inline u32 ovs_flow_hash_crc(const u32 *data, u32 len, u32 seed)
> +{
> +	const u32 *p32 = (const u32 *) data;
> +	u32 i, tmp = 0;
> +
> +	for (i = 0; i < len; i++)
> +		seed = ovs_flow_hash_crc_4b(*p32++, seed);

Doesn't that have the arguments if the wrong order ?
Maybe it doesn't affect the result, but the asm pattern works better
the other way around.

> +	switch (3 - ((len * 4) & 0x03)) {

This looked like an obscure way to calculate the remainder,
then I noticed the 'len *4' which means it is always 3
(no residual).

> +	case 0:
> +		tmp |= *((const u8 *) p32 + 2) << 16;
> +		/* Fallthrough */
> +	case 1:
> +		tmp |= *((const u8 *) p32 + 1) << 8;
> +		/* Fallthrough */
> +	case 2:
> +		tmp |= *((const u8 *) p32);
> +		seed = ovs_flow_hash_crc_4b(tmp, seed);
> +	default:
> +		break;
> +	}
> +
> +	return seed;
> +}
> +#endif

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Dec. 10, 2013, 7:28 p.m. UTC | #2
On Tue, Dec 10, 2013 at 1:52 AM, Francesco Fusco <ffusco@redhat.com> wrote:
> Currently OVS uses jhash2() for calculating flow hashes in its
> internal flow_hash() function. The performance of the flow_hash()
> function is critical, as the input data can be hundreds of bytes
> long.
>
> One possible direction to achieve higher performance consists
> of replacing jhash2() with an architecture specific hash
> function pretty much like the Intel folks have proposed in
> DPDK. DPDK provides a very fast hash function that leverages
> the 32bit crc32l instruction part of the Intel SSE4.2
> instruction set.
>
> OVS is largely deployed in x86_64 based datacenters.
> Therefore, we argue that the performance critical fast path
> of OVS should exploit underlying CPU features in order to reduce
> the per packet processing costs.
>
> We adapted and integrated Intel's hashing function from DPDK in
> order to be used with OVS internally. Our patch greatly reduces
> the hash footprint from ~200 cycles of jhash2() to around ~90
> cycles in case of ovs_flow_hash_crc() (measured with rdtsc over
> maximum length flow keys on an i7 Intel CPU).
>
> Additionally, we wrote a microbenchmark to stress the flow table
> performance. The benchmark inserts random flows into the flow
> hash and then performs lookups. Our hash deployed on a CRC32
> capable CPU reduces the lookup for 1000 flows, 100 masks from
> ~10,100us to ~6,700us, for example.
>
> Signed-off-by: Francesco Fusco <ffusco@redhat.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Thomas Graf <tgraf@redhat.com>

I think this is definitely a good optimization to do given that so
much of the work that OVS does is hashing. However, isn't there a
library where there would be a more appropriate place to put this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 10, 2013, 7:36 p.m. UTC | #3
From: Jesse Gross <jesse@nicira.com>
Date: Tue, 10 Dec 2013 11:28:08 -0800

> I think this is definitely a good optimization to do given that so
> much of the work that OVS does is hashing. However, isn't there a
> library where there would be a more appropriate place to put this?

I also honestly don't see why we want to special case OVS at all
here.  This faster hashing would be useful for socket demux and
other locations in the kernel.

When I see changes like this my only reaction is "sad face".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 10, 2013, 8:47 p.m. UTC | #4
On 12/10/2013 08:36 PM, David Miller wrote:
> From: Jesse Gross <jesse@nicira.com>
> Date: Tue, 10 Dec 2013 11:28:08 -0800
>
>> I think this is definitely a good optimization to do given that so
>> much of the work that OVS does is hashing. However, isn't there a
>> library where there would be a more appropriate place to put this?
>
> I also honestly don't see why we want to special case OVS at all
> here.  This faster hashing would be useful for socket demux and
> other locations in the kernel.
>
> When I see changes like this my only reaction is "sad face".

We discussed this heavily and decided to go with the minimal approach
first to collect some feedback. We were not sure whether the runtime
check, function pointer and hardware dependencies are something other
subsystems that are less x86_64 centric would want to live with.

That said, we are _very_ willing to do the work and move it to lib/
to make it available to other consumers but one should be aware that
crc32 based hashes are not as generally usable as jhash.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Dec. 10, 2013, 9:27 p.m. UTC | #5
On Tue, Dec 10, 2013 at 11:36 AM, David Miller <davem@davemloft.net> wrote:
> From: Jesse Gross <jesse@nicira.com>
> Date: Tue, 10 Dec 2013 11:28:08 -0800
>
>> I think this is definitely a good optimization to do given that so
>> much of the work that OVS does is hashing. However, isn't there a
>> library where there would be a more appropriate place to put this?
>
> I also honestly don't see why we want to special case OVS at all
> here.  This faster hashing would be useful for socket demux and
> other locations in the kernel.
>

Also, we already do a lot of work to compute flow hashes for packets
in both TX and RX.  Can these be leveraged? For instance, if OVS is
computing a 5-tuple hash on an RX packet it's probably redundant to
skb->rxhash.

> When I see changes like this my only reaction is "sad face".
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Dec. 10, 2013, 9:36 p.m. UTC | #6
On Tue, Dec 10, 2013 at 12:47 PM, Thomas Graf <tgraf@redhat.com> wrote:
> On 12/10/2013 08:36 PM, David Miller wrote:
>>
>> From: Jesse Gross <jesse@nicira.com>
>> Date: Tue, 10 Dec 2013 11:28:08 -0800
>>
>>> I think this is definitely a good optimization to do given that so
>>> much of the work that OVS does is hashing. However, isn't there a
>>> library where there would be a more appropriate place to put this?
>>
>>
>> I also honestly don't see why we want to special case OVS at all
>> here.  This faster hashing would be useful for socket demux and
>> other locations in the kernel.
>>
>> When I see changes like this my only reaction is "sad face".
>
>
> We discussed this heavily and decided to go with the minimal approach
> first to collect some feedback. We were not sure whether the runtime
> check, function pointer and hardware dependencies are something other
> subsystems that are less x86_64 centric would want to live with.
>
> That said, we are _very_ willing to do the work and move it to lib/
> to make it available to other consumers but one should be aware that
> crc32 based hashes are not as generally usable as jhash.

Most hash users probably don't care what the actual hash function is,
as long as it is consistent. Couldn't you have a function that uses
CRC32 if it is available (or whatever is appropriate on a given
architecture) and jhash otherwise? Most users would then be able to
call this function without knowing or caring about the various quirks.

We could also potentially use the static key stuff to get the right
hash function as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Dec. 10, 2013, 9:40 p.m. UTC | #7
On Tue, Dec 10, 2013 at 1:27 PM, Tom Herbert <therbert@google.com> wrote:
> On Tue, Dec 10, 2013 at 11:36 AM, David Miller <davem@davemloft.net> wrote:
>> From: Jesse Gross <jesse@nicira.com>
>> Date: Tue, 10 Dec 2013 11:28:08 -0800
>>
>>> I think this is definitely a good optimization to do given that so
>>> much of the work that OVS does is hashing. However, isn't there a
>>> library where there would be a more appropriate place to put this?
>>
>> I also honestly don't see why we want to special case OVS at all
>> here.  This faster hashing would be useful for socket demux and
>> other locations in the kernel.
>>
>
> Also, we already do a lot of work to compute flow hashes for packets
> in both TX and RX.  Can these be leveraged? For instance, if OVS is
> computing a 5-tuple hash on an RX packet it's probably redundant to
> skb->rxhash.

That's true if we care about exactly the 5-tuple. However, the set of
fields that are included in the OVS flow lookup is dynamically
generated down to a bit mask so I don't know if it's worth special
casing. (In fact, I don't think that current OVS userspace will ever
actually emit exactly a 5-tuple.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 10, 2013, 10:38 p.m. UTC | #8
From: Thomas Graf <tgraf@redhat.com>
Date: Tue, 10 Dec 2013 21:47:48 +0100

> We were not sure whether the runtime check, function pointer and
> hardware dependencies are something other subsystems that are less
> x86_64 centric would want to live with.

The things you are listing as implicit overhead can be eliminated
almost entirely using static branches.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index e425427..262d7ca 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -43,13 +43,16 @@ 
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/ndisc.h>
-
+#ifdef CONFIG_X86
+#include <asm/cpufeature.h>
+#endif
 #include "datapath.h"
 
 #define TBL_MIN_BUCKETS		1024
 #define REHASH_INTERVAL		(10 * 60 * HZ)
 
 static struct kmem_cache *flow_cache;
+static u32 (*__flow_hash)(const u32 *data, u32 len, u32 seed) = jhash2;
 
 static u16 range_n_bytes(const struct sw_flow_key_range *range)
 {
@@ -362,7 +365,7 @@  static u32 flow_hash(const struct sw_flow_key *key, int key_start,
 	/* Make sure number of hash bytes are multiple of u32. */
 	BUILD_BUG_ON(sizeof(long) % sizeof(u32));
 
-	return jhash2(hash_key, hash_u32s, 0);
+	return __flow_hash(hash_key, hash_u32s, 0);
 }
 
 static int flow_key_start(const struct sw_flow_key *key)
@@ -581,7 +584,10 @@  int ovs_flow_init(void)
 					0, NULL);
 	if (flow_cache == NULL)
 		return -ENOMEM;
-
+#ifdef CONFIG_X86_64
+	if (cpu_has_xmm4_2)
+		__flow_hash = ovs_flow_hash_crc;
+#endif
 	return 0;
 }
 
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index fbe45d5..50cfd1e 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -14,6 +14,37 @@ 
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
  * 02110-1301, USA
+ *
+ * Some portions derived from code covered by the following notice:
+ *
+ * Copyright (c) 2010-2013 Intel Corporation. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in
+ *     the documentation and/or other materials provided with the
+ *     distribution.
+ *   * Neither the name of Intel Corporation nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #ifndef FLOW_TABLE_H
@@ -78,4 +109,39 @@  bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 		       const struct sw_flow_mask *mask);
+
+#ifdef CONFIG_X86_64
+static inline u32 ovs_flow_hash_crc_4b(u32 crc, u32 val)
+{
+	asm ("crc32l %[val], %[crc]\n"
+		: [crc] "+r" (crc)
+		: [val] "rm" (val));
+	return crc;
+}
+
+static inline u32 ovs_flow_hash_crc(const u32 *data, u32 len, u32 seed)
+{
+	const u32 *p32 = (const u32 *) data;
+	u32 i, tmp = 0;
+
+	for (i = 0; i < len; i++)
+		seed = ovs_flow_hash_crc_4b(*p32++, seed);
+
+	switch (3 - ((len * 4) & 0x03)) {
+	case 0:
+		tmp |= *((const u8 *) p32 + 2) << 16;
+		/* Fallthrough */
+	case 1:
+		tmp |= *((const u8 *) p32 + 1) << 8;
+		/* Fallthrough */
+	case 2:
+		tmp |= *((const u8 *) p32);
+		seed = ovs_flow_hash_crc_4b(tmp, seed);
+	default:
+		break;
+	}
+
+	return seed;
+}
+#endif
 #endif /* flow_table.h */