diff mbox

[v2,net-next,3/4] secure_seq: use SipHash in place of MD5

Message ID 20170107144057.15432-4-Jason@zx2c4.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Jason A. Donenfeld Jan. 7, 2017, 2:40 p.m. UTC
This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.

64-bit x86_64:
[    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
[    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
[    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
[    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526

32-bit x86:
[    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
[    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
[    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
[    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/secure_seq.c | 145 ++++++++++++++++++++++----------------------------
 1 file changed, 63 insertions(+), 82 deletions(-)

Comments

David Miller Jan. 7, 2017, 9:37 p.m. UTC | #1
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Sat,  7 Jan 2017 15:40:56 +0100

> This gives a clear speed and security improvement. Siphash is both
> faster and is more solid crypto than the aging MD5.
> 
> Rather than manually filling MD5 buffers, for IPv6, we simply create
> a layout by a simple anonymous struct, for which gcc generates
> rather efficient code. For IPv4, we pass the values directly to the
> short input convenience functions.
> 
> 64-bit x86_64:
> [    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
> [    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
> [    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
> [    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526
> 
> 32-bit x86:
> [    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
> [    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
> [    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
> [    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

This and the next patch are a real shame, performance wise, on cpus
that have single-instruction SHA1 and MD5 implementations.  Sparc64
has both, and I believe x86_64 can do SHA1 these days.

It took so long to get those instructions into real silicon, and then
have software implemented to make use of them as well.

Who knows when we'll see SipHash widely deployed in any instruction
set, if at all, right?  And by that time we'll possibly find out that
"Oh shit, this SipHash thing has flaws!" and we'll need
DIPPY_DO_DA_HASH and thus be forced back to a software implementation
again.

I understand the reasons why these patches are being proposed, I just
thought I'd mention the issue of cpus that implement secure hash
algorithm instructions.
Eric Biggers Jan. 7, 2017, 10:09 p.m. UTC | #2
Hi David,

On Sat, Jan 07, 2017 at 04:37:36PM -0500, David Miller wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Date: Sat,  7 Jan 2017 15:40:56 +0100
> 
> > This gives a clear speed and security improvement. Siphash is both
> > faster and is more solid crypto than the aging MD5.
[snip]
> 
> This and the next patch are a real shame, performance wise, on cpus
> that have single-instruction SHA1 and MD5 implementations.  Sparc64
> has both, and I believe x86_64 can do SHA1 these days.
> 
> It took so long to get those instructions into real silicon, and then
> have software implemented to make use of them as well.
> 
> Who knows when we'll see SipHash widely deployed in any instruction
> set, if at all, right?  And by that time we'll possibly find out that
> "Oh shit, this SipHash thing has flaws!" and we'll need
> DIPPY_DO_DA_HASH and thus be forced back to a software implementation
> again.
> 
> I understand the reasons why these patches are being proposed, I just
> thought I'd mention the issue of cpus that implement secure hash
> algorithm instructions.

Well, except those instructions aren't actually used in these places.  Although
x86_64 SHA1-NI accelerated SHA-1 is available in the Linux crypto API, it seems
that in kernel code it remains impractical to use these instructions on small
amounts of data because they use XMM registers, which means the overhead of
kernel_fpu_begin()/kernel_fpu_end() must be incurred.  Furthermore,
kernel_fpu_begin() is not allowed in all contexts so there has to be a fallback.

Out of curiosity, is this actually a solvable problem, e.g. by making the code
using the XMM registers responsible for saving and restoring the ones clobbered,
or by optimizing kernel_fpu_begin()/kernel_fpu_end()?  Or does it in fact remain
impractical for such instructions to be used for applications like this one?

Eric
David Miller Jan. 8, 2017, 1:42 a.m. UTC | #3
From: Eric Biggers <ebiggers3@gmail.com>
Date: Sat, 7 Jan 2017 14:09:11 -0800

> Well, except those instructions aren't actually used in these
> places.  Although x86_64 SHA1-NI accelerated SHA-1 is available in
> the Linux crypto API, it seems that in kernel code it remains
> impractical to use these instructions on small amounts of data
> because they use XMM registers, which means the overhead of
> kernel_fpu_begin()/kernel_fpu_end() must be incurred.  Furthermore,
> kernel_fpu_begin() is not allowed in all contexts so there has to be
> a fallback.
> 
> Out of curiosity, is this actually a solvable problem, e.g. by
> making the code using the XMM registers responsible for saving and
> restoring the ones clobbered, or by optimizing
> kernel_fpu_begin()/kernel_fpu_end()?  Or does it in fact remain
> impractical for such instructions to be used for applications like
> this one?

On x86 making the FPU save more tractible in situations like this is
really hard and will make the code significantly more complex.

It's simpler and cheaper on sparc64, and unlike on x86 there aren't
any fundament restrictions on where FPU stuff can be used.  This is
because we don't have "save all the FPU state" instructions and have
to do it all by hand anyways.

However I will note that just like x86, sparc64 doesn't override the
md5_transform() in lib/md5.c like it should.
Jason A. Donenfeld Jan. 8, 2017, 12:23 p.m. UTC | #4
Hi David,

On Sat, Jan 7, 2017 at 10:37 PM, David Miller <davem@davemloft.net> wrote:
> This and the next patch are a real shame, performance wise, on cpus
> that have single-instruction SHA1 and MD5 implementations.  Sparc64
> has both, and I believe x86_64 can do SHA1 these days.
>
> It took so long to get those instructions into real silicon, and then
> have software implemented to make use of them as well.

Actually, from a performance perspective, these patches are strictly
better than what was already there, since nothing actually used the
special instructions. They're also better security wise, because the
prior use of these functions was quite dubious. On x86, using the FPU
isn't really an option in these situations, as you well know. On
Sparc64, sure, I guess it's a bummer that silicon is lagging
cryptography. If after merging these improvements, you want to start
thinking about a special construction just for Sparc64 that would be
faster and have a matching security level, this would of course be
great. But so far, nobody even bothered to do this for the old
insecure slow code that this is replacing.

> Who knows when we'll see SipHash widely deployed in any instruction
> set, if at all, right?  And by that time we'll possibly find out that
> "Oh shit, this SipHash thing has flaws!" and we'll need
> DIPPY_DO_DA_HASH and thus be forced back to a software implementation
> again.

The literature and cryptanalyses on SipHash have been quite positive.
And as I mentioned earlier in patchset messages, SipHash is really
_not_ some newfangled hipster thing, but rather something that's been
around a while, pretty extensively studied, and considered quite
venerable. I think if you're going to bet on something SipHash is one
of the more safe bets to be made.

> I understand the reasons why these patches are being proposed, I just
> thought I'd mention the issue of cpus that implement secure hash
> algorithm instructions.

Yea, agreed, it's a bummer. Hopefully silicon will catch up someday,
and we'll all be happy. In the meantime, at least these patches
improve the situation on Linux.

I interpret your letter's omission of any substantive comments on the
code itself to be an indication that things are mostly sane. I'll
follow up with Eric's suggestions to produce a v3, and then hopefully
we can get this merged.

Regards,
Jason
David Laight Jan. 9, 2017, 1:18 p.m. UTC | #5
From: Eric Biggers
> Sent: 07 January 2017 22:09
..
> Out of curiosity, is this actually a solvable problem, e.g. by making the code
> using the XMM registers responsible for saving and restoring the ones clobbered,
> or by optimizing kernel_fpu_begin()/kernel_fpu_end()?  Or does it in fact remain
> impractical for such instructions to be used for applications like this one?

I think it should be possible to fast-save a couple of FP registers
by telling the IPI save code where to save them from (to load
into a different cpu).
But there might be issues with other parts of the FP state.
This might be ok provided the kernel doesn't actually use FP instructions.

(I thought about this when adding AVX support to NetBSD.)

	David
diff mbox

Patch

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..119e3ebe2f30 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,7 @@ 
+/*
+ * Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,18 +12,18 @@ 
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static siphash_key_t net_secret;
 
 static __always_inline void net_secret_init(void)
 {
-	net_get_random_once(net_secret, sizeof(net_secret));
+	net_get_random_once(&net_secret, sizeof(net_secret));
 }
 #endif
 
@@ -44,80 +48,70 @@  static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash(&combined, offsetofend(typeof(combined), dport),
+		       &net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash(&combined, offsetofend(typeof(combined), dport),
+		       &net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
 
 #ifdef CONFIG_INET
 
+/* secure_tcp_sequence_number(a, b, 0, d) == secure_ipv4_port_ephemeral(a, b, d),
+ * but fortunately, `sport' cannot be 0 in any circumstances. If this changes,
+ * it would be easy enough to have the former function use siphash_4u32, passing
+ * the arguments as separate u32.
+ */
+
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
+			    (__force u32)sport << 16 | (__force u32)dport,
+			    &net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return siphash_3u32((__force u32)saddr, (__force u32)daddr,
+			    (__force u16)dport, &net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +120,11 @@  EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash_3u32(saddr, daddr, (u32)sport << 16 | dport, &net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +133,23 @@  EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash(&combined, offsetofend(typeof(combined), dport),
+		      &net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);