diff mbox

[v2,3/4] secure_seq: use siphash24 instead of md5_transform

Message ID 20161214035927.30004-3-Jason@zx2c4.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jason A. Donenfeld Dec. 14, 2016, 3:59 a.m. UTC
This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
Changes from v1->v2:

  - Rebased on the latest 4.10, and now uses top 32-bits of siphash
    for the optional ts value.

 net/core/secure_seq.c | 160 +++++++++++++++++++++++++-------------------------
 1 file changed, 79 insertions(+), 81 deletions(-)

Comments

Jason A. Donenfeld Dec. 14, 2016, 12:53 p.m. UTC | #1
Hi David,

On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Jason A. Donenfeld
>> Sent: 14 December 2016 00:17
>> This gives a clear speed and security improvement. Rather than manually
>> filling MD5 buffers, we simply create a layout by a simple anonymous
>> struct, for which gcc generates rather efficient code.
> ...
>> +     const struct {
>> +             struct in6_addr saddr;
>> +             struct in6_addr daddr;
>> +             __be16 sport;
>> +             __be16 dport;
>> +     } __packed combined = {
>> +             .saddr = *(struct in6_addr *)saddr,
>> +             .daddr = *(struct in6_addr *)daddr,
>> +             .sport = sport,
>> +             .dport = dport
>> +     };
>
> You need to look at the effect of marking this (and the other)
> structures 'packed' on architectures like sparc64.

In all current uses of __packed in the code, I think the impact is
precisely zero, because all structures have members in descending
order of size, with each member being a perfect multiple of the one
below it. The __packed is therefore just there for safety, in case
somebody comes in and screws everything up by sticking a u8 in
between. In that case, it wouldn't be desirable to hash the structure
padding bits. In the worst case, I don't believe the impact would be
worse than a byte-by-byte memcpy, which is what the old code did. But
anyway, these structures are already naturally packed anyway, so the
present impact is nil.

Jason
Hannes Frederic Sowa Dec. 14, 2016, 1:16 p.m. UTC | #2
On 14.12.2016 13:53, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> +     const struct {
>>> +             struct in6_addr saddr;
>>> +             struct in6_addr daddr;
>>> +             __be16 sport;
>>> +             __be16 dport;
>>> +     } __packed combined = {
>>> +             .saddr = *(struct in6_addr *)saddr,
>>> +             .daddr = *(struct in6_addr *)daddr,
>>> +             .sport = sport,
>>> +             .dport = dport
>>> +     };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
> 
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.

__packed not only removes all padding of the struct but also changes the
alignment assumptions for the whole struct itself. The rule, the struct
is aligned by its maximum alignment of a member is no longer true. That
said, the code accessing this struct will change (not on archs that can
deal efficiently with unaligned access, but on others).

A proper test for not introducing padding is to use something with
BUILD_BUG_ON. Also gcc also clears the padding of the struct, so padding
shouldn't be that bad in there (it is better than byte access on mips).

Btw. I think gcc7 gets support for store merging optimization.

Bye,
Hannes
Jason A. Donenfeld Dec. 14, 2016, 1:44 p.m. UTC | #3
Hi Hannes,

Thanks for the feedback.

> __packed not only removes all padding of the struct but also changes the
> alignment assumptions for the whole struct itself. The rule, the struct
> is aligned by its maximum alignment of a member is no longer true. That
> said, the code accessing this struct will change (not on archs that can
> deal efficiently with unaligned access, but on others).

That's interesting. There currently aren't any alignment requirements
in siphash because we use the unaligned helper functions, but as David
pointed out in another thread, maybe that too should change. In that
case, we'd have an aligned-only version of the function that requires
8-byte aligned input. Perhaps the best way to go about that would be
to just mark the struct as __packed __aligned(8). Or, I guess, since
64-bit accesses gets split into two on 32-bit, that'd be best descried
as __packed __aligned(sizeof(long)). Would that be an acceptable
solution?

Jason
David Laight Dec. 14, 2016, 2:47 p.m. UTC | #4
From: Jason A. Donenfeld

> Sent: 14 December 2016 13:44

> To: Hannes Frederic Sowa

> > __packed not only removes all padding of the struct but also changes the

> > alignment assumptions for the whole struct itself. The rule, the struct

> > is aligned by its maximum alignment of a member is no longer true. That

> > said, the code accessing this struct will change (not on archs that can

> > deal efficiently with unaligned access, but on others).

> 

> That's interesting. There currently aren't any alignment requirements

> in siphash because we use the unaligned helper functions, but as David

> pointed out in another thread, maybe that too should change. In that

> case, we'd have an aligned-only version of the function that requires

> 8-byte aligned input. Perhaps the best way to go about that would be

> to just mark the struct as __packed __aligned(8). Or, I guess, since

> 64-bit accesses gets split into two on 32-bit, that'd be best descried

> as __packed __aligned(sizeof(long)). Would that be an acceptable

> solution?


Just remove the __packed and ensure that the structure is 'nice'.
This includes ensuring there is no 'tail padding'.
In some cases you'll need to put the port number into a 32bit field.

I'd also require that the key be aligned.
It probably ought to be a named structure type with two 64bit members
(or with an array member that has two elements).

	David
Jason A. Donenfeld Dec. 14, 2016, 5:49 p.m. UTC | #5
On Wed, Dec 14, 2016 at 3:47 PM, David Laight <David.Laight@aculab.com> wrote:
> Just remove the __packed and ensure that the structure is 'nice'.
> This includes ensuring there is no 'tail padding'.
> In some cases you'll need to put the port number into a 32bit field.

I'd rather not. There's no point in wasting extra cycles on hashing
useless data, just for some particular syntactic improvement. __packed
__aligned(8) will do what we want perfectly, I think.

> I'd also require that the key be aligned.

Yep, I'll certainly do this for the siphash24_aligned version in the v3.
David Miller Dec. 14, 2016, 5:56 p.m. UTC | #6
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Wed, 14 Dec 2016 13:53:10 +0100

> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between.

Just marking the structure __packed, whether necessary or not, makes
the compiler assume that the members are not aligned and causes
byte-by-byte accesses to be performed for words.

Never, _ever_, use __packed unless absolutely necessary, it pessimizes
the code on cpus that require proper alignment of types.
Jason A. Donenfeld Dec. 14, 2016, 6:06 p.m. UTC | #7
Hi David,

On Wed, Dec 14, 2016 at 6:56 PM, David Miller <davem@davemloft.net> wrote:
> Just marking the structure __packed, whether necessary or not, makes
> the compiler assume that the members are not aligned and causes
> byte-by-byte accesses to be performed for words.
> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
> the code on cpus that require proper alignment of types.

Oh, jimminy cricket, I did not realize that it made assignments
byte-by-byte *always*. So what options am I left with? What
immediately comes to mind are:

1)

struct {
    u64 a;
    u32 b;
    u32 c;
    u16 d;
    u8 end[];
} a = {
    .a = a,
    .b = b,
    .c = c,
    .d = d
};
siphash24(&a, offsetof(typeof(a), end), key);

2)

u8 bytes[sizeof(u64) + sizeof(u32) * 2 + sizeof(u16)];
*(u64 *)&bytes[0] = a;
*(u32 *)&bytes[sizeof(u64)] = b;
*(u32 *)&bytes[sizeof(u64) + sizeof(u32)] = c;
*(u16 *)&bytes[sizeof(u64) + sizeof(u32) * 2] = d;
siphash24(bytes, sizeof(bytes), key);


Personally I find (1) a bit neater than (2). What's your opinion?

Jason
Hannes Frederic Sowa Dec. 14, 2016, 7:22 p.m. UTC | #8
On 14.12.2016 19:06, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 6:56 PM, David Miller <davem@davemloft.net> wrote:
>> Just marking the structure __packed, whether necessary or not, makes
>> the compiler assume that the members are not aligned and causes
>> byte-by-byte accesses to be performed for words.
>> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
>> the code on cpus that require proper alignment of types.
> 
> Oh, jimminy cricket, I did not realize that it made assignments
> byte-by-byte *always*. So what options am I left with? What
> immediately comes to mind are:
> 
> 1)
> 
> struct {
>     u64 a;
>     u32 b;
>     u32 c;
>     u16 d;
>     u8 end[];

I don't think this helps. Did you test it? I don't see reason why
padding could be left out between `d' and `end' because of the flexible
array member?

Bye,
Hannes
Jason A. Donenfeld Dec. 14, 2016, 7:38 p.m. UTC | #9
Hi Hannes,

On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I don't think this helps. Did you test it? I don't see reason why
> padding could be left out between `d' and `end' because of the flexible
> array member?

Because the type u8 doesn't require any alignment requirements, it can
nestle right up there cozy with the u16:

zx2c4@thinkpad ~ $ cat a.c
#include <stdint.h>
#include <stdio.h>
#include <stddef.h>
int main()
{
       struct {
               uint64_t a;
               uint32_t b;
               uint32_t c;
               uint16_t d;
               char x[];
       } a;
       printf("%zu\n", sizeof(a));
       printf("%zu\n", offsetof(typeof(a), x));
       return 0;
}
zx2c4@thinkpad ~ $ gcc a.c
zx2c4@thinkpad ~ $ ./a.out
24
18

Jason
Tom Herbert Dec. 14, 2016, 8:12 p.m. UTC | #10
On Wed, Dec 14, 2016 at 4:53 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi David,
>
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> +     const struct {
>>> +             struct in6_addr saddr;
>>> +             struct in6_addr daddr;
>>> +             __be16 sport;
>>> +             __be16 dport;
>>> +     } __packed combined = {
>>> +             .saddr = *(struct in6_addr *)saddr,
>>> +             .daddr = *(struct in6_addr *)daddr,
>>> +             .sport = sport,
>>> +             .dport = dport
>>> +     };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
>
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.
>
If you pad the data structure to 64 bits then we can call the version
of siphash that only deals in 64 bit words. Writing a zero in the
padding will be cheaper than dealing with odd lengths in siphash24.

Tom

> Jason
Hannes Frederic Sowa Dec. 14, 2016, 8:27 p.m. UTC | #11
Hey Jason,

On 14.12.2016 20:38, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> I don't think this helps. Did you test it? I don't see reason why
>> padding could be left out between `d' and `end' because of the flexible
>> array member?
> 
> Because the type u8 doesn't require any alignment requirements, it can
> nestle right up there cozy with the u16:
> 
> zx2c4@thinkpad ~ $ cat a.c
> #include <stdint.h>
> #include <stdio.h>
> #include <stddef.h>
> int main()
> {
>        struct {
>                uint64_t a;
>                uint32_t b;
>                uint32_t c;
>                uint16_t d;
>                char x[];
>        } a;
>        printf("%zu\n", sizeof(a));
>        printf("%zu\n", offsetof(typeof(a), x));
>        return 0;
> }
> zx2c4@thinkpad ~ $ gcc a.c
> zx2c4@thinkpad ~ $ ./a.out
> 24
> 18

Sorry, I misread the patch. You are using offsetof. In this case remove
the char x[] and just use offsetofend because it is misleading
otherwise. Should work like that though.

What I don't really understand is that the addition of this complexity
actually reduces the performance, as you have to take the "if (left)"
branch during hashing and causes you to make a load_unaligned_zeropad.

Bye,
Hannes
Jason A. Donenfeld Dec. 14, 2016, 9:01 p.m. UTC | #12
On Wed, Dec 14, 2016 at 9:12 PM, Tom Herbert <tom@herbertland.com> wrote:
> If you pad the data structure to 64 bits then we can call the version
> of siphash that only deals in 64 bit words. Writing a zero in the
> padding will be cheaper than dealing with odd lengths in siphash24.
On Wed, Dec 14, 2016 at 9:27 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> What I don't really understand is that the addition of this complexity
> actually reduces the performance, as you have to take the "if (left)"
> branch during hashing and causes you to make a load_unaligned_zeropad.

Oh, duh, you guys are right. Fixed in my repo [1]. I'll submit the
next version in a day or so to let some other comments come in.

Thanks again for your reviews.

Jason

[1] https://git.zx2c4.com/linux-dev/log/?h=siphash
diff mbox

Patch

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..abadc79cd5d3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@ 
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,14 +10,14 @@ 
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static u8 net_secret[SIPHASH24_KEY_LEN];
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,39 @@  static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +88,37 @@  EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return seq_scale(siphash24((const u8 *)&combined, sizeof(combined), net_secret));
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +127,22 @@  EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +151,22 @@  EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);