diff mbox

[v7,3/6] random: use SipHash in place of MD5

Message ID 20161221230216.25341-4-Jason@zx2c4.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jason A. Donenfeld Dec. 21, 2016, 11:02 p.m. UTC
This duplicates the current algorithm for get_random_int/long, but uses
siphash instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
separates hashed fields into three values instead of one, in order to
increase diffusion.

The previous MD5 algorithm used a per-cpu MD5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte MD5 state, there is
instead a per-cpu previously returned value for chaining.

The speed benefits are substantial:

                | siphash | md5    | speedup |
		------------------------------
get_random_long | 137130  | 415983 | 3.03x   |
get_random_int  | 86384   | 343323 | 3.97x   |

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ted Tso <tytso@mit.edu>
---
 drivers/char/random.c  | 84 +++++++++++++++++++++++++++++---------------------
 include/linux/random.h |  1 -
 init/main.c            |  1 -
 3 files changed, 49 insertions(+), 37 deletions(-)

Comments

Jason A. Donenfeld Dec. 21, 2016, 11:13 p.m. UTC | #1
Hi Ted,

On Thu, Dec 22, 2016 at 12:02 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> This duplicates the current algorithm for get_random_int/long

I should have mentioned this directly in the commit message, which I
forgot to update: this v7 adds the time-based key rotation, which,
while not strictly necessary for ensuring the security of the RNG,
might help alleviate some concerns, as we talked about. Performance is
quite good on both 32-bit and 64-bit -- better than MD5 in both cases.

If you like this, terrific. If not, I'm happy to take this in whatever
direction you prefer, and implement whatever construction you think
best. There's been a lot of noise on this list about it; we can
continue to discuss more, or you can just tell me whatever you want to
do, and I'll implement it and that'll be the end of it. As you said,
we can always get something decent now and improve it later.

Alternatively, if you've decided in the end you prefer your batched
entropy approach using chacha, I'm happy to implement a polished
version of that here in this patch series (so that we can keep the `rm
lib/md5.c` commit.)

Just let me know how you'd like to proceed.

Thanks,
Jason
Andy Lutomirski Dec. 21, 2016, 11:42 p.m. UTC | #2
On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>  unsigned int get_random_int(void)
>  {
> -       __u32 *hash;
> -       unsigned int ret;
> -
> -       if (arch_get_random_int(&ret))
> -               return ret;
> -
> -       hash = get_cpu_var(get_random_int_hash);
> -
> -       hash[0] += current->pid + jiffies + random_get_entropy();
> -       md5_transform(hash, random_int_secret);
> -       ret = hash[0];
> -       put_cpu_var(get_random_int_hash);
> -
> -       return ret;
> +       unsigned int arch_result;
> +       u64 result;
> +       struct random_int_secret *secret;
> +
> +       if (arch_get_random_int(&arch_result))
> +               return arch_result;
> +
> +       secret = get_random_int_secret();
> +       result = siphash_3u64(secret->chaining, jiffies,
> +                             (u64)random_get_entropy() + current->pid,
> +                             secret->secret);
> +       secret->chaining += result;
> +       put_cpu_var(secret);
> +       return result;
>  }
>  EXPORT_SYMBOL(get_random_int);

Hmm.  I haven't tried to prove anything for real.  But here goes (in
the random oracle model):

Suppose I'm an attacker and I don't know the secret or the chaining
value.  Then, regardless of what the entropy is, I can't predict the
numbers.

Now suppose I do know the secret and the chaining value due to some
leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
to find a value "result" such that prev_chaining + result = chaining
and result = H(prev_chaining, ..., secret);.  I don't think this can
be done efficiently in the random oracle model regardless of what the
"..." is.

But, if I know the secret and chaining value, I can predict the next
output assuming I can guess the entropy.  What's worse is that, even
if I can't guess the entropy, if I *observe* the next output then I
can calculate the next chaining value.

So this is probably good enough, and making it better is hard.  Changing it to:

u64 entropy = (u64)random_get_entropy() + current->pid;
result = siphash(..., entropy, ...);
secret->chaining += result + entropy;

would reduce this problem by forcing an attacker to brute-force the
entropy on each iteration, which is probably an improvement.

To fully fix it, something like "catastrophic reseeding" would be
needed, but that's hard to get right.

(An aside: on x86 at least, using two percpu variables is faster
because directly percpu access is essentially free, whereas getting
the address of a percpu variable is not free.)
Hannes Frederic Sowa Dec. 22, 2016, 2:07 a.m. UTC | #3
On 22.12.2016 00:42, Andy Lutomirski wrote:
> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>  unsigned int get_random_int(void)
>>  {
>> -       __u32 *hash;
>> -       unsigned int ret;
>> -
>> -       if (arch_get_random_int(&ret))
>> -               return ret;
>> -
>> -       hash = get_cpu_var(get_random_int_hash);
>> -
>> -       hash[0] += current->pid + jiffies + random_get_entropy();
>> -       md5_transform(hash, random_int_secret);
>> -       ret = hash[0];
>> -       put_cpu_var(get_random_int_hash);
>> -
>> -       return ret;
>> +       unsigned int arch_result;
>> +       u64 result;
>> +       struct random_int_secret *secret;
>> +
>> +       if (arch_get_random_int(&arch_result))
>> +               return arch_result;
>> +
>> +       secret = get_random_int_secret();
>> +       result = siphash_3u64(secret->chaining, jiffies,
>> +                             (u64)random_get_entropy() + current->pid,
>> +                             secret->secret);
>> +       secret->chaining += result;
>> +       put_cpu_var(secret);
>> +       return result;
>>  }
>>  EXPORT_SYMBOL(get_random_int);
> 
> Hmm.  I haven't tried to prove anything for real.  But here goes (in
> the random oracle model):
> 
> Suppose I'm an attacker and I don't know the secret or the chaining
> value.  Then, regardless of what the entropy is, I can't predict the
> numbers.
> 
> Now suppose I do know the secret and the chaining value due to some
> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
> to find a value "result" such that prev_chaining + result = chaining
> and result = H(prev_chaining, ..., secret);.  I don't think this can
> be done efficiently in the random oracle model regardless of what the
> "..." is.
> 
> But, if I know the secret and chaining value, I can predict the next
> output assuming I can guess the entropy.  What's worse is that, even
> if I can't guess the entropy, if I *observe* the next output then I
> can calculate the next chaining value.
> 
> So this is probably good enough, and making it better is hard.  Changing it to:
> 
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
> 
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.
> 
> To fully fix it, something like "catastrophic reseeding" would be
> needed, but that's hard to get right.

I wonder if Ted's proposal was analyzed further in terms of performance
if get_random_int should provide cprng alike properties?

For reference: https://lkml.org/lkml/2016/12/14/351

The proposal made sense to me and would completely solve the above
mentioned problem on the cost of repeatedly reseeding from the crng.

Bye,
Hannes
Andy Lutomirski Dec. 22, 2016, 2:09 a.m. UTC | #4
On Wed, Dec 21, 2016 at 6:07 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 22.12.2016 00:42, Andy Lutomirski wrote:
>> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>>  unsigned int get_random_int(void)
>>>  {
>>> -       __u32 *hash;
>>> -       unsigned int ret;
>>> -
>>> -       if (arch_get_random_int(&ret))
>>> -               return ret;
>>> -
>>> -       hash = get_cpu_var(get_random_int_hash);
>>> -
>>> -       hash[0] += current->pid + jiffies + random_get_entropy();
>>> -       md5_transform(hash, random_int_secret);
>>> -       ret = hash[0];
>>> -       put_cpu_var(get_random_int_hash);
>>> -
>>> -       return ret;
>>> +       unsigned int arch_result;
>>> +       u64 result;
>>> +       struct random_int_secret *secret;
>>> +
>>> +       if (arch_get_random_int(&arch_result))
>>> +               return arch_result;
>>> +
>>> +       secret = get_random_int_secret();
>>> +       result = siphash_3u64(secret->chaining, jiffies,
>>> +                             (u64)random_get_entropy() + current->pid,
>>> +                             secret->secret);
>>> +       secret->chaining += result;
>>> +       put_cpu_var(secret);
>>> +       return result;
>>>  }
>>>  EXPORT_SYMBOL(get_random_int);
>>
>> Hmm.  I haven't tried to prove anything for real.  But here goes (in
>> the random oracle model):
>>
>> Suppose I'm an attacker and I don't know the secret or the chaining
>> value.  Then, regardless of what the entropy is, I can't predict the
>> numbers.
>>
>> Now suppose I do know the secret and the chaining value due to some
>> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
>> to find a value "result" such that prev_chaining + result = chaining
>> and result = H(prev_chaining, ..., secret);.  I don't think this can
>> be done efficiently in the random oracle model regardless of what the
>> "..." is.
>>
>> But, if I know the secret and chaining value, I can predict the next
>> output assuming I can guess the entropy.  What's worse is that, even
>> if I can't guess the entropy, if I *observe* the next output then I
>> can calculate the next chaining value.
>>
>> So this is probably good enough, and making it better is hard.  Changing it to:
>>
>> u64 entropy = (u64)random_get_entropy() + current->pid;
>> result = siphash(..., entropy, ...);
>> secret->chaining += result + entropy;
>>
>> would reduce this problem by forcing an attacker to brute-force the
>> entropy on each iteration, which is probably an improvement.
>>
>> To fully fix it, something like "catastrophic reseeding" would be
>> needed, but that's hard to get right.
>
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.
>

Unless I've misunderstood it, Ted's proposal causes get_random_int()
to return bytes straight from urandom (effectively), which should make
it very strong.  And if urandom is competitively fast now, I don't see
the problem.  ChaCha20 is designed for speed, after all.
Jason A. Donenfeld Dec. 22, 2016, 2:31 a.m. UTC | #5
Hi Andy,

On Thu, Dec 22, 2016 at 12:42 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> So this is probably good enough, and making it better is hard.  Changing it to:
>
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
>
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.

Ahh, so that's the reasoning behind a similar suggestion of yours in a
previous email. Makes sense to me. I'll include this in the next merge
if we don't come up with a different idea before then. Your reasoning
seems good for it.

Part of what makes this process a bit goofy is that it's not all
together clear what the design goals are. Right now we're going for
"not worse than before", which we've nearly achieved. How good of an
RNG do we want? I'm willing to examine and analyze the security and
performance of all constructions we can come up with. One thing I
don't want to do, however, is start tweaking the primitives themselves
in ways not endorsed by the designers. So, I believe that precludes
things like carrying over SipHash's internal state (like what was done
with MD5), because there hasn't been a formal security analysis of
this like there has with other uses of SipHash. I also don't want to
change any internals of how SipHash actually works. I mention that
because of some of the suggestions on other threads, which make me
rather uneasy.

So with that said, while writing this reply to you, I was
simultaneously reading some other crypto code and was reminded that
there's a variant of SipHash which outputs an additional 64-bits; it's
part of the siphash reference code, which they call the "128-bit
mode". It has the benefit that we can return 64-bits to the caller and
save 64-bits for the chaining key. That way there's no correlation
between the returned secret and the chaining key, which I think would
completely alleviate all of your concerns, and simplify the analysis a
bit.

Here's what it looks like:
https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=46fbe5b408e66b2d16b4447860f8083480e1c08d

The downside is that it takes 4 extra Sip rounds. This puts the
performance still better than MD5, though, and likely still better
than the other batched entropy solution. We could optimize this, I
suppose, by giving it only two parameters -- chaining,
jiffies+entropy+pid -- instead of the current three -- chaining,
jiffies, entropy+pid -- which would then shave off 2 Sip rounds. But I
liked the idea of having a bit more spread in the entropy input field.

Anyway, with this in mind, we now have three possibilities:

1. result = siphash(chaining, entropy, key); chaining += result + entropy
2. result = siphash_extra_output(chaining, entropy, key, &chaining);
3. Ted's batched entropy idea using chacha20

The more I think about this, the more I suspect that we should just
use chacha20. It will still be faster than MD5. I don't like the
non-determinism of it (some processes will start slower than others,
if the batched entropy has run out and ASLR demands more), but I guess
I can live with that. But, most importantly, it greatly simplifies
both the security analysis and what we can promise to callers about
the function. Right now in the comment documentation, we're coy with
callers about the security of the RNG. If we moved to a known
construction like chacha20/get_random_bytes_batched, then we could
just be straight up with a promise that the numbers it returns are
high quality.

Thoughts on 2 and 3, and on 1 vs 2 vs 3?

Jason
Jason A. Donenfeld Dec. 22, 2016, 2:49 a.m. UTC | #6
Hi Andy & Hannes,

On Thu, Dec 22, 2016 at 3:07 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.

On Thu, Dec 22, 2016 at 3:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Unless I've misunderstood it, Ted's proposal causes get_random_int()
> to return bytes straight from urandom (effectively), which should make
> it very strong.  And if urandom is competitively fast now, I don't see
> the problem.  ChaCha20 is designed for speed, after all.

Funny -- while you guys were sending this back & forth, I was writing
my reply to Andy which essentially arrives at the same conclusion.
Given that we're all arriving to the same thing, and that Ted shot in
this direction long before we all did, I'm leaning toward abandoning
SipHash for the de-MD5-ification of get_random_int/long, and working
on polishing Ted's idea into something shiny for this patchset.

I did have two objections to this. The first was that my SipHash
construction is faster. But in any case, they're both faster than the
current MD5, so it's just extra rice. The second, and the more
important one, was that batching entropy up like this means that 32
calls will be really fast, and then the 33rd will be slow, since it
has to do a whole ChaCha round, because get_random_bytes must be
called to refill the batch. Since get_random_long is called for every
process startup, I didn't really like there being inconsistent
performance on process startup. And I'm pretty sure that one ChaCha
whole block is slower than computing MD5, even though it lasts 32
times as long, though I need to measure this. But maybe that's dumb in
the end? Are these concerns that should point us toward the
determinism (and speed) of SipHash? Are these concerns that don't
matter and so we should roll with the simplicity of reusing ChaCha?

Jason
Jason A. Donenfeld Dec. 22, 2016, 3:12 a.m. UTC | #7
On Thu, Dec 22, 2016 at 3:49 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I did have two objections to this. The first was that my SipHash
> construction is faster. But in any case, they're both faster than the
> current MD5, so it's just extra rice. The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch. Since get_random_long is called for every
> process startup, I didn't really like there being inconsistent
> performance on process startup. And I'm pretty sure that one ChaCha
> whole block is slower than computing MD5, even though it lasts 32
> times as long, though I need to measure this. But maybe that's dumb in
> the end? Are these concerns that should point us toward the
> determinism (and speed) of SipHash? Are these concerns that don't
> matter and so we should roll with the simplicity of reusing ChaCha?

I ran some measurements in order to quantify what I'm talking about.
Repeatedly running md5_transform is about 2.3 times faster than
repeatedly running extract_crng. What does this mean?

One call to extract_crng gives us 32 times as many longs as one call
to md5_transform. This means that spread over 32 process creations,
chacha will be 13.9 times faster. However, every 32nd process will
take 2.3 times as long to generate its ASLR value as it would with the
old md5_transform code.

Personally, I don't think that 2.3 is a big deal. And I really like
how much this simplifies the analysis.
But if it's a big deal to you, then we can continue to discuss my
SipHash construction, which gives faster and more consistent
performance, at the cost of a more complicated and probably less
impressive security analysis.

Jason
diff mbox

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..ea9858d9d8b4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@ 
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,17 +2043,31 @@  struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
 
-int random_int_secret_init(void)
+struct random_int_secret {
+	siphash_key_t secret;
+	u64 chaining;
+	unsigned long birthdate;
+	bool initialized;
+};
+static DEFINE_PER_CPU(struct random_int_secret, random_int_secret);
+
+enum {
+	SECRET_ROTATION_TIME = HZ * 60 * 5
+};
+
+static struct random_int_secret *get_random_int_secret(void)
 {
-	get_random_bytes(random_int_secret, sizeof(random_int_secret));
-	return 0;
+	struct random_int_secret *secret = &get_cpu_var(random_int_secret);
+	if (unlikely(!secret->initialized ||
+		     !time_is_after_jiffies(secret->birthdate + SECRET_ROTATION_TIME))) {
+		secret->initialized = true;
+		secret->birthdate = jiffies;
+		get_random_bytes(secret->secret, sizeof(secret->secret));
+	}
+	return secret;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
-
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
  * with the goal of minimal entropy pool depletion. As a result, the random
@@ -2061,20 +2076,20 @@  static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
-	unsigned int ret;
-
-	if (arch_get_random_int(&ret))
-		return ret;
-
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
-	return ret;
+	unsigned int arch_result;
+	u64 result;
+	struct random_int_secret *secret;
+
+	if (arch_get_random_int(&arch_result))
+		return arch_result;
+
+	secret = get_random_int_secret();
+	result = siphash_3u64(secret->chaining, jiffies,
+			      (u64)random_get_entropy() + current->pid,
+			      secret->secret);
+	secret->chaining += result;
+	put_cpu_var(secret);
+	return result;
 }
 EXPORT_SYMBOL(get_random_int);
 
@@ -2083,20 +2098,19 @@  EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
-	unsigned long ret;
-
-	if (arch_get_random_long(&ret))
-		return ret;
-
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
-	return ret;
+	unsigned long arch_result;
+	u64 result;
+	struct random_int_secret *secret;
+
+	if (arch_get_random_long(&arch_result))
+		return arch_result;
+
+	secret = get_random_int_secret();
+	result = siphash_3u64(secret->chaining, jiffies, random_get_entropy() +
+			      current->pid, secret->secret);
+	secret->chaining += result;
+	put_cpu_var(secret);
+	return result;
 }
 EXPORT_SYMBOL(get_random_long);
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 7bd2403e4fef..16ab429735a7 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -37,7 +37,6 @@  extern void get_random_bytes(void *buf, int nbytes);
 extern int add_random_ready_callback(struct random_ready_callback *rdy);
 extern void del_random_ready_callback(struct random_ready_callback *rdy);
 extern void get_random_bytes_arch(void *buf, int nbytes);
-extern int random_int_secret_init(void);
 
 #ifndef MODULE
 extern const struct file_operations random_fops, urandom_fops;
diff --git a/init/main.c b/init/main.c
index 23c275cca73a..a3af9296cafd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -879,7 +879,6 @@  static void __init do_basic_setup(void)
 	do_ctors();
 	usermodehelper_enable();
 	do_initcalls();
-	random_int_secret_init();
 }
 
 static void __init do_pre_smp_initcalls(void)