Patchwork get_random_int() should use hash[1]

login
register
mail settings
Submitter George Spelvin
Date Aug. 16, 2011, 9:07 a.m.
Message ID <20110816090723.18492.qmail@science.horizon.com>
Download mbox | patch
Permalink /patch/110161/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

George Spelvin - Aug. 16, 2011, 9:07 a.m.
Re: commit e997d47bff5a467262ef224b4cf8cbba2d3eceea

As long as you're using MD5, you should know that each round only
modifies one word of the state.  The order is [0], [3], [2], [1],
repeating 64 times.  Thus, on output, word [1] is the "most hashed"
word.  If you really wanted word [0], you could just skip the last
3 rounds.

It's not really critical, but as long as you're performing the
rounds, you might as well use them...


Me, I'd also put jiffies and get_cycles into different words
just on general principles, but that's up to you:

- 	hash[0] += current->pid + jiffies + get_cycles();
+ 	hash[1] += current->pid + jiffies;
+ 	hash[2] += get_cycles();
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 16, 2011, 9:09 a.m.
From: "George Spelvin" <linux@horizon.com>
Date: 16 Aug 2011 05:07:23 -0400

> Re: commit e997d47bff5a467262ef224b4cf8cbba2d3eceea
> 
> As long as you're using MD5, you should know that each round only
> modifies one word of the state.  The order is [0], [3], [2], [1],
> repeating 64 times.  Thus, on output, word [1] is the "most hashed"
> word.  If you really wanted word [0], you could just skip the last
> 3 rounds.
> 
> It's not really critical, but as long as you're performing the
> rounds, you might as well use them...

Please provide a proper signoff with your change and properly
"-p1" base your patch.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 16, 2011, 9:10 a.m.
From: David Miller <davem@davemloft.net>
Date: Tue, 16 Aug 2011 02:09:35 -0700 (PDT)

> From: "George Spelvin" <linux@horizon.com>
> Date: 16 Aug 2011 05:07:23 -0400
> 
>> Re: commit e997d47bff5a467262ef224b4cf8cbba2d3eceea
>> 
>> As long as you're using MD5, you should know that each round only
>> modifies one word of the state.  The order is [0], [3], [2], [1],
>> repeating 64 times.  Thus, on output, word [1] is the "most hashed"
>> word.  If you really wanted word [0], you could just skip the last
>> 3 rounds.
>> 
>> It's not really critical, but as long as you're performing the
>> rounds, you might as well use them...
> 
> Please provide a proper signoff with your change and properly
> "-p1" base your patch.
> 
> Thanks.

Also this change is of interest to, and effects, folks outside of
networking.  So probably want to CC: linux-kernel when you respin
this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Aug. 16, 2011, 10:30 a.m.
From davem@davemloft.net Tue Aug 16 09:10:35 2011
Date: Tue, 16 Aug 2011 02:10:31 -0700 (PDT)
To: linux@horizon.com
Cc: netdev@vger.kernel.org
Subject: Re: get_random_int() should use hash[1]
From: David Miller <davem@davemloft.net>
In-Reply-To: <20110816.020935.717525957035990843.davem@davemloft.net>
References: <20110816090723.18492.qmail@science.horizon.com>
	<20110816.020935.717525957035990843.davem@davemloft.net>
X-Mailer: Mew version 6.3 on Emacs 23.2 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (shards.monkeyblade.net [198.137.202.13]); Tue, 16 Aug 2011 02:10:33 -0700 (PDT)

From: David Miller <davem@davemloft.net>
Date: Tue, 16 Aug 2011 02:09:35 -0700 (PDT)

> From: "George Spelvin" <linux@horizon.com>
> Date: 16 Aug 2011 05:07:23 -0400
> 
>> Re: commit e997d47bff5a467262ef224b4cf8cbba2d3eceea
>> 
>> As long as you're using MD5, you should know that each round only
>> modifies one word of the state.  The order is [0], [3], [2], [1],
>> repeating 64 times.  Thus, on output, word [1] is the "most hashed"
>> word.  If you really wanted word [0], you could just skip the last
>> 3 rounds.
>> 
>> It's not really critical, but as long as you're performing the
>> rounds, you might as well use them...
> 
> Please provide a proper signoff with your change and properly
> "-p1" base your patch.

Not a problem.  This came up in the middle of a rebase operation so I
didn't have a tree immediately at hand to work with.

I'll also get the various uses in net/core/secure_seq.c.

One thing about that commit I'm becoming more concerneed by: I notice that
it eliminates the periodic reseeding of the secret.

While the reduction in the number of random bits was a tradeoff (and it
can be increased to 28 or so), it had two great advantages:
- The usefulness of an attack drops off sharply after 5 minutes (you
  can still attack connections established during the attack window,
  but then you have to guess how much data has been sent across them).
- An initial shortage of seed entropy does not become a persistent
  problem.  Note that late_initcall() is still before any device
  activity, much less entropy pool re-seeding from init scripts.

Put together, an attacker has the system uptime to try to guess
the low-entropy boot seed.  That's not clearly a security improvement.

What I *really* wonder is whether such a change is really -stable
material.  Cc: to Matt Mackall for comment.

It seems at least worth figuring out a way to defer seeding until
after /dev/random reseeding.  (E.g. until first non-loopback
connection is made.)

(There are also both better and faster algorithms than MD5 for
the job, but that's a separate issue.)


Just for example, as long as you're actually willing to spend more CPU
time, you could do *both*.  Compute both a fixed-secret 32-bit value and a
changing-secret 24-bit value and add them together.  Best of both worlds.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Aug. 20, 2011, 11:39 p.m.
(Apologies if I'm obverbroad on the Cc: list.)

I've beeen concerned by the recent change to initial sequence number
generation, from a time-varying 24-bit hash of the endpoint addresses
to a fixed 32-bit hash.

First of all, my apologies that I didn't see this when it was posted
for comment August 7; I only noticed when I tried to merge some local
experiments with -stable and found a conflict in drivers/random.c.

My concern primarily is that the local secret used to compute the hashes
is generated very early in the boot sequence, before any significant
amount of entropy is accumulated.  And since it's constant for the uptime
of the machine, an attacker has a considerable length of time to find
and explot the secret value.

While the increase to 32 bits is definitely desirable, and defends
against a much less sophisticated attack, I'm concerned that this is a
case of robbing Peter to pay Paul.

Trying to improve this, I'm working in a few directions:
1) Postpone the seeding as late in the boot process as possible.
   It's quite low-overhead to generate it only when the first TCP
   connection is made, which hopefully is preceded by running
   init and at least a little bit of device driver activity.

2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
   They can be added together and provide the security advantages of
   both.  The only cost is having to compute two hashes per SYN.

   The main problem here is coming up with a hash function fast enough
   that computing both hashes is no slower than one MD5 invocation.

3) Extend the 24-bit time-varying hash to a 28-bit one.
   This can cause the sequence numbers to wrap in 7/8 of the time
   they would with a fixed offset, but that doesn't seem too bad.
   (That's worst case; it's a triangular distribution centered
   on 15/16.)

It's relatively easy to hash quickly with 15 64-bit registers, but doing
it with 7 32-bit registers is decidedly trickier.

I'm currently playing with a 36-round 6x32-bit variant of the SHA-3
candidate Skein.  I haven't run the genetic algorithm to select optimal
rotation constants, but they shouldn't affect the timing.
(I'm also going to ask the Skein team to look over my work.)

So far, it is notably faster than MD5 (89 ns/hash vs. 148 on a 2.5 GHz
Phenom), as well as being much smaller (383 bytes as opposed to 1951 for
the core transform).  One limitation is that it only hashes 6 32-bit
words per transform.  Thus, IPv6 would need to use two iterations,
or go back to MD5.

As mentioned, we can use a different algorithm for 64-bit processors.
Or even 32-bit ones with more registers.  So the speed problem only
exists for IPv6 on 32-bit x86.

(For example, on a 64-bit processor, two parallel MD5 tranforms
can be computed in barely more time than one.)

A few questions, all related to performance requirements:
* Should I worry about 32-bit x86 performance at all, since it's
  pretty unlikely that a 32-bit machine will be running traffic levels
  (1000+ connections/sec) where it matters?
* Should I worry about 32-bit IPv6 performance, since that's even more
  unlikely to be running heavy loads on 32-bit hardware?
* If yes, is this fast enough to be acceptable, or do I need to work
  harder to find more speed?

Willy, apparently you did some benchmarking of various hash functions.
Is that data available somewhere?  Even if not, just a brief description
of the methodology and assumptions would help to make sure I'm measuring
in a reasonable way.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 20, 2011, 11:44 p.m.
From: "George Spelvin" <linux@horizon.com>
Date: 20 Aug 2011 19:39:51 -0400

> While the increase to 32 bits is definitely desirable, and defends
> against a much less sophisticated attack, I'm concerned that this is a
> case of robbing Peter to pay Paul.

I disagree, attacking this random number selection is much more theoretical
than the brute force attacks possible on 24-bits of entropy.

Show me a usable attack on a real system, then we can talk.

By comparison, real attacks against the 24-bit value have been
demonstrated.

> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>    They can be added together and provide the security advantages of
>    both.  The only cost is having to compute two hashes per SYN.
> 
>    The main problem here is coming up with a hash function fast enough
>    that computing both hashes is no slower than one MD5 invocation.

Doubling the hashing cost is a non-starter.   Going to MD5 itself was
a huge lose, and was right at the brink of acceptable performance loss.

This whole change was nearly nixed because of the cost introduced
merely by going to MD5.

> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>    This can cause the sequence numbers to wrap in 7/8 of the time
>    they would with a fixed offset, but that doesn't seem too bad.
>    (That's worst case; it's a triangular distribution centered
>    on 15/16.)

I want to stay with a 32-bits of entropy, thank you very much.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Aug. 21, 2011, 12:49 a.m.
>> While the increase to 32 bits is definitely desirable, and defends
>> against a much less sophisticated attack, I'm concerned that this is a
>> case of robbing Peter to pay Paul.

> I disagree, attacking this random number selection is much more theoretical
> than the brute force attacks possible on 24-bits of entropy.

Can you explain more precisely what you disagree with?

What you state after the comma appears to be agreeing with what I
wrote (it seems like a restatement of my first two clauses), so I'm
unenlightened as to where the disagreement is.

I'm not saying you didn't address a real problem, just that fixing
one problem exposed another, and it would be nice to address *both*.

> Show me a usable attack on a real system, then we can talk.

If you like.  It's about a week of implementation work.  (And I
don't have 1 week/week of free time, so more than that elapsed.)

> By comparison, real attacks against the 24-bit value have been
> demonstrated.

Anywhere that I can see?

>> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>>    They can be added together and provide the security advantages of
>>    both.  The only cost is having to compute two hashes per SYN.
>> 
>>    The main problem here is coming up with a hash function fast enough
>>    that computing both hashes is no slower than one MD5 invocation.

> Doubling the hashing cost is a non-starter.   Going to MD5 itself was
> a huge lose, and was right at the brink of acceptable performance loss.
> 
> This whole change was nearly nixed because of the cost introduced
> merely by going to MD5.

Okay, I'll make certain a proposed solution is strictly faster than MD5.
I was asking about performance goals, and you've given me an answer.
Thank you very much!

The patch comment was fairly offhand about the performance cost, and
prior discussion was apparently private, so it wasn't clear how much
pain people experienced.

My only other question is whether IPv6 on x86-32 specificaly needs to be
faster than MD5.  Is that negotiable, or is that also a hard limit?
(This is challenging because it's trying to hash 288 bits of address material
in 224 bits of available registers.)

Eureka!  The possible source addresses are very limited.  It's possible to
pre-hash them, then you only have 160 bits of per-connection variability,
which can fit in a second hash block.

This requires finding somewhere in the network stack to store the
pre-hashed IPv6 addresses, as well as a fallback to use when spoofing
other source addresses, but that shouldn't be TOO difficult.

>> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>>    This can cause the sequence numbers to wrap in 7/8 of the time
>>    they would with a fixed offset, but that doesn't seem too bad.
>>    (That's worst case; it's a triangular distribution centered
>>    on 15/16.)

> I want to stay with a 32-bits of entropy, thank you very much.

My goal is to give you *both*.  32 bits fixed + 28 bits time-varying.
An attacker would have to cryptanalyze the 32 bits (which the 28 bits
makes harder) *and* brute-force the 28 bits.

(It's almost certainly simpler to brute-force 32 bits.)


Thank you for your response!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
willy tarreau - Aug. 21, 2011, 1:28 a.m.
Hi George,

On Sat, Aug 20, 2011 at 07:39:51PM -0400, George Spelvin wrote:
(...)
> A few questions, all related to performance requirements:
> * Should I worry about 32-bit x86 performance at all, since it's
>   pretty unlikely that a 32-bit machine will be running traffic levels
>   (1000+ connections/sec) where it matters?

1000 connections per second is a moderately low load even for a
32-bit machine. I'm used to play in the 10-100k/s range on 32-bit,
depending on the usage pattern, I even reached 300k/s on an anti-ddos
machine. So yes, performance matters a lot, especially when we risk
to slow down one small operation that is done many times a second.

> * Should I worry about 32-bit IPv6 performance, since that's even more
>   unlikely to be running heavy loads on 32-bit hardware?

On x86 you're probably right, but there are other very fast platforms
such as ARM, which are used to build routers or appliances, and which
are 32-bit and there it may matter.

> * If yes, is this fast enough to be acceptable, or do I need to work
>   harder to find more speed?

I'd suggest that the most important is no performance regression. Probably
that if you can bring something which brings back what we lost with MD5,
your work would gain interest.

> Willy, apparently you did some benchmarking of various hash functions.
> Is that data available somewhere?  Even if not, just a brief description
> of the methodology and assumptions would help to make sure I'm measuring
> in a reasonable way.

I'm copy-pasting here the memo I exchanged in private after my tests, there
is nothing secret in it, so better post the whole explanation :

-------------------------------------------------------------------------
I did an ugly patch which consists in replacing calls to md5_transform()
with sha_transform() in secure_ip_id(), secure_tcp_sequence_number(),
secure_ipv4_port_ephemeral() on top of David's patches. I kept the same
hashing method, without calling sha_init() and by filling the hash with
net_secret, eg :

@@ -104,28 +107,32 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
 __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
                                 __be16 sport, __be16 dport)
 {
-       u32 hash[MD5_DIGEST_WORDS];
+       u32 hash[SHA_DIGEST_WORDS];
+       u32 workspace[SHA_WORKSPACE_WORDS];

        hash[0] = (__force u32)saddr;
        hash[1] = (__force u32)daddr;
        hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-       hash[3] = net_secret[15];
+       hash[3] = net_secret[14];
+       hash[4] = net_secret[15];

-       md5_transform(hash, net_secret);
+       sha_transform(hash, (const char *)net_secret, workspace);

        return seq_scale(hash[0]);
 }


With this I could run tests on mainline (called "MD4" below), David's code
("MD5") and the transform above ("SHA1"). The tests involved connecting
from the test machine to an external HTTP server and retrieving an empty
object. This test was followed by two other series, one on a server which
immediately resets upon accept (to reproduce the SYN, SYN/ACK, ACK, RST
sequence I'm used to encounter when setting up anti-DDoS filters), and
a SYN, RST sequence caused by sending the traffic to a closed port, in
order to more accurately observe the differences.

I switch the test machine to an Atom N450 running in 64-bit mode in order
to benefit from the SHA1 optimizations.

Numbers are in connections per second.

kernel   http   RST server   closed port
-------+------+------------+------------
 MD4     9610      7840       16950
 MD5     9340      7560       16360
 SHA1    9250      7280       15400

In HTTP, performance drops by 2.8% when switching to MD5, and by 3.75
when using SHA1 instead. With the reset server, MD5 takes a 3.6% hit
and SHA1 7.15%. On the closed port test, which sees only SYN and RST
packets, MD5 takes a 3.5% hit and SHA1 a 9.15% one.

Note that the biggest hit was still the 2.6.35.11 -> 3.0-git upgrade,
because HTTP gives me 10040 cps in 2.6.35.11. I think it's the compiler
and not the kernel : I used to build 2.6.35 with gcc-3.4 but had to
use a more recent toolchain (gcc 4.4) with 3.0 due to cmpxchg16b, and
my experience with gcc has always been a noticeable performance loss
with each new version, so that seems consistent...

All in all, while the SHA1 cost becomes concerning, it could be used
as an alternative to MD5 when we add a sysctl to select between
performance and security.
-------------------------------------------------------------------------

Note that this wasn't the best machine for the test, but it was available
and moreover it required little additional hardware to saturate it ;-)

Best regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Aug. 21, 2011, 3:04 a.m.
>> * Should I worry about 32-bit IPv6 performance, since that's even more
>>   unlikely to be running heavy loads on 32-bit hardware?

> On x86 you're probably right, but there are other very fast platforms
> such as ARM, which are used to build routers or appliances, and which
> are 32-bit and there it may matter.

Thanks for the feedback.  Your point about routers is well-taken.
It's particularly x86-32 which gives me fits, but I'll keep ARM
performance in mind, too.

>> * If yes, is this fast enough to be acceptable, or do I need to work
>>   harder to find more speed?

> I'd suggest that the most important is no performance regression. Probably
> that if you can bring something which brings back what we lost with MD5,
> your work would gain interest.

Okay, I'll go back to the drawing board on performance.

Damn, this is going to be tough.

> I'm copy-pasting here the memo I exchanged in private after my tests, there
> is nothing secret in it, so better post the whole explanation :

Thank you very much.  It helps me figure out what the time budget is.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 21, 2011, 3:27 a.m.
Here's a random thought --- it won't help on anything other than
modern x86's, but who's to say we have to use the same algorithm on
all platforms?  Does the AES-NI facility provide enough of a speedup
that it's worth using it instead of MD5, at least on modern x86
systems which have this support?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - Aug. 21, 2011, 4:02 a.m.
On Sat, Aug 20, 2011 at 11:27:53PM -0400, Ted Ts'o wrote:
> Here's a random thought --- it won't help on anything other than
> modern x86's, but who's to say we have to use the same algorithm on
> all platforms?  Does the AES-NI facility provide enough of a speedup
> that it's worth using it instead of MD5, at least on modern x86
> systems which have this support?

It is fast but it also touches SSE state.

Cheers,
George Spelvin - Aug. 22, 2011, 2:06 a.m.
> Here's a random thought --- it won't help on anything other than
> modern x86's, but who's to say we have to use the same algorithm on
> all platforms?

We absolutely do not.  With 15 64-bit registers, there are a lot more
efficient algorithms available than x86-32.

Here are my current timing experiments trying to find an efficient
hash for that case.  Timings are for 100,000 iterations, so these are
generally in the 100-200 cycles range per operation.

This is 32-bit code, even though it's running on 64-bit machines.


The first 3 are various MD5 implementations (standard, with the k[]
constants in an external array, with the input and K values scheduled
separately so it's a 64-word linear fetch while running).

Then are three similar variants of two-at-a-time MD5 computation.
The lines reporting 0 time are the second outputs.

The last two MD5 timings (9 and 10) are the current half_md4_transform
and twothirdsMD4Transform.  Those are the timing figures to match,
or preferably beat.

ChaCha is 8 rounds, and simply not fast enough.

The Skein192 (to be renamed; it's NOT approved by the designers!) is a
36-round, 6x32-bit variant of the SHA-3 candidate Skein.  0 is C code,
1 and 2 are rolled up assembly variants, 3 and 4 are unrolled assembly,
and 5 is unrolled assembly that also drops the tweak words.  (Which is
hardly important if we're only hashing one block.)

That, as you can see, is about half the time of MD5 and competitive with
the 2/3 MD4.  And probably considerably more secure.  (It's basically a
secure 192-bit hash with zero margin; I could maybe shave another couple
of rounds and still have 128-bit security.)

Shabal is a supposedly-fast SHA-3 candidate that made it to round 2,
just to see what the timing is like.  The second line is a half-size
variant that works on 8 as opposed to 16 words at a time.

2.4 GHz Phenom, hot cache:
MD5 (& half MD4) implementations:
 0:  36912577 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 1:  36937728 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 2:  37354723 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 3:  58866027 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 4:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 5:  96074375 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 6:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 7: 101722571 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 8:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 9:  12321905 cycles  b4b721c6 5635b583 b06c6474 c9871bee
10:  18038018 cycles  336f8820 5565aa9b 0133a23a 0b62780f

ChaCha implementations:
 0:  37345090 cycles  751ddf6a 6977b031 60730a4f c1e2d89e
 1:  30782411 cycles  d5fec2fe a8937844 33da9645 cbff3484

Skein192 implementations:
 0:  30343213 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 1:  22137253 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 2:  22439643 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 3:  18189990 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 4:  19025781 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 5:  17335443 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7

Shabal implementations:
 0: 177585782 cycles  6c192c71 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54
 1:  88048716 cycles  51e5dab0 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54


2.4 GHz Phenom, "cool cache": Run each of the 20 code paths once, then repeat 100,000 times.
MD5 (& half MD4) implementations:
 0:  43746107 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 1:  43682485 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 2:  43554706 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 3:  65376283 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 4:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 5: 105050654 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 6:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 7: 109949761 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 8:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 9:  19284802 cycles  b4b721c6 5635b583 b06c6474 c9871bee
10:  24359312 cycles  336f8820 5565aa9b 0133a23a 0b62780f

ChaCha implementations:
 0:  46379423 cycles  751ddf6a 6977b031 60730a4f c1e2d89e
 1:  38641131 cycles  d5fec2fe a8937844 33da9645 cbff3484

Skein192 implementations:
 0:  38125833 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 1:  31800936 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 2:  29010869 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 3:  25627838 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 4:  25913924 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 5:  24318110 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7

Shabal implementations:
 0: 185723030 cycles  6c192c71 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54
 1:  92834811 cycles  51e5dab0 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54


2.67 GHz i7 (Xeon W3520), hot cache:
MD5 (& half MD4) implementations:
 0:  36919956 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 1:  37156578 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 2:  35044283 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 3:  61506553 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 4:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 5:  97228616 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 6:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 7: 102025434 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 8:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 9:  11890788 cycles  b4b721c6 5635b583 b06c6474 c9871bee
10:  19542853 cycles  336f8820 5565aa9b 0133a23a 0b62780f

ChaCha implementations:
 0:  32864931 cycles  751ddf6a 6977b031 60730a4f c1e2d89e
 1:  28502254 cycles  d5fec2fe a8937844 33da9645 cbff3484

Skein192 implementations:
 0:  28894544 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 1:  25963843 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 2:  26394942 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 3:  22960214 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 4:  22458045 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 5:  21020691 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7

Shabal implementations:
 0: 107339671 cycles  6c192c71 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54
 1:  52250187 cycles  51e5dab0 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54


2.67 GHz i7 (Xeon W3520), cool cache:
MD5 (& half MD4) implementations:
 0:  38680693 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 1:  37812504 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 2:  35577261 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 3:  60584516 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 4:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 5:  96525869 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 6:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 7:  98360974 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 8:         0 cycles  16d174cf 10a7082f e1a2b897 3faddc63
 9:  13622987 cycles  b4b721c6 5635b583 b06c6474 c9871bee
10:  20895171 cycles  336f8820 5565aa9b 0133a23a 0b62780f

ChaCha implementations:
 0:  34210571 cycles  751ddf6a 6977b031 60730a4f c1e2d89e
 1:  29135985 cycles  d5fec2fe a8937844 33da9645 cbff3484

Skein192 implementations:
 0:  29621108 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 1:  27885244 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 2:  26022084 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 3:  22360719 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 4:  24639107 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7
 5:  21072150 cycles  646ce01c 7f2228dd 229a336d 033748b5 0de3a665 c79cf4f7

Shabal implementations:
 0: 105186652 cycles  6c192c71 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54
 1:  57009351 cycles  51e5dab0 ed0912f7 ec4513bb c8f03710 8e4e71b2 5200adff f4f40b0c 81ab7e54
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- drivers/char/random.c.1	2011-08-16 05:02:09.000000000 -0400
+++ drivers/char/random.c.2	2011-08-16 05:02:43.000000000 -0400
@@ -1323,7 +1323,7 @@ 
 
 	hash[0] += current->pid + jiffies + get_cycles();
 	md5_transform(hash, random_int_secret);
-	ret = hash[0];
+	ret = hash[1];
 	put_cpu_var(get_random_int_hash);
 
 	return ret;