[RFC,v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
diff mbox series

Message ID 20190915052242.GG19710@mit.edu
State New
Headers show
Series
  • [RFC,v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
Related show

Commit Message

Theodore Y. Ts'o Sept. 15, 2019, 5:22 a.m. UTC
getrandom() has been created as a new and more secure interface for
pseudorandom data requests.  Unlike /dev/urandom, it unconditionally
blocks until the entropy pool has been properly initialized.

While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been abusing it by issuing the syscall, from shared
libraries no less, during the main system boot sequence.

Thus, on certain setups where there is no hwrng (embedded), or the
hwrng is not trusted by some users (intel RDRAND), or sometimes it's
just broken (amd RDRAND), the system boot can be *reliably* blocked.

The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
merges directory lookup code inode table IO, and thus minimizes the
number of disk interrupts and entropy during boot. After that commit,
a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
with standard ArchLinux user-space.

Thus, add an optional configuration option which stops getrandom(2)
from blocking, but instead returns "best efforts" randomness, which
might not be random or secure at all.  This can be controlled via
random.getrandom_block boot command line option, and the
CONFIG_RANDOM_BLOCK can be used to set the default to be blocking.
Since according to the Great Penguin, only incompetent system
designers would value "security" ahead of "usability", the default is
to be non-blocking.

In addition, modify getrandom(2) to complain loudly with a kernel
warning when some userspace process is erroneously calling
getrandom(2) too early during the boot process.

Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org

[ Modified by tytso@mit.edu to make the change of getrandom(2) to be
  non-blocking to be optional. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---

Here's my take on the patch.  I really very strongly believe that the
idea of making getrandom(2) non-blocking and to blindly assume that we
can load up the buffer with "best efforts" randomness to be a
terrible, terrible idea that is going to cause major security problems
that we will potentially regret very badly.  Linus Torvalds believes I
am an incompetent systems designer.

So let's do it both ways, and push the decision on the distributor
and/or product manufacturer

 drivers/char/Kconfig  | 33 +++++++++++++++++++++++++++++++--
 drivers/char/random.c | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 60 insertions(+), 7 deletions(-)

Comments

Linus Torvalds Sept. 15, 2019, 5:32 p.m. UTC | #1
[ Added Lennart, who was active in the other thread ]

On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> Thus, add an optional configuration option which stops getrandom(2)
> from blocking, but instead returns "best efforts" randomness, which
> might not be random or secure at all.

So I hate having a config option for something like this.

How about this attached patch instead? It only changes the waiting
logic, and I'll quote the comment in full, because I think that
explains not only the rationale, it explains every part of the patch
(and is most of the patch anyway):

 * We refuse to wait very long for a blocking getrandom().
 *
 * The crng may not be ready during boot, but if you ask for
 * blocking random numbers very early, there is no guarantee
 * that you'll ever get any timely entropy.
 *
 * If you are sure you need entropy and that you can generate
 * it, you need to ask for non-blocking random state, and then
 * if that fails you must actively _do_something_ that causes
 * enough system activity, perhaps asking the user to type
 * something on the keyboard.
 *
 * Just asking for blocking random numbers is completely and
 * fundamentally wrong, and the kernel will not play that game.
 *
 * We will block for at most 15 seconds at a time, and if called
 * sequentially will decrease the blocking amount so that we'll
 * block for at most 30s total - and if people continue to ask
 * for blocking, at that point we'll just return whatever random
 * state we have acquired.
 *
 * This will also complain loudly if the timeout happens, to let
 * the distribution or system admin know about the problem.
 *
 * The process that gets the -EAGAIN will hopefully also log the
 * error, to raise awareness that there may be use of random
 * numbers without sufficient entropy.

Hmm? No strange behavior. No odd config variables. A bounded total
boot-time wait of 30s (which is a completely random number, but I
claimed it as the "big red button" time).

And if you only do it once and fall back to something else it will
only wait for 15s, and you'll have your error value so that you can
log it properly.

Yes, a single boot-time wait of 15s at boot is still "darn annoying",
but it likely

 (a) isn't so long that people consider it a boot failure and give up
(but hopefully annoying enough that they'll report it)

 (b) long enough that *if* the thing that is waiting is not actually
blocking the boot sequence, the non-blocked part of the boot sequence
should have time to do sufficient IO to get better randomness.

So (a) is the "the system is still usable" part. While (b) is the
"give it a chance, and even if it fails and you fall back on urandom
or whatever, you'll actually be getting good randomness even if we
can't perhaps _guarantee_ entropy".

Also, if you have some user that wants to do the old-timey ssh-keygen
thing with user input etc, we now have a documented way to do that:
just do the nonblocking thing, and then make really really sure that
you actually have something that generates more entropy if that
nonblocking thing returns EAGAIN. But it's also very clear that at
that point the program that wants this entropy guarantee has to _work_
for it.

Because just being lazy and say "block" without any entropy will
return EAGAIN for a (continually decreasing) while, but then at some
point stop and say "you're broken", and just give you the urandom
data.

Because if you really do nothing at all, and there is no activity
what-so-ever for 15s because you blocked the boot, then I claim that
it's better to return an error than to wait forever. And if you ignore
the error and just retry, eventually we'll do the fallback for you.

Of course, if you have something like rdrand, and told us you trust
it, none of this matters at all, since we'll have initialized the pool
long before.

So this is unconditional, but it's basically "unconditionally somewhat
flexibly reasonable". It should only ever trigger for the case where
the boot sequence was fundamentally broken. And it will complain
loudly (both at a kernel level, and hopefully at a systemd journal
level too) if it ever triggers.

And hey, if some distro wants to then revert this because they feel
uncomfortable with this, that's now _their_ problem, not the problem
of the upstream kernel. The upstream kernel tries to do something that
I think is arguably fairly reasonable in all situations.

                 Linus
Willy Tarreau Sept. 15, 2019, 6:32 p.m. UTC | #2
On Sun, Sep 15, 2019 at 10:32:15AM -0700, Linus Torvalds wrote:
>  * We will block for at most 15 seconds at a time, and if called
>  * sequentially will decrease the blocking amount so that we'll
>  * block for at most 30s total - and if people continue to ask
>  * for blocking, at that point we'll just return whatever random
>  * state we have acquired.

I think that the exponential decay will either not be used or
be totally used, so in practice you'll always end up with 0 or
30s depending on the entropy situation, because I really do not
see any valid reason for entropy to suddenly start to appear
after 15s if it didn't prior to this. As such I do think that
a single timeout should be enough.

In addition, since you're leaving the door open to bikeshed around
the timeout valeue, I'd say that while 30s is usually not huge in a
desktop system's life, it actually is a lot in network environments
when it delays a switchover. It can cause other timeouts to occur
and leave quite a long embarrassing black out. I'd guess that a max
total wait time of 2-3s should be OK though since application timeouts
rarely are lower due to TCP generally starting to retransmit at 3s.
And even in 3s we're supposed to see quite some interrupts or it's
unlikely that much more will happen between 3 and 30s.

If the setting had to be made user-changeable then it could make
sense to let it be overridden on the kernel's command line though
I don't think that it should be necessary with a low enough value.

Thanks,
Willy
Willy Tarreau Sept. 15, 2019, 6:36 p.m. UTC | #3
I also wanted to ask, are we going to enforce the same strategy on
/dev/urandom ? If we don't because we fear application breakage or
whatever, then there will always be some incentive against migrating
to getrandom(). And if we do it, we know we have to take a reasonable
approach making the change transparent enough for applications. That
would too go in favor of a short timeout.

Willy
Linus Torvalds Sept. 15, 2019, 6:59 p.m. UTC | #4
On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <w@1wt.eu> wrote:
>
> I think that the exponential decay will either not be used or
> be totally used, so in practice you'll always end up with 0 or
> 30s depending on the entropy situation

According to the systemd random-seed source snippet that Ahmed posted,
it actually just tries once (well, first once non-blocking, then once
blocking) and then falls back to reading urandom if it fails.

So assuming there's just one of those "read much too early" cases, I
think it actually matters.

But while I tried to test this, on my F30 install, systemd seems to
always just use urandom().

I can trigger the urandom read warning easily enough (turn of CPU
rdrand trusting and increase the entropy requirement by a factor of
ten, and turn of the ioctl to add entropy from user space), just not
the getrandom() blocking case at all.

So presumably that's because I have a systemd that doesn't use
getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
Or maybe because Arch has some other oddity that just triggers the
problem.

> In addition, since you're leaving the door open to bikeshed around
> the timeout valeue, I'd say that while 30s is usually not huge in a
> desktop system's life, it actually is a lot in network environments
> when it delays a switchover.

Oh, absolutely.

But in that situation you have a MIS person on call, and somebody who
can fix it.

It's not like switchovers happen in a vacuum. What we should care
about is that updating a kernel _works_. No regressions. But if you
have some five-nines setup with switchover, you'd better have some
competent MIS people there too. You don't just switch kernels without
testing ;)

                 Linus
Linus Torvalds Sept. 15, 2019, 7:08 p.m. UTC | #5
On Sun, Sep 15, 2019 at 11:37 AM Willy Tarreau <w@1wt.eu> wrote:
>
> I also wanted to ask, are we going to enforce the same strategy on
> /dev/urandom ?

Right now the strategy for /dev/urandom is "print a one-line warning,
then do the read".

I don't see why we should change that. The whole point of urandom has
been that it doesn't block, and doesn't use up entropy.

It's the _blocking_ behavior that has always been problematic. It's
why almost nobody uses /dev/random in practice.

getrandom() looks like /dev/urandom in not using up entropy, but had
that blocking behavior of /dev/random that was problematic.

And exactly the same way it was problematic for /dev/random users, it
has now shown itself to be problematic for getrandom().

My suggested patch left the /dev/random blocking behavior, because
hopefully people *know* about the problems there.

And hopefully people understand that getrandom(GRND_RANDOM) has all
the same issues.

If you want that behavior, you can still use GRND_RANDOM or
/dev/random, but they are simply not acceptable for boot-time
schenarios. Never have been,

... exactly the way the "block forever" wasn't acceptable for getrandom().

                Linus
Willy Tarreau Sept. 15, 2019, 7:12 p.m. UTC | #6
On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> > In addition, since you're leaving the door open to bikeshed around
> > the timeout valeue, I'd say that while 30s is usually not huge in a
> > desktop system's life, it actually is a lot in network environments
> > when it delays a switchover.
> 
> Oh, absolutely.
> 
> But in that situation you have a MIS person on call, and somebody who
> can fix it.
> 
> It's not like switchovers happen in a vacuum. What we should care
> about is that updating a kernel _works_. No regressions. But if you
> have some five-nines setup with switchover, you'd better have some
> competent MIS people there too. You don't just switch kernels without
> testing ;)

I mean maybe I didn't use the right term, but typically in networked
environments you'll have watchdogs on sensitive devices (e.g. the
default gateways and load balancers), which will trigger an instant
reboot of the system if something really bad happens. It can range
from a dirty oops, FS remounted R/O, pure freeze, OOM, missing
process, panic etc. And here the reset which used to take roughly
10s to get the whole services back up for operations suddenly takes
40s. My point is that I won't have issues explaining users that 10s
or 13s is the same when they rely on five nices, but trying to argue
that 40s is identical to 10s will be a hard position to stand by.

And actually there are other dirty cases. Such systems often work
in active-backup or active-active modes. One typical issue is that
the primary system reboots, the second takes over within one second,
and once the primary system is back *apparently* operating, some
processes which appear to be present and which possibly have already
bound their listening ports are waiting for 30s in getrandom() while
the monitoring systems around see them as ready, thus the primary
machine goes back to its role and cannot reliably run the service
for the first 30 seconds, which roughly multiplies the downtime by
30. That's why I'd like to make it possible to lower it this value
(either definitely or by cmdline, as I think it can be fine for
all those who care about down time).

Willy
Willy Tarreau Sept. 15, 2019, 7:18 p.m. UTC | #7
On Sun, Sep 15, 2019 at 12:08:31PM -0700, Linus Torvalds wrote:
> My suggested patch left the /dev/random blocking behavior, because
> hopefully people *know* about the problems there.
> 
> And hopefully people understand that getrandom(GRND_RANDOM) has all
> the same issues.

I think this one doesn't cause any issue to users. It's the only
one that should be used for long-lived crypto keys in my opinion.

> If you want that behavior, you can still use GRND_RANDOM or
> /dev/random, but they are simply not acceptable for boot-time
> schenarios.

Oh no I definitely don't want this behavior at all for urandom, what
I'm saying is that as long as getrandom() will have a lower quality
of service than /dev/urandom for non-important randoms, there will be
compelling reasons to avoid it. And I think that your bounded wait
could actually reconciliate both ends of the users spectrum, those
who want excellent randoms to run tetris and those who don't care
to always play the same party on every boot because they just want
to play. And by making /dev/urandom behave like getrandom() we could
actually tell users "both are now exactly the same, you have no valid
reason anymore not to use the new API". And it forces us to remain
very reasonable in getrandom() so that we don't break old applications
that relied on urandom to be fast.

Willy
Linus Torvalds Sept. 15, 2019, 7:31 p.m. UTC | #8
On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@1wt.eu> wrote:
>
> Oh no I definitely don't want this behavior at all for urandom, what
> I'm saying is that as long as getrandom() will have a lower quality
> of service than /dev/urandom for non-important randoms

Ahh, here you're talking about the fact that it can block at all being
"lower quality".

I do agree that getrandom() is doing some odd things. It has the
"total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
it has no mode of replacing /dev/urandom.

So if you want the /dev/urandom bvehavior, then no, getrandom() simply
has never given you that.

Use /dev/urandom if you want that.

Sad, but there it is. We could have a new flag (GRND_URANDOM) that
actually gives the /dev/urandom behavior. But the ostensible reason
for getrandom() was the blocking for entropy. See commit c6e9d6f38894
("random: introduce getrandom(2) system call") from back in 2014.

The fact that it took five years to hit this problem is probably due
to two reasons:

 (a) we're actually pretty good about initializing the entropy pool
fairly quickly most of the time

 (b) people who started using 'getrandom()' and hit this issue
presumably then backed away from it slowly and just used /dev/urandom
instead.

So it needed an actual "oops, we don't get as much entropy from the
filesystem accesses" situation to actually turn into a problem. And
presumably the people who tried out things like nvdimm filesystems
never used Arch, and never used a sufficiently new systemd to see the
"oh, without disk interrupts you don't get enough randomness to boot".

One option is to just say that GRND_URANDOM is the default (ie never
block, do the one-liner log entry to warn) and add a _new_ flag that
says "block for entropy". But if we do that, then I seriously think
that the new behavior should have that timeout limiter.

For 5.3, I'll just revert the ext4 change, stupid as that is. That
avoids the regression, even if it doesn't avoid the fundamental
problem. And gives us time to discuss it.

                 Linus
Willy Tarreau Sept. 15, 2019, 7:54 p.m. UTC | #9
On Sun, Sep 15, 2019 at 12:31:42PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@1wt.eu> wrote:
> >
> > Oh no I definitely don't want this behavior at all for urandom, what
> > I'm saying is that as long as getrandom() will have a lower quality
> > of service than /dev/urandom for non-important randoms
> 
> Ahh, here you're talking about the fact that it can block at all being
> "lower quality".
> 
> I do agree that getrandom() is doing some odd things. It has the
> "total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
> it has no mode of replacing /dev/urandom.

Yep but with your change it's getting better.

> So if you want the /dev/urandom bvehavior, then no, getrandom() simply
> has never given you that.
> 
> Use /dev/urandom if you want that.

It's not available in chroot, which is the main driver for getrandom()
I guess.

> Sad, but there it is. We could have a new flag (GRND_URANDOM) that
> actually gives the /dev/urandom behavior. But the ostensible reason
> for getrandom() was the blocking for entropy. See commit c6e9d6f38894
> ("random: introduce getrandom(2) system call") from back in 2014.

Oh I definitely know it's been a long debate.

> The fact that it took five years to hit this problem is probably due
> to two reasons:
> 
>  (a) we're actually pretty good about initializing the entropy pool
> fairly quickly most of the time
> 
>  (b) people who started using 'getrandom()' and hit this issue
> presumably then backed away from it slowly and just used /dev/urandom
> instead.

We've hit it the hard way more than a year ago already, when openssl
adopted getrandom() instead of urandom for certain low-importance
things in order to work better in chroots and/or avoid fd leaks. And
even openssl had to work around these issues in multiple iterations
(I don't remember how however).

> So it needed an actual "oops, we don't get as much entropy from the
> filesystem accesses" situation to actually turn into a problem. And
> presumably the people who tried out things like nvdimm filesystems
> never used Arch, and never used a sufficiently new systemd to see the
> "oh, without disk interrupts you don't get enough randomness to boot".

In my case the whole system is in the initramfs and the only accesses
to the flash are to read the config. So that's pretty a limited source
of interrupts for a headless system ;-)

> One option is to just say that GRND_URANDOM is the default (ie never
> block, do the one-liner log entry to warn) and add a _new_ flag that
> says "block for entropy". But if we do that, then I seriously think
> that the new behavior should have that timeout limiter.

I think the timeout is a good thing to do, but it would be nice to
let the application know that what was provided was probably not as
good as expected (well if the application wants real random, it
should use GRND_RANDOM).

> For 5.3, I'll just revert the ext4 change, stupid as that is. That
> avoids the regression, even if it doesn't avoid the fundamental
> problem. And gives us time to discuss it.

It's sad to see that being excessive on randomness leads to forcing
totally unrelated subsystem to be less efficient :-(

Willy
Ahmed S. Darwish Sept. 16, 2019, 2:45 a.m. UTC | #10
On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <w@1wt.eu> wrote:
> >
> > I think that the exponential decay will either not be used or
> > be totally used, so in practice you'll always end up with 0 or
> > 30s depending on the entropy situation
> 
> According to the systemd random-seed source snippet that Ahmed posted,
> it actually just tries once (well, first once non-blocking, then once
> blocking) and then falls back to reading urandom if it fails.
> 
> So assuming there's just one of those "read much too early" cases, I
> think it actually matters.
>

Just a quick note, the snippest I posted:

    https://lkml.kernel.org/r/20190914150206.GA2270@darwi-home-pc

is not PID 1.

It's just a lowly process called "systemd-random-seed". Its main
reason of existence is to load/restore a random seed file from and to
disk across reboots (just like what sysv scripts did).

The reason I posted it was to show that if we change getrandom() to
silently return weak crypto instead of blocking or an error code,
systemd-random-seed will break: it will save the resulting data to
disk, then even _credit_ it (if asked to) in the next boot cycle
through RNDADDENTROPY.

> But while I tried to test this, on my F30 install, systemd seems to
> always just use urandom().
> 
> I can trigger the urandom read warning easily enough (turn of CPU
> rdrand trusting and increase the entropy requirement by a factor of
> ten, and turn of the ioctl to add entropy from user space), just not
> the getrandom() blocking case at all.
>

Yeah, because the problem was/is not with systemd :)

It is GDM/gnome-session which was blocking the graphical boot process.

Regarding reproducing the issue, through a quick trace_prink, all of
below processes are calling getrandom() on my Arch system at boot:

    https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc

The fatal call was gnome-session's one, because gnome didn't continue
_its own_ boot due to this blockage.

> So presumably that's because I have a systemd that doesn't use
> getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
> Or maybe because Arch has some other oddity that just triggers the
> problem.
>

It seems Arch is good at triggering this. For example, here is a
another Arch user on a Thinkpad (different model than mine), also with
GDM getting blocked on entropy:

    https://bbs.archlinux.org/viewtopic.php?id=248035
    
    "As you can see, the system is literally waiting a half minute for
    something - up until crng init is done"

(The NetworkManager logs are just noise. I also had them, but completely
 disabling NetworkManager didn't do anything .. just made the logs
 cleaner)

thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com
Lennart Poettering Sept. 16, 2019, 6:08 p.m. UTC | #11
On So, 15.09.19 10:32, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> [ Added Lennart, who was active in the other thread ]
>
> On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > Thus, add an optional configuration option which stops getrandom(2)
> > from blocking, but instead returns "best efforts" randomness, which
> > might not be random or secure at all.
>
> So I hate having a config option for something like this.
>
> How about this attached patch instead? It only changes the waiting
> logic, and I'll quote the comment in full, because I think that
> explains not only the rationale, it explains every part of the patch
> (and is most of the patch anyway):
>
>  * We refuse to wait very long for a blocking getrandom().
>  *
>  * The crng may not be ready during boot, but if you ask for
>  * blocking random numbers very early, there is no guarantee
>  * that you'll ever get any timely entropy.
>  *
>  * If you are sure you need entropy and that you can generate
>  * it, you need to ask for non-blocking random state, and then
>  * if that fails you must actively _do_something_ that causes
>  * enough system activity, perhaps asking the user to type
>  * something on the keyboard.

You are requesting a UI change here. Maybe the kernel shouldn't be the
one figuring out UI.

I mean, as I understand you are unhappy with behaviour you saw on
systemd systems; we can certainly improve behaviour of systemd in
userspace alone, i.e. abort the getrandom() after a while in userspace
and log about it using typical userspace logging to the console. I am
not sure why you want to do all that in the kernel, the kernel isn't
great at user interaction, and really shouldn't be.

If all you want is abort the getrandom() after 30s and a friendly
message on screen, by all means, let's add that to systemd, I have
zero problem with that. systemd has infrastructure for pushing that to
the user, the kernel doesn't really have that so nicely.

It appears to me you subscribe too much to an idea that userspace
people are not smart enough and couldn't implement something like
this. Turns out we can though, and there's no need to add logic that
appears to follow the logic of "never trust userspace"...

i.e. why not just consider this all just a feature request for the
systemd-random-seed.service, i.e. the service you saw the issue with
to handle this on its own?

> Hmm? No strange behavior. No odd config variables. A bounded total
> boot-time wait of 30s (which is a completely random number, but I
> claimed it as the "big red button" time).

As mentioned, in systemd's case, updating the random seed on disk
is entirely fine to take 5h or so. I don't really think we really need
to bound this in kernel space.

Lennart

--
Lennart Poettering, Berlin
Willy Tarreau Sept. 16, 2019, 7:16 p.m. UTC | #12
On Mon, Sep 16, 2019 at 08:08:01PM +0200, Lennart Poettering wrote:
> I mean, as I understand you are unhappy with behaviour you saw on
> systemd systems; we can certainly improve behaviour of systemd in
> userspace alone, i.e. abort the getrandom() after a while in userspace
> and log about it using typical userspace logging to the console. I am
> not sure why you want to do all that in the kernel, the kernel isn't
> great at user interaction, and really shouldn't be.

Because the syscall will have the option to return what random data
was available in this case, while if you try to fix it only from
within systemd you currently don't even get that data.

> It appears to me you subscribe too much to an idea that userspace
> people are not smart enough and couldn't implement something like
> this. Turns out we can though, and there's no need to add logic that
> appears to follow the logic of "never trust userspace"...

I personally see this very differently. If randoms were placed into a
kernel compared to other operating systems doing everything in userspace,
it's in part because it requires to collect data very widely to gather
some entropy and that no isolated userspace alone can collect as much
as the kernel. Or they each have to reimplement their own method, each
with their own bugs, instead of fixing them all at a single place. All
applications need random, there's no reason for having to force them
all to implement them in detail.

Willy

Patch
diff mbox series

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3e866885a405..337baeca5ebc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -557,8 +557,6 @@  config ADI
 	  and SSM (Silicon Secured Memory).  Intended consumers of this
 	  driver include crash and makedumpfile.
 
-endmenu
-
 config RANDOM_TRUST_CPU
 	bool "Trust the CPU manufacturer to initialize Linux's CRNG"
 	depends on X86 || S390 || PPC
@@ -573,3 +571,34 @@  config RANDOM_TRUST_CPU
 	has not installed a hidden back door to compromise the CPU's
 	random number generation facilities. This can also be configured
 	at boot with "random.trust_cpu=on/off".
+
+config RANDOM_BLOCK
+	bool "Block if getrandom is called before CRNG is initialized"
+	help
+	  Say Y here if you want userspace programs which call
+	  getrandom(2) before the Cryptographic Random Number
+	  Generator (CRNG) is initialized to block until
+	  secure random numbers are available.
+
+	  Say N if you believe usability is more important than
+	  security, so if getrandom(2) is called before the CRNG is
+	  initialized, it should not block, but instead return "best
+	  effort" randomness which might not be very secure or random
+	  at all; but at least the system boot will not be delayed by
+	  minutes or hours.
+
+	  This can also be controlled at boot with
+	  "random.getrandom_block=on/off".
+
+	  Ideally, systems would be configured with hardware random
+	  number generators, and/or configured to trust CPU-provided
+	  RNG's.  In addition, userspace should generate cryptographic
+	  keys only as late as possible, when they are needed, instead
+	  of during early boot.  (For non-cryptographic use cases,
+	  such as dictionary seeds or MIT Magic Cookies, other
+	  mechanisms such as /dev/urandom or random(3) may be more
+	  appropropriate.)  This config option controls what the
+	  kernel should do as a fallback when the non-ideal case
+	  presents itself.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..243fb4a4535f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@  static struct ratelimit_state unseeded_warning =
 	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
 static struct ratelimit_state urandom_warning =
 	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+	RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3);
 
 static int ratelimit_disable __read_mostly;
 
@@ -854,12 +856,19 @@  static void invalidate_batched_entropy(void);
 static void numa_crng_init(void);
 
 static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK);
 static int __init parse_trust_cpu(char *arg)
 {
 	return kstrtobool(arg, &trust_cpu);
 }
 early_param("random.trust_cpu", parse_trust_cpu);
 
+static int __init parse_block(char *arg)
+{
+	return kstrtobool(arg, &getrandom_block);
+}
+early_param("random.getrandom_block", parse_block);
+
 static void crng_initialize(struct crng_state *crng)
 {
 	int		i;
@@ -1045,6 +1054,12 @@  static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 				  urandom_warning.missed);
 			urandom_warning.missed = 0;
 		}
+		if (getrandom_warning.missed) {
+			pr_notice("random: %d getrandom warning(s) missed "
+				  "due to ratelimiting\n",
+				  getrandom_warning.missed);
+			getrandom_warning.missed = 0;
+		}
 	}
 }
 
@@ -1900,6 +1915,7 @@  int __init rand_initialize(void)
 	crng_global_init_time = jiffies;
 	if (ratelimit_disable) {
 		urandom_warning.interval = 0;
+		getrandom_warning.interval = 0;
 		unseeded_warning.interval = 0;
 	}
 	return 0;
@@ -1969,8 +1985,8 @@  urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	if (!crng_ready() && maxwarn > 0) {
 		maxwarn--;
 		if (__ratelimit(&urandom_warning))
-			printk(KERN_NOTICE "random: %s: uninitialized "
-			       "urandom read (%zd bytes read)\n",
+			pr_err("random: %s: CRNG uninitialized "
+			       "(%zd bytes read)\n",
 			       current->comm, nbytes);
 		spin_lock_irqsave(&primary_crng.lock, flags);
 		crng_init_cnt = 0;
@@ -2135,9 +2151,17 @@  SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
+		WARN_ON_ONCE(1);
+		if (getrandom_block) {
+			if (__ratelimit(&getrandom_warning))
+				pr_err("random: %s: getrandom blocking for CRNG initialization\n",
+				       current->comm);
+			ret = wait_for_random_bytes();
+			if (unlikely(ret))
+				return ret;
+		} else if (__ratelimit(&getrandom_warning))
+			pr_err("random: %s: getrandom called too early\n",
+			       current->comm);
 	}
 	return urandom_read(NULL, buf, count, NULL);
 }