Message ID | 20190915081747.GA1058@darwi-home-pc |
---|---|
State | Not Applicable |
Headers | show |
Series | [RFC,v3] random: getrandom(2): optionally block when CRNG is uninitialized | expand |
On So, 15.09.19 10:17, Ahmed S. Darwish (darwish.07@gmail.com) wrote: > Thus, don't trust user-space on calling getrandom(2) from the right > context. Never block, by default, and just return data from the > urandom source if entropy is not yet available. This is an explicit > decision not to let user-space work around this through busy loops on > error-codes. > > Note: this lowers the quality of random data returned by getrandom(2) > to the level of randomness returned by /dev/urandom, with all the > original security implications coming out of that, as discussed in > problem "3." at the top of this commit log. If this is not desirable, > offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or > random.getrandom_block=true bootparam. This is an awful idea. It just means that all crypto that needs entropy doing during early boot will now be using weak keys, and doesn't even know it. Yeah, it's a bad situation, but I am very sure that failing loudly in this case is better than just sticking your head in the sand and ignoring the issue without letting userspace know is an exceptionally bad idea. We live in a world where people run HTTPS, SSH, and all that stuff in the initrd already. It's where SSH host keys are generated, and plenty session keys. If Linux lets all that stuff run with awful entropy then you pretend things where secure while they actually aren't. It's much better to fail loudly in that case, I am sure. Quite frankly, I don't think this is something to fix in the kernel. Let the people putting together systems deal with this. Let them provide a creditable hw rng, and let them pay the price if they don't. Lennart -- Lennart Poettering, Berlin
On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote: > We live in a world where people run HTTPS, SSH, and all that stuff in > the initrd already. It's where SSH host keys are generated, and plenty > session keys. It is exactly the type of crap that create this situation : making people developing such scripts believe that any random source was OK to generate these, and as such forcing urandom to produce crypto-solid randoms! No, distro developers must know that it's not acceptable to generate lifetime crypto keys from the early boot when no entropy is available. At least with this change they will get an error returned from getrandom() and will be able to ask the user to feed entropy, or be able to say "it was impossible to generate the SSH key right now, the daemon will only be started once it's possible", or "the SSH key we produced will not be saved because it's not safe and is only usable for this recovery session". > If Linux lets all that stuff run with awful entropy then > you pretend things where secure while they actually aren't. It's much > better to fail loudly in that case, I am sure. This is precisely what this change permits : fail instead of block by default, and let applications decide based on the use case. > Quite frankly, I don't think this is something to fix in the > kernel. As long as it offers a single API to return randoms, and that it is not possible not to block for low-quality randoms, it needs to be at least addressed there. Then userspace can adapt. For now userspace does not have this option just due to the kernel's way of exposing randoms. Willy
On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote: > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote: > > We live in a world where people run HTTPS, SSH, and all that stuff in > > the initrd already. It's where SSH host keys are generated, and plenty > > session keys. > > It is exactly the type of crap that create this situation : making > people developing such scripts believe that any random source was OK > to generate these, and as such forcing urandom to produce crypto-solid > randoms! Willy, let's tone it down please... the thread is already getting a bit toxic. > No, distro developers must know that it's not acceptable to > generate lifetime crypto keys from the early boot when no entropy is > available. At least with this change they will get an error returned > from getrandom() and will be able to ask the user to feed entropy, or > be able to say "it was impossible to generate the SSH key right now, > the daemon will only be started once it's possible", or "the SSH key > we produced will not be saved because it's not safe and is only usable > for this recovery session". > > > If Linux lets all that stuff run with awful entropy then > > you pretend things where secure while they actually aren't. It's much > > better to fail loudly in that case, I am sure. > > This is precisely what this change permits : fail instead of block > by default, and let applications decide based on the use case. > Unfortunately, not exactly. Linus didn't want getrandom to return an error code / "to fail" in that case, but to silently return CRNG-uninitialized /dev/urandom data, to avoid user-space even working around the error code through busy-loops. I understand the rationale behind that, of course, and this is what I've done so far in the V3 RFC. Nonetheless, this _will_, for example, make systemd-random-seed(8) save week seeds under /var/lib/systemd/random-seed, since the kernel didn't inform it about such weakness at all.. The situation is so bad now, that it's more of "some user-space are more equal than others".. Let's just at least admit this while discussing the RFC patch in question. thanks, > > Quite frankly, I don't think this is something to fix in the > > kernel. > > As long as it offers a single API to return randoms, and that it is > not possible not to block for low-quality randoms, it needs to be > at least addressed there. Then userspace can adapt. For now userspace > does not have this option just due to the kernel's way of exposing > randoms. > > Willy
On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote: > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote: > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote: > > > We live in a world where people run HTTPS, SSH, and all that stuff in > > > the initrd already. It's where SSH host keys are generated, and plenty > > > session keys. > > > > It is exactly the type of crap that create this situation : making > > people developing such scripts believe that any random source was OK > > to generate these, and as such forcing urandom to produce crypto-solid > > randoms! > > Willy, let's tone it down please... the thread is already getting a > bit toxic. I don't see what's wrong in my tone above, I'm sorry if it can be perceived as such. My point was that things such as creating lifetime keys while there's no entropy is the wrong thing to do and what progressively led to this situation. > > > If Linux lets all that stuff run with awful entropy then > > > you pretend things where secure while they actually aren't. It's much > > > better to fail loudly in that case, I am sure. > > > > This is precisely what this change permits : fail instead of block > > by default, and let applications decide based on the use case. > > > > Unfortunately, not exactly. > > Linus didn't want getrandom to return an error code / "to fail" in > that case, but to silently return CRNG-uninitialized /dev/urandom > data, to avoid user-space even working around the error code through > busy-loops. But with this EINVAL you have the information that it only filled the buffer with whatever it could, right ? At least that was the last point I manage to catch in the discussion. Otherwise if it's totally silent, I fear that it will reintroduce the problem in a different form (i.e. libc will say "our randoms are not reliable anymore, let us work around this and produce blocking, solid randoms again to help all our users"). > I understand the rationale behind that, of course, and this is what > I've done so far in the V3 RFC. > > Nonetheless, this _will_, for example, make systemd-random-seed(8) > save week seeds under /var/lib/systemd/random-seed, since the kernel > didn't inform it about such weakness at all.. Then I am confused because I understood that the goal was to return EINVAL or anything equivalent in which case the userspace knows what it has to deal with :-/ Regards, Willy
On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote: > On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote: > > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote: > > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote: [...] > > > > If Linux lets all that stuff run with awful entropy then > > > > you pretend things where secure while they actually aren't. It's much > > > > better to fail loudly in that case, I am sure. > > > > > > This is precisely what this change permits : fail instead of block > > > by default, and let applications decide based on the use case. > > > > > > > Unfortunately, not exactly. > > > > Linus didn't want getrandom to return an error code / "to fail" in > > that case, but to silently return CRNG-uninitialized /dev/urandom > > data, to avoid user-space even working around the error code through > > busy-loops. > > But with this EINVAL you have the information that it only filled > the buffer with whatever it could, right ? At least that was the > last point I manage to catch in the discussion. Otherwise if it's > totally silent, I fear that it will reintroduce the problem in a > different form (i.e. libc will say "our randoms are not reliable > anymore, let us work around this and produce blocking, solid randoms > again to help all our users"). > V1 of the patch I posted did indeed return -EINVAL. Linus then suggested that this might make still some user-space act smart and just busy-loop around that, basically blocking the boot again: https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com So it was then requested to actually return what /dev/urandom would return, so that user-space has no way whatsoever in knowing if getrandom has failed. Then, it's the job of system integratos / BSP builders to fix the inspect the big fat WARN on the kernel and fix that. This is the core of Lennart's critqueue of V3 above. > > I understand the rationale behind that, of course, and this is what > > I've done so far in the V3 RFC. > > > > Nonetheless, this _will_, for example, make systemd-random-seed(8) > > save week seeds under /var/lib/systemd/random-seed, since the kernel > > didn't inform it about such weakness at all.. > > Then I am confused because I understood that the goal was to return > EINVAL or anything equivalent in which case the userspace knows what > it has to deal with :-/ > Yeah, the discussion moved a bit beyond that. thanks, --darwi
On Sun, Sep 15, 2019 at 12:55:39PM +0200, Ahmed S. Darwish wrote: > On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote: > > On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote: > > > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote: > > > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote: > [...] > > > > > If Linux lets all that stuff run with awful entropy then > > > > > you pretend things where secure while they actually aren't. It's much > > > > > better to fail loudly in that case, I am sure. > > > > > > > > This is precisely what this change permits : fail instead of block > > > > by default, and let applications decide based on the use case. > > > > > > > > > > Unfortunately, not exactly. > > > > > > Linus didn't want getrandom to return an error code / "to fail" in > > > that case, but to silently return CRNG-uninitialized /dev/urandom > > > data, to avoid user-space even working around the error code through > > > busy-loops. > > > > But with this EINVAL you have the information that it only filled > > the buffer with whatever it could, right ? At least that was the > > last point I manage to catch in the discussion. Otherwise if it's > > totally silent, I fear that it will reintroduce the problem in a > > different form (i.e. libc will say "our randoms are not reliable > > anymore, let us work around this and produce blocking, solid randoms > > again to help all our users"). > > > > V1 of the patch I posted did indeed return -EINVAL. Linus then > suggested that this might make still some user-space act smart and > just busy-loop around that, basically blocking the boot again: > > https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com > https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com > > So it was then requested to actually return what /dev/urandom would > return, so that user-space has no way whatsoever in knowing if > getrandom has failed. Then, it's the job of system integratos / BSP > builders to fix the inspect the big fat WARN on the kernel and fix > that. Then I was indeed a bit confused in the middle of the discussion as I didn't understand exactly this, thanks for the clarifying :-) But does it still block when called with GRND_RANDOM ? If so I guess I'm fine as it translates exactly the previous behavior of random vs urandom, and that GRND_NONBLOCK allows the application to fall back to reliable sources if needed (typically human interactions). Thanks, Willy
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 3e866885a405..337baeca5ebc 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -557,8 +557,6 @@ config ADI and SSM (Silicon Secured Memory). Intended consumers of this driver include crash and makedumpfile. -endmenu - config RANDOM_TRUST_CPU bool "Trust the CPU manufacturer to initialize Linux's CRNG" depends on X86 || S390 || PPC @@ -573,3 +571,34 @@ config RANDOM_TRUST_CPU has not installed a hidden back door to compromise the CPU's random number generation facilities. This can also be configured at boot with "random.trust_cpu=on/off". + +config RANDOM_BLOCK + bool "Block if getrandom is called before CRNG is initialized" + help + Say Y here if you want userspace programs which call + getrandom(2) before the Cryptographic Random Number + Generator (CRNG) is initialized to block until + secure random numbers are available. + + Say N if you believe usability is more important than + security, so if getrandom(2) is called before the CRNG is + initialized, it should not block, but instead return "best + effort" randomness which might not be very secure or random + at all; but at least the system boot will not be delayed by + minutes or hours. + + This can also be controlled at boot with + "random.getrandom_block=on/off". + + Ideally, systems would be configured with hardware random + number generators, and/or configured to trust CPU-provided + RNG's. In addition, userspace should generate cryptographic + keys only as late as possible, when they are needed, instead + of during early boot. (For non-cryptographic use cases, + such as dictionary seeds or MIT Magic Cookies, other + mechanisms such as /dev/urandom or random(3) may be more + appropropriate.) This config option controls what the + kernel should do as a fallback when the non-ideal case + presents itself. + +endmenu diff --git a/drivers/char/random.c b/drivers/char/random.c index 4a50ee2c230d..689fdb486785 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning = RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3); static struct ratelimit_state urandom_warning = RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3); +static struct ratelimit_state getrandom_warning = + RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3); static int ratelimit_disable __read_mostly; @@ -854,12 +856,19 @@ static void invalidate_batched_entropy(void); static void numa_crng_init(void); static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU); +static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK); static int __init parse_trust_cpu(char *arg) { return kstrtobool(arg, &trust_cpu); } early_param("random.trust_cpu", parse_trust_cpu); +static int __init parse_block(char *arg) +{ + return kstrtobool(arg, &getrandom_block); +} +early_param("random.getrandom_block", parse_block); + static void crng_initialize(struct crng_state *crng) { int i; @@ -1053,6 +1062,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) urandom_warning.missed); urandom_warning.missed = 0; } + if (getrandom_warning.missed) { + pr_notice("random: %d getrandom warning(s) missed " + "due to ratelimiting\n", + getrandom_warning.missed); + getrandom_warning.missed = 0; + } } } @@ -1915,6 +1930,7 @@ int __init rand_initialize(void) crng_global_init_time = jiffies; if (ratelimit_disable) { urandom_warning.interval = 0; + getrandom_warning.interval = 0; unseeded_warning.interval = 0; } return 0; @@ -1984,8 +2000,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) if (!crng_ready() && maxwarn > 0) { maxwarn--; if (__ratelimit(&urandom_warning)) - printk(KERN_NOTICE "random: %s: uninitialized " - "urandom read (%zd bytes read)\n", + pr_err("random: %s: CRNG uninitialized " + "(%zd bytes read)\n", current->comm, nbytes); spin_lock_irqsave(&primary_crng.lock, flags); crng_init_cnt = 0; @@ -2152,9 +2168,16 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count, if (!crng_ready()) { if (flags & GRND_NONBLOCK) return -EAGAIN; - ret = wait_for_random_bytes(); - if (unlikely(ret)) - return ret; + + if (__ratelimit(&getrandom_warning)) + pr_err("random: %s: getrandom (%zd bytes): CRNG not " + "yet initialized", current->comm, count); + + if (getrandom_block) { + ret = wait_for_random_bytes(); + if (unlikely(ret)) + return ret; + } } return urandom_read(NULL, buf, count, NULL); }
Since Linux v3.17, getrandom() has been created as a new and more secure interface for pseudorandom data requests. It attempted to solve three problems as compared to /dev/urandom: 1. the need to access filesystem paths, which can fail, e.g. under a chroot 2. the need to open a file descriptor, which can fail under file descriptor exhaustion attacks 3. the possibility to get not-so-random data from /dev/urandom, due to an incompletely initialized kernel entropy pool To solve the third problem, getrandom(2) was made to block until a proper amount of entropy has been accumulated. This basically made the system call have no guaranteed upper-bound for its waiting time. As was said in c6e9d6f38894 (random: introduce getrandom(2) system call): "Any userspace program which uses this new functionality must take care to assure that if it is used during the boot process, that it will not cause the init scripts or other portions of the system startup to hang indefinitely." Meanwhile, user-facing Linux documentation, e.g. the urandom(4) and getrandom(2) manpages, didn't add such explicit warnings. It didn't also help that glibc, since v2.25, implemented an "OpenBSD-like" getentropy(3) in terms of getrandom(2). OpenBSD getentropy(2) never blocked though, while linux-glibc version did, possibly indefinitely. Since that glibc change, even more applications at the boot-path began to implicitly reques randomness through getrandom(2); e.g., for an Xorg/Xwayland MIT cookie. OpenBSD genentropy(2) never blocked because, as stated in its rnd(4) manpages, it saves entropy to disk on shutdown and restores it on boot. Moreover, the NetBSD bootloader, as shown in its boot(8), even have special commands to load a random seed file and pass it to the kernel. Meanwhile on a Linux systemd userland, systemd-random-seed(8) preserved a random seed across reboots at /var/lib/systemd/random-seed, but it never had the actual code, until very recently at v243, to ask the kernel to credit such entropy through an RNDADDENTROPY ioctl. From a mix of the above factors, it began to be common for Embedded Linux systems to "get stuck at boot" unless a daemon like haveged is installed, or the BSP provider enabling the necessary hwrng driver in question and crediting its entropy; e.g. 62f95ae805fa (hwrng: omap - Set default quality). Over time, the issue began to even creep into consumer-level x86 laptops: mainstream distributions, like debian buster, began to recommend installing haveged as a workaround. Thus, on certain setups where there is no hwrng (embedded systems or VMs on a host lacking virtio-rng), or the hwrng is not trusted by some users (intel RDRAND), or sometimes it's just broken (amd RDRAND), the system boot can be *reliably* blocked. It can therefore be argued that there is no way to use getrandom() on Linux correctly, especially from shared libraries: GRND_NONBLOCK has to be used, and a fallback to some other interface like /dev/urandom is required, thus making the net result no better than just using /dev/urandom unconditionally. The issue is further exaggerated by recent file-system optimizations, e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which merges directory lookup code inode table IO, and thus minimizes the number of disk interrupts and entropy during boot. After that commit, a blocked boot can be reliably reproduced on a Thinkpad E480 laptop with standard ArchLinux user-space. Thus, don't trust user-space on calling getrandom(2) from the right context. Never block, by default, and just return data from the urandom source if entropy is not yet available. This is an explicit decision not to let user-space work around this through busy loops on error-codes. Note: this lowers the quality of random data returned by getrandom(2) to the level of randomness returned by /dev/urandom, with all the original security implications coming out of that, as discussed in problem "3." at the top of this commit log. If this is not desirable, offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or random.getrandom_block=true bootparam. [tytso@mit.edu: make the change to a non-blocking getrandom(2) optional] Link: https://lkml.kernel.org/r/20190914222432.GC19710@mit.edu Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu Link: https://factorable.net ("Widespread Weak Keys in Network Devices") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com Rreported-by: Ahmed S. Darwish <darwish.07@gmail.com> Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> --- Notes: changelog-v2: - tytso: make blocking optional changelog-v3: - more detailed commit log + historical context (thanks patrakov) - remove WARN_ON_ONCE. It's pretty excessive, and the first caller is systemd-random-seed(8), which we know it will not change. Just print errors in the kernel log. $dmesg | grep random: [0.235843] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0 [0.685682] random: fast init done [2.405263] random: lvm: CRNG uninitialized (4 bytes read) [2.480686] random: systemd-random-: getrandom (512 bytes): CRNG not yet initialized [2.480687] random: systemd-random-: CRNG uninitialized (512 bytes read) [3.265201] random: dbus-daemon: CRNG uninitialized (12 bytes read) [3.835066] urandom_read: 1 callbacks suppressed [3.835068] random: polkitd: CRNG uninitialized (8 bytes read) [3.835509] random: polkitd: CRNG uninitialized (8 bytes read) [3.835577] random: polkitd: CRNG uninitialized (8 bytes read) [4.190653] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized [4.190658] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized [4.190662] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized [4.952299] random: crng init done [4.952311] random: 3 urandom warning(s) missed due to ratelimiting [4.952314] random: 1 getrandom warning(s) missed due to ratelimiting drivers/char/Kconfig | 33 +++++++++++++++++++++++++++++++-- drivers/char/random.c | 33 ++++++++++++++++++++++++++++----- 2 files changed, 59 insertions(+), 7 deletions(-)