[RFC] Support explicit_bzero, memset_s, memzero_explicit, or similar.
diff mbox

Message ID CAKDKvuzWYf3GcXYs4ED8XLyy58nzmvxRV84xwsKKZjPpVSFQug@mail.gmail.com
State New
Headers show

Commit Message

Nick Mathewson Dec. 15, 2014, 2:21 p.m. UTC
Hello!

(This is my first attempt at a glibc RFC/patch post.  I'm hoping I can
do more in the future, if this works out.)


It's frequently desirable in secure C programming to cause a piece
of memory to be erased in a way that the compiler is not allowed to
optimize away.

For example, if you're writing a program that needs to perform a
single operation with a high-value password cryptographic key, you
might want to make sure that the key is no longer resident in memory
once you're done with it.  But this pattern is inadequate to do so:

    int password_is_okay(void)
    {
       int result = -1;
       char passwordbuf[MAX_PASSWORD + 1];
       if (read_password(passwordbuf, MAX_PASSWORD) < 0)
         goto err;
       if (password_matches(passwordbuf) < 0)
         goto err;

       result = 0;
     err:
       memset(passwordbuf, 0, sizeof(passwordbuf));
       return result;
    }

The pattern above fails because the memset call is a dead store, and
the compiler is allowed to eliminate it.


Different operating systems, libraries, and standards have
approached this problem differently.  Windows provides
      SecureZeroMemory(void *, size_t);
The BSD family is moving towards
      explicit_bzero(void *, size_t);
And the C11 standard defines
      memset_s(void *, rsize_t, int, size_t);
And OpenSSL (along with LibreSSL and BoringSSL) provide:
      OPENSSL_cleanse(void *, size_t);
And in the Linux kernel these says, they're using:
      memzero_explicit(void *, size_t);


These APIs are in wide use where available:
     http://codesearch.debian.net/results/memset_s/page_0
     http://codesearch.debian.net/results/explicit_bzero/page_0
especially in cryptographic and security-related tools.  In the
absence of these APIs, programs typically try to cobble their own
implementations together -- typically using some combination of
volatile and explicit barriers.


I propose that glibc add such an API: either memset_s (which has a
standard backing it), or explicit_bzero (which has been around
longer, and is pretty widely used), or memzero_explicit (if we feel
strongly that even mentioning bzero is deprecated.)


Now, at first glance it would seem that that memset_s() is the
obvious choice, since it's the one supported by a standard.  But
it's a part of a much larger pile of bounds-checking variants of
other C functions (C11 Annex K).  It's *possible* to
implement memset_s() without the rest of Annex K, but it requires a
bit of infrastructure to be fully compliant.  (Like, we would need
an rsize_t, and a set_constraint_handler_s().  I do not believe we
would need a full implementation of Annex K.)

It's possible to do a non-compliant memset_s() implementation, and
not unprecedented.  The OSX version, for instance, doesn't include
set_constraint_handler_s(), or any other Annex K functions as far as
I can tell.

On the other hand, if we think that memset_s() is the only useful
part of Annex K, maybe memset_s() isn't the variant for us?



I'm including an example manpage for a somewhat-compliant
memset_s(), and one for an explicit_bzero.

I'm also attaching an incomplete explicit_bzero implementation patch
-- this is my first glibc patch, so I have probably made many
mistakes.  I can write an memset_s one instead if needed.


I hereby dedicate this email, the patch attached to it, and all
later versions of this patch written by me to the public domain.


========== variant 1
NAME
       memset_s - fill memory with a constant byte, securely.

SYNOPSIS

       #define __STDC_WANT_LIB_EXT1__ 1
       #include <string.h>

       errno_t memset_s(void *s, rsize_t szmax, int c, rsize_t n);

DESCRIPTION
       The memset() function fills the first n bytes of the memory
       area pointed to by s with the constant byte c.

       The szmax value indicates the length of the memory area at s;
       it must be no smaller than the value of n.  If it is, a
       runtime constraint violation occurs.

       Unlike memset(), the memset_s() function will not be removed
       by the compiler, even if it can prove that there are no
       subsequent legal reads to the memory in s.

RETURN VALUE
       The memset_s() function zero if no constraint violation
       occurred.  Otherwise, it returns an error.

CONFORMING TO
       C11 Annex K specifies a memset_s().  This implementation
       does not provide set_constraint_handler_s().

SEE ALSO
       bzero(3), memset(3)
==========




========== variant 2
NAME
       explicit_bzero - fill memory with zero, securely.

SYNOPSIS
       #include <string.h>

       void explicit_bzero(void *s, size_t n);

DESCRIPTION
       The explicit_bzero() function sets the first n bytes of the
       memory area starting at 's' to zero (bytes containing '\0').

       Unlike memset() and the regular bzero() function, the
       explicit_bzero() function will not be removed by the compiler,
       even if it can prove that there are no subsequent legal reads
       to the memory in s.

RETURN VALUE
       None.

CONFORMING TO
       explicit_bzero() first appeared in OpenBSD 5.5.

SEE ALSO
       bzero(3), memset(3)
==========

Comments

Rich Felker Dec. 15, 2014, 5:35 p.m. UTC | #1
On Mon, Dec 15, 2014 at 09:21:07AM -0500, Nick Mathewson wrote:
> Hello!
> 
> (This is my first attempt at a glibc RFC/patch post.  I'm hoping I can
> do more in the future, if this works out.)
> 
> 
> It's frequently desirable in secure C programming to cause a piece
> of memory to be erased in a way that the compiler is not allowed to
> optimize away.
> 
> For example, if you're writing a program that needs to perform a
> single operation with a high-value password cryptographic key, you
> might want to make sure that the key is no longer resident in memory
> once you're done with it.  But this pattern is inadequate to do so:
> 
>     int password_is_okay(void)
>     {
>        int result = -1;
>        char passwordbuf[MAX_PASSWORD + 1];
>        if (read_password(passwordbuf, MAX_PASSWORD) < 0)
>          goto err;
>        if (password_matches(passwordbuf) < 0)
>          goto err;
> 
>        result = 0;
>      err:
>        memset(passwordbuf, 0, sizeof(passwordbuf));
>        return result;
>     }
> 
> The pattern above fails because the memset call is a dead store, and
> the compiler is allowed to eliminate it.
> 
> 
> Different operating systems, libraries, and standards have
> approached this problem differently.  Windows provides
>       SecureZeroMemory(void *, size_t);
> The BSD family is moving towards
>       explicit_bzero(void *, size_t);
> And the C11 standard defines
>       memset_s(void *, rsize_t, int, size_t);
> And OpenSSL (along with LibreSSL and BoringSSL) provide:
>       OPENSSL_cleanse(void *, size_t);
> And in the Linux kernel these says, they're using:
>       memzero_explicit(void *, size_t);

None of these solve the problem, because the compiler is free to have
copied part of all of this buffer into other temporary storage on the
stack or registers. This is especially the case if SIMD optimizations
are used. Solving the problem (which isn't really even a problem, just
a hardening consideration) correctly requires compiler features.

> Now, at first glance it would seem that that memset_s() is the
> obvious choice, since it's the one supported by a standard.  But
> it's a part of a much larger pile of bounds-checking variants of
> other C functions (C11 Annex K).  It's *possible* to
> implement memset_s() without the rest of Annex K, but it requires a
> bit of infrastructure to be fully compliant.  (Like, we would need
> an rsize_t, and a set_constraint_handler_s().  I do not believe we
> would need a full implementation of Annex K.)

errno_t is defined (by Annex K) as int and rsize_t is defined as
size_t, so we could just omit these types entirely and use the
standard types (int and size_t) and still not conflict with the Annex
K definition. There's no reason to use or even provide the ugly and
useless MS/Annex K typedefs.

> It's possible to do a non-compliant memset_s() implementation, and
> not unprecedented.  The OSX version, for instance, doesn't include
> set_constraint_handler_s(), or any other Annex K functions as far as
> I can tell.
> 
> On the other hand, if we think that memset_s() is the only useful
> part of Annex K, maybe memset_s() isn't the variant for us?

There are some other functions from Annex K that are possibly
desirable to adopt like qsort_s. IMO an ideal response to Annex K
would be to adopt the useful/usable subset of it, without stupid
things like constraint handlers, and aim to get those standardized in
POSIX while the rest is left to rot (and maybe even get deprecated in
a future version of ISO C).

Rich
Paul Eggert Dec. 15, 2014, 9:29 p.m. UTC | #2
On 12/15/2014 09:35 AM, Rich Felker wrote:
> None of these solve the problem, because the compiler is free to have
> copied part of all of this buffer into other temporary storage on the
> stack or registers.

Yes, this is not something that can be solved just at the C library 
level.  It's a big problem, that requires OS and compiler support (and 
maybe even hardware support).  See, for example, Anikeev et al's paper 
on secure garbage collection 
<http://dx.doi.org/10.1016/j.jisa.2014.10.001> or Chow et al's classic 
paper on shredding one's garbage 
<https://www.usenix.org/legacy/event/sec05/tech/full_papers/chow/chow_html/>. 
<http://dx.doi.org/10.1016/j.jisa.2014.10.001>

By the way, shouldn't one set memory to a random bitpattern rather than 
simply clearing it?
<http://dx.doi.org/10.1016/j.jisa.2014.10.001>
Rich Felker Dec. 15, 2014, 9:38 p.m. UTC | #3
On Mon, Dec 15, 2014 at 01:29:59PM -0800, Paul Eggert wrote:
> On 12/15/2014 09:35 AM, Rich Felker wrote:
> >None of these solve the problem, because the compiler is free to have
> >copied part of all of this buffer into other temporary storage on the
> >stack or registers.
> 
> Yes, this is not something that can be solved just at the C library
> level.  It's a big problem, that requires OS and compiler support
> (and maybe even hardware support).  See, for example, Anikeev et
> al's paper on secure garbage collection
> <http://dx.doi.org/10.1016/j.jisa.2014.10.001> or Chow et al's
> classic paper on shredding one's garbage <https://www.usenix.org/legacy/event/sec05/tech/full_papers/chow/chow_html/>.
> <http://dx.doi.org/10.1016/j.jisa.2014.10.001>
> 
> By the way, shouldn't one set memory to a random bitpattern rather
> than simply clearing it?
> <http://dx.doi.org/10.1016/j.jisa.2014.10.001>

Only if you also wrap your device in tinfoil. Seriously, these
measures are about preventing information disclosure in the event of
subsequent compromise of the process (or possibly even normal behavior
following setuid() to drop privs), not defending against physical
attacks to recover previous state of memory/disks. If you agree with
my assessment of the scope, it's solvable purely at the compiler level
without any special OS or hardware level support. This is because you
don't have to ensure that the information is fully destroyed, just
that the process can no longer access it, and you get to assume all
the normal kernel- and hardware-enforced access controls apply.

Rich
Nick Mathewson Dec. 15, 2014, 10:57 p.m. UTC | #4
(sorry; sent this before, but from the wrong address.)

On Mon, Dec 15, 2014 at 12:35 PM, Rich Felker <dalias@libc.org> wrote:

Thanks for the reply, Rich!

>> Different operating systems, libraries, and standards have
>> approached this problem differently.  Windows provides
>>       SecureZeroMemory(void *, size_t);
>> The BSD family is moving towards
>>       explicit_bzero(void *, size_t);
>> And the C11 standard defines
>>       memset_s(void *, rsize_t, int, size_t);
>> And OpenSSL (along with LibreSSL and BoringSSL) provide:
>>       OPENSSL_cleanse(void *, size_t);
>> And in the Linux kernel these says, they're using:
>>       memzero_explicit(void *, size_t);
>
> None of these solve the problem, because the compiler is free to have
> copied part of all of this buffer into other temporary storage on the
> stack or registers. This is especially the case if SIMD optimizations
> are used. Solving the problem (which isn't really even a problem, just
> a hardening consideration) correctly requires compiler features.

I agree that a *complete* solution would require compiler features.
But nonetheless, a non-removable memset() equivalent is useful for
some hardening.  At  the very least, we could make sure that the
documentation explains this point.

It's not clear to me from your response, whether you think including
an explicit_bzero() or a memset_s() implementation is "worse than
nothing" because of the register and spilling issue, or whether you
think it would be worth providing anyway with appropriate caveats?

 [...]
>> Now, at first glance it would seem that that memset_s() is the
>> obvious choice, since it's the one supported by a standard.  But
>> it's a part of a much larger pile of bounds-checking variants of
>> other C functions (C11 Annex K).  It's *possible* to
>> implement memset_s() without the rest of Annex K, but it requires a
>> bit of infrastructure to be fully compliant.  (Like, we would need
>> an rsize_t, and a set_constraint_handler_s().  I do not believe we
>> would need a full implementation of Annex K.)
>
> errno_t is defined (by Annex K) as int and rsize_t is defined as
> size_t, so we could just omit these types entirely and use the
> standard types (int and size_t) and still not conflict with the Annex
> K definition. There's no reason to use or even provide the ugly and
> useless MS/Annex K typedefs.

Attached is an approximate (probably buggy, I am new to glibc)
implementation of the memset_s variant. I dedicate it (and this email)
to the public domain.

yrs,
Paul Eggert Dec. 16, 2014, 4:20 a.m. UTC | #5
Rich Felker wrote:

> Only if you also wrap your device in tinfoil.

Perhaps; but if the original request is for a tinfoil beanie, it's not 
unreasonable to suggest getting the full hat.

> these measures are about preventing information disclosure in the event of
> subsequent compromise of the process (or possibly even normal behavior
> following setuid() to drop privs), not defending against physical
> attacks to recover previous state of memory/disks.

If only things were so simple!  With virtual machines, there's not always a 
clean distinction between a compromise and a physical attack.

> If you agree with
> my assessment of the scope, it's solvable purely at the compiler level
> without any special OS or hardware level support.

Possibly, but there might be some assumptions involved that do require some 
support from lower levels.  It's something that would have to be vetted.  (As 
far as I know this has not been done for C.)

Anyway, I think we're in agreement that the suggested (library-only) approach 
does not suffice.
Rich Felker Dec. 16, 2014, 5:26 a.m. UTC | #6
On Mon, Dec 15, 2014 at 05:57:02PM -0500, Nick Mathewson wrote:
> (sorry; sent this before, but from the wrong address.)
> 
> On Mon, Dec 15, 2014 at 12:35 PM, Rich Felker <dalias@libc.org> wrote:
> 
> Thanks for the reply, Rich!
> 
> >> Different operating systems, libraries, and standards have
> >> approached this problem differently.  Windows provides
> >>       SecureZeroMemory(void *, size_t);
> >> The BSD family is moving towards
> >>       explicit_bzero(void *, size_t);
> >> And the C11 standard defines
> >>       memset_s(void *, rsize_t, int, size_t);
> >> And OpenSSL (along with LibreSSL and BoringSSL) provide:
> >>       OPENSSL_cleanse(void *, size_t);
> >> And in the Linux kernel these says, they're using:
> >>       memzero_explicit(void *, size_t);
> >
> > None of these solve the problem, because the compiler is free to have
> > copied part of all of this buffer into other temporary storage on the
> > stack or registers. This is especially the case if SIMD optimizations
> > are used. Solving the problem (which isn't really even a problem, just
> > a hardening consideration) correctly requires compiler features.
> 
> I agree that a *complete* solution would require compiler features.
> But nonetheless, a non-removable memset() equivalent is useful for
> some hardening.  At  the very least, we could make sure that the
> documentation explains this point.
> 
> It's not clear to me from your response, whether you think including
> an explicit_bzero() or a memset_s() implementation is "worse than
> nothing" because of the register and spilling issue, or whether you
> think it would be worth providing anyway with appropriate caveats?

I don't think it's necessarily "worse than nothing" as long as it's
documented that it does not solve the problem.

>  [...]
> >> Now, at first glance it would seem that that memset_s() is the
> >> obvious choice, since it's the one supported by a standard.  But
> >> it's a part of a much larger pile of bounds-checking variants of
> >> other C functions (C11 Annex K).  It's *possible* to
> >> implement memset_s() without the rest of Annex K, but it requires a
> >> bit of infrastructure to be fully compliant.  (Like, we would need
> >> an rsize_t, and a set_constraint_handler_s().  I do not believe we
> >> would need a full implementation of Annex K.)
> >
> > errno_t is defined (by Annex K) as int and rsize_t is defined as
> > size_t, so we could just omit these types entirely and use the
> > standard types (int and size_t) and still not conflict with the Annex
> > K definition. There's no reason to use or even provide the ugly and
> > useless MS/Annex K typedefs.
> 
> Attached is an approximate (probably buggy, I am new to glibc)
> implementation of the memset_s variant. I dedicate it (and this email)
> to the public domain.

I don't think the implementation as written is valid -- at least, not
if you allow LTO. The compiler barrier does not prevent the memset
from being optimized out unless the address of the buffer being memset
has been leaked to code the compiler cannot see. As long as it sees
that the asm has no way of observing the output of the memset, it can
optimize out the memset. Simply making the memset buffer visible to
the asm by passing its address (or better yet, it as a memory object)
in an asm constraint would probably fix this, but I'd like to have
someone from the GCC side confirm this.

Rich
Ondřej Bílka Dec. 16, 2014, 10:16 a.m. UTC | #7
On Tue, Dec 16, 2014 at 12:26:17AM -0500, Rich Felker wrote:
> On Mon, Dec 15, 2014 at 05:57:02PM -0500, Nick Mathewson wrote:
> > (sorry; sent this before, but from the wrong address.)
> > 
> > On Mon, Dec 15, 2014 at 12:35 PM, Rich Felker <dalias@libc.org> wrote:
> > 
> > Thanks for the reply, Rich!
> > 
> > >> Different operating systems, libraries, and standards have
> > >> approached this problem differently.  Windows provides
> > >>       SecureZeroMemory(void *, size_t);
> > >> The BSD family is moving towards
> > >>       explicit_bzero(void *, size_t);
> > >> And the C11 standard defines
> > >>       memset_s(void *, rsize_t, int, size_t);
> > >> And OpenSSL (along with LibreSSL and BoringSSL) provide:
> > >>       OPENSSL_cleanse(void *, size_t);
> > >> And in the Linux kernel these says, they're using:
> > >>       memzero_explicit(void *, size_t);
> > >
> > > None of these solve the problem, because the compiler is free to have
> > > copied part of all of this buffer into other temporary storage on the
> > > stack or registers. This is especially the case if SIMD optimizations
> > > are used. Solving the problem (which isn't really even a problem, just
> > > a hardening consideration) correctly requires compiler features.
> > 
> > I agree that a *complete* solution would require compiler features.
> > But nonetheless, a non-removable memset() equivalent is useful for
> > some hardening.  At  the very least, we could make sure that the
> > documentation explains this point.
> > 
> > It's not clear to me from your response, whether you think including
> > an explicit_bzero() or a memset_s() implementation is "worse than
> > nothing" because of the register and spilling issue, or whether you
> > think it would be worth providing anyway with appropriate caveats?
> 
> I don't think it's necessarily "worse than nothing" as long as it's
> documented that it does not solve the problem.
> 
> >  [...]
> > >> Now, at first glance it would seem that that memset_s() is the
> > >> obvious choice, since it's the one supported by a standard.  But
> > >> it's a part of a much larger pile of bounds-checking variants of
> > >> other C functions (C11 Annex K).  It's *possible* to
> > >> implement memset_s() without the rest of Annex K, but it requires a
> > >> bit of infrastructure to be fully compliant.  (Like, we would need
> > >> an rsize_t, and a set_constraint_handler_s().  I do not believe we
> > >> would need a full implementation of Annex K.)
> > >
> > > errno_t is defined (by Annex K) as int and rsize_t is defined as
> > > size_t, so we could just omit these types entirely and use the
> > > standard types (int and size_t) and still not conflict with the Annex
> > > K definition. There's no reason to use or even provide the ugly and
> > > useless MS/Annex K typedefs.
> > 
> > Attached is an approximate (probably buggy, I am new to glibc)
> > implementation of the memset_s variant. I dedicate it (and this email)
> > to the public domain.
> 
> I don't think the implementation as written is valid -- at least, not
> if you allow LTO. The compiler barrier does not prevent the memset
> from being optimized out unless the address of the buffer being memset
> has been leaked to code the compiler cannot see. As long as it sees
> that the asm has no way of observing the output of the memset, it can
> optimize out the memset. Simply making the memset buffer visible to
> the asm by passing its address (or better yet, it as a memory object)
> in an asm constraint would probably fix this, but I'd like to have
> someone from the GCC side confirm this.
> 
To do that it would require improve compiler a lot. Compiler now cannot 
optimize anything about shared library calls because using different
version of library could break any assumption it makes. To make it
working you would need add runtime verification that assumptions
compiler made about library still hold.

Now LTO could only affect static-linking of libc if we made LTOing libc
working.

If you are worried about that create new library that will be compiled
separately without LTO.
Richard Henderson Dec. 16, 2014, 4:24 p.m. UTC | #8
On 12/15/2014 11:26 PM, Rich Felker wrote:
> I don't think the implementation as written is valid -- at least, not
> if you allow LTO. The compiler barrier does not prevent the memset
> from being optimized out unless the address of the buffer being memset
> has been leaked to code the compiler cannot see. As long as it sees
> that the asm has no way of observing the output of the memset, it can
> optimize out the memset. Simply making the memset buffer visible to
> the asm by passing its address (or better yet, it as a memory object)
> in an asm constraint would probably fix this, but I'd like to have
> someone from the GCC side confirm this.

I believe a simple memory clobber (without even passing the buffer address)
should be sufficient.  The memory clobber is a very large hammer, indicating
that *all* memory is both read and written.  Thus the memset cannot be dead,
because its results may be read by the asm.


r~
Rich Felker Dec. 16, 2014, 6:01 p.m. UTC | #9
On Tue, Dec 16, 2014 at 10:24:17AM -0600, Richard Henderson wrote:
> On 12/15/2014 11:26 PM, Rich Felker wrote:
> > I don't think the implementation as written is valid -- at least, not
> > if you allow LTO. The compiler barrier does not prevent the memset
> > from being optimized out unless the address of the buffer being memset
> > has been leaked to code the compiler cannot see. As long as it sees
> > that the asm has no way of observing the output of the memset, it can
> > optimize out the memset. Simply making the memset buffer visible to
> > the asm by passing its address (or better yet, it as a memory object)
> > in an asm constraint would probably fix this, but I'd like to have
> > someone from the GCC side confirm this.
> 
> I believe a simple memory clobber (without even passing the buffer address)
> should be sufficient.  The memory clobber is a very large hammer, indicating
> that *all* memory is both read and written.  Thus the memset cannot be dead,
> because its results may be read by the asm.

But in the case of a function like:

	int foo = 42;
	memset(&foo, 0, sizeof foo);
	__asm__ __volatile__ ( "" : : : "memory" );

after analysis based on compiler knowledge of the memset function, the
object foo is not "memory" because its address has never leaked. Since
the asm cannot see it, it can be optimized out to never exist at all
independently of handling the asm. If this weren't the case, presence
of asm with memory clobbers anywhere in the whole program would
prevent the compiler from optimizing out objects like this anywhere in
the whole program, which is obviously not correct.

Note that you can easily get code just like the above when memcpy_s is
inlined via LTO. I disagree with Ondřej's approach of just assuming
LTO won't happen (dynamic linking) or intentionally suppressing it.
The code should just be written with semantically correct compiler
barriers so it doesn't matter and so we're not assuming external calls
provide the needed compiler barriers.

Rich
Alexander Monakov Dec. 16, 2014, 6:08 p.m. UTC | #10
On Tue, 16 Dec 2014, Rich Felker wrote:

> On Tue, Dec 16, 2014 at 10:24:17AM -0600, Richard Henderson wrote:
> > On 12/15/2014 11:26 PM, Rich Felker wrote:
> > > I don't think the implementation as written is valid -- at least, not
> > > if you allow LTO. The compiler barrier does not prevent the memset
> > > from being optimized out unless the address of the buffer being memset
> > > has been leaked to code the compiler cannot see. As long as it sees
> > > that the asm has no way of observing the output of the memset, it can
> > > optimize out the memset. Simply making the memset buffer visible to
> > > the asm by passing its address (or better yet, it as a memory object)
> > > in an asm constraint would probably fix this, but I'd like to have
> > > someone from the GCC side confirm this.
> > 
> > I believe a simple memory clobber (without even passing the buffer address)
> > should be sufficient.  The memory clobber is a very large hammer, indicating
> > that *all* memory is both read and written.  Thus the memset cannot be dead,
> > because its results may be read by the asm.
> 
> But in the case of a function like:
> 
> 	int foo = 42;
> 	memset(&foo, 0, sizeof foo);
> 	__asm__ __volatile__ ( "" : : : "memory" );

Note that the asm in the proposed patch is written in a different way: it
accepts the address of the memset 'dest' pointer.  I don't understand why you
said "as long the asm has no way of observing the output of memset" while it
clearly can do so via the passed pointer.

Alexander

> after analysis based on compiler knowledge of the memset function, the
> object foo is not "memory" because its address has never leaked. Since
> the asm cannot see it, it can be optimized out to never exist at all
> independently of handling the asm. If this weren't the case, presence
> of asm with memory clobbers anywhere in the whole program would
> prevent the compiler from optimizing out objects like this anywhere in
> the whole program, which is obviously not correct.
Rich Felker Dec. 16, 2014, 7:19 p.m. UTC | #11
On Tue, Dec 16, 2014 at 09:08:55PM +0300, Alexander Monakov wrote:
> > But in the case of a function like:
> > 
> > 	int foo = 42;
> > 	memset(&foo, 0, sizeof foo);
> > 	__asm__ __volatile__ ( "" : : : "memory" );
> 
> Note that the asm in the proposed patch is written in a different way: it
> accepts the address of the memset 'dest' pointer.  I don't understand why you
> said "as long the asm has no way of observing the output of memset" while it
> clearly can do so via the passed pointer.

I looked at the asm a couple times and somehow missed seeing that.
Sorry for the noise. I believe it's correct as written then.

Rich
Florian Weimer Dec. 17, 2014, 12:32 p.m. UTC | #12
On 12/16/2014 07:01 PM, Rich Felker wrote:

> But in the case of a function like:
>
> 	int foo = 42;
> 	memset(&foo, 0, sizeof foo);
> 	__asm__ __volatile__ ( "" : : : "memory" );
>
> after analysis based on compiler knowledge of the memset function, the
> object foo is not "memory" because its address has never leaked. Since
> the asm cannot see it, it can be optimized out to never exist at all
> independently of handling the asm. If this weren't the case, presence
> of asm with memory clobbers anywhere in the whole program would
> prevent the compiler from optimizing out objects like this anywhere in
> the whole program, which is obviously not correct.

Curiously, this is the case where use of memset_s would *force* the 
creation of a copy of the value, just so that it can be zapped immediately:

   <https://gcc.gnu.org/ml/gcc-help/2014-10/msg00071.html>
Andreas Schwab Dec. 17, 2014, 1:19 p.m. UTC | #13
Florian Weimer <fweimer@redhat.com> writes:

> Curiously, this is the case where use of memset_s would *force* the
> creation of a copy of the value, just so that it can be zapped
> immediately:

Rather it stops the copy from being elided.

Andreas.
Nick Mathewson Dec. 21, 2014, 4:08 p.m. UTC | #14
On Mon, Dec 15, 2014 at 9:21 AM, Nick Mathewson <nickm@freehaven.net> wrote:
> Hello!
>
> (This is my first attempt at a glibc RFC/patch post.  I'm hoping I can
> do more in the future, if this works out.)
>

So, where does this stand now?  I sent an updated patch to implement
the memset_s, and if I'm reading the consensus right, people think
that the implementation is correct, and the functionality is
standards-conformant and probably better than nothing.

What should I do next to turn this ticket into something that can be merged?

best wishes,
Paul Eggert Dec. 21, 2014, 7:33 p.m. UTC | #15
Nick Mathewson wrote:

> if I'm reading the consensus right, people think
> that the implementation is correct, and the functionality is
> standards-conformant and probably better than nothing.

I don't see that consensus.  As I recall, the proposed functionality doesn't 
conform to C11 Annex K, and it's not clear (to me at least) that the patch is 
better than nothing.

As I understand it the most favored suggestion in the past has been that Annex K 
functionality should be packaged into a separate library libc_s; see, for 
example <https://sourceware.org/ml/libc-alpha/2014-08/msg00141.html>.  You might 
want to look into efforts along those lines, e.g., slibc 
<https://code.google.com/p/slibc/>.
Rich Felker Dec. 22, 2014, 7:46 p.m. UTC | #16
On Sun, Dec 21, 2014 at 11:33:17AM -0800, Paul Eggert wrote:
> Nick Mathewson wrote:
> 
> >if I'm reading the consensus right, people think
> >that the implementation is correct, and the functionality is
> >standards-conformant and probably better than nothing.
> 
> I don't see that consensus.  As I recall, the proposed functionality
> doesn't conform to C11 Annex K, and it's not clear (to me at least)
> that the patch is better than nothing.

It doesn't conflict with Annex K in any way. IMO there's a huge
difference in adding an interface that conflicts with he requriements
of Annex K (even if glibc doesn't want to support Annex K now or for
the forseeable future), and adding one which simply does not support
all the 'features' specified in Annex K but matches the semantics for
the features it does support.

> As I understand it the most favored suggestion in the past has been
> that Annex K functionality should be packaged into a separate
> library libc_s; see, for example
> <https://sourceware.org/ml/libc-alpha/2014-08/msg00141.html>.  You
> might want to look into efforts along those lines, e.g., slibc
> <https://code.google.com/p/slibc/>.

I don't recall any such consensus, and I don't see any value of
putting something in a separate library file unless it can actually be
an independent package from libc, which Annex K can't (due to
requirements in the headers and interactions with stdio internals,
etc.). I hope we can agree we've come to regret this when it's
happened before, e.g. clock_gettime being in librt and locking
functions being in libpthread, and other such messes which have led to
invalid hacks in applications and third-party libraries (like weak
references to pthread symbols and busybox's use of syscall() to get
clock_gettime).

So far I'm not aware of any consensus that glibc should aim to
support Annex K at all, and I hope it stays that way.

Rich
Nick Mathewson Dec. 22, 2014, 8:33 p.m. UTC | #17
On Mon, Dec 22, 2014 at 2:46 PM, Rich Felker <dalias@libc.org> wrote:
> On Sun, Dec 21, 2014 at 11:33:17AM -0800, Paul Eggert wrote:
>> Nick Mathewson wrote:
>>
>> >if I'm reading the consensus right, people think
>> >that the implementation is correct, and the functionality is
>> >standards-conformant and probably better than nothing.
>>
>> I don't see that consensus.  As I recall, the proposed functionality
>> doesn't conform to C11 Annex K, and it's not clear (to me at least)
>> that the patch is better than nothing.
>
> It doesn't conflict with Annex K in any way. IMO there's a huge
> difference in adding an interface that conflicts with he requriements
> of Annex K (even if glibc doesn't want to support Annex K now or for
> the forseeable future), and adding one which simply does not support
> all the 'features' specified in Annex K but matches the semantics for
> the features it does support.

Incidentally, there are some places in glibc which, in order to match
the programmers' intent, probably need functionality like this today.

For example, see the memset calls at the end of:
     __md5_crypt_r
     __sha256_crypt_r
     __sha512_crypt_r

The documented intent there is to clear data from the stack before
returning, but the compiler is free to remove them IIUC.


(As I said before, it doesn't make a big difference to me whether a
non-removable memset is called memset_s() or whether it's called
explicit_bzero() or whether it gets another name entirely.)


cheers,
Paul Eggert Dec. 23, 2014, 2:21 a.m. UTC | #18
Nick Mathewson wrote:
> For example, see the memset calls at the end of:
>       __md5_crypt_r
>       __sha256_crypt_r
>       __sha512_crypt_r
>
> The documented intent there is to clear data from the stack before
> returning, but the compiler is free to remove them IIUC.

Yes, that sounds like it could be a problem.  Have you inspected the generated 
code to see whether the problem actually occurs?  That would strengthen the case 
for a buffer-smushing primitive, even if it's only internal to glibc.  (I still 
see no reason for guaranteeing that any such primitive sets the buffer's bytes 
to zero -- any value will do so long as it's independent of the buffer's 
previous contents.)
Paul Eggert Dec. 23, 2014, 2:21 a.m. UTC | #19
Rich Felker wrote:
> I don't recall any such consensus,

Well, to be fair I didn't call it a consensus, only a "most favored suggestion". 
  I agree it didn't have consensus.

> there's a huge
> difference in adding an interface that conflicts with he requriements
> of Annex K (even if glibc doesn't want to support Annex K now or for
> the forseeable future), and adding one which simply does not support
> all the 'features' specified in Annex K but matches the semantics for
> the features it does support.

That goes only so far.  Suppose the spec for plain traditional strcpy were in 
Annex K, and glibc supplied a strcpy implementation that mostly worked, except 
it failed to copy the trailing null byte.  I doubt whether users would be 
satisfied with the justification "well, this doesn't *conflict* with the spec, 
and it's *partial* support for the standard".  Similarly for a memset_s that 
fails to throw an exception if you invoke it with invalid arguments.
Rich Felker Dec. 23, 2014, 3 a.m. UTC | #20
On Mon, Dec 22, 2014 at 06:21:50PM -0800, Paul Eggert wrote:
> Rich Felker wrote:
> >I don't recall any such consensus,
> 
> Well, to be fair I didn't call it a consensus, only a "most favored
> suggestion".  I agree it didn't have consensus.
> 
> >there's a huge
> >difference in adding an interface that conflicts with he requriements
> >of Annex K (even if glibc doesn't want to support Annex K now or for
> >the forseeable future), and adding one which simply does not support
> >all the 'features' specified in Annex K but matches the semantics for
> >the features it does support.
> 
> That goes only so far.  Suppose the spec for plain traditional
> strcpy were in Annex K, and glibc supplied a strcpy implementation
> that mostly worked, except it failed to copy the trailing null byte.
> I doubt whether users would be satisfied with the justification
> "well, this doesn't *conflict* with the spec, and it's *partial*
> support for the standard".  Similarly for a memset_s that fails to
> throw an exception if you invoke it with invalid arguments.

Annex K does not specify that it "throws an exception" (what would
that mean?) or aborts the program or anything. The default runtime
constraint handler is implementation-defined, and just returning an
error is one option. This is one reason why Annex K is basically
unusable: you can't rely on any particular behavior for what happens
on a runtime constraint violation unless you install your own handler,
but the handler is global (not even thread-local) so you can't even
safely install it on a temporary basis and switch it back.

This level of misdesign is at most barely-tolerable in legacy
interfaces that are being supported in continuance of a historical
practice, but it's utterly absurd to add new interfaces like this.

In any case I think you misinterpreted my concept of conflicting vs
not-incompatible. A strcpy that doesn't null-terminate would be
*conflicting* in my sense, because it never produces the specified
behavior unless the destination string is already zero-filled. On the
other hand, the proposed memcpy_s would never be observably different
from the spec in Annex K except in a program which attempted to set
the constraint handler, which would not even successfully compile/link
since this function is missing.

Rich

Patch
diff mbox

commit 5dd7fa4810de0b5ae150af4d7eb1e6672fafb70d
Author: Nick Mathewson <nickm@torproject.org>
Date:   Mon Dec 15 09:14:48 2014 -0500

    Add new explicit_bzero() function

diff --git a/debug/tst-chk1.c b/debug/tst-chk1.c
index 3393153..9085e3e 100644
--- a/debug/tst-chk1.c
+++ b/debug/tst-chk1.c
@@ -283,6 +283,10 @@  do_test (void)
   CHK_FAIL_END
 
   CHK_FAIL_START
+  explicit_bzero (buf + 9, l0 + 2);
+  CHK_FAIL_END
+
+  CHK_FAIL_START
   strcpy (buf + 5, str1 + 5);
   CHK_FAIL_END
 
diff --git a/include/string.h b/include/string.h
index 034e295..64340be 100644
--- a/include/string.h
+++ b/include/string.h
@@ -100,6 +100,7 @@  libc_hidden_builtin_proto (mempcpy)
 libc_hidden_builtin_proto (memcmp)
 libc_hidden_builtin_proto (memmove)
 libc_hidden_builtin_proto (memset)
+libc_hidden_builtin_proto (bzero)
 libc_hidden_builtin_proto (strcat)
 libc_hidden_builtin_proto (strchr)
 libc_hidden_builtin_proto (strcmp)
diff --git a/manual/string.texi b/manual/string.texi
index ba5a2c7..9fa60d8 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -1057,6 +1057,17 @@  BSD.  Note that it is not as general as @code{memset}, because the only
 value it can store is zero.
 @end deftypefun
 
+@comment string.h
+@comment BSD
+@deftypefun void explicit_bzero (void *@var{block}, size_t @var{size})
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+This function behaves as @code{bzero} or as @code{memset(block, 0, size)},
+except that the compiler is not permitted to perform dead-assignment
+elimination on it.  It can be useful in cryptographic code that needs to
+clear sensitive data from the stack.
+@end deftypefun
+
+
 @node String/Array Comparison
 @section String/Array Comparison
 @cindex comparing strings and arrays
diff --git a/string/Makefile b/string/Makefile
index 98c2961..6fd9726 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -33,6 +33,7 @@  routines	:= strcat strchr strcmp strcoll strcpy strcspn		\
 		   strrchr strpbrk strsignal strspn strstr strtok	\
 		   strtok_r strxfrm memchr memcmp memmove memset	\
 		   mempcpy bcopy bzero ffs ffsll stpcpy stpncpy		\
+		   explicit_bzero					\
 		   strcasecmp strncase strcasecmp_l strncase_l		\
 		   memccpy memcpy wordcopy strsep strcasestr		\
 		   swab strfry memfrob memmem rawmemchr strchrnul	\
@@ -47,7 +48,8 @@  strop-tests	:= memchr memcmp memcpy memmove mempcpy memset memccpy	\
 		   stpcpy stpncpy strcat strchr strcmp strcpy strcspn	\
 		   strlen strncmp strncpy strpbrk strrchr strspn memmem	\
 		   strstr strcasestr strnlen strcasecmp strncasecmp	\
-		   strncat rawmemchr strchrnul bcopy bzero memrchr
+		   strncat rawmemchr strchrnul bcopy bzero memrchr	\
+		   explicit_bzero
 tests		:= tester inl-tester noinl-tester testcopy test-ffs	\
 		   tst-strlen stratcliff tst-svc tst-inlcall		\
 		   bug-strncat1 bug-strspn1 bug-strpbrk1 tst-bswap	\
diff --git a/string/Versions b/string/Versions
index 59bf35a..f9aba06 100644
--- a/string/Versions
+++ b/string/Versions
@@ -80,4 +80,7 @@  libc {
   GLIBC_2.6 {
     strerror_l;
   }
+  GLIBC_2.21 {
+    explicit_bzero;
+  }
 }
diff --git a/string/bits/string3.h b/string/bits/string3.h
index 801e7ac..65d9067 100644
--- a/string/bits/string3.h
+++ b/string/bits/string3.h
@@ -102,6 +102,13 @@  __NTH (bzero (void *__dest, size_t __len))
 {
   (void) __builtin___memset_chk (__dest, '\0', __len, __bos0 (__dest));
 }
+
+__fortify_function void
+__NTH (explicit_bzero (void *__dest, size_t __len))
+{
+  (void) __builtin___memset_chk (__dest, '\0', __len, __bos0 (__dest));
+  __asm__ __volatile__ ( "" : : "r"(__dest) : "memory" );
+}
 #endif
 
 __fortify_function char *
diff --git a/string/bzero.c b/string/bzero.c
index 9c220b9..9525de8 100644
--- a/string/bzero.c
+++ b/string/bzero.c
@@ -20,6 +20,7 @@ 
 #include <memcopy.h>
 
 #undef __bzero
+#undef __explicit_bzero
 
 /* Set N bytes of S to 0.  */
 void
@@ -80,3 +81,17 @@  __bzero (s, len)
     }
 }
 weak_alias (__bzero, bzero)
+
+
+void
+__explicit_bzero(s, len)
+    void *s;
+    size_t len;
+{
+  __bzero (s, len);
+
+  /* Add a barrier to prevent the compiler from optimizing the above bzero
+   * away. */
+  __asm__ __volatile__ ( "" : : "r"(s) : "memory" );
+}
+weak_alias (__explicit_bzero, explicit_bzero)
diff --git a/string/explicit_bzero.c b/string/explicit_bzero.c
new file mode 100644
index 0000000..7ddd190
--- /dev/null
+++ b/string/explicit_bzero.c
@@ -0,0 +1,32 @@ 
+/* Copyright (C) 1991-2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <memcopy.h>
+
+void
+__explicit_bzero(s, len)
+    void *s;
+    size_t len;
+{
+  __bzero (s, len);
+
+  /* Add a barrier to prevent the compiler from optimizing the above bzero
+   * away. */
+  __asm__ __volatile__ ( "" : : "r"(s) : "memory" );
+}
+weak_alias (__explicit_bzero, explicit_bzero)
diff --git a/string/string.h b/string/string.h
index c79debc..c6888c0 100644
--- a/string/string.h
+++ b/string/string.h
@@ -445,9 +445,10 @@  extern char *strerror_l (int __errnum, __locale_t __l) __THROW;
 #endif
 
 
-/* We define this function always since `bzero' is sometimes needed when
+/* We define these functions always since `bzero' is sometimes needed when
    the namespace rules does not allow this.  */
 extern void __bzero (void *__s, size_t __n) __THROW __nonnull ((1));
+extern void __explicit_bzero (void *__s, size_t __n) __THROW __nonnull ((1));
 
 #ifdef __USE_MISC
 /* Copy N bytes of SRC to DEST (like memmove, but args reversed).  */
@@ -457,6 +458,9 @@  extern void bcopy (const void *__src, void *__dest, size_t __n)
 /* Set N bytes of S to 0.  */
 extern void bzero (void *__s, size_t __n) __THROW __nonnull ((1));
 
+/* Set N bytes of S to 0, and don't let the compiler eliminate this. */
+extern void explicit_bzero (void *__s, size_t __n) __THROW __nonnull ((1));
+
 /* Compare N bytes of S1 and S2 (same as memcmp).  */
 extern int bcmp (const void *__s1, const void *__s2, size_t __n)
      __THROW __attribute_pure__ __nonnull ((1, 2));
diff --git a/string/strings.h b/string/strings.h
index 872a0b2..12c4f51 100644
--- a/string/strings.h
+++ b/string/strings.h
@@ -49,6 +49,9 @@  extern void bcopy (const void *__src, void *__dest, size_t __n) __THROW;
 /* Set N bytes of S to 0.  */
 extern void bzero (void *__s, size_t __n) __THROW;
 
+/* Set N bytes of S to 0, and don't let the compiler eliminate this. */
+extern void explicit_bzero (void *__s, size_t __n) __THROW __nonnull ((1));
+
 /* Find the first occurrence of C in S (same as strchr).  */
 #  ifdef __CORRECT_ISO_CPP_STRINGS_H_PROTO
 extern "C++"
diff --git a/string/test-explicit_bzero.c b/string/test-explicit_bzero.c
new file mode 100644
index 0000000..7ed52b3
--- /dev/null
+++ b/string/test-explicit_bzero.c
@@ -0,0 +1,20 @@ 
+/* Test and measure bzero functions.
+   Copyright (C) 2012-2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+#define TEST_EXPLICIT_BZERO
+#define TEST_BZERO
+#include "test-memset.c"
diff --git a/string/test-memset.c b/string/test-memset.c
index 2171b0d..a4820dc 100644
--- a/string/test-memset.c
+++ b/string/test-memset.c
@@ -18,7 +18,9 @@ 
    <http://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef TEST_BZERO
+#ifdef TEST_EXPLICIT_BZERO
+# define TEST_NAME "explicit_bzero"
+#elif defined (TEST_BZERO)
 # define TEST_NAME "bzero"
 #else
 # define TEST_NAME "memset"
@@ -35,7 +37,11 @@  void builtin_bzero (char *, size_t);
 
 IMPL (simple_bzero, 0)
 IMPL (builtin_bzero, 0)
+#ifdef TEST_EXPLICIT_BZERO
+IMPL (explicit_bzero, 1)
+#else
 IMPL (bzero, 1)
+#endif
 
 void
 simple_bzero (char *s, size_t n)