diff mbox

[v17,01/15] Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs

Message ID 1333051320-30872-2-git-send-email-wad@chromium.org
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Will Drewry March 29, 2012, 8:01 p.m. UTC
From: Andy Lutomirski <luto@amacapital.net>

With this set, a lot of dangerous operations (chroot, unshare, etc)
become a lot less dangerous because there is no possibility of
subverting privileged binaries.

This patch completely breaks apparmor.  Someone who understands (and
uses) apparmor should fix it or at least give me a hint.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>

(rebased onto -linus, bumping prctl # -wad@chromium.org)
---
 fs/exec.c                  |   10 +++++++++-
 include/linux/prctl.h      |   15 +++++++++++++++
 include/linux/sched.h      |    2 ++
 include/linux/security.h   |    1 +
 kernel/sys.c               |   10 ++++++++++
 security/apparmor/domain.c |    4 ++++
 security/commoncap.c       |    7 +++++--
 security/selinux/hooks.c   |   10 +++++++++-
 8 files changed, 55 insertions(+), 4 deletions(-)

Comments

Andrew Morton April 6, 2012, 7:49 p.m. UTC | #1
On Thu, 29 Mar 2012 15:01:46 -0500
Will Drewry <wad@chromium.org> wrote:

> From: Andy Lutomirski <luto@amacapital.net>
> 
> With this set, a lot of dangerous operations (chroot, unshare, etc)
> become a lot less dangerous because there is no possibility of
> subverting privileged binaries.
> 
> This patch completely breaks apparmor.  Someone who understands (and
> uses) apparmor should fix it or at least give me a hint.

So [patch 2/15] fixes all this up?

I guess we should join the two patches into one, to avoid a silly
breakage window.  That means that John loses a brownie point, but we
can mention him in the changelog, include his signed-off-by:

> Signed-off-by: Andy Lutomirski <luto@amacapital.net>

Several of these patches are missing your signed-off-by:.  They should
all have your SOB, because you sent them. 
Documentation/SubmittingPatches explains this.

I'm trying to find a way to merge all this code without reviewing it ;)
Alas, this is against my rules.  Given the length of time for which
this patchset has been floating around, I'm a little surprised by the
lack of acked-by's and reviewed-by's.  Have you been gathering them all
up?  Are the networking guys all happy about this patchset?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski April 6, 2012, 7:55 p.m. UTC | #2
On Fri, Apr 6, 2012 at 12:49 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 15:01:46 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> From: Andy Lutomirski <luto@amacapital.net>
>>
>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>> become a lot less dangerous because there is no possibility of
>> subverting privileged binaries.
>>
>> This patch completely breaks apparmor.  Someone who understands (and
>> uses) apparmor should fix it or at least give me a hint.
>
> So [patch 2/15] fixes all this up?
>
> I guess we should join the two patches into one, to avoid a silly
> breakage window.  That means that John loses a brownie point, but we
> can mention him in the changelog, include his signed-off-by:

Or just fix the commit message.  It no longer completely breaks
AppArmor.  It just causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton April 6, 2012, 7:55 p.m. UTC | #3
On Thu, 29 Mar 2012 15:01:46 -0500
Will Drewry <wad@chromium.org> wrote:

> From: Andy Lutomirski <luto@amacapital.net>
> 
> With this set, a lot of dangerous operations (chroot, unshare, etc)
> become a lot less dangerous because there is no possibility of
> subverting privileged binaries.

The changelog doesn't explain the semantics of the new syscall. 
There's a comment way-down-there which I guess suffices, if you hunt
for it.

And the changelog doesn't explain why this is being added.  Presumably
seccomp_filter wants/needs this feature but whowhatwherewhenwhy?  Spell
it all out, please.

The new syscall mode will be documented in the prctl manpage.  Please
cc linux-man@vger.kernel.org and work with Michael on getting this
done?

>
> ...
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski April 6, 2012, 8:01 p.m. UTC | #4
On Fri, Apr 6, 2012 at 12:55 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 15:01:46 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> From: Andy Lutomirski <luto@amacapital.net>
>>
>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>> become a lot less dangerous because there is no possibility of
>> subverting privileged binaries.
>
> The changelog doesn't explain the semantics of the new syscall.
> There's a comment way-down-there which I guess suffices, if you hunt
> for it.
>
> And the changelog doesn't explain why this is being added.  Presumably
> seccomp_filter wants/needs this feature but whowhatwherewhenwhy?  Spell
> it all out, please.
>
> The new syscall mode will be documented in the prctl manpage.  Please
> cc linux-man@vger.kernel.org and work with Michael on getting this
> done?

This has been bugging me for awhile.  Is there any interest in moving
the manpages into the kernel source tree?  Then there could be a
general requirement that new APIs get documented when they're written.

(There are plenty of barely- or incompletely-documented syscalls.
futex and relatives come to mind.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jonathan Corbet April 6, 2012, 8:28 p.m. UTC | #5
On Fri, 6 Apr 2012 13:01:17 -0700
Andrew Lutomirski <luto@mit.edu> wrote:

> This has been bugging me for awhile.  Is there any interest in moving
> the manpages into the kernel source tree?  Then there could be a
> general requirement that new APIs get documented when they're written.

Man page (or other documentation) requirements for patch acceptance are a
regular kernel summit feature.  People seem to think it's a good idea, but
actual enforcement of such requirements always seems to be lacking.  Lots
of people have kind of given up trying.  I don't really see that adding
the man pages to the tree would help, but I could be wrong...

jon
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski April 6, 2012, 8:37 p.m. UTC | #6
On Fri, Apr 6, 2012 at 1:28 PM, Jonathan Corbet <corbet@lwn.net> wrote:
> On Fri, 6 Apr 2012 13:01:17 -0700
> Andrew Lutomirski <luto@mit.edu> wrote:
>
>> This has been bugging me for awhile.  Is there any interest in moving
>> the manpages into the kernel source tree?  Then there could be a
>> general requirement that new APIs get documented when they're written.
>
> Man page (or other documentation) requirements for patch acceptance are a
> regular kernel summit feature.  People seem to think it's a good idea, but
> actual enforcement of such requirements always seems to be lacking.  Lots
> of people have kind of given up trying.  I don't really see that adding
> the man pages to the tree would help, but I could be wrong...
>

If it's in the source, then I can send it with git format-patch.  If
it's out of tree, I have to find the tree, clone the tree, figure out
how to submit, and send separate emails.  And then whoever checks that
I documented it has to figure out where I sent it and how to read it
and then try to decide which documentation submission matches which
patch submission.

(Also, if it's in-tree, then I can build the docs from a kernel tree
and have the latest ones.  That could be nice.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Gutschke April 6, 2012, 8:47 p.m. UTC | #7
On Fri, Apr 6, 2012 at 12:49, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 15:01:46 -0500
> Will Drewry <wad@chromium.org> wrote:
>> From: Andy Lutomirski <luto@amacapital.net>
>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>> become a lot less dangerous because there is no possibility of
>> subverting privileged binaries.

I don't want to derail things. So, tell me to go away, if I can't have
what I want.

Having said that, it would be great if NO_NEW_PRIVS also gave access
to the restricted clone() flags. Such as CLONE_NEWIPC, CLONE_NEWNET
and CLONE_NEWPID.


Markus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski April 6, 2012, 8:54 p.m. UTC | #8
On Fri, Apr 6, 2012 at 1:47 PM, Markus Gutschke <markus@chromium.org> wrote:
> On Fri, Apr 6, 2012 at 12:49, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Thu, 29 Mar 2012 15:01:46 -0500
>> Will Drewry <wad@chromium.org> wrote:
>>> From: Andy Lutomirski <luto@amacapital.net>
>>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>>> become a lot less dangerous because there is no possibility of
>>> subverting privileged binaries.
>
> I don't want to derail things. So, tell me to go away, if I can't have
> what I want.
>
> Having said that, it would be great if NO_NEW_PRIVS also gave access
> to the restricted clone() flags. Such as CLONE_NEWIPC, CLONE_NEWNET
> and CLONE_NEWPID.

I decided to hold off on extra controversy for awhile.  However:

https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=commit;h=9a520b74ad5dc14a3d6950b6d63a64714adbdd7d
and
http://web.mit.edu/luto/www/linux/nnp/newns.c

I fully intend to resurrect both of those once nnp lands.

(FWIW, I think that CLONE_NEWPID interacts badly with unix socket
credentials and should be fixed as a prerequisite for making it easier
to access.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Gutschke April 6, 2012, 9:04 p.m. UTC | #9
On Fri, Apr 6, 2012 at 13:54, Andrew Lutomirski <luto@mit.edu> wrote:
> (FWIW, I think that CLONE_NEWPID interacts badly with unix socket
> credentials and should be fixed as a prerequisite for making it easier
> to access.)

sendmsg() is a big hair ball. And yes, I have all sorts of things on
my wish list that touch sendmsg(). I don't think, I'll get them
fullfilled anytime soon, though:

 - sendmsg() does lots of different things: sending on unconnected
sockets, sending file descriptors, sending unix decriptors, and
plain-old sending data. These operations have very different security
properties and implications. It would be awesome if BPF filters could
filter these different types of operations selectively.

 - ancillary data is a cool concept in general. We should use it more.
If I could send a memory mapping from one process to another, that
would solve so many problems. But I know, I am dreaming; I don't
expect to see this feature any time soon.


Markus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski April 6, 2012, 9:15 p.m. UTC | #10
On Fri, Apr 6, 2012 at 2:04 PM, Markus Gutschke <markus@chromium.org> wrote:
> On Fri, Apr 6, 2012 at 13:54, Andrew Lutomirski <luto@mit.edu> wrote:
>> (FWIW, I think that CLONE_NEWPID interacts badly with unix socket
>> credentials and should be fixed as a prerequisite for making it easier
>> to access.)
>
> sendmsg() is a big hair ball. And yes, I have all sorts of things on
> my wish list that touch sendmsg(). I don't think, I'll get them
> fullfilled anytime soon, though:
>
>  - sendmsg() does lots of different things: sending on unconnected
> sockets, sending file descriptors, sending unix decriptors, and
> plain-old sending data. These operations have very different security
> properties and implications. It would be awesome if BPF filters could
> filter these different types of operations selectively.
>
>  - ancillary data is a cool concept in general. We should use it more.
> If I could send a memory mapping from one process to another, that
> would solve so many problems. But I know, I am dreaming; I don't
> expect to see this feature any time soon.
>

Agreed, but I'm talking about something totally different: if I can
use CLONE_NEWPID, then I can send an unexpected pid with SCM_CREDS.
The SCM_CREDS receive code should remap pids.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Gutschke April 6, 2012, 9:32 p.m. UTC | #11
On Fri, Apr 6, 2012 at 14:15, Andrew Lutomirski <luto@mit.edu> wrote:
> Agreed, but I'm talking about something totally different: if I can
> use CLONE_NEWPID, then I can send an unexpected pid with SCM_CREDS.
> The SCM_CREDS receive code should remap pids.

Yes, I know. It's broken. And so is the view of the /proc filesystem
when inside a pid namespace. And things behave funny if you don't set
up a new "init" process inside of the pid namespace. And I am sure, a
few other things are broken that we just haven't run into.

CLONE_NEWPID is tricky. I can understand, if you want to fix it first.
Looking forward to seeing some patches in the future; please cc me, if
you want feedback from an actual user of this code.

The SCM_CREDS issue is the most serious one of the above, but it
doesn't bother me personally, as I would just set up my sandbox policy
to disallow all of SCM_CREDS (*). But that obviously not a good excuse
for leaving a kernel bug around.

Overall, I like both NO_NEW_PRIVS and BPF filters for seccomp though;
they are a great way to reduce the attack surface of the kernel.
Kernel bugs become a lot less of a headache, if I have a way to filter
out the buggy parts of the kernel. It isn't a panacea, but it's a
great new tool to harden applications.


Markus


*) this is currently difficult to filter SCM_CREDS, if we still want
to allow SCM_RIGHTS. See my earlier complaint about sendmsg().
Currently, filtering of sendmsg() probably requires the use of a
helper process.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Drewry April 10, 2012, 7:03 p.m. UTC | #12
On Fri, Apr 6, 2012 at 2:55 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 15:01:46 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> From: Andy Lutomirski <luto@amacapital.net>
>>
>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>> become a lot less dangerous because there is no possibility of
>> subverting privileged binaries.
>
> The changelog doesn't explain the semantics of the new syscall.
> There's a comment way-down-there which I guess suffices, if you hunt
> for it.

I'll bubble up luto's comment into the changelog when I resend the
grand-unified-patchset.

> And the changelog doesn't explain why this is being added.  Presumably
> seccomp_filter wants/needs this feature but whowhatwherewhenwhy?  Spell
> it all out, please.

I'll try my hand at that and luto@ can yell at me if I misrepresent.
Seem reasonable?

> The new syscall mode will be documented in the prctl manpage.  Please
> cc linux-man@vger.kernel.org and work with Michael on getting this
> done?

I'll add linux-man to the patch series since this applies to both
no_new_privs and seccomp filter.

Thanks!

>>
>> ...
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Drewry April 10, 2012, 7:12 p.m. UTC | #13
On Fri, Apr 6, 2012 at 2:49 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 15:01:46 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> From: Andy Lutomirski <luto@amacapital.net>
>>
>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>> become a lot less dangerous because there is no possibility of
>> subverting privileged binaries.
>>
>> This patch completely breaks apparmor.  Someone who understands (and
>> uses) apparmor should fix it or at least give me a hint.
>
> So [patch 2/15] fixes all this up?
>
> I guess we should join the two patches into one, to avoid a silly
> breakage window.  That means that John loses a brownie point, but we
> can mention him in the changelog, include his signed-off-by:
>
>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>
> Several of these patches are missing your signed-off-by:.  They should
> all have your SOB, because you sent them.
> Documentation/SubmittingPatches explains this.

Oops - I'll add them!

> I'm trying to find a way to merge all this code without reviewing it ;)
> Alas, this is against my rules.  Given the length of time for which
> this patchset has been floating around, I'm a little surprised by the
> lack of acked-by's and reviewed-by's.  Have you been gathering them all
> up?  Are the networking guys all happy about this patchset?

eric.dumazet@gmail.com acked the networking ones, and I have a
smattering of others for the other patches. Given the review and
feedback, I don't have a huge number of acked/reviewed-bys. I tried
not to lose any after the first couple of revs, but I know I did some
things wrong early on.

I can prod some others who've contributed to add their tags, unless
there is a good reason for them not too.  I suspect it was just
because of partial/drive-by reviewing, but I don't know.

thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley April 10, 2012, 8:37 p.m. UTC | #14
On 04/06/2012 03:01 PM, Andrew Lutomirski wrote:
> On Fri, Apr 6, 2012 at 12:55 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Thu, 29 Mar 2012 15:01:46 -0500
>> Will Drewry <wad@chromium.org> wrote:
>>
>>> From: Andy Lutomirski <luto@amacapital.net>
>>>
>>> With this set, a lot of dangerous operations (chroot, unshare, etc)
>>> become a lot less dangerous because there is no possibility of
>>> subverting privileged binaries.
>>
>> The changelog doesn't explain the semantics of the new syscall.
>> There's a comment way-down-there which I guess suffices, if you hunt
>> for it.
>>
>> And the changelog doesn't explain why this is being added.  Presumably
>> seccomp_filter wants/needs this feature but whowhatwherewhenwhy?  Spell
>> it all out, please.
>>
>> The new syscall mode will be documented in the prctl manpage.  Please
>> cc linux-man@vger.kernel.org and work with Michael on getting this
>> done?
> 
> This has been bugging me for awhile.  Is there any interest in moving
> the manpages into the kernel source tree?

Not that I know of. I'm pretty sure if the guy maintaining it (Michael
Kerrisk) wanted to do that, he could have raised the issue at any time
over the past several years.

> Then there could be a
> general requirement that new APIs get documented when they're written.

Because having a Documentation directory, javadoc in the source itself
(some of which is combined with the Documentation/DocBook xml files to
form the make htmldocs output), menuconfig help text, and a whole buch
of scattered readmes does _not_ get new APIs documented as they're written.

That isn't even counting git commit comments and mailing list messages
in various web archives. Are you going to suck the linux weekly news
kernel articles into the tree (http://lwn.net/Kernel/Index)? How about
Linux Journal's complete archives going back to 2004
(http://www.linuxjournal.com/magazine)? Or the h-online and
kernelnewbies writeups? How about wikipedia pages on interesting kernel
topics? The sourceforge pages for userspace projects like lxc.sf.net or
i2c-utils? How about that device driver writing tutorial Greg KH
recorded in 2008, that's only a 2.8 gigabyte video file.  Rusty's
Unreliable Guides?  Greg KH's blog? (Heck, http://kernelplanet.org).

Speaking of videos, here's the 2011 LinuxCon Japan talks:

  http://video.linux.com/categories/2011-linuxcon-japan

And here are videos for the Consumer Electronic Linux Forum:

  http://free-electrons.com/blog/elc-2012-videos/

(and you can get 2011, 2010, 2006...)

Here are Ottawa Linux Symposium papers:

  http://kernel.org/doc/ols

Don't forget IBM Developerworks' library:

http://www.ibm.com/developerworks/linux/library/l-linux-kernel/

Have some standards documents:

http://www.opengroup.org/onlinepubs/9699919799/
http://busybox.net/~landley/c99-draft.html
http://www.unix.org/whitepapers/64bit.html
http://refspecs.linuxfoundation.org
http://t10.org/scsi-3.htm

Here's a random blog post about booting a bare metal "hello world"
program on qemu for ARM:

http://balau82.wordpress.com/2010/02/28/hello-world-for-bare-metal-arm-using-qemu/

Let's pick a topic, like the ELF loader. Here's the best introduction of
how ELF files _really_ work I've seen:

http://muppetlabs.com/~breadbox/software/tiny/teensy.html

Although http://linuxjournal.com/article/1059 is pretty good too, as
were http://linuxjournal.com/article/1060 and
http://linuxjournal.com/article/80 as well.  And if you want the
_details_, here's an extremely dry online book:

http;//www.iecc.com/linker/

And here's the first entry in the blog series the guy who wrote "gold"
did about writing his new linker:

http://www.airs.com/blog/archives/38

And so on, and so forth...

> (There are plenty of barely- or incompletely-documented syscalls.
> futex and relatives come to mind.)

Your proposal does not address this problem.

speaking of syscalls, I do note that ever since I tried to add Hexagon
support to strace (less fun than it sounds), I've wanted a way to beat
proper syscall information out of the kernel headers so I could get not
just syscall numbers but how many arguments and a brief stab at the type
of each argument.

Of course you _can_ get argument type and count, but not from the
headers: you have to use moderately horrible sed on the kernel's source
code, ala:

find . -name "*.c" -print0 | \
xargs -n1 -0 sed -n -e 's/.*\(SYSCALL_DEFINE[0-9](\)/\1/' \
  -e 't got;d;:got;s/).*/)/p;t;N;b got'

> --Andy

Rob
Michael Kerrisk \(man-pages\) April 11, 2012, 7:31 p.m. UTC | #15
On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote:
> On Fri, 6 Apr 2012 13:01:17 -0700
> Andrew Lutomirski <luto@mit.edu> wrote:
>
>> This has been bugging me for awhile.  Is there any interest in moving
>> the manpages into the kernel source tree?  Then there could be a
>> general requirement that new APIs get documented when they're written.
>
> Man page (or other documentation) requirements for patch acceptance are a
> regular kernel summit feature.  People seem to think it's a good idea, but
> actual enforcement of such requirements always seems to be lacking.  Lots
> of people have kind of given up trying.  I don't really see that adding
> the man pages to the tree would help, but I could be wrong...

I largely consider this (moving man pages to kernel.org) a technical
solution to what is fundamentally a social problem (developers
reluctant to write documentation), and doubt that the technical
solution would make much difference. I'd love to be proved wrong, but
the experiment would require significant start-up effort. (My
collected thoughts on this can be found here:
http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source.
Note the alternative idea of patch tags mentioned at the end of that
text.)

Unless, or until there's a paid maintainer, I don't expect things to
get significantly better than what they currently are. The quite
significant improvements in man-pages since 2004, when I became
maintainer were in small part due to the fact that I was for a short
period paid to do the work, but in much larger part due to a huge
private effort over those years which over the last couple of years is
no longer unsustainable for me (man-pages is in competition with
requirements for my attention from family, working life, and
(seriously!) seismic events),

Cheers,

Michael
Michael Kerrisk \(man-pages\) April 12, 2012, 12:15 a.m. UTC | #16
> no longer unsustainable for me (man-pages is in competition with

^no longer unsustainable^no longer sustainable
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski April 12, 2012, 12:50 a.m. UTC | #17
On Wed, Apr 11, 2012 at 12:31 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote:
>> On Fri, 6 Apr 2012 13:01:17 -0700
>> Andrew Lutomirski <luto@mit.edu> wrote:
>>
>>> This has been bugging me for awhile.  Is there any interest in moving
>>> the manpages into the kernel source tree?  Then there could be a
>>> general requirement that new APIs get documented when they're written.
>>
>> Man page (or other documentation) requirements for patch acceptance are a
>> regular kernel summit feature.  People seem to think it's a good idea, but
>> actual enforcement of such requirements always seems to be lacking.  Lots
>> of people have kind of given up trying.  I don't really see that adding
>> the man pages to the tree would help, but I could be wrong...
>
> I largely consider this (moving man pages to kernel.org) a technical
> solution to what is fundamentally a social problem (developers
> reluctant to write documentation), and doubt that the technical
> solution would make much difference. I'd love to be proved wrong, but
> the experiment would require significant start-up effort. (My
> collected thoughts on this can be found here:
> http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source.
> Note the alternative idea of patch tags mentioned at the end of that
> text.)
>
> Unless, or until there's a paid maintainer, I don't expect things to
> get significantly better than what they currently are. The quite
> significant improvements in man-pages since 2004, when I became
> maintainer were in small part due to the fact that I was for a short
> period paid to do the work, but in much larger part due to a huge
> private effort over those years which over the last couple of years is
> no longer unsustainable for me (man-pages is in competition with
> requirements for my attention from family, working life, and
> (seriously!) seismic events),

Hrm.  Maybe someone could convince Andrew and Linus not to pull new
syscalls or major ABI features unless the patchset includes full docs.

Anyway, I'll write up a detailed description of PR_SET_NO_NEW_PRIVS,
stick it in the changelog, and cc linux-doc.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley April 16, 2012, 7:11 p.m. UTC | #18
On 04/11/2012 02:31 PM, Michael Kerrisk (man-pages) wrote:
> On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote:
>> On Fri, 6 Apr 2012 13:01:17 -0700
>> Andrew Lutomirski <luto@mit.edu> wrote:
>>
>>> This has been bugging me for awhile.  Is there any interest in moving
>>> the manpages into the kernel source tree?  Then there could be a
>>> general requirement that new APIs get documented when they're written.
>>
>> Man page (or other documentation) requirements for patch acceptance are a
>> regular kernel summit feature.  People seem to think it's a good idea, but
>> actual enforcement of such requirements always seems to be lacking.  Lots
>> of people have kind of given up trying.  I don't really see that adding
>> the man pages to the tree would help, but I could be wrong...
> 
> I largely consider this (moving man pages to kernel.org) a technical
> solution to what is fundamentally a social problem (developers
> reluctant to write documentation), and doubt that the technical
> solution would make much difference.

*nod* *nod*

> I'd love to be proved wrong, but
> the experiment would require significant start-up effort. (My
> collected thoughts on this can be found here:
> http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source.
> Note the alternative idea of patch tags mentioned at the end of that
> text.)
> 
> Unless, or until there's a paid maintainer, I don't expect things to
> get significantly better than what they currently are.

Maintainer of which, the man pages or the kernel Documentation directory?

I just got handed the Documentation ball (right as relatives were
visiting and a couple days before buying a new house, so I've just put
_tons_ of time into it so far). I have grandiose plans for cleaning it
up, but first I need to get my kernel.org account working again.

That said, I think having man-pages in the kernel directory is a bad
idea, for reasons I already posted to this thread.

> The quite
> significant improvements in man-pages since 2004, when I became
> maintainer were in small part due to the fact that I was for a short
> period paid to do the work, but in much larger part due to a huge
> private effort over those years which over the last couple of years is
> no longer unsustainable for me (man-pages is in competition with
> requirements for my attention from family, working life, and
> (seriously!) seismic events),

Heh, I know the feeling. :)

Circa 2007 I was paid to work on documentation for half a year (hence
the http://kernel.org/doc directory I stopped being able to update when
kernel.org got hacked). These days it competes with my toybox and
aboriginal linux projects, and with my day job.

The way I'm looking at it is I'm _curating_ documentation.  I'm acting
as some kind of of librarian, and my first goal is reshuffling the files
in Documentation into some semblance of order so you can see what's
there.  (I've posted about that before here, moving
architecture-specific stuff under an arch subdirectory and so on.)

I do sometimes write new documentation, but no human being knows
everything there is to know about the kernel. Building expertise is
enormously time consuming,

That said, if anything was going to move into the kernel moving the
syscall info into javadoc might make sense.

Something that might help you is the syscall mining script snippet I
posted last time:

find . -name "*.c" -print0 | \
xargs -n1 -0 sed -n -e 's/.*\(SYSCALL_DEFINE[0-9](\)/\1/' \
  -e 't got;d;:got;s/).*/)/p;t;N;b got'

I might be able to build a script around that which would look up the
system call number, figure out which architectures implement this call,
find any javadoc at the call site, and so on. That way we could automate
this a bit.  For example, kernel/fork.c has two syscalls:

SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)

/*
 * unshare allows a process to 'unshare' part of the process
 * context which was originally shared using clone.  copy_*
 * functions used by do_fork() cannot be used here directly
 * because they modify an inactive task_struct that is being
 * constructed. Here we are modifying the current, active,
 * task_struct.
 */
SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)

Both of these have man pages which provide way more info than the
comments (if any). Is there any sort of javadoc comment before the
syscall that might provide useful information you could automatically
harvest?  Some sort of standard header briefly defining the syscall?

(P.S. Speaking of man 2 unshare, what's with the #define _GNU_SOURCE for
a new linux kernel syscall? What the heck does the FSF have to do with
anything? This didn't used to be needed in ubuntu 10.04 but then the
headers changed to match the man page, which I found sad...)

> Cheers,
> 
> Michael

Rob
diff mbox

Patch

diff --git a/fs/exec.c b/fs/exec.c
index c8b63d1..a8451ec 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1245,6 +1245,13 @@  static int check_unsafe_exec(struct linux_binprm *bprm)
 			bprm->unsafe |= LSM_UNSAFE_PTRACE;
 	}
 
+	/*
+	 * This isn't strictly necessary, but it makes it harder for LSMs to
+	 * mess up.
+	 */
+	if (current->no_new_privs)
+		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
+
 	n_fs = 1;
 	spin_lock(&p->fs->lock);
 	rcu_read_lock();
@@ -1288,7 +1295,8 @@  int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
+	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
+	    !current->no_new_privs) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index e0cfec2..78b76e2 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -124,4 +124,19 @@ 
 #define PR_SET_CHILD_SUBREAPER 36
 #define PR_GET_CHILD_SUBREAPER 37
 
+/*
+ * If no_new_privs is set, then operations that grant new privileges (i.e.
+ * execve) will either fail or not grant them.  This affects suid/sgid,
+ * file capabilities, and LSMs.
+ *
+ * Operations that merely manipulate or drop existing privileges (setresuid,
+ * capset, etc.) will still work.  Drop those privileges if you want them gone.
+ *
+ * Changing LSM security domain is considered a new privilege.  So, for example,
+ * asking selinux for a specific new context (e.g. with runcon) will result
+ * in execve returning -EPERM.
+ */
+#define PR_SET_NO_NEW_PRIVS 38
+#define PR_GET_NO_NEW_PRIVS 39
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..ba60897 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1341,6 +1341,8 @@  struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
 
+	/* task may not gain privileges */
+	unsigned no_new_privs:1;
 
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
diff --git a/include/linux/security.h b/include/linux/security.h
index 673afbb..6e1dea9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -144,6 +144,7 @@  struct request_sock;
 #define LSM_UNSAFE_SHARE	1
 #define LSM_UNSAFE_PTRACE	2
 #define LSM_UNSAFE_PTRACE_CAP	4
+#define LSM_UNSAFE_NO_NEW_PRIVS	8
 
 #ifdef CONFIG_MMU
 extern int mmap_min_addr_handler(struct ctl_table *table, int write,
diff --git a/kernel/sys.c b/kernel/sys.c
index e7006eb..b82568b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1979,6 +1979,16 @@  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = put_user(me->signal->is_child_subreaper,
 					 (int __user *) arg2);
 			break;
+		case PR_SET_NO_NEW_PRIVS:
+			if (arg2 != 1 || arg3 || arg4 || arg5)
+				return -EINVAL;
+
+			current->no_new_privs = 1;
+			break;
+		case PR_GET_NO_NEW_PRIVS:
+			if (arg2 || arg3 || arg4 || arg5)
+				return -EINVAL;
+			return current->no_new_privs ? 1 : 0;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 6327685..18c88d0 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -360,6 +360,10 @@  int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 	if (bprm->cred_prepared)
 		return 0;
 
+	/* XXX: no_new_privs is not usable with AppArmor yet */
+	if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)
+		return -EPERM;
+
 	cxt = bprm->cred->security;
 	BUG_ON(!cxt);
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 0cf4b53..edd3918 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -506,14 +506,17 @@  int cap_bprm_set_creds(struct linux_binprm *bprm)
 skip:
 
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
-	 * credentials unless they have the appropriate permit
+	 * credentials unless they have the appropriate permit.
+	 *
+	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
 	if ((new->euid != old->uid ||
 	     new->egid != old->gid ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
-		if (!capable(CAP_SETUID)) {
+		if (!capable(CAP_SETUID) ||
+		    (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
 			new->euid = new->uid;
 			new->egid = new->gid;
 		}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3049299..be99c84 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2000,6 +2000,13 @@  static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 		new_tsec->sid = old_tsec->exec_sid;
 		/* Reset exec SID on execve. */
 		new_tsec->exec_sid = 0;
+
+		/*
+		 * Minimize confusion: if no_new_privs and a transition is
+		 * explicitly requested, then fail the exec.
+		 */
+		if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)
+			return -EPERM;
 	} else {
 		/* Check for a default transition on this program. */
 		rc = security_transition_sid(old_tsec->sid, isec->sid,
@@ -2012,7 +2019,8 @@  static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 	COMMON_AUDIT_DATA_INIT(&ad, PATH);
 	ad.u.path = bprm->file->f_path;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if ((bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) ||
+	    (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))
 		new_tsec->sid = old_tsec->sid;
 
 	if (new_tsec->sid == old_tsec->sid) {