Message ID | 1333051320-30872-2-git-send-email-wad@chromium.org |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 29 Mar 2012 15:01:46 -0500 Will Drewry <wad@chromium.org> wrote: > From: Andy Lutomirski <luto@amacapital.net> > > With this set, a lot of dangerous operations (chroot, unshare, etc) > become a lot less dangerous because there is no possibility of > subverting privileged binaries. > > This patch completely breaks apparmor. Someone who understands (and > uses) apparmor should fix it or at least give me a hint. So [patch 2/15] fixes all this up? I guess we should join the two patches into one, to avoid a silly breakage window. That means that John loses a brownie point, but we can mention him in the changelog, include his signed-off-by: > Signed-off-by: Andy Lutomirski <luto@amacapital.net> Several of these patches are missing your signed-off-by:. They should all have your SOB, because you sent them. Documentation/SubmittingPatches explains this. I'm trying to find a way to merge all this code without reviewing it ;) Alas, this is against my rules. Given the length of time for which this patchset has been floating around, I'm a little surprised by the lack of acked-by's and reviewed-by's. Have you been gathering them all up? Are the networking guys all happy about this patchset? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 12:49 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 29 Mar 2012 15:01:46 -0500 > Will Drewry <wad@chromium.org> wrote: > >> From: Andy Lutomirski <luto@amacapital.net> >> >> With this set, a lot of dangerous operations (chroot, unshare, etc) >> become a lot less dangerous because there is no possibility of >> subverting privileged binaries. >> >> This patch completely breaks apparmor. Someone who understands (and >> uses) apparmor should fix it or at least give me a hint. > > So [patch 2/15] fixes all this up? > > I guess we should join the two patches into one, to avoid a silly > breakage window. That means that John loses a brownie point, but we > can mention him in the changelog, include his signed-off-by: Or just fix the commit message. It no longer completely breaks AppArmor. It just causes execve to fail when PR_SET_NO_NEW_PRIVS is set and AppArmor is in use. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 29 Mar 2012 15:01:46 -0500 Will Drewry <wad@chromium.org> wrote: > From: Andy Lutomirski <luto@amacapital.net> > > With this set, a lot of dangerous operations (chroot, unshare, etc) > become a lot less dangerous because there is no possibility of > subverting privileged binaries. The changelog doesn't explain the semantics of the new syscall. There's a comment way-down-there which I guess suffices, if you hunt for it. And the changelog doesn't explain why this is being added. Presumably seccomp_filter wants/needs this feature but whowhatwherewhenwhy? Spell it all out, please. The new syscall mode will be documented in the prctl manpage. Please cc linux-man@vger.kernel.org and work with Michael on getting this done? > > ... > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 12:55 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 29 Mar 2012 15:01:46 -0500 > Will Drewry <wad@chromium.org> wrote: > >> From: Andy Lutomirski <luto@amacapital.net> >> >> With this set, a lot of dangerous operations (chroot, unshare, etc) >> become a lot less dangerous because there is no possibility of >> subverting privileged binaries. > > The changelog doesn't explain the semantics of the new syscall. > There's a comment way-down-there which I guess suffices, if you hunt > for it. > > And the changelog doesn't explain why this is being added. Presumably > seccomp_filter wants/needs this feature but whowhatwherewhenwhy? Spell > it all out, please. > > The new syscall mode will be documented in the prctl manpage. Please > cc linux-man@vger.kernel.org and work with Michael on getting this > done? This has been bugging me for awhile. Is there any interest in moving the manpages into the kernel source tree? Then there could be a general requirement that new APIs get documented when they're written. (There are plenty of barely- or incompletely-documented syscalls. futex and relatives come to mind.) --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 6 Apr 2012 13:01:17 -0700 Andrew Lutomirski <luto@mit.edu> wrote: > This has been bugging me for awhile. Is there any interest in moving > the manpages into the kernel source tree? Then there could be a > general requirement that new APIs get documented when they're written. Man page (or other documentation) requirements for patch acceptance are a regular kernel summit feature. People seem to think it's a good idea, but actual enforcement of such requirements always seems to be lacking. Lots of people have kind of given up trying. I don't really see that adding the man pages to the tree would help, but I could be wrong... jon -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 1:28 PM, Jonathan Corbet <corbet@lwn.net> wrote: > On Fri, 6 Apr 2012 13:01:17 -0700 > Andrew Lutomirski <luto@mit.edu> wrote: > >> This has been bugging me for awhile. Is there any interest in moving >> the manpages into the kernel source tree? Then there could be a >> general requirement that new APIs get documented when they're written. > > Man page (or other documentation) requirements for patch acceptance are a > regular kernel summit feature. People seem to think it's a good idea, but > actual enforcement of such requirements always seems to be lacking. Lots > of people have kind of given up trying. I don't really see that adding > the man pages to the tree would help, but I could be wrong... > If it's in the source, then I can send it with git format-patch. If it's out of tree, I have to find the tree, clone the tree, figure out how to submit, and send separate emails. And then whoever checks that I documented it has to figure out where I sent it and how to read it and then try to decide which documentation submission matches which patch submission. (Also, if it's in-tree, then I can build the docs from a kernel tree and have the latest ones. That could be nice.) --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 12:49, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 29 Mar 2012 15:01:46 -0500 > Will Drewry <wad@chromium.org> wrote: >> From: Andy Lutomirski <luto@amacapital.net> >> With this set, a lot of dangerous operations (chroot, unshare, etc) >> become a lot less dangerous because there is no possibility of >> subverting privileged binaries. I don't want to derail things. So, tell me to go away, if I can't have what I want. Having said that, it would be great if NO_NEW_PRIVS also gave access to the restricted clone() flags. Such as CLONE_NEWIPC, CLONE_NEWNET and CLONE_NEWPID. Markus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 1:47 PM, Markus Gutschke <markus@chromium.org> wrote: > On Fri, Apr 6, 2012 at 12:49, Andrew Morton <akpm@linux-foundation.org> wrote: >> On Thu, 29 Mar 2012 15:01:46 -0500 >> Will Drewry <wad@chromium.org> wrote: >>> From: Andy Lutomirski <luto@amacapital.net> >>> With this set, a lot of dangerous operations (chroot, unshare, etc) >>> become a lot less dangerous because there is no possibility of >>> subverting privileged binaries. > > I don't want to derail things. So, tell me to go away, if I can't have > what I want. > > Having said that, it would be great if NO_NEW_PRIVS also gave access > to the restricted clone() flags. Such as CLONE_NEWIPC, CLONE_NEWNET > and CLONE_NEWPID. I decided to hold off on extra controversy for awhile. However: https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=commit;h=9a520b74ad5dc14a3d6950b6d63a64714adbdd7d and http://web.mit.edu/luto/www/linux/nnp/newns.c I fully intend to resurrect both of those once nnp lands. (FWIW, I think that CLONE_NEWPID interacts badly with unix socket credentials and should be fixed as a prerequisite for making it easier to access.) --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 13:54, Andrew Lutomirski <luto@mit.edu> wrote: > (FWIW, I think that CLONE_NEWPID interacts badly with unix socket > credentials and should be fixed as a prerequisite for making it easier > to access.) sendmsg() is a big hair ball. And yes, I have all sorts of things on my wish list that touch sendmsg(). I don't think, I'll get them fullfilled anytime soon, though: - sendmsg() does lots of different things: sending on unconnected sockets, sending file descriptors, sending unix decriptors, and plain-old sending data. These operations have very different security properties and implications. It would be awesome if BPF filters could filter these different types of operations selectively. - ancillary data is a cool concept in general. We should use it more. If I could send a memory mapping from one process to another, that would solve so many problems. But I know, I am dreaming; I don't expect to see this feature any time soon. Markus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 2:04 PM, Markus Gutschke <markus@chromium.org> wrote: > On Fri, Apr 6, 2012 at 13:54, Andrew Lutomirski <luto@mit.edu> wrote: >> (FWIW, I think that CLONE_NEWPID interacts badly with unix socket >> credentials and should be fixed as a prerequisite for making it easier >> to access.) > > sendmsg() is a big hair ball. And yes, I have all sorts of things on > my wish list that touch sendmsg(). I don't think, I'll get them > fullfilled anytime soon, though: > > - sendmsg() does lots of different things: sending on unconnected > sockets, sending file descriptors, sending unix decriptors, and > plain-old sending data. These operations have very different security > properties and implications. It would be awesome if BPF filters could > filter these different types of operations selectively. > > - ancillary data is a cool concept in general. We should use it more. > If I could send a memory mapping from one process to another, that > would solve so many problems. But I know, I am dreaming; I don't > expect to see this feature any time soon. > Agreed, but I'm talking about something totally different: if I can use CLONE_NEWPID, then I can send an unexpected pid with SCM_CREDS. The SCM_CREDS receive code should remap pids. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 14:15, Andrew Lutomirski <luto@mit.edu> wrote: > Agreed, but I'm talking about something totally different: if I can > use CLONE_NEWPID, then I can send an unexpected pid with SCM_CREDS. > The SCM_CREDS receive code should remap pids. Yes, I know. It's broken. And so is the view of the /proc filesystem when inside a pid namespace. And things behave funny if you don't set up a new "init" process inside of the pid namespace. And I am sure, a few other things are broken that we just haven't run into. CLONE_NEWPID is tricky. I can understand, if you want to fix it first. Looking forward to seeing some patches in the future; please cc me, if you want feedback from an actual user of this code. The SCM_CREDS issue is the most serious one of the above, but it doesn't bother me personally, as I would just set up my sandbox policy to disallow all of SCM_CREDS (*). But that obviously not a good excuse for leaving a kernel bug around. Overall, I like both NO_NEW_PRIVS and BPF filters for seccomp though; they are a great way to reduce the attack surface of the kernel. Kernel bugs become a lot less of a headache, if I have a way to filter out the buggy parts of the kernel. It isn't a panacea, but it's a great new tool to harden applications. Markus *) this is currently difficult to filter SCM_CREDS, if we still want to allow SCM_RIGHTS. See my earlier complaint about sendmsg(). Currently, filtering of sendmsg() probably requires the use of a helper process. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 2:55 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 29 Mar 2012 15:01:46 -0500 > Will Drewry <wad@chromium.org> wrote: > >> From: Andy Lutomirski <luto@amacapital.net> >> >> With this set, a lot of dangerous operations (chroot, unshare, etc) >> become a lot less dangerous because there is no possibility of >> subverting privileged binaries. > > The changelog doesn't explain the semantics of the new syscall. > There's a comment way-down-there which I guess suffices, if you hunt > for it. I'll bubble up luto's comment into the changelog when I resend the grand-unified-patchset. > And the changelog doesn't explain why this is being added. Presumably > seccomp_filter wants/needs this feature but whowhatwherewhenwhy? Spell > it all out, please. I'll try my hand at that and luto@ can yell at me if I misrepresent. Seem reasonable? > The new syscall mode will be documented in the prctl manpage. Please > cc linux-man@vger.kernel.org and work with Michael on getting this > done? I'll add linux-man to the patch series since this applies to both no_new_privs and seccomp filter. Thanks! >> >> ... >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 6, 2012 at 2:49 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 29 Mar 2012 15:01:46 -0500 > Will Drewry <wad@chromium.org> wrote: > >> From: Andy Lutomirski <luto@amacapital.net> >> >> With this set, a lot of dangerous operations (chroot, unshare, etc) >> become a lot less dangerous because there is no possibility of >> subverting privileged binaries. >> >> This patch completely breaks apparmor. Someone who understands (and >> uses) apparmor should fix it or at least give me a hint. > > So [patch 2/15] fixes all this up? > > I guess we should join the two patches into one, to avoid a silly > breakage window. That means that John loses a brownie point, but we > can mention him in the changelog, include his signed-off-by: > >> Signed-off-by: Andy Lutomirski <luto@amacapital.net> > > Several of these patches are missing your signed-off-by:. They should > all have your SOB, because you sent them. > Documentation/SubmittingPatches explains this. Oops - I'll add them! > I'm trying to find a way to merge all this code without reviewing it ;) > Alas, this is against my rules. Given the length of time for which > this patchset has been floating around, I'm a little surprised by the > lack of acked-by's and reviewed-by's. Have you been gathering them all > up? Are the networking guys all happy about this patchset? eric.dumazet@gmail.com acked the networking ones, and I have a smattering of others for the other patches. Given the review and feedback, I don't have a huge number of acked/reviewed-bys. I tried not to lose any after the first couple of revs, but I know I did some things wrong early on. I can prod some others who've contributed to add their tags, unless there is a good reason for them not too. I suspect it was just because of partial/drive-by reviewing, but I don't know. thanks! will -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/06/2012 03:01 PM, Andrew Lutomirski wrote: > On Fri, Apr 6, 2012 at 12:55 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: >> On Thu, 29 Mar 2012 15:01:46 -0500 >> Will Drewry <wad@chromium.org> wrote: >> >>> From: Andy Lutomirski <luto@amacapital.net> >>> >>> With this set, a lot of dangerous operations (chroot, unshare, etc) >>> become a lot less dangerous because there is no possibility of >>> subverting privileged binaries. >> >> The changelog doesn't explain the semantics of the new syscall. >> There's a comment way-down-there which I guess suffices, if you hunt >> for it. >> >> And the changelog doesn't explain why this is being added. Presumably >> seccomp_filter wants/needs this feature but whowhatwherewhenwhy? Spell >> it all out, please. >> >> The new syscall mode will be documented in the prctl manpage. Please >> cc linux-man@vger.kernel.org and work with Michael on getting this >> done? > > This has been bugging me for awhile. Is there any interest in moving > the manpages into the kernel source tree? Not that I know of. I'm pretty sure if the guy maintaining it (Michael Kerrisk) wanted to do that, he could have raised the issue at any time over the past several years. > Then there could be a > general requirement that new APIs get documented when they're written. Because having a Documentation directory, javadoc in the source itself (some of which is combined with the Documentation/DocBook xml files to form the make htmldocs output), menuconfig help text, and a whole buch of scattered readmes does _not_ get new APIs documented as they're written. That isn't even counting git commit comments and mailing list messages in various web archives. Are you going to suck the linux weekly news kernel articles into the tree (http://lwn.net/Kernel/Index)? How about Linux Journal's complete archives going back to 2004 (http://www.linuxjournal.com/magazine)? Or the h-online and kernelnewbies writeups? How about wikipedia pages on interesting kernel topics? The sourceforge pages for userspace projects like lxc.sf.net or i2c-utils? How about that device driver writing tutorial Greg KH recorded in 2008, that's only a 2.8 gigabyte video file. Rusty's Unreliable Guides? Greg KH's blog? (Heck, http://kernelplanet.org). Speaking of videos, here's the 2011 LinuxCon Japan talks: http://video.linux.com/categories/2011-linuxcon-japan And here are videos for the Consumer Electronic Linux Forum: http://free-electrons.com/blog/elc-2012-videos/ (and you can get 2011, 2010, 2006...) Here are Ottawa Linux Symposium papers: http://kernel.org/doc/ols Don't forget IBM Developerworks' library: http://www.ibm.com/developerworks/linux/library/l-linux-kernel/ Have some standards documents: http://www.opengroup.org/onlinepubs/9699919799/ http://busybox.net/~landley/c99-draft.html http://www.unix.org/whitepapers/64bit.html http://refspecs.linuxfoundation.org http://t10.org/scsi-3.htm Here's a random blog post about booting a bare metal "hello world" program on qemu for ARM: http://balau82.wordpress.com/2010/02/28/hello-world-for-bare-metal-arm-using-qemu/ Let's pick a topic, like the ELF loader. Here's the best introduction of how ELF files _really_ work I've seen: http://muppetlabs.com/~breadbox/software/tiny/teensy.html Although http://linuxjournal.com/article/1059 is pretty good too, as were http://linuxjournal.com/article/1060 and http://linuxjournal.com/article/80 as well. And if you want the _details_, here's an extremely dry online book: http;//www.iecc.com/linker/ And here's the first entry in the blog series the guy who wrote "gold" did about writing his new linker: http://www.airs.com/blog/archives/38 And so on, and so forth... > (There are plenty of barely- or incompletely-documented syscalls. > futex and relatives come to mind.) Your proposal does not address this problem. speaking of syscalls, I do note that ever since I tried to add Hexagon support to strace (less fun than it sounds), I've wanted a way to beat proper syscall information out of the kernel headers so I could get not just syscall numbers but how many arguments and a brief stab at the type of each argument. Of course you _can_ get argument type and count, but not from the headers: you have to use moderately horrible sed on the kernel's source code, ala: find . -name "*.c" -print0 | \ xargs -n1 -0 sed -n -e 's/.*\(SYSCALL_DEFINE[0-9](\)/\1/' \ -e 't got;d;:got;s/).*/)/p;t;N;b got' > --Andy Rob
On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote: > On Fri, 6 Apr 2012 13:01:17 -0700 > Andrew Lutomirski <luto@mit.edu> wrote: > >> This has been bugging me for awhile. Is there any interest in moving >> the manpages into the kernel source tree? Then there could be a >> general requirement that new APIs get documented when they're written. > > Man page (or other documentation) requirements for patch acceptance are a > regular kernel summit feature. People seem to think it's a good idea, but > actual enforcement of such requirements always seems to be lacking. Lots > of people have kind of given up trying. I don't really see that adding > the man pages to the tree would help, but I could be wrong... I largely consider this (moving man pages to kernel.org) a technical solution to what is fundamentally a social problem (developers reluctant to write documentation), and doubt that the technical solution would make much difference. I'd love to be proved wrong, but the experiment would require significant start-up effort. (My collected thoughts on this can be found here: http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source. Note the alternative idea of patch tags mentioned at the end of that text.) Unless, or until there's a paid maintainer, I don't expect things to get significantly better than what they currently are. The quite significant improvements in man-pages since 2004, when I became maintainer were in small part due to the fact that I was for a short period paid to do the work, but in much larger part due to a huge private effort over those years which over the last couple of years is no longer unsustainable for me (man-pages is in competition with requirements for my attention from family, working life, and (seriously!) seismic events), Cheers, Michael
> no longer unsustainable for me (man-pages is in competition with
^no longer unsustainable^no longer sustainable
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 11, 2012 at 12:31 PM, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote: >> On Fri, 6 Apr 2012 13:01:17 -0700 >> Andrew Lutomirski <luto@mit.edu> wrote: >> >>> This has been bugging me for awhile. Is there any interest in moving >>> the manpages into the kernel source tree? Then there could be a >>> general requirement that new APIs get documented when they're written. >> >> Man page (or other documentation) requirements for patch acceptance are a >> regular kernel summit feature. People seem to think it's a good idea, but >> actual enforcement of such requirements always seems to be lacking. Lots >> of people have kind of given up trying. I don't really see that adding >> the man pages to the tree would help, but I could be wrong... > > I largely consider this (moving man pages to kernel.org) a technical > solution to what is fundamentally a social problem (developers > reluctant to write documentation), and doubt that the technical > solution would make much difference. I'd love to be proved wrong, but > the experiment would require significant start-up effort. (My > collected thoughts on this can be found here: > http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source. > Note the alternative idea of patch tags mentioned at the end of that > text.) > > Unless, or until there's a paid maintainer, I don't expect things to > get significantly better than what they currently are. The quite > significant improvements in man-pages since 2004, when I became > maintainer were in small part due to the fact that I was for a short > period paid to do the work, but in much larger part due to a huge > private effort over those years which over the last couple of years is > no longer unsustainable for me (man-pages is in competition with > requirements for my attention from family, working life, and > (seriously!) seismic events), Hrm. Maybe someone could convince Andrew and Linus not to pull new syscalls or major ABI features unless the patchset includes full docs. Anyway, I'll write up a detailed description of PR_SET_NO_NEW_PRIVS, stick it in the changelog, and cc linux-doc. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/11/2012 02:31 PM, Michael Kerrisk (man-pages) wrote: > On Sat, Apr 7, 2012 at 8:28 AM, Jonathan Corbet <corbet@lwn.net> wrote: >> On Fri, 6 Apr 2012 13:01:17 -0700 >> Andrew Lutomirski <luto@mit.edu> wrote: >> >>> This has been bugging me for awhile. Is there any interest in moving >>> the manpages into the kernel source tree? Then there could be a >>> general requirement that new APIs get documented when they're written. >> >> Man page (or other documentation) requirements for patch acceptance are a >> regular kernel summit feature. People seem to think it's a good idea, but >> actual enforcement of such requirements always seems to be lacking. Lots >> of people have kind of given up trying. I don't really see that adding >> the man pages to the tree would help, but I could be wrong... > > I largely consider this (moving man pages to kernel.org) a technical > solution to what is fundamentally a social problem (developers > reluctant to write documentation), and doubt that the technical > solution would make much difference. *nod* *nod* > I'd love to be proved wrong, but > the experiment would require significant start-up effort. (My > collected thoughts on this can be found here: > http://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source. > Note the alternative idea of patch tags mentioned at the end of that > text.) > > Unless, or until there's a paid maintainer, I don't expect things to > get significantly better than what they currently are. Maintainer of which, the man pages or the kernel Documentation directory? I just got handed the Documentation ball (right as relatives were visiting and a couple days before buying a new house, so I've just put _tons_ of time into it so far). I have grandiose plans for cleaning it up, but first I need to get my kernel.org account working again. That said, I think having man-pages in the kernel directory is a bad idea, for reasons I already posted to this thread. > The quite > significant improvements in man-pages since 2004, when I became > maintainer were in small part due to the fact that I was for a short > period paid to do the work, but in much larger part due to a huge > private effort over those years which over the last couple of years is > no longer unsustainable for me (man-pages is in competition with > requirements for my attention from family, working life, and > (seriously!) seismic events), Heh, I know the feeling. :) Circa 2007 I was paid to work on documentation for half a year (hence the http://kernel.org/doc directory I stopped being able to update when kernel.org got hacked). These days it competes with my toybox and aboriginal linux projects, and with my day job. The way I'm looking at it is I'm _curating_ documentation. I'm acting as some kind of of librarian, and my first goal is reshuffling the files in Documentation into some semblance of order so you can see what's there. (I've posted about that before here, moving architecture-specific stuff under an arch subdirectory and so on.) I do sometimes write new documentation, but no human being knows everything there is to know about the kernel. Building expertise is enormously time consuming, That said, if anything was going to move into the kernel moving the syscall info into javadoc might make sense. Something that might help you is the syscall mining script snippet I posted last time: find . -name "*.c" -print0 | \ xargs -n1 -0 sed -n -e 's/.*\(SYSCALL_DEFINE[0-9](\)/\1/' \ -e 't got;d;:got;s/).*/)/p;t;N;b got' I might be able to build a script around that which would look up the system call number, figure out which architectures implement this call, find any javadoc at the call site, and so on. That way we could automate this a bit. For example, kernel/fork.c has two syscalls: SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr) /* * unshare allows a process to 'unshare' part of the process * context which was originally shared using clone. copy_* * functions used by do_fork() cannot be used here directly * because they modify an inactive task_struct that is being * constructed. Here we are modifying the current, active, * task_struct. */ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) Both of these have man pages which provide way more info than the comments (if any). Is there any sort of javadoc comment before the syscall that might provide useful information you could automatically harvest? Some sort of standard header briefly defining the syscall? (P.S. Speaking of man 2 unshare, what's with the #define _GNU_SOURCE for a new linux kernel syscall? What the heck does the FSF have to do with anything? This didn't used to be needed in ubuntu 10.04 but then the headers changed to match the man page, which I found sad...) > Cheers, > > Michael Rob
diff --git a/fs/exec.c b/fs/exec.c index c8b63d1..a8451ec 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1245,6 +1245,13 @@ static int check_unsafe_exec(struct linux_binprm *bprm) bprm->unsafe |= LSM_UNSAFE_PTRACE; } + /* + * This isn't strictly necessary, but it makes it harder for LSMs to + * mess up. + */ + if (current->no_new_privs) + bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS; + n_fs = 1; spin_lock(&p->fs->lock); rcu_read_lock(); @@ -1288,7 +1295,8 @@ int prepare_binprm(struct linux_binprm *bprm) bprm->cred->euid = current_euid(); bprm->cred->egid = current_egid(); - if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) { + if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) && + !current->no_new_privs) { /* Set-uid? */ if (mode & S_ISUID) { bprm->per_clear |= PER_CLEAR_ON_SETID; diff --git a/include/linux/prctl.h b/include/linux/prctl.h index e0cfec2..78b76e2 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -124,4 +124,19 @@ #define PR_SET_CHILD_SUBREAPER 36 #define PR_GET_CHILD_SUBREAPER 37 +/* + * If no_new_privs is set, then operations that grant new privileges (i.e. + * execve) will either fail or not grant them. This affects suid/sgid, + * file capabilities, and LSMs. + * + * Operations that merely manipulate or drop existing privileges (setresuid, + * capset, etc.) will still work. Drop those privileges if you want them gone. + * + * Changing LSM security domain is considered a new privilege. So, for example, + * asking selinux for a specific new context (e.g. with runcon) will result + * in execve returning -EPERM. + */ +#define PR_SET_NO_NEW_PRIVS 38 +#define PR_GET_NO_NEW_PRIVS 39 + #endif /* _LINUX_PRCTL_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 81a173c..ba60897 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1341,6 +1341,8 @@ struct task_struct { * execve */ unsigned in_iowait:1; + /* task may not gain privileges */ + unsigned no_new_privs:1; /* Revert to default priority/policy when forking */ unsigned sched_reset_on_fork:1; diff --git a/include/linux/security.h b/include/linux/security.h index 673afbb..6e1dea9 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -144,6 +144,7 @@ struct request_sock; #define LSM_UNSAFE_SHARE 1 #define LSM_UNSAFE_PTRACE 2 #define LSM_UNSAFE_PTRACE_CAP 4 +#define LSM_UNSAFE_NO_NEW_PRIVS 8 #ifdef CONFIG_MMU extern int mmap_min_addr_handler(struct ctl_table *table, int write, diff --git a/kernel/sys.c b/kernel/sys.c index e7006eb..b82568b 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1979,6 +1979,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = put_user(me->signal->is_child_subreaper, (int __user *) arg2); break; + case PR_SET_NO_NEW_PRIVS: + if (arg2 != 1 || arg3 || arg4 || arg5) + return -EINVAL; + + current->no_new_privs = 1; + break; + case PR_GET_NO_NEW_PRIVS: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + return current->no_new_privs ? 1 : 0; default: error = -EINVAL; break; diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c index 6327685..18c88d0 100644 --- a/security/apparmor/domain.c +++ b/security/apparmor/domain.c @@ -360,6 +360,10 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) if (bprm->cred_prepared) return 0; + /* XXX: no_new_privs is not usable with AppArmor yet */ + if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) + return -EPERM; + cxt = bprm->cred->security; BUG_ON(!cxt); diff --git a/security/commoncap.c b/security/commoncap.c index 0cf4b53..edd3918 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -506,14 +506,17 @@ int cap_bprm_set_creds(struct linux_binprm *bprm) skip: /* Don't let someone trace a set[ug]id/setpcap binary with the revised - * credentials unless they have the appropriate permit + * credentials unless they have the appropriate permit. + * + * In addition, if NO_NEW_PRIVS, then ensure we get no new privs. */ if ((new->euid != old->uid || new->egid != old->gid || !cap_issubset(new->cap_permitted, old->cap_permitted)) && bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) { /* downgrade; they get no more than they had, and maybe less */ - if (!capable(CAP_SETUID)) { + if (!capable(CAP_SETUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { new->euid = new->uid; new->egid = new->gid; } diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 3049299..be99c84 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2000,6 +2000,13 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) new_tsec->sid = old_tsec->exec_sid; /* Reset exec SID on execve. */ new_tsec->exec_sid = 0; + + /* + * Minimize confusion: if no_new_privs and a transition is + * explicitly requested, then fail the exec. + */ + if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) + return -EPERM; } else { /* Check for a default transition on this program. */ rc = security_transition_sid(old_tsec->sid, isec->sid, @@ -2012,7 +2019,8 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) COMMON_AUDIT_DATA_INIT(&ad, PATH); ad.u.path = bprm->file->f_path; - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) + if ((bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) new_tsec->sid = old_tsec->sid; if (new_tsec->sid == old_tsec->sid) {