Message ID | m1ocj6qljj.fsf@fess.ebiederm.org |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Quoting Eric W. Biederman (ebiederm@xmission.com): > Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes: > > > Eric W. Biederman [ebiederm@xmission.com] wrote: > > | > > | I think replacing a struct pid for another struct pid allocated in > > | descendant pid_namespace (but has all of the same struct upid values > > | as the first struct pid) is a disastrous idea. It destroys the > > > > True. Sorry, I did not mean we would need a new 'struct pid' for an > > existing process. I think we talked earlier of finding a way of attaching > > additional pid numbers to the same struct pid. > > I just played with this and if you make the semantics of unshare(CLONE_NEWPID) > to be that you become the idle task aka pid 0, and not the init task pid 1 the > implementation is trivial. Heh, and then (browsing through your copy_process() patch hunks) the next forked task becomes the child reaper for the new pidns? <shrug> why not I guess. Now if that child reaper then gets killed, will the idle task get killed too? And if not, then idle task can just re-populating the new pidns with new idle tasks... If this brought us a step closer to entering an existing pidns that would be one thing, but is there actually any advantage to being able to unshare a new pidns? Oh, I guess there is - PAM can then use it at login, which might be neat. -serge -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I just played with this and if you make the semantics of unshare(CLONE_NEWPID) > to be that you become the idle task aka pid 0, and not the init task pid 1 the > implementation is trivial. This is not ... handy - if after enter you have pid 0 you obviously can't perform 2 parallel enters. The way I see it: As far as the numbers reported to the userspace are concerned: 1. task, that enters is still visible by its old parent by old pid 2. task, that enters gets some pid within the entering namespace and reports its parent pid to have pid 1 (init obviously doesn't care) 3. we _can_ try to allocate new pid equal to the old one so that glibc stays happy As far as the pointers are concerned: 1. parent pointer doesn't change 2. task_pid(tsk) one (i.e. struct pid * one) _can_ change if a) we don't allow threads enter (de_thread problem is handeled) b) we don't allow leave the group/session, i.e. check, that there is the only one task that enters lives in its pgid/sid c) we wait for the quiescent state to pass by before destroying the old pid to handle race with sys_kill() Thoughts/questions? ("This is a nasty problem" answer is not acceptable, the real code problems/races please) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Serge E. Hallyn" <serue@us.ibm.com> writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes: >> >> > Eric W. Biederman [ebiederm@xmission.com] wrote: >> > | >> > | I think replacing a struct pid for another struct pid allocated in >> > | descendant pid_namespace (but has all of the same struct upid values >> > | as the first struct pid) is a disastrous idea. It destroys the >> > >> > True. Sorry, I did not mean we would need a new 'struct pid' for an >> > existing process. I think we talked earlier of finding a way of attaching >> > additional pid numbers to the same struct pid. >> >> I just played with this and if you make the semantics of unshare(CLONE_NEWPID) >> to be that you become the idle task aka pid 0, and not the init task pid 1 the >> implementation is trivial. > > Heh, and then (browsing through your copy_process() patch hunks) the next > forked task becomes the child reaper for the new pidns? <shrug> why not > I guess. > > Now if that child reaper then gets killed, will the idle task get killed too? No. > And if not, then idle task can just re-populating the new pidns with new > idle tasks... After zap_pid_namespace interesting... > If this brought us a step closer to entering an existing pidns that would > be one thing, but is there actually any advantage to being able to > unshare a new pidns? Oh, I guess there is - PAM can then use it at > login, which might be neat. I have to say that the semantics of my patch are unworkable for unshare. Unless I am mistaken for PAM to use it requires that the current process fully change and become what it needs to be. Requiring an extra fork to fully complete the process is a problem. Scratch one bright idea. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov <xemul@parallels.com> writes: >> I just played with this and if you make the semantics of unshare(CLONE_NEWPID) >> to be that you become the idle task aka pid 0, and not the init task pid 1 the >> implementation is trivial. > > This is not ... handy - if after enter you have pid 0 you obviously > can't perform 2 parallel enters. 2 parallel enters? I meant you have pid 0 in the entered pid namespace. You have pid 0 because your pid simply does not map. There is nothing that makes to parallel enters impossible in that. Even today we have one thread per cpu that has task->pid == &init_struct_pid which is pid 0. For the case of unshare where we are designed to be used with PAM I don't think my proposed semantics work. For a join needed an extra fork before you are really in the pid namespace should be minor. > The way I see it: > > As far as the numbers reported to the userspace are concerned: > 1. task, that enters is still visible by its old parent by old pid > 2. task, that enters gets some pid within the entering namespace > and reports its parent pid to have pid 1 (init obviously doesn't > care) > 3. we _can_ try to allocate new pid equal to the old one so that > glibc stays happy > > > As far as the pointers are concerned: > 1. parent pointer doesn't change > 2. task_pid(tsk) one (i.e. struct pid * one) _can_ change if > a) we don't allow threads enter (de_thread problem is handeled) > b) we don't allow leave the group/session, i.e. check, that there > is the only one task that enters lives in its pgid/sid > c) we wait for the quiescent state to pass by before destroying > the old pid to handle race with sys_kill() That doesn't handle the case of cached struct pids. A good example is waitpid, where it waits for a specific struct pid. Which means that allocating a new struct pid and changing task->pid will cause waitpid(pid) to wait forever... To change struct pid would require the refcount on struct pid to show no references from anywhere except the task_struct. At the cost of a little memory we can solve that problem for unshare if we have a an extra upid in struct pid, how we verify there is space in struct pid I'm not certain. I do think that at least until someone calls exec the namespace pids are reported to the process itself should not change. That is kill and waitpid etc. Which suggests an implementation the opposite of what I proposed. With ns_of_pid(task_pid(current)) being used as the pid namespace of children, and current->nsproxy->pid_ns not changing in the case of unshare. Shrug. Or perhaps this is a case where we use we can implement join with an extra process but we can't implement unshare, because the effect cannot be immediate. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ebiederm@xmission.com (Eric W. Biederman) writes: > "Serge E. Hallyn" <serue@us.ibm.com> writes: > >> Quoting Eric W. Biederman (ebiederm@xmission.com): >>> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes: >>> >>> > Eric W. Biederman [ebiederm@xmission.com] wrote: >>> > | >>> > | I think replacing a struct pid for another struct pid allocated in >>> > | descendant pid_namespace (but has all of the same struct upid values >>> > | as the first struct pid) is a disastrous idea. It destroys the >>> > >>> > True. Sorry, I did not mean we would need a new 'struct pid' for an >>> > existing process. I think we talked earlier of finding a way of attaching >>> > additional pid numbers to the same struct pid. >>> >>> I just played with this and if you make the semantics of unshare(CLONE_NEWPID) >>> to be that you become the idle task aka pid 0, and not the init task pid 1 the >>> implementation is trivial. >> >> Heh, and then (browsing through your copy_process() patch hunks) the next >> forked task becomes the child reaper for the new pidns? <shrug> why not >> I guess. >> >> Now if that child reaper then gets killed, will the idle task get killed too? > > No. > >> And if not, then idle task can just re-populating the new pidns with new >> idle tasks... > > After zap_pid_namespace interesting... > >> If this brought us a step closer to entering an existing pidns that would >> be one thing, but is there actually any advantage to being able to >> unshare a new pidns? Oh, I guess there is - PAM can then use it at >> login, which might be neat. > > I have to say that the semantics of my patch are unworkable for > unshare. Unless I am mistaken for PAM to use it requires that the > current process fully change and become what it needs to be. > Requiring an extra fork to fully complete the process is a problem. > > Scratch one bright idea. Maybe not. I just looked and in the vast majority of cases the login process goes like this. { setup stuff include pam child = fork(); if (!child) { setuid() exec /bin/bash } waitpid(child); pam and other cleanup } So an unshare of the pid namespace that doesn't really take effect until we fork may actually be usable from pam, and in fact is probably the preferred implementation. It looks like neither openssh nor login from util-linux-ng will cope properly with getting any pid back from wait() except the pid of their child. It looks like they both with terminate. Which means if you login in a new pid namespace (where the unsharing process becomes pid 1) and call nohup everything will get killed and you will be logged out. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 2010-03-04 22:45, Eric W. Biederman wrote: > >So an unshare of the pid namespace that doesn't really take effect >until we fork may actually be usable from pam, and in fact is probably >the preferred implementation. It looks like neither openssh nor login >from util-linux-ng will cope properly with getting any pid back from >wait() except the pid of their child. Correct; I can tell from experience with pam_mount. GDM for example is very unhappy if you fork/exit processes in PAM modules and don't hide the fact by bending SIGCHLD from gdm_handler to mypam_handler (which itself is racy, suppose GDM re-set the SIGCHLD handler midway through). (In this particular case however, I'd prefer if login programs like GDM just ignored any PIDs they did not spawn in the first place instead of moaning around.) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> 2 parallel enters? I meant you have pid 0 in the entered pid namespace. > You have pid 0 because your pid simply does not map. Oh, I see. > There is nothing that makes to parallel enters impossible in that. > Even today we have one thread per cpu that has task->pid == &init_struct_pid > which is pid 0. How about the forked processes then? Who will be their parent? > For the case of unshare where we are designed to be used with PAM I don't > think my proposed semantics work. For a join needed an extra fork before > you are really in the pid namespace should be minor. Hm... One more proposal - can we adopt the planned new fork_with_pids system call to fork the process right into a new pid namespace? > That doesn't handle the case of cached struct pids. A good example is > waitpid, where it waits for a specific struct pid. Which means that > allocating a new struct pid and changing task->pid will cause > waitpid(pid) to wait forever... OK. Good example. Thanks. > To change struct pid would require the refcount on struct pid to show > no references from anywhere except the task_struct. I think this is OK to return -EBUSY for this. And fix the waitpid respectively not to block this common case. All the others I think can be stayed as is. > At the cost of a little memory we can solve that problem for unshare > if we have a an extra upid in struct pid, how we verify there is space > in struct pid I'm not certain. > > I do think that at least until someone calls exec the namespace pids are > reported to the process itself should not change. That is kill and Wait a second - in that case the wait will be blocked too! No? > waitpid etc. Which suggests an implementation the opposite of what > I proposed. With ns_of_pid(task_pid(current)) being used as the > pid namespace of children, and current->nsproxy->pid_ns not changing > in the case of unshare. > > Shrug. > > Or perhaps this is a case where we use we can implement join with > an extra process but we can't implement unshare, because the effect > cannot be immediate. Well, I'm talking only about the join now. > Eric > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov <xemul@parallels.com> writes: >> 2 parallel enters? I meant you have pid 0 in the entered pid namespace. >> You have pid 0 because your pid simply does not map. > > Oh, I see. > >> There is nothing that makes to parallel enters impossible in that. >> Even today we have one thread per cpu that has task->pid == &init_struct_pid >> which is pid 0. > > How about the forked processes then? Who will be their parent? The normal rules of parentage apply. So the child will see simply see it's parent as ppid == 0. If that child daemonizes it will become a child of the pid namespaces init. This is a lot like something that gets started from call_usermodehelper. It's parent process is not a descendant of init either. The implementation of the join is to simply change current->nsproxy->pid_ns. Then to use it you simply fork to get a child in the target pid namespace. >> For the case of unshare where we are designed to be used with PAM I don't >> think my proposed semantics work. For a join needed an extra fork before >> you are really in the pid namespace should be minor. > > Hm... One more proposal - can we adopt the planned new fork_with_pids system > call to fork the process right into a new pid namespace? In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I don't think anything I am doing fundamentally undermines it. The use case of doing things in fork is that there is automatic inheritance of everything. All of the namespaces and all of the control groups, and possibly also the parent process. It does have the high cost that the process we are copying from must be stopped because there are no locks that let us take everything. I haven't looked at the recent proposals to see if anyone has solved that problem cleanly. If we can do a sys_hijack/sys_cloneat style of join, that means we can afford a fork. At which point the my proposed pid namespace semantics should be fine. aka: setns(NSTYPE_PID); pid = fork(); if (pid == 0) { getpid() == 2; /* Or whatever the first free pid is joined pid namespace */ getppid() == 0; } else { pid == 6400; /* Or whatever the first free pid is in the original pid namespace */ waitpid(pid); } >> That doesn't handle the case of cached struct pids. A good example is >> waitpid, where it waits for a specific struct pid. Which means that >> allocating a new struct pid and changing task->pid will cause >> waitpid(pid) to wait forever... > > OK. Good example. Thanks. > >> To change struct pid would require the refcount on struct pid to show >> no references from anywhere except the task_struct. > > I think this is OK to return -EBUSY for this. And fix the waitpid > respectively not to block this common case. All the others I think > can be stayed as is. That would probably work. setsid() and setpgrp() have similar sorts of restrictions. That is both more challenging and more limiting than the semantics that come out of my unshare(CLONE_NEWPID) patch. So I would prefer to keep this sort of thing as a last resort. >> At the cost of a little memory we can solve that problem for unshare >> if we have a an extra upid in struct pid, how we verify there is space >> in struct pid I'm not certain. >> >> I do think that at least until someone calls exec the namespace pids are >> reported to the process itself should not change. That is kill and > > Wait a second - in that case the wait will be blocked too! No? If all we do is populate an unused struct upid in struct pid there isn't a chance of a problem. >> waitpid etc. Which suggests an implementation the opposite of what >> I proposed. With ns_of_pid(task_pid(current)) being used as the >> pid namespace of children, and current->nsproxy->pid_ns not changing >> in the case of unshare. >> >> Shrug. >> >> Or perhaps this is a case where we use we can implement join with >> an extra process but we can't implement unshare, because the effect >> cannot be immediate. > > Well, I'm talking only about the join now. Overall it sounds like the semantics I have proposed with unshare(CLONE_NEWPID) are workable, and simple to implement. The extra fork is a bit surprising but it certainly does not look like a show stopper for implementing a pid namespace join. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > Pavel Emelyanov <xemul@parallels.com> writes: > > >>> 2 parallel enters? I meant you have pid 0 in the entered pid namespace. >>> You have pid 0 because your pid simply does not map. >>> >> Oh, I see. >> >> >>> There is nothing that makes to parallel enters impossible in that. >>> Even today we have one thread per cpu that has task->pid == &init_struct_pid >>> which is pid 0. >>> >> How about the forked processes then? Who will be their parent? >> > > The normal rules of parentage apply. So the child will see simply > see it's parent as ppid == 0. If that child daemonizes it will become > a child of the pid namespaces init. > > This is a lot like something that gets started from call_usermodehelper. It's > parent process is not a descendant of init either. > > > The implementation of the join is to simply change current->nsproxy->pid_ns. > Then to use it you simply fork to get a child in the target pid namespace. > If the normal rules of parentage apply, that means pid 0 has to wait it's child. If we are in the scenario of pid 0, it's child pid 1234 and we kill the pid 1 of the pid namespace, I suppose pid 1234 will be killed too. The pid 0 will stay in the pid namespace and will able to fork again a new pid 1. I think Serge already reported that... That sounds good :) >>> For the case of unshare where we are designed to be used with PAM I don't >>> think my proposed semantics work. For a join needed an extra fork before >>> you are really in the pid namespace should be minor. >>> >> Hm... One more proposal - can we adopt the planned new fork_with_pids system >> call to fork the process right into a new pid namespace? >> > > In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I > don't think anything I am doing fundamentally undermines it. The use > case of doing things in fork is that there is automatic inheritance of > everything. All of the namespaces and all of the control groups, and > possibly also the parent process. And also the rootfs for executing the command inside the container (eg. shutdown), the uid/gid (if there is a user namespace), the mount points, ... But I suppose we can do the same with setns for all the namespaces and chrooting within the container rootfs. What I see is a problem with the tty. For example, we cloneat the init process of the container which is usually /sbin/init but this one has its tty mapped to /dev/console, so the output of the exec'ed command will go to the console. > It does have the high cost that the > process we are copying from must be stopped because there are no locks > that let us take everything. I haven't looked at the recent proposals > to see if anyone has solved that problem cleanly. > Right. > If we can do a sys_hijack/sys_cloneat style of join, that means we can > afford a fork. At which point the my proposed pid namespace semantics > should be fine. > > aka: > setns(NSTYPE_PID); > pid = fork(); > if (pid == 0) { > getpid() == 2; /* Or whatever the first free pid is joined pid namespace */ > getppid() == 0; > } else { > pid == 6400; /* Or whatever the first free pid is in the original pid namespace */ > waitpid(pid); > } > > >>> That doesn't handle the case of cached struct pids. A good example is >>> waitpid, where it waits for a specific struct pid. Which means that >>> allocating a new struct pid and changing task->pid will cause >>> waitpid(pid) to wait forever... >>> >> OK. Good example. Thanks. >> >> >>> To change struct pid would require the refcount on struct pid to show >>> no references from anywhere except the task_struct. >>> >> I think this is OK to return -EBUSY for this. And fix the waitpid >> respectively not to block this common case. All the others I think >> can be stayed as is. >> > > That would probably work. setsid() and setpgrp() have similar sorts > of restrictions. That is both more challenging and more limiting than > the semantics that come out of my unshare(CLONE_NEWPID) patch. So I > would prefer to keep this sort of thing as a last resort. > > >>> At the cost of a little memory we can solve that problem for unshare >>> if we have a an extra upid in struct pid, how we verify there is space >>> in struct pid I'm not certain. >>> >>> I do think that at least until someone calls exec the namespace pids are >>> reported to the process itself should not change. That is kill and >>> >> Wait a second - in that case the wait will be blocked too! No? >> > > If all we do is populate an unused struct upid in struct pid there > isn't a chance of a problem. > > >>> waitpid etc. Which suggests an implementation the opposite of what >>> I proposed. With ns_of_pid(task_pid(current)) being used as the >>> pid namespace of children, and current->nsproxy->pid_ns not changing >>> in the case of unshare. >>> >>> Shrug. >>> >>> Or perhaps this is a case where we use we can implement join with >>> an extra process but we can't implement unshare, because the effect >>> cannot be immediate. >>> >> Well, I'm talking only about the join now. >> > > Overall it sounds like the semantics I have proposed with > unshare(CLONE_NEWPID) are workable, and simple to implement. The > extra fork is a bit surprising but it certainly does not > look like a show stopper for implementing a pid namespace join. > I agree, it's some kind of "ghost" process. IMO, with a bit of userspace code it would be possible to enter or exec a command inside a container with nsfd, setns. +1 to test your patchset Eric :) Just a mindless suggestion, the "nsopen" / "nsattach" syscall names should be more clear no ? Jumping back, one question about the nsfd and the poll for waiting the end of the namespace. If we have an openened file descriptor on a specific namespace, we grab a reference on this one, so the namespace won't be destroyed until we close the fd which is used to poll the end of the namespace, no ? Did I miss something ? Thanks -- Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel Lezcano <daniel.lezcano@free.fr> writes: > Eric W. Biederman wrote: > If the normal rules of parentage apply, that means pid 0 has to wait it's child. > If we are in the scenario of pid 0, it's child pid 1234 and we kill the pid 1 of > the pid namespace, I suppose pid 1234 will be killed too. > The pid 0 will stay in the pid namespace and will able to fork again a new pid > 1. > > I think Serge already reported that... > > That sounds good :) I expect zap_pid_ns_processes should also arrange so we cannot allocate any more processes. We certainly need to do something explicit or pid 1 won't be allocated. It might make sense to resurrect a pid namespace after it's death but it is definitely weird. >> In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I >> don't think anything I am doing fundamentally undermines it. The use >> case of doing things in fork is that there is automatic inheritance of >> everything. All of the namespaces and all of the control groups, and >> possibly also the parent process. > And also the rootfs for executing the command inside the container > (eg. shutdown), the uid/gid (if there is a user namespace), the mount points, > ... > But I suppose we can do the same with setns for all the namespaces and chrooting > within the container rootfs. > > What I see is a problem with the tty. For example, we cloneat the init process > of the container which is usually /sbin/init but this one has its tty mapped to > /dev/console, so the output of the exec'ed command will go to the console. My original thinking was that the fd's would come from the caller of sys_cloneat.... >> Overall it sounds like the semantics I have proposed with >> unshare(CLONE_NEWPID) are workable, and simple to implement. The >> extra fork is a bit surprising but it certainly does not >> look like a show stopper for implementing a pid namespace join. >> > I agree, it's some kind of "ghost" process. > IMO, with a bit of userspace code it would be possible to enter or exec a > command inside a container with nsfd, setns. > > +1 to test your patchset Eric :) I will see about reposting sometime soon. > Just a mindless suggestion, the "nsopen" / "nsattach" syscall names should be > more clear no ? Not bad suggestions. I am going to explore a bit more. Given that nsfd is using the same permission checks as a proc file, I think I can just make it a proc file. Something like "/proc/<pid>/ns/net". With a little luck that won't suck too badly. > Jumping back, one question about the nsfd and the poll for waiting the end of > the namespace. > If we have an openened file descriptor on a specific namespace, we grab a > reference on this one, so the namespace won't be destroyed until we close the fd > which is used to poll the end of the namespace, no ? Did I miss something ? Not really. The assumption was that there would be a very similar file descriptor that we could use with poll. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > Daniel Lezcano <daniel.lezcano@free.fr> writes: > > >> Eric W. Biederman wrote: >> > > >> If the normal rules of parentage apply, that means pid 0 has to wait it's child. >> If we are in the scenario of pid 0, it's child pid 1234 and we kill the pid 1 of >> the pid namespace, I suppose pid 1234 will be killed too. >> The pid 0 will stay in the pid namespace and will able to fork again a new pid >> 1. >> >> I think Serge already reported that... >> >> That sounds good :) >> > > I expect zap_pid_ns_processes should also arrange so we cannot allocate any > more processes. We certainly need to do something explicit or pid 1 won't > be allocated. It might make sense to resurrect a pid namespace after it's > death but it is definitely weird. > Mmh, yes. But that was just an idea, maybe a bit out of the scope you are aiming. >>> In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I >>> don't think anything I am doing fundamentally undermines it. The use >>> case of doing things in fork is that there is automatic inheritance of >>> everything. All of the namespaces and all of the control groups, and >>> possibly also the parent process. >>> >> And also the rootfs for executing the command inside the container >> (eg. shutdown), the uid/gid (if there is a user namespace), the mount points, >> ... >> But I suppose we can do the same with setns for all the namespaces and chrooting >> within the container rootfs. >> >> What I see is a problem with the tty. For example, we cloneat the init process >> of the container which is usually /sbin/init but this one has its tty mapped to >> /dev/console, so the output of the exec'ed command will go to the console. >> > > My original thinking was that the fd's would come from the caller of sys_cloneat.... Oh, ok :s >>> Overall it sounds like the semantics I have proposed with >>> unshare(CLONE_NEWPID) are workable, and simple to implement. The >>> extra fork is a bit surprising but it certainly does not >>> look like a show stopper for implementing a pid namespace join. >>> >>> >> I agree, it's some kind of "ghost" process. >> IMO, with a bit of userspace code it would be possible to enter or exec a >> command inside a container with nsfd, setns. >> >> +1 to test your patchset Eric :) >> > > I will see about reposting sometime soon. > Great ! thanks. >> Just a mindless suggestion, the "nsopen" / "nsattach" syscall names should be >> more clear no ? >> > > Not bad suggestions. > > I am going to explore a bit more. Given that nsfd is using the same > permission checks as a proc file, I think I can just make it a proc > file. Something like "/proc/<pid>/ns/net". With a little luck that > won't suck too badly. > Ah ! yes. Good idea. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have take an snapshot of my development tree and placed it at. git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git >> I am going to explore a bit more. Given that nsfd is using the same >> permission checks as a proc file, I think I can just make it a proc >> file. Something like "/proc/<pid>/ns/net". With a little luck that >> won't suck too badly. >> > Ah ! yes. Good idea. It is a hair more code to use proc files but nothing worth counting. Probably the biggest thing I am aware of right now in my development tree is in getting uids to pass properly between unix domain sockets I would up writing this cred_to_ucred function. Serge can you take a look and check my logic, and do you have any idea of where we should place something like pid_vnr but for the uid namespace? void cred_to_ucred(struct pid *pid, const struct cred *cred, struct ucred *ucred) { ucred->pid = pid_vnr(pid); ucred->uid = ucred->gid = -1; if (cred) { struct user_namespace *cred_ns = cred->user->user_ns; struct user_namespace *current_ns = current_user_ns(); struct user_namespace *tmp; if (likely(cred_ns == current_ns)) { ucred->uid = cred->euid; ucred->gid = cred->egid; } else { /* Is cred in a child user namespace */ tmp = cred_ns; do { tmp = tmp->creator->user_ns; if (tmp == current_ns) { ucred->uid = tmp->creator->uid; ucred->gid = overflowgid; return; } } while (tmp != &init_user_ns); /* Is cred the creator of my user namespace, * or the creator of one of it's parents? */ for( tmp = current_ns; tmp != &init_user_ns; tmp = tmp->creator->user_ns) { if (cred->user == tmp->creator) { ucred->uid = 0; ucred->gid = 0; return; } } /* No user namespace relationship so no mapping */ ucred->uid = overflowuid; ucred->gid = overflowgid; } } } Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > I have take an snapshot of my development tree and placed it at. > > > git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git > Hi Eric, thanks for the pointer. I tried to boot the kernel under qemu and I got this oops: Loading /lib/kbd/keymaps/i386/azerty/fr.map Creating block device nodes. Creating character device nodes. Making device-mapper control node BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff812df7e7>] netlink_broadcast+0x1bd/0x384 PGD 3cfd0067 PUD 3cfc1067 PMD 0 Oops: 0002 [#1] DEBUG_PAGEALLOC last sysfs file: /sys/class/firmware/timeout CPU 0 Pid: 841, comm: modprobe Not tainted 2.6.33 #1 / RIP: 0010:[<ffffffff812df7e7>] [<ffffffff812df7e7>] netlink_broadcast+0x1bd/0x384 RSP: 0018:ffff88003cfc3ca8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88003ce947f0 RCX: ffff88003cf877f0 RDX: ffff88003f939dd0 RSI: ffff88003f939ef0 RDI: ffff88003f939e00 RBP: ffff88003cfc3d18 R08: ffff88003f939d98 R09: ffff88003f939e88 R10: ffff88003cf87818 R11: 0000000000000286 R12: ffff88003f939d98 R13: ffff88003f939e88 R14: ffff88003ce94818 R15: ffff88003ce94800 FS: 00007f23a90a06f0(0000) GS:ffffffff8161b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000003cfcd000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 841, threadinfo ffff88003cfc2000, task ffff88003d1db058) Stack: 0000000000000270 00000000000000d0 ffff88003f9377f0 ffffffff8203d630 <0> ffff88003f939f5c 0000000000000000 0000000000000001 0000000000000000 <0> ffff88003cf03000 0000000000000020 0000000000000004 ffff88003f939e88 Call Trace: [<ffffffff8121c1bd>] kobject_uevent_env+0x414/0x59b [<ffffffff81117432>] ? sysfs_create_file+0x25/0x27 [<ffffffff8105eb13>] ? module_add_modinfo_attrs+0xd6/0xfc [<ffffffff8121c34f>] kobject_uevent+0xb/0xd [<ffffffff8105eba4>] mod_sysfs_setup+0x6b/0x99 [<ffffffff810602aa>] load_module+0x12a2/0x16f1 [<ffffffff81060b34>] sys_init_module+0x60/0x230 [<ffffffff81002928>] system_call_fastpath+0x16/0x1b Code: 00 ff c8 74 2a 8b 75 98 4c 89 ef e8 22 f8 fd ff 48 85 c0 49 89 c4 74 3c 48 8d 50 38 48 8b 40 38 48 85 c0 74 0 2 ff 00 48 8b 42 08 <ff> 00 eb 4f 48 8b 55 b0 ff 02 49 83 7d 10 00 74 08 4c 89 ef e8 RIP [<ffffffff812df7e7>] netlink_broadcast+0x1bd/0x384 RSP <ffff88003cfc3ca8> CR2: 0000000000000000 ---[ end trace 979d62b87f68fab3 ]--- netlink_recvmsg: missing NETLINK_CB proto: 15 pid: 0 dsg_group: 1 destructor = (null) nlmsghdr: len=1080321121 type=27951 flags=646f seq=795176053 pid=1769169779 I will try to investigate further later - kid duty :) -- Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Eric W. Biederman (ebiederm@xmission.com): > > I have take an snapshot of my development tree and placed it at. > > > git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git > > > >> I am going to explore a bit more. Given that nsfd is using the same > >> permission checks as a proc file, I think I can just make it a proc > >> file. Something like "/proc/<pid>/ns/net". With a little luck that > >> won't suck too badly. > >> > > Ah ! yes. Good idea. > > It is a hair more code to use proc files but nothing worth counting. > > Probably the biggest thing I am aware of right now in my development > tree is in getting uids to pass properly between unix domain sockets > I would up writing this cred_to_ucred function. > > Serge can you take a look and check my logic, and do you have > any idea of where we should place something like pid_vnr but > for the uid namespace? Well my first thought was user_namespace, but I'm thinking kernel/cred.c is the best place for it. > void cred_to_ucred(struct pid *pid, const struct cred *cred, > struct ucred *ucred) > { > ucred->pid = pid_vnr(pid); > ucred->uid = ucred->gid = -1; > if (cred) { > struct user_namespace *cred_ns = cred->user->user_ns; > struct user_namespace *current_ns = current_user_ns(); > struct user_namespace *tmp; > > if (likely(cred_ns == current_ns)) { > ucred->uid = cred->euid; > ucred->gid = cred->egid; > } else { > /* Is cred in a child user namespace */ > tmp = cred_ns; > do { > tmp = tmp->creator->user_ns; > if (tmp == current_ns) { Hmm, I think you want to catch one level up - so the creator itself is in current_user_ns, so do { if (tmp->creator->user_ns == current_ns) { ucred->uid = tmp->creator->uid; ucred->gid = tmp->creator_gid; return; } tmp = tmp->creator->user_ns; } while (tmp != &init_user_ns); > ucred->uid = tmp->creator->uid; > ucred->gid = overflowgid; should we start recording a user_ns->creator_gid instead? > return; > } > } while (tmp != &init_user_ns); > > /* Is cred the creator of my user namespace, > * or the creator of one of it's parents? > */ > for( tmp = current_ns; tmp != &init_user_ns; > tmp = tmp->creator->user_ns) { > if (cred->user == tmp->creator) { > ucred->uid = 0; > ucred->gid = 0; > return; > } > } That looks right. > /* No user namespace relationship so no mapping */ > ucred->uid = overflowuid; > ucred->gid = overflowgid; > } > } > } > > Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Serge E. Hallyn" <serue@us.ibm.com> writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> >> I have take an snapshot of my development tree and placed it at. >> >> >> git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git >> >> >> >> I am going to explore a bit more. Given that nsfd is using the same >> >> permission checks as a proc file, I think I can just make it a proc >> >> file. Something like "/proc/<pid>/ns/net". With a little luck that >> >> won't suck too badly. >> >> >> > Ah ! yes. Good idea. >> >> It is a hair more code to use proc files but nothing worth counting. >> >> Probably the biggest thing I am aware of right now in my development >> tree is in getting uids to pass properly between unix domain sockets >> I would up writing this cred_to_ucred function. >> >> Serge can you take a look and check my logic, and do you have >> any idea of where we should place something like pid_vnr but >> for the uid namespace? > > Well my first thought was user_namespace, but I'm thinking kernel/cred.c is > the best place for it. Thanks. >> void cred_to_ucred(struct pid *pid, const struct cred *cred, >> struct ucred *ucred) >> { >> ucred->pid = pid_vnr(pid); >> ucred->uid = ucred->gid = -1; >> if (cred) { >> struct user_namespace *cred_ns = cred->user->user_ns; >> struct user_namespace *current_ns = current_user_ns(); >> struct user_namespace *tmp; >> >> if (likely(cred_ns == current_ns)) { >> ucred->uid = cred->euid; >> ucred->gid = cred->egid; >> } else { >> /* Is cred in a child user namespace */ >> tmp = cred_ns; >> do { >> tmp = tmp->creator->user_ns; >> if (tmp == current_ns) { > > Hmm, I think you want to catch one level up - so the creator itself > is in current_user_ns, so > > do { > if (tmp->creator->user_ns == current_ns) { > ucred->uid = tmp->creator->uid; > ucred->gid = tmp->creator_gid; > return; > } > tmp = tmp->creator->user_ns; > } while (tmp != &init_user_ns); Good catch. >> ucred->uid = tmp->creator->uid; >> ucred->gid = overflowgid; > > should we start recording a user_ns->creator_gid > instead? I had a similar question. Possibly we can just grab the creators cred. >> return; >> } >> } while (tmp != &init_user_ns); >> >> /* Is cred the creator of my user namespace, >> * or the creator of one of it's parents? >> */ >> for( tmp = current_ns; tmp != &init_user_ns; >> tmp = tmp->creator->user_ns) { >> if (cred->user == tmp->creator) { >> ucred->uid = 0; >> ucred->gid = 0; >> return; >> } >> } > > That looks right. > >> /* No user namespace relationship so no mapping */ >> ucred->uid = overflowuid; >> ucred->gid = overflowgid; >> } >> } >> } Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serue@us.ibm.com> writes: > > > Quoting Eric W. Biederman (ebiederm@xmission.com): > >> > >> I have take an snapshot of my development tree and placed it at. > >> > >> > >> git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git > >> > >> > >> >> I am going to explore a bit more. Given that nsfd is using the same > >> >> permission checks as a proc file, I think I can just make it a proc > >> >> file. Something like "/proc/<pid>/ns/net". With a little luck that > >> >> won't suck too badly. > >> >> > >> > Ah ! yes. Good idea. > >> > >> It is a hair more code to use proc files but nothing worth counting. > >> > >> Probably the biggest thing I am aware of right now in my development > >> tree is in getting uids to pass properly between unix domain sockets > >> I would up writing this cred_to_ucred function. > >> > >> Serge can you take a look and check my logic, and do you have > >> any idea of where we should place something like pid_vnr but > >> for the uid namespace? > > > > Well my first thought was user_namespace, but I'm thinking kernel/cred.c is > > the best place for it. > > Thanks. > > >> void cred_to_ucred(struct pid *pid, const struct cred *cred, > >> struct ucred *ucred) > >> { > >> ucred->pid = pid_vnr(pid); > >> ucred->uid = ucred->gid = -1; > >> if (cred) { > >> struct user_namespace *cred_ns = cred->user->user_ns; > >> struct user_namespace *current_ns = current_user_ns(); > >> struct user_namespace *tmp; > >> > >> if (likely(cred_ns == current_ns)) { > >> ucred->uid = cred->euid; > >> ucred->gid = cred->egid; > >> } else { > >> /* Is cred in a child user namespace */ > >> tmp = cred_ns; > >> do { > >> tmp = tmp->creator->user_ns; > >> if (tmp == current_ns) { > > > > Hmm, I think you want to catch one level up - so the creator itself > > is in current_user_ns, so > > > > > do { > > if (tmp->creator->user_ns == current_ns) { > > ucred->uid = tmp->creator->uid; > > ucred->gid = tmp->creator_gid; > > return; > > } > > tmp = tmp->creator->user_ns; > > } while (tmp != &init_user_ns); > > Good catch. > > >> ucred->uid = tmp->creator->uid; > >> ucred->gid = overflowgid; > > > > should we start recording a user_ns->creator_gid > > instead? > > I had a similar question. Possibly we can just grab the creators cred. Oh, yeah, make user_ns->creator a cred, excellent idea - then we have the LSM and capability fields cached as well. -serge -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index 4678078..b7f2026 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -1094,7 +1094,7 @@ static int show_spu_loadavg(struct seq_file *s, void *private) LOAD_INT(c), LOAD_FRAC(c), count_active_contexts(), atomic_read(&nr_spu_contexts), - current->nsproxy->pid_ns->last_pid); + task_active_pid_ns(current)->last_pid); return 0; } diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c index 3b3c366..4e6985e 100644 --- a/arch/um/drivers/mconsole_kern.c +++ b/arch/um/drivers/mconsole_kern.c @@ -125,7 +125,7 @@ void mconsole_log(struct mc_request *req) void mconsole_proc(struct mc_request *req) { struct nameidata nd; - struct vfsmount *mnt = current->nsproxy->pid_ns->proc_mnt; + struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt; struct file *file; int n, err; char *ptr = req->request.data, *buf; diff --git a/fs/proc/root.c b/fs/proc/root.c index b080b79..fbcd3f8 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -57,7 +57,7 @@ static int proc_get_sb(struct file_system_type *fs_type, if (flags & MS_KERNMOUNT) ns = (struct pid_namespace *)data; else - ns = current->nsproxy->pid_ns; + ns = task_active_pid_ns(current); sb = sget(fs_type, proc_test_super, proc_set_super, ns); if (IS_ERR(sb)) diff --git a/init/main.c b/init/main.c index 4cb47a1..67e40fc 100644 --- a/init/main.c +++ b/init/main.c @@ -851,15 +851,6 @@ static int __init kernel_init(void * unused) * init can run on any cpu. */ set_cpus_allowed_ptr(current, cpu_all_mask); - /* - * Tell the world that we're going to be the grim - * reaper of innocent orphaned children. - * - * We don't want people to have to make incorrect - * assumptions about where in the task array this - * can be found. - */ - init_pid_ns.child_reaper = current; cad_pid = task_pid(current); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index aa3bee5..737d2eb 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2453,7 +2453,7 @@ static struct cgroup_pidlist *cgroup_pidlist_find(struct cgroup *cgrp, { struct cgroup_pidlist *l; /* don't need task_nsproxy() if we're looking at ourself */ - struct pid_namespace *ns = get_pid_ns(current->nsproxy->pid_ns); + struct pid_namespace *ns = get_pid_ns(task_active_pid_ns(current)); /* * We can't drop the pidlist_mutex before taking the l->mutex in case * the last ref-holder is trying to remove l from the list at the same diff --git a/kernel/fork.c b/kernel/fork.c index f88bd98..832c035 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1172,7 +1172,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (!pid) goto bad_fork_cleanup_io; - if (clone_flags & CLONE_NEWPID) { + if (pid->numbers[pid->level].nr == 1) { retval = pid_ns_prepare_proc(p->nsproxy->pid_ns); if (retval < 0) goto bad_fork_free_pid; @@ -1279,7 +1279,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, tracehook_finish_clone(p, clone_flags, trace); if (thread_group_leader(p)) { - if (clone_flags & CLONE_NEWPID) + if (pid->numbers[pid->level].nr == 1) p->nsproxy->pid_ns->child_reaper = p; p->signal->leader_pid = pid; @@ -1539,10 +1539,19 @@ static void check_unshare_flags(unsigned long *flags_ptr) *flags_ptr |= CLONE_THREAD; /* + * If unsharing the pid namespace and the task was created + * using CLONE_THREAD, then must unshare the thread. + */ + if ((*flags_ptr & CLONE_NEWPID) && + (atomic_read(¤t->signal->count) > 1)) + *flags_ptr |= CLONE_THREAD; + + /* * If unsharing namespace, must also unshare filesystem information. */ if (*flags_ptr & CLONE_NEWNS) *flags_ptr |= CLONE_FS; + } /* @@ -1647,7 +1656,8 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) err = -EINVAL; if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND| CLONE_VM|CLONE_FILES|CLONE_SYSVSEM| - CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET)) + CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET| + CLONE_NEWPID)) goto bad_unshare_out; /* diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index e3be4ef..1d023d5 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -173,7 +173,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags, int err = 0; if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | - CLONE_NEWNET))) + CLONE_NEWNET | CLONE_NEWPID))) return 0; if (!capable(CAP_SYS_ADMIN)) diff --git a/kernel/perf_event.c b/kernel/perf_event.c index 2ae7409..74865cd 100644 --- a/kernel/perf_event.c +++ b/kernel/perf_event.c @@ -4436,7 +4436,7 @@ perf_event_alloc(struct perf_event_attr *attr, event->parent = parent_event; - event->ns = get_pid_ns(current->nsproxy->pid_ns); + event->ns = get_pid_ns(task_active_pid_ns(current)); event->id = atomic64_inc_return(&perf_event_id); event->state = PERF_EVENT_STATE_INACTIVE; diff --git a/kernel/pid.c b/kernel/pid.c index 2e17c9c..6b64a82 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -305,7 +305,7 @@ EXPORT_SYMBOL_GPL(find_pid_ns); struct pid *find_vpid(int nr) { - return find_pid_ns(nr, current->nsproxy->pid_ns); + return find_pid_ns(nr, task_active_pid_ns(current)); } EXPORT_SYMBOL_GPL(find_vpid); @@ -385,7 +385,7 @@ struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns) struct task_struct *find_task_by_vpid(pid_t vnr) { - return find_task_by_pid_ns(vnr, current->nsproxy->pid_ns); + return find_task_by_pid_ns(vnr, task_active_pid_ns(current)); } struct pid *get_task_pid(struct task_struct *task, enum pid_type type) @@ -437,7 +437,7 @@ pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns) pid_t pid_vnr(struct pid *pid) { - return pid_nr_ns(pid, current->nsproxy->pid_ns); + return pid_nr_ns(pid, task_active_pid_ns(current)); } EXPORT_SYMBOL_GPL(pid_vnr); @@ -448,7 +448,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type, rcu_read_lock(); if (!ns) - ns = current->nsproxy->pid_ns; + ns = task_active_pid_ns(current); if (likely(pid_alive(task))) { if (type != PIDTYPE_PID) task = task->group_leader; diff --git a/kernel/signal.c b/kernel/signal.c index 934ae5e..885b699 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1438,16 +1438,15 @@ int do_notify_parent(struct task_struct *tsk, int sig) * we are under tasklist_lock here so our parent is tied to * us and cannot exit and release its namespace. * - * the only it can is to switch its nsproxy with sys_unshare, - * bu uncharing pid namespaces is not allowed, so we'll always - * see relevant namespace + * The only it can is to switch its nsproxy with sys_unshare, + * but we use the pid_namespace for task_pid which never changes. * * write_lock() currently calls preempt_disable() which is the * same as rcu_read_lock(), but according to Oleg, this is not * correct to rely on this */ rcu_read_lock(); - info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns); + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(tsk->parent)); info.si_uid = __task_cred(tsk)->uid; rcu_read_unlock(); @@ -1518,7 +1517,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why) * see comment in do_notify_parent() abot the following 3 lines */ rcu_read_lock(); - info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns); + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent)); info.si_uid = __task_cred(tsk)->uid; rcu_read_unlock(); diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index 8f5d16e..1e4da59 100644 --- a/kernel/sysctl_binary.c +++ b/kernel/sysctl_binary.c @@ -1356,7 +1356,7 @@ static ssize_t binary_sysctl(const int *name, int nlen, goto out_putname; } - mnt = current->nsproxy->pid_ns->proc_mnt; + mnt = task_active_pid_ns(current)->proc_mnt; result = vfs_path_lookup(mnt->mnt_root, mnt, pathname, 0, &nd); if (result) goto out_putname;