diff mbox

[repost] sched: export sched_set/getaffinity to modules

Message ID 1280166688.3375.5.camel@localhost
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Sridhar Samudrala July 26, 2010, 5:51 p.m. UTC
On Mon, 2010-07-26 at 20:12 +0300, Michael S. Tsirkin wrote:
> On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote:
> > On 07/02, Peter Zijlstra wrote:
> > >
> > > On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> > > >
> > > >  Does  it (Tejun's kthread_clone() patch) also  inherit the
> > > > cgroup of the caller?
> > >
> > > Of course, its a simple do_fork() which inherits everything just as you
> > > would expect from a similar sys_clone()/sys_fork() call.
> > 
> > Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
> > from ioctl(), right?
> > 
> > Then the new thread becomes the natural child of the caller, and it shares
> > ->mm with the parent. And files, dup_fd() without CLONE_FS.
> > 
> > Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
> > TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
> > just because the parent gets SIGQUIT or abother coredumpable signal.
> > Or the new thread can recieve SIGSTOP via ^Z.
> > 
> > Perhaps this is OK, I do not know. Just to remind that kernel_thread()
> > is merely clone(CLONE_VM).
> > 
> > Oleg.
> 
> With some machinery to stop it later, yes.
> Oleg, how does the below look to you?
> 
> Here I explicitly drop the fds so we don't share them.
> CLONE_VM takes care of sharing the mm I think.
> About signals - for the vhost-net use this is OK as we use
> uninterruptible sleep anyway (like the new kthread_worker does).
> 
> This code seems to work fine for me so far - any comments?
> 
> ---
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index aabc8a1..72c7b17 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -9,6 +9,11 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
>  				   const char namefmt[], ...)
>  	__attribute__((format(printf, 3, 4)));
> 
> +struct task_struct *kthread_create_inherit(int (*threadfn)(void *data),
> +					   void *data,
> +					   const char namefmt[], ...)
> +	__attribute__((format(printf, 3, 4)));
> +
>  /**
>   * kthread_run - create and wake a thread.
>   * @threadfn: the function to run until signal_pending(current).
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 83911c7..b81588c 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -149,6 +149,38 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
>  }
>  EXPORT_SYMBOL(kthread_create);
> 
> +/* Same as kthread_create, but inherit attributes (cgroups, priority, CPU mask)
> + * from current. */
> +struct task_struct *kthread_create_inherit(int (*threadfn)(void *data),
> +					   void *data,
> +					   const char namefmt[],
> +					   ...)
> +{
> +	struct kthread_create_info create;
> +
> +	create.threadfn = threadfn;
> +	create.data = data;
> +	init_completion(&create.done);
> +
> +	create_kthread(&create);
> +	wait_for_completion(&create.done);
> +
> +	if (!IS_ERR(create.result)) {
> +		va_list args;
> +
> +		/* Don't share files with parent as drivers use release for
> +		 * close on exit, etc. */
> +		exit_files(create.result);
> +
> +		va_start(args, namefmt);
> +		vsnprintf(create.result->comm, sizeof(create.result->comm),
> +			  namefmt, args);
> +		va_end(args);
> +	}
> +	return create.result;
> +}
> +EXPORT_SYMBOL(kthread_create_inherit);
> +
>  /**
>   * kthread_bind - bind a just-created kthread to a cpu.
>   * @p: thread created by kthread_create().


I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
flag rather than create_kthread() and then closing the files.
Either version should be fine.






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Oleg Nesterov July 26, 2010, 6:08 p.m. UTC | #1
On 07/26, Sridhar Samudrala wrote:
>
> I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> flag rather than create_kthread() and then closing the files.

!CLONE_FILES can't help. copy_files() does dup_fd() in this case.
The child still inherits the files.

> Either version should be fine.

I think neither version is fine ;)

exit_files() is not enough too. How about the signals, reparenting?


I already forgot all details, probably I missed somethig. But it
seems to me that it is better to just export get/set affinity and
forget about all complications.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 26, 2010, 7:55 p.m. UTC | #2
On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> On 07/26, Sridhar Samudrala wrote:
> >
> > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > flag rather than create_kthread() and then closing the files.
> 
> !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> The child still inherits the files.
> 
> > Either version should be fine.
> 
> I think neither version is fine ;)
> 
> exit_files() is not enough too. How about the signals,

As I said, signals are unimportant as we are using this
thread to base a worker on - it sleeps uninterruptibly.

> reparenting?

That's actually a feature: it lets us find out which process
owns the device using the thread by looking at the parent.

> 
> I already forgot all details, probably I missed somethig. But it
> seems to me that it is better to just export get/set affinity and
> forget about all complications.
> 
> Oleg.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 26, 2010, 8:27 p.m. UTC | #3
On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> On 07/26, Sridhar Samudrala wrote:
> >
> > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > flag rather than create_kthread() and then closing the files.
> 
> !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> The child still inherits the files.
> 
> > Either version should be fine.
> 
> I think neither version is fine ;)
> 
> exit_files() is not enough too. How about the signals, reparenting?
> 
> 
> I already forgot all details, probably I missed somethig. But it
> seems to me that it is better to just export get/set affinity and
> forget about all complications.
> 
> Oleg.

Almost forgot, we need the same for priority.
Michael S. Tsirkin July 27, 2010, 4:55 a.m. UTC | #4
On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> On 07/26, Sridhar Samudrala wrote:
> >
> > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > flag rather than create_kthread() and then closing the files.
> 
> !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> The child still inherits the files.
> 
> > Either version should be fine.
> 
> I think neither version is fine ;)
> 
> exit_files() is not enough too. How about the signals, reparenting?
> 
> 
> I already forgot all details, probably I missed somethig. But it
> seems to me that it is better to just export get/set affinity and
> forget about all complications.
> 
> Oleg.


Peter, could you please indicate whether you think this is the way to
go, too?
Michael S. Tsirkin July 27, 2010, 3:41 p.m. UTC | #5
On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> On 07/26, Sridhar Samudrala wrote:
> >
> > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > flag rather than create_kthread() and then closing the files.
> 
> !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> The child still inherits the files.
> 
> > Either version should be fine.
> 
> I think neither version is fine ;)
> 
> exit_files() is not enough too. How about the signals, reparenting?
> 
> 
> I already forgot all details, probably I missed somethig. But it
> seems to me that it is better to just export get/set affinity and
> forget about all complications.
> 
> Oleg.

Oleg, so can I attach your Ack to the patch in question, and merge
it all through net-next?
Oleg Nesterov July 30, 2010, 2:19 p.m. UTC | #6
Sorry for the delay, I can't be responsive these days...

On 07/27, Michael S. Tsirkin wrote:
>
> On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> > On 07/26, Sridhar Samudrala wrote:
> > >
> > > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > > flag rather than create_kthread() and then closing the files.
> >
> > !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> > The child still inherits the files.
> >
> > > Either version should be fine.
> >
> > I think neither version is fine ;)
> >
> > exit_files() is not enough too. How about the signals, reparenting?
> >
> >
> > I already forgot all details, probably I missed somethig. But it
> > seems to me that it is better to just export get/set affinity and
> > forget about all complications.
> >
> > Oleg.
>
> Oleg, so can I attach your Ack to the patch in question, and merge
> it all through net-next?

Well, I do not think you need my ack ;)


But I must admit, I personally dislike this idea. A kernel thread which
is the child of the user-space process, and in fact it is not the "real"
kernel thread. I think this is against the common case. If you do not
care the signals/reparenting, why can't you fork the user-space process
which does all the work via ioctl's ? OK, I do not understand the problem
domain, probably this can't work.

Anyway, the patch looks buggy to me. Starting from

	create_kthread(&create);
	wait_for_completion(&create.done);

At least you should check create_kthread() suceeds, otherwise
wait_for_completion() will hang forever. OTOH, if it suceeds then
wait_for_completion() is not needed. But this is minor.

create_kthread()->kernel_thread() uses CLONE_VM, this means that the
child will share ->mm. And this means that if the parent recieves
the coredumping signal it will hang forever in kernel space waiting
until this child exits.

This is just the immediate surprise I can see with this approach,
I am afraid there is something else.

And once again. We are doing this hacks only because we lack a
couples of exports (iiuk). This is, well, a bit strange ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo July 30, 2010, 2:31 p.m. UTC | #7
Hello,

On 07/30/2010 04:19 PM, Oleg Nesterov wrote:
> But I must admit, I personally dislike this idea. A kernel thread which
> is the child of the user-space process, and in fact it is not the "real"
> kernel thread. I think this is against the common case. If you do not
> care the signals/reparenting, why can't you fork the user-space process
> which does all the work via ioctl's ? OK, I do not understand the problem
> domain, probably this can't work.

Having kernel threads which are children of user process is plain
scary considering the many things parent/children relationship implies
and various bugs and security vulnerabilities in the area.  I can't
pinpoint any problem but I think we really shouldn't be adding
something like this for this specific use case.  If we can get away
with exporting a few symbols, let's go that way.

Thanks.
Michael S. Tsirkin Aug. 1, 2010, 8:50 a.m. UTC | #8
On Fri, Jul 30, 2010 at 04:19:01PM +0200, Oleg Nesterov wrote:
> Sorry for the delay, I can't be responsive these days...
> 
> On 07/27, Michael S. Tsirkin wrote:
> >
> > On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
> > > On 07/26, Sridhar Samudrala wrote:
> > > >
> > > > I have been testing out a similar patch that uses kernel_thread() without CLONE_FILES
> > > > flag rather than create_kthread() and then closing the files.
> > >
> > > !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
> > > The child still inherits the files.
> > >
> > > > Either version should be fine.
> > >
> > > I think neither version is fine ;)
> > >
> > > exit_files() is not enough too. How about the signals, reparenting?
> > >
> > >
> > > I already forgot all details, probably I missed somethig. But it
> > > seems to me that it is better to just export get/set affinity and
> > > forget about all complications.
> > >
> > > Oleg.
> >
> > Oleg, so can I attach your Ack to the patch in question, and merge
> > it all through net-next?
> 
> Well, I do not think you need my ack ;)
> 
> 
> But I must admit, I personally dislike this idea. A kernel thread which
> is the child of the user-space process, and in fact it is not the "real"
> kernel thread. I think this is against the common case. If you do not
> care the signals/reparenting, why can't you fork the user-space process
> which does all the work via ioctl's ? OK, I do not understand the problem
> domain, probably this can't work.
> 
> Anyway, the patch looks buggy to me. Starting from
> 
> 	create_kthread(&create);
> 	wait_for_completion(&create.done);
> 
> At least you should check create_kthread() suceeds, otherwise
> wait_for_completion() will hang forever. OTOH, if it suceeds then
> wait_for_completion() is not needed. But this is minor.
> 
> create_kthread()->kernel_thread() uses CLONE_VM, this means that the
> child will share ->mm. And this means that if the parent recieves
> the coredumping signal it will hang forever in kernel space waiting
> until this child exits.
> 
> This is just the immediate surprise I can see with this approach,
> I am afraid there is something else.
> 
> And once again. We are doing this hacks only because we lack a
> couples of exports (iiuk). This is, well, a bit strange ;)
> 
> Oleg.


Oleg, I mean Ack the exporting of get/set affinity.

Thanks!
Oleg Nesterov Aug. 2, 2010, 3:02 p.m. UTC | #9
On 08/01, Michael S. Tsirkin wrote:
>
> Oleg, I mean Ack the exporting of get/set affinity.

Ah, I misunderstood.

Yes, I believe the exporting is the lesser evil. Please feel free
to add my ack.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Aug. 4, 2010, 10:45 a.m. UTC | #10
On Tue, 2010-07-27 at 07:55 +0300, Michael S. Tsirkin wrote:

> Peter, could you please indicate whether you think this is the way to
> go, too?

I really dislike it, as you indicated, you now want priority too..

It seems the problem is that we normally don't consider work done by
kernel threads for user processes part of that process.

I'm not sure what work you're doing, but I'm pretty sure there's similar
things already in the kernel -- think about the work done by encryption
threads for encrypted sockets and stuff.

If you want proper containment of work caused by a process, I'd suggest
you start by looking at curing the general problem, instead of special
casing this one case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index aabc8a1..634eaf7 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -9,6 +9,11 @@  struct task_struct *kthread_create(int (*threadfn)(void *data),
 				   const char namefmt[], ...)
 	__attribute__((format(printf, 3, 4)));
 
+struct task_struct *kthread_clone(int (*threadfn)(void *data),
+				  void *data,
+				  const char namefmt[], ...)
+	__attribute__((format(printf, 3, 4)));
+
 /**
  * kthread_run - create and wake a thread.
  * @threadfn: the function to run until signal_pending(current).
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 83911c7..806dae5 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -149,6 +149,38 @@  struct task_struct *kthread_create(int (*threadfn)(void *data),
 }
 EXPORT_SYMBOL(kthread_create);
 
+struct task_struct *kthread_clone(int (*threadfn)(void *data),
+				  void *data,
+				  const char namefmt[],
+				  ...)
+{
+	struct kthread_create_info create;
+	int pid;
+
+	create.threadfn = threadfn;
+	create.data = data;
+	init_completion(&create.done);
+	INIT_LIST_HEAD(&create.list);
+
+	pid = kernel_thread(kthread, &create, CLONE_FS);
+	if (pid < 0) {
+		create.result = ERR_PTR(pid);
+		complete(&create.done);
+	}
+	wait_for_completion(&create.done);
+
+	if (!IS_ERR(create.result)) {
+		va_list args;
+		va_start(args, namefmt);
+		vsnprintf(create.result->comm, sizeof(create.result->comm),
+			  namefmt, args);
+		va_end(args);
+	}
+
+	return create.result;
+}
+EXPORT_SYMBOL(kthread_clone);
+
 /**
  * kthread_bind - bind a just-created kthread to a cpu.
  * @p: thread created by kthread_create().