diff mbox

[BUG] Bug in netprio_cgroup and netcls_cgroup ?

Message ID 20130122100938.GA26820@candlejack.bmw-carit.intra
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Wagner Jan. 22, 2013, 10:09 a.m. UTC
On Mon, Jan 21, 2013 at 09:18:38AM -0800, John Fastabend wrote:
> On 01/21/2013 01:57 AM, Li Zefan wrote:
> > On 2013/1/21 17:27, Daniel Wagner wrote:
> >> On 21.01.2013 10:01, Li Zefan wrote:
> >>> On 2013/1/21 16:50, Daniel Wagner wrote:
> >>>> Hi Li,
> >>>>
> >>>> On 21.01.2013 07:08, Li Zefan wrote:
> >>>>> I'm not a network developer, so correct me if I'm wrong.
> >>>>>
> >>>>> Since commit 7955490f732c2b8
> >>>>> ("net: netprio_cgroup: rework update socket logic"), sock->sk->sk_cgrp_prioidx
> >>>>> is set when the socket is created, and won't be updated unless the task is
> >>>>> moved to another cgroup.
> >>>>>
> >>>>> Now the problem is, a socket can be _shared_ by multiple processes (fork, SCM_RIGHT).
> >>>>> If we place those processes in different cgroups, and each cgroup has
> >>>>> different configs, but all of the processes will send data via this socket
> >>>>> with the same network priority.
> >>>>
> >>>> Wouldn't that be addressed by 48a87cc26c13b68f6cce4e9d769fcb17a6b3e4b8
> >>>>
> >>>>      net: netprio: fd passed in SCM_RIGHTS datagram not set correctly
> >>>>
> >>>>       A socket fd passed in a SCM_RIGHTS datagram was not getting
> >>>>       updated with the new tasks cgrp prioidx. This leaves IO on
> >>>>       the socket tagged with the old tasks priority.
> >>>>
> >>>>       To fix this add a check in the scm recvmsg path to update the
> >>>>       sock cgrp prioidx with the new tasks value.
> >>>>
> >>>> As I read this this should work for net_prio.
> >>>>
> >>>
> >>> But after process A passed the socket fd to B, both A and B can use the
> >>> same socket to send data, right? Then if A and B were placed in different
> >>> cgroups with differnt configs, A's config won't take effect anymore.
> >>>
> >>> Am I missing something?
> >>
> >> I don't know. I guess at one point the socket resources are shared and then
> >> one configuration is taking preference. As you can see I am far away of
> >> being
> >> an expert in this field. Hopefully someone who understands this bits
> >> can chip in.
> >>
> >> BTW, isn't this a similar to what should happen with the block io cgroup?
> >> What is the behavior with a fd writing to a file in the scenario you
> >> describe above?
> >>
> > 
> > It forbids task moving in this case:
> > 
> > /*
> >   * We cannot support shared io contexts, as we have no mean to support
> >   * two tasks with the same ioc in two different groups without major rework
> >   * of the main cic data structures.  For now we allow a task to change
> >   * its cgroup only if it's the only owner of its ioc.
> >   */
> > static int blkcg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
> > 
> 
> OK, I guess we should do something similar in the netprio, netcls
> cgroups and
> yes document it as you noted in your last comment.

Here is my attempt to add such a check. I really don't know if this is the
correct way to do so. To test this I have written a test program, which
seems to test the right thing. Please have a look and let me know if
it is correct: http://www.monom.org/misc/scm_rights.c

And here a dirty first version of the patch:


From 49a78d907eaf31c16673025e7e3b4844e419e416 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Tue, 22 Jan 2013 11:08:22 +0100
Subject: [PATCH] net: net_prio: Block attach if a socket is shared

---
 net/core/netprio_cgroup.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

John Fastabend Jan. 23, 2013, 12:02 a.m. UTC | #1
[...]

>>
>> OK, I guess we should do something similar in the netprio, netcls
>> cgroups and
>> yes document it as you noted in your last comment.
>
> Here is my attempt to add such a check. I really don't know if this is the
> correct way to do so. To test this I have written a test program, which
> seems to test the right thing. Please have a look and let me know if
> it is correct: http://www.monom.org/misc/scm_rights.c
>
> And here a dirty first version of the patch:
>
>
>  From 49a78d907eaf31c16673025e7e3b4844e419e416 Mon Sep 17 00:00:00 2001
> From: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Date: Tue, 22 Jan 2013 11:08:22 +0100
> Subject: [PATCH] net: net_prio: Block attach if a socket is shared
>
> ---
>   net/core/netprio_cgroup.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
>
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index 847c02b..de4e6c5 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -274,9 +274,39 @@ static struct cftype ss_files[] = {
>   	{ }	/* terminate */
>   };
>
> +static int check_cnt(const void *v, struct file *file, unsigned n)
> +{
> +	unsigned *flag = (unsigned *)v;
> +	int err;
> +
> +	struct socket *sock = sock_from_file(file, &err);
> +	if (sock && file_count(file) > 1)
> +		*flag = 1;
> +

I think this check will catch a lot of cases that are not necessarily
sharing a socket across tasks though. For example iscsid passes a file
descriptor to the kernel which does a sockfd_lookup() incrementing
f_count. Similarly look at dup/clone/etc.

In many of these cases I believe it should be OK to move the task
around when the sockets are not shared between multiple tasks.

.John
Daniel Wagner Jan. 23, 2013, 9:24 a.m. UTC | #2
On 23.01.2013 01:02, John Fastabend wrote:
> [...]
>
>>>
>>> OK, I guess we should do something similar in the netprio, netcls
>>> cgroups and
>>> yes document it as you noted in your last comment.
>>
>> Here is my attempt to add such a check. I really don't know if this is
>> the
>> correct way to do so. To test this I have written a test program, which
>> seems to test the right thing. Please have a look and let me know if
>> it is correct: http://www.monom.org/misc/scm_rights.c
>>
>> And here a dirty first version of the patch:
>>
>>
>>  From 49a78d907eaf31c16673025e7e3b4844e419e416 Mon Sep 17 00:00:00 2001
>> From: Daniel Wagner <daniel.wagner@bmw-carit.de>
>> Date: Tue, 22 Jan 2013 11:08:22 +0100
>> Subject: [PATCH] net: net_prio: Block attach if a socket is shared
>>
>> ---
>>   net/core/netprio_cgroup.c | 30 ++++++++++++++++++++++++++++++
>>   1 file changed, 30 insertions(+)
>>
>> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
>> index 847c02b..de4e6c5 100644
>> --- a/net/core/netprio_cgroup.c
>> +++ b/net/core/netprio_cgroup.c
>> @@ -274,9 +274,39 @@ static struct cftype ss_files[] = {
>>       { }    /* terminate */
>>   };
>>
>> +static int check_cnt(const void *v, struct file *file, unsigned n)
>> +{
>> +    unsigned *flag = (unsigned *)v;
>> +    int err;
>> +
>> +    struct socket *sock = sock_from_file(file, &err);
>> +    if (sock && file_count(file) > 1)
>> +        *flag = 1;
>> +
>
> I think this check will catch a lot of cases that are not necessarily
> sharing a socket across tasks though. For example iscsid passes a file
> descriptor to the kernel which does a sockfd_lookup() incrementing
> f_count. Similarly look at dup/clone/etc.

Yep, I expected that this patch was too simple.

> In many of these cases I believe it should be OK to move the task
> around when the sockets are not shared between multiple tasks.

Do you know of a different way to identify the shared sockets?

thanks,
daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zefan Li Jan. 25, 2013, 8:39 a.m. UTC | #3
>>>> BTW, isn't this a similar to what should happen with the block io cgroup?
>>>> What is the behavior with a fd writing to a file in the scenario you
>>>> describe above?
>>>>
>>>
>>> It forbids task moving in this case:
>>>
>>> /*
>>>   * We cannot support shared io contexts, as we have no mean to support
>>>   * two tasks with the same ioc in two different groups without major rework
>>>   * of the main cic data structures.  For now we allow a task to change
>>>   * its cgroup only if it's the only owner of its ioc.
>>>   */
>>> static int blkcg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
>>>
>>
>> OK, I guess we should do something similar in the netprio, netcls
>> cgroups and
>> yes document it as you noted in your last comment.
> 
> Here is my attempt to add such a check. I really don't know if this is the
> correct way to do so. To test this I have written a test program, which
> seems to test the right thing. Please have a look and let me know if
> it is correct: http://www.monom.org/misc/scm_rights.c
> 
> And here a dirty first version of the patch:
> 

Adding new can_attach() is discouraged. It's considered bad, because if cgroup
is mounted with multi subsystems binded, and if one of them can forbid task
moving for some reason, this will add burden on users to use cgroup properly.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 847c02b..de4e6c5 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -274,9 +274,39 @@  static struct cftype ss_files[] = {
 	{ }	/* terminate */
 };
 
+static int check_cnt(const void *v, struct file *file, unsigned n)
+{
+	unsigned *flag = (unsigned *)v;
+	int err;
+
+	struct socket *sock = sock_from_file(file, &err);
+	if (sock && file_count(file) > 1)
+		*flag = 1;
+
+	return 0;
+}
+
+static int cgrp_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
+{
+	struct task_struct *p;
+	unsigned flag = 0;
+
+	cgroup_taskset_for_each(p, cgrp, tset) {
+		task_lock(p);
+		iterate_fd(p->files, 0, check_cnt, &flag);
+		task_unlock(p);
+	}
+
+	if (flag != 0)
+		return -EINVAL;
+
+	return 0;
+}
+
 struct cgroup_subsys net_prio_subsys = {
 	.name		= "net_prio",
 	.create		= cgrp_create,
+	.can_attach	= cgrp_can_attach,
 	.destroy	= cgrp_destroy,
 	.attach		= net_prio_attach,
 	.subsys_id	= net_prio_subsys_id,