diff mbox

[nf-next,v3] netfilter: xtables: lightweight process control group matching

Message ID 7d845623366ba0d435d5c82bf318455871929e90.1387815795.git.dborkman@redhat.com
State Changes Requested
Headers show

Commit Message

Daniel Borkmann Dec. 23, 2013, 5:41 p.m. UTC
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
---
  Previous discussions, design considerations etc can be found in:
   - v1: http://patchwork.ozlabs.org/patch/280687/
   - v1/alt: http://patchwork.ozlabs.org/patch/282477/
   - v2: http://patchwork.ozlabs.org/patch/284582/
  v1->v2:
   - Updated commit message, rebased
   - Applied Gao Feng's feedback
  v2->v3:
   - After discussions w/ Tejun, let's not add any cgroups code here,
     thus we _only_ add code in netfilter area, nowhere else, that's
     even more simple and cleaner than proposed.

 Documentation/cgroups/net_cls.txt        |  5 +++
 include/uapi/linux/netfilter/Kbuild      |  1 +
 include/uapi/linux/netfilter/xt_cgroup.h | 11 +++++
 net/netfilter/Kconfig                    | 10 +++++
 net/netfilter/Makefile                   |  1 +
 net/netfilter/xt_cgroup.c                | 71 ++++++++++++++++++++++++++++++++
 6 files changed, 99 insertions(+)
 create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h
 create mode 100644 net/netfilter/xt_cgroup.c

Comments

Zefan Li Dec. 27, 2013, 3:13 a.m. UTC | #1
On 2013/12/24 1:41, Daniel Borkmann wrote:
> It would be useful e.g. in a server or desktop environment to have
> a facility in the notion of fine-grained "per application" or "per
> application group" firewall policies. Probably, users in the mobile,
> embedded area (e.g. Android based) with different security policy
> requirements for application groups could have great benefit from
> that as well. For example, with a little bit of configuration effort,
> an admin could whitelist well-known applications, and thus block
> otherwise unwanted "hard-to-track" applications like [1] from a
> user's machine. Blocking is just one example, but it is not limited
> to that, meaning we can have much different scenarios/policies that
> netfilter allows us than just blocking, e.g. fine grained settings
> where applications are allowed to connect/send traffic to, application
> traffic marking/conntracking, application-specific packet mangling,
> and so on.
> 
> Implementation of PID-based matching would not be appropriate
> as they frequently change, and child tracking would make that
> even more complex and ugly. Cgroups would be a perfect candidate
> for accomplishing that as they associate a set of tasks with a
> set of parameters for one or more subsystems, in our case the
> netfilter subsystem, which, of course, can be combined with other
> cgroup subsystems into something more complex if needed.
> 
> As mentioned, to overcome this constraint, such processes could
> be placed into one or multiple cgroups where different fine-grained
> rules can be defined depending on the application scenario, while
> e.g. everything else that is not part of that could be dropped (or
> vice versa), thus making life harder for unwanted processes to
> communicate to the outside world. So, we make use of cgroups here
> to track jobs and limit their resources in terms of iptables
> policies; in other words, limiting, tracking, etc what they are
> allowed to communicate.
> 
> In our case we're working on outgoing traffic based on which local
> socket that originated from. Also, one doesn't even need to have
> an a-prio knowledge of the application internals regarding their
> particular use of ports or protocols. Matching is *extremly*
> lightweight as we just test for the sk_classid marker of sockets,
> originating from net_cls. net_cls and netfilter do not contradict
> each other; in fact, each construct can live as standalone or they
> can be used in combination with each other, which is perfectly fine,
> plus it serves Tejun's requirement to not introduce a new cgroups
> subsystem. Through this, we result in a very minimal and efficient
> module, and don't add anything except netfilter code.
> 

I'd suggest splitting cls_cgroup code into 2 parts. The first part
is to manage cgroupfs and classid, and should be put into net/core/
and add a new config like NET_CGROUP_CLASSID for it. The second part
is specific cls_cgroup code.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 27, 2013, 8:05 a.m. UTC | #2
On 12/27/2013 04:13 AM, Li Zefan wrote:
> On 2013/12/24 1:41, Daniel Borkmann wrote:
>> It would be useful e.g. in a server or desktop environment to have
>> a facility in the notion of fine-grained "per application" or "per
>> application group" firewall policies. Probably, users in the mobile,
>> embedded area (e.g. Android based) with different security policy
>> requirements for application groups could have great benefit from
>> that as well. For example, with a little bit of configuration effort,
>> an admin could whitelist well-known applications, and thus block
>> otherwise unwanted "hard-to-track" applications like [1] from a
>> user's machine. Blocking is just one example, but it is not limited
>> to that, meaning we can have much different scenarios/policies that
>> netfilter allows us than just blocking, e.g. fine grained settings
>> where applications are allowed to connect/send traffic to, application
>> traffic marking/conntracking, application-specific packet mangling,
>> and so on.
>>
>> Implementation of PID-based matching would not be appropriate
>> as they frequently change, and child tracking would make that
>> even more complex and ugly. Cgroups would be a perfect candidate
>> for accomplishing that as they associate a set of tasks with a
>> set of parameters for one or more subsystems, in our case the
>> netfilter subsystem, which, of course, can be combined with other
>> cgroup subsystems into something more complex if needed.
>>
>> As mentioned, to overcome this constraint, such processes could
>> be placed into one or multiple cgroups where different fine-grained
>> rules can be defined depending on the application scenario, while
>> e.g. everything else that is not part of that could be dropped (or
>> vice versa), thus making life harder for unwanted processes to
>> communicate to the outside world. So, we make use of cgroups here
>> to track jobs and limit their resources in terms of iptables
>> policies; in other words, limiting, tracking, etc what they are
>> allowed to communicate.
>>
>> In our case we're working on outgoing traffic based on which local
>> socket that originated from. Also, one doesn't even need to have
>> an a-prio knowledge of the application internals regarding their
>> particular use of ports or protocols. Matching is *extremly*
>> lightweight as we just test for the sk_classid marker of sockets,
>> originating from net_cls. net_cls and netfilter do not contradict
>> each other; in fact, each construct can live as standalone or they
>> can be used in combination with each other, which is perfectly fine,
>> plus it serves Tejun's requirement to not introduce a new cgroups
>> subsystem. Through this, we result in a very minimal and efficient
>> module, and don't add anything except netfilter code.
>>
> 
> I'd suggest splitting cls_cgroup code into 2 parts. The first part
> is to manage cgroupfs and classid, and should be put into net/core/
> and add a new config like NET_CGROUP_CLASSID for it. The second part
> is specific cls_cgroup code.

Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
any of this code in netfilter here.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zefan Li Dec. 28, 2013, 1:33 a.m. UTC | #3
On 2013/12/27 16:05, Daniel Borkmann wrote:
> On 12/27/2013 04:13 AM, Li Zefan wrote:
>> On 2013/12/24 1:41, Daniel Borkmann wrote:
>>> It would be useful e.g. in a server or desktop environment to have
>>> a facility in the notion of fine-grained "per application" or "per
>>> application group" firewall policies. Probably, users in the mobile,
>>> embedded area (e.g. Android based) with different security policy
>>> requirements for application groups could have great benefit from
>>> that as well. For example, with a little bit of configuration effort,
>>> an admin could whitelist well-known applications, and thus block
>>> otherwise unwanted "hard-to-track" applications like [1] from a
>>> user's machine. Blocking is just one example, but it is not limited
>>> to that, meaning we can have much different scenarios/policies that
>>> netfilter allows us than just blocking, e.g. fine grained settings
>>> where applications are allowed to connect/send traffic to, application
>>> traffic marking/conntracking, application-specific packet mangling,
>>> and so on.
>>>
>>> Implementation of PID-based matching would not be appropriate
>>> as they frequently change, and child tracking would make that
>>> even more complex and ugly. Cgroups would be a perfect candidate
>>> for accomplishing that as they associate a set of tasks with a
>>> set of parameters for one or more subsystems, in our case the
>>> netfilter subsystem, which, of course, can be combined with other
>>> cgroup subsystems into something more complex if needed.
>>>
>>> As mentioned, to overcome this constraint, such processes could
>>> be placed into one or multiple cgroups where different fine-grained
>>> rules can be defined depending on the application scenario, while
>>> e.g. everything else that is not part of that could be dropped (or
>>> vice versa), thus making life harder for unwanted processes to
>>> communicate to the outside world. So, we make use of cgroups here
>>> to track jobs and limit their resources in terms of iptables
>>> policies; in other words, limiting, tracking, etc what they are
>>> allowed to communicate.
>>>
>>> In our case we're working on outgoing traffic based on which local
>>> socket that originated from. Also, one doesn't even need to have
>>> an a-prio knowledge of the application internals regarding their
>>> particular use of ports or protocols. Matching is *extremly*
>>> lightweight as we just test for the sk_classid marker of sockets,
>>> originating from net_cls. net_cls and netfilter do not contradict
>>> each other; in fact, each construct can live as standalone or they
>>> can be used in combination with each other, which is perfectly fine,
>>> plus it serves Tejun's requirement to not introduce a new cgroups
>>> subsystem. Through this, we result in a very minimal and efficient
>>> module, and don't add anything except netfilter code.
>>>
>>
>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>> is to manage cgroupfs and classid, and should be put into net/core/
>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>> is specific cls_cgroup code.
> 
> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
> any of this code in netfilter here.
> 

We should clean up the code before introducing a new feature, not the
other way.

And I wish by touching the code outside netfilter we'll get more eyes
on whether it's ok to reuse cls cgroup for netfilter.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 28, 2013, 8:17 a.m. UTC | #4
On 12/28/2013 02:33 AM, Li Zefan wrote:
> On 2013/12/27 16:05, Daniel Borkmann wrote:
>> On 12/27/2013 04:13 AM, Li Zefan wrote:
...
>>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>>> is to manage cgroupfs and classid, and should be put into net/core/
>>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>>> is specific cls_cgroup code.
>>
>> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
>> any of this code in netfilter here.
> 
> We should clean up the code before introducing a new feature, not the
> other way.

Hehe, quite honestly, I think this is YOUR opinion of a "cleanup", which
makes the code actually more complicated, and it's not strictly needed.

If you are so desperate about this separation, fine, I will do this, but
again, I don't think it's *strictly* required for this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 28, 2013, 8:44 a.m. UTC | #5
On 12/28/2013 09:17 AM, Daniel Borkmann wrote:
> On 12/28/2013 02:33 AM, Li Zefan wrote:
>> On 2013/12/27 16:05, Daniel Borkmann wrote:
>>> On 12/27/2013 04:13 AM, Li Zefan wrote:
> ...
>>>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>>>> is to manage cgroupfs and classid, and should be put into net/core/
>>>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>>>> is specific cls_cgroup code.
>>>
>>> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
>>> any of this code in netfilter here.
>>
>> We should clean up the code before introducing a new feature, not the
>> other way.
> 
> Hehe, quite honestly, I think this is YOUR opinion of a "cleanup", which
> makes the code actually more complicated, and it's not strictly needed.
> 
> If you are so desperate about this separation, fine, I will do this, but
> again, I don't think it's *strictly* required for this.

Thinking about this further a bit, with this separation into net/core/, you
would need another Kconfig, where people choose between built-in or
module (or none). With built-in you would disallow people to load/unload the
cgroup part during runtime. With the module, you would introduce a module
dependency _for each_, even if you only want to use net_cls, which sort of
is not what we want from a simple "cleanup".
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zefan Li Dec. 28, 2013, 9:20 a.m. UTC | #6
On 2013/12/28 16:44, Daniel Borkmann wrote:
> On 12/28/2013 09:17 AM, Daniel Borkmann wrote:
>> On 12/28/2013 02:33 AM, Li Zefan wrote:
>>> On 2013/12/27 16:05, Daniel Borkmann wrote:
>>>> On 12/27/2013 04:13 AM, Li Zefan wrote:
>> ...
>>>>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>>>>> is to manage cgroupfs and classid, and should be put into net/core/
>>>>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>>>>> is specific cls_cgroup code.
>>>>
>>>> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
>>>> any of this code in netfilter here.
>>>
>>> We should clean up the code before introducing a new feature, not the
>>> other way.
>>
>> Hehe, quite honestly, I think this is YOUR opinion of a "cleanup", which
>> makes the code actually more complicated, and it's not strictly needed.
>>
>> If you are so desperate about this separation, fine, I will do this, but
>> again, I don't think it's *strictly* required for this.
> 

Currently we have:

	cls_cgroup

With your patch, it's changed to:

                               __  cls_cgroup
                             /
	net_cgroup_classid -
                             \ __  netfilter_cgroup

The code should be re-organized.

> Thinking about this further a bit, with this separation into net/core/, you
> would need another Kconfig, where people choose between built-in or
> module (or none).

Just disable module support for net_cgroup_classid. I never like the cgroup
feature that allows modular subsystems. It will be just about 100 lines of
code, so we gain little by making it modular. Futhermore with this change,
we can simplify include/linux/cls_cgroup.h

Moreoever cls_cgroup is the only modular cgroup subsystem, so I may even remove
this feature from cgroup core after the change to cls_cgroup.

> With built-in you would disallow people to load/unload the
> cgroup part during runtime. With the module, you would introduce a module
> dependency _for each_, even if you only want to use net_cls, which sort of
> is not what we want from a simple "cleanup".
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zefan Li Dec. 28, 2013, 9:54 a.m. UTC | #7
>> Thinking about this further a bit, with this separation into net/core/, you
>> would need another Kconfig, where people choose between built-in or
>> module (or none).
> 
> Just disable module support for net_cgroup_classid. I never like the cgroup
> feature that allows modular subsystems. It will be just about 100 lines of
> code, so we gain little by making it modular. Futhermore with this change,
> we can simplify include/linux/cls_cgroup.h
> 
> Moreoever cls_cgroup is the only modular cgroup subsystem, so I may even remove
> this feature from cgroup core after the change to cls_cgroup.
> 
>> With built-in you would disallow people to load/unload the
>> cgroup part during runtime. With the module, you would introduce a module
>> dependency _for each_, even if you only want to use net_cls, which sort of
>> is not what we want from a simple "cleanup".
>>

Even if it's a module, I don't think there's any dependency here, as
the only interface that cls_cgroup and netfilter_cgroup need is
task_cls_classid(), which is an inline function and doesn't involve any
EXPORT symbol.

Therefore it works even if CONFIG_CLS_CGROUP doesn't depend of
CONFIG_NET_CGROUP_CLASSID.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 28, 2013, 6:04 p.m. UTC | #8
On 12/28/2013 10:20 AM, Li Zefan wrote:
> On 2013/12/28 16:44, Daniel Borkmann wrote:
>> On 12/28/2013 09:17 AM, Daniel Borkmann wrote:
>>> On 12/28/2013 02:33 AM, Li Zefan wrote:
>>>> On 2013/12/27 16:05, Daniel Borkmann wrote:
>>>>> On 12/27/2013 04:13 AM, Li Zefan wrote:
>>> ...
>>>>>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>>>>>> is to manage cgroupfs and classid, and should be put into net/core/
>>>>>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>>>>>> is specific cls_cgroup code.
>>>>>
>>>>> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
>>>>> any of this code in netfilter here.
>>>>
>>>> We should clean up the code before introducing a new feature, not the
>>>> other way.
>>>
>>> Hehe, quite honestly, I think this is YOUR opinion of a "cleanup", which
>>> makes the code actually more complicated, and it's not strictly needed.
>>>
>>> If you are so desperate about this separation, fine, I will do this, but
>>> again, I don't think it's *strictly* required for this.
>>
> 
> Currently we have:
> 
> 	cls_cgroup
> 
> With your patch, it's changed to:
> 
>                                 __  cls_cgroup
>                               /
> 	net_cgroup_classid -
>                               \ __  netfilter_cgroup
> 
> The code should be re-organized.
> 
>> Thinking about this further a bit, with this separation into net/core/, you
>> would need another Kconfig, where people choose between built-in or
>> module (or none).
> 
> Just disable module support for net_cgroup_classid. I never like the cgroup
> feature that allows modular subsystems. It will be just about 100 lines of
> code, so we gain little by making it modular. Futhermore with this change,
> we can simplify include/linux/cls_cgroup.h
> 
> Moreoever cls_cgroup is the only modular cgroup subsystem, so I may even remove
> this feature from cgroup core after the change to cls_cgroup.

Ok, will send a v4 with this change included, thanks.

>> With built-in you would disallow people to load/unload the
>> cgroup part during runtime. With the module, you would introduce a module
>> dependency _for each_, even if you only want to use net_cls, which sort of
>> is not what we want from a simple "cleanup".
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 28, 2013, 6:37 p.m. UTC | #9
On 12/28/2013 07:04 PM, Daniel Borkmann wrote:
> On 12/28/2013 10:20 AM, Li Zefan wrote:
>> On 2013/12/28 16:44, Daniel Borkmann wrote:
>>> On 12/28/2013 09:17 AM, Daniel Borkmann wrote:
>>>> On 12/28/2013 02:33 AM, Li Zefan wrote:
>>>>> On 2013/12/27 16:05, Daniel Borkmann wrote:
>>>>>> On 12/27/2013 04:13 AM, Li Zefan wrote:
>>>> ...
>>>>>>> I'd suggest splitting cls_cgroup code into 2 parts. The first part
>>>>>>> is to manage cgroupfs and classid, and should be put into net/core/
>>>>>>> and add a new config like NET_CGROUP_CLASSID for it. The second part
>>>>>>> is specific cls_cgroup code.
>>>>>>
>>>>>> Sure, if this is wished, I'd do this as a follow-up as it doesn't affect
>>>>>> any of this code in netfilter here.
>>>>>
>>>>> We should clean up the code before introducing a new feature, not the
>>>>> other way.
>>>>
>>>> Hehe, quite honestly, I think this is YOUR opinion of a "cleanup", which
>>>> makes the code actually more complicated, and it's not strictly needed.
>>>>
>>>> If you are so desperate about this separation, fine, I will do this, but
>>>> again, I don't think it's *strictly* required for this.
>>>
>>
>> Currently we have:
>>
>> 	cls_cgroup
>>
>> With your patch, it's changed to:
>>
>>                                  __  cls_cgroup
>>                                /
>> 	net_cgroup_classid -
>>                                \ __  netfilter_cgroup
>>
>> The code should be re-organized.
>>
>>> Thinking about this further a bit, with this separation into net/core/, you
>>> would need another Kconfig, where people choose between built-in or
>>> module (or none).
>>
>> Just disable module support for net_cgroup_classid. I never like the cgroup
>> feature that allows modular subsystems. It will be just about 100 lines of
>> code, so we gain little by making it modular. Futhermore with this change,
>> we can simplify include/linux/cls_cgroup.h
>>
>> Moreoever cls_cgroup is the only modular cgroup subsystem, so I may even remove
>> this feature from cgroup core after the change to cls_cgroup.

Btw, if you plan to remove modular support at some point in time, there's still
netprio_cgroup on todo.

> Ok, will send a v4 with this change included, thanks.
> 
>>> With built-in you would disallow people to load/unload the
>>> cgroup part during runtime. With the module, you would introduce a module
>>> dependency _for each_, even if you only want to use net_cls, which sort of
>>> is not what we want from a simple "cleanup".
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/cgroups/net_cls.txt b/Documentation/cgroups/net_cls.txt
index 9face6b..ec18234 100644
--- a/Documentation/cgroups/net_cls.txt
+++ b/Documentation/cgroups/net_cls.txt
@@ -6,6 +6,8 @@  tag network packets with a class identifier (classid).
 
 The Traffic Controller (tc) can be used to assign
 different priorities to packets from different cgroups.
+Also, Netfilter (iptables) can use this tag to perform
+actions on such packets.
 
 Creating a net_cls cgroups instance creates a net_cls.classid file.
 This net_cls.classid value is initialized to 0.
@@ -32,3 +34,6 @@  tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
  - creating traffic class 10:1
 
 tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup
+
+configuring iptables, basic example:
+iptables -A OUTPUT -m cgroup ! --cgroup 0x100001 -j DROP
diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
index 17c3af2..2e6b9e4 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -39,6 +39,7 @@  header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_addrtype.h
 header-y += xt_bpf.h
+header-y += xt_cgroup.h
 header-y += xt_cluster.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h
new file mode 100644
index 0000000..43acb7e
--- /dev/null
+++ b/include/uapi/linux/netfilter/xt_cgroup.h
@@ -0,0 +1,11 @@ 
+#ifndef _UAPI_XT_CGROUP_H
+#define _UAPI_XT_CGROUP_H
+
+#include <linux/types.h>
+
+struct xt_cgroup_info {
+	__u32 id;
+	__u32 invert;
+};
+
+#endif /* _UAPI_XT_CGROUP_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c3398cd..bbdab88 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -858,6 +858,16 @@  config NETFILTER_XT_MATCH_BPF
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_CGROUP
+	tristate '"control group" match support'
+	depends on NETFILTER_ADVANCED
+	depends on CGROUPS
+	select NET_CLS_CGROUP
+	---help---
+	Socket/process control group matching allows you to match locally
+	generated packets based on which net_cls control group processes
+	belong to.
+
 config NETFILTER_XT_MATCH_CLUSTER
 	tristate '"cluster" match support'
 	depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 394483b..9345365 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -142,6 +142,7 @@  obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
new file mode 100644
index 0000000..66107a7
--- /dev/null
+++ b/net/netfilter/xt_cgroup.c
@@ -0,0 +1,71 @@ 
+/*
+ * Xtables module to match the process control group.
+ *
+ * Might be used to implement individual "per-application" firewall
+ * policies in contrast to global policies based on control groups.
+ * Matching is based upon processes tagged to net_cls' classid marker.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/skbuff.h>
+#include <linux/module.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_cgroup.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("Xtables: process control group matching");
+MODULE_ALIAS("ipt_cgroup");
+MODULE_ALIAS("ip6t_cgroup");
+
+static int cgroup_mt_check(const struct xt_mtchk_param *par)
+{
+	struct xt_cgroup_info *info = par->matchinfo;
+
+	if (info->invert & ~1)
+		return -EINVAL;
+
+	return info->id ? 0 : -EINVAL;
+}
+
+static bool
+cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_cgroup_info *info = par->matchinfo;
+
+	if (skb->sk == NULL)
+		return false;
+
+	return (info->id == skb->sk->sk_classid) ^ info->invert;
+}
+
+static struct xt_match cgroup_mt_reg __read_mostly = {
+	.name       = "cgroup",
+	.revision   = 0,
+	.family     = NFPROTO_UNSPEC,
+	.checkentry = cgroup_mt_check,
+	.match      = cgroup_mt,
+	.matchsize  = sizeof(struct xt_cgroup_info),
+	.me         = THIS_MODULE,
+	.hooks      = (1 << NF_INET_LOCAL_OUT) |
+		      (1 << NF_INET_POST_ROUTING),
+};
+
+static int __init cgroup_mt_init(void)
+{
+	return xt_register_match(&cgroup_mt_reg);
+}
+
+static void __exit cgroup_mt_exit(void)
+{
+	xt_unregister_match(&cgroup_mt_reg);
+}
+
+module_init(cgroup_mt_init);
+module_exit(cgroup_mt_exit);