diff mbox series

[RFC,net-next] net: sch_clsact: add support for global per-netns classifier mode

Message ID 1504615701-20912-1-git-send-email-nikolay@cumulusnetworks.com
State RFC, archived
Delegated to: David Miller
Headers show
Series [RFC,net-next] net: sch_clsact: add support for global per-netns classifier mode | expand

Commit Message

Nikolay Aleksandrov Sept. 5, 2017, 12:48 p.m. UTC
Hi all,
This RFC adds a new mode for clsact which designates a device's egress
classifier as global per netns. The packets that are not classified for
a particular device will be classified using the global classifier.
We have needed a global classifier for some time now for various
purposes and setting the single bridge or loopback/vrf device as the
global classifier device is acceptable for us. Doing it this way avoids
the act/cls device and queue dependencies.

This is strictly an RFC patch just to show the intent, if we agree on
the details the proposed patch will have support for both ingress and
egress, and will be using a static key to avoid the fast path test when no
global classifier has been configured.

Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
$ tc qdisc add dev lo clsact global
$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop

the last filter will be global for all devices that don't have a
specific egress_cl_list (i.e. have clsact configured).

Any comments and thoughts would be greatly appreciated.

Thanks!

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 include/linux/rtnetlink.h   |  1 +
 include/net/net_namespace.h |  3 +++
 include/net/sch_generic.h   |  1 +
 net/core/dev.c              | 36 ++++++++++++++++++++++++++++++++++--
 net/sched/sch_ingress.c     | 15 +++++++++++++--
 5 files changed, 52 insertions(+), 4 deletions(-)

Comments

Jiri Pirko Sept. 5, 2017, 2:07 p.m. UTC | #1
Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote:
>Hi all,
>This RFC adds a new mode for clsact which designates a device's egress
>classifier as global per netns. The packets that are not classified for
>a particular device will be classified using the global classifier.
>We have needed a global classifier for some time now for various
>purposes and setting the single bridge or loopback/vrf device as the
>global classifier device is acceptable for us. Doing it this way avoids
>the act/cls device and queue dependencies.
>
>This is strictly an RFC patch just to show the intent, if we agree on
>the details the proposed patch will have support for both ingress and
>egress, and will be using a static key to avoid the fast path test when no
>global classifier has been configured.
>
>Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>$ tc qdisc add dev lo clsact global
>$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>
>the last filter will be global for all devices that don't have a
>specific egress_cl_list (i.e. have clsact configured).
>
>Any comments and thoughts would be greatly appreciated.

Did you see my shared blocks work? I believe that it should resolve your
usecase, in a generic way. You just have to bind the devices you need to
the shared block. Please see the RFC:

https://www.spinics.net/lists/netdev/msg444067.html
Jiri Pirko Sept. 5, 2017, 2:23 p.m. UTC | #2
Tue, Sep 05, 2017 at 04:07:51PM CEST, jiri@resnulli.us wrote:
>Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote:
>>Hi all,
>>This RFC adds a new mode for clsact which designates a device's egress
>>classifier as global per netns. The packets that are not classified for
>>a particular device will be classified using the global classifier.
>>We have needed a global classifier for some time now for various
>>purposes and setting the single bridge or loopback/vrf device as the
>>global classifier device is acceptable for us. Doing it this way avoids
>>the act/cls device and queue dependencies.
>>
>>This is strictly an RFC patch just to show the intent, if we agree on
>>the details the proposed patch will have support for both ingress and
>>egress, and will be using a static key to avoid the fast path test when no
>>global classifier has been configured.
>>
>>Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>>$ tc qdisc add dev lo clsact global
>>$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>
>>the last filter will be global for all devices that don't have a
>>specific egress_cl_list (i.e. have clsact configured).
>>
>>Any comments and thoughts would be greatly appreciated.

For the record, I think this "global" thing is a hack similar to
cls_u32 shared hashlists.


>
>Did you see my shared blocks work? I believe that it should resolve your
>usecase, in a generic way. You just have to bind the devices you need to
>the shared block. Please see the RFC:
>
>https://www.spinics.net/lists/netdev/msg444067.html
Roopa Prabhu Sept. 5, 2017, 3:17 p.m. UTC | #3
On Tue, Sep 5, 2017 at 7:07 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote:
>>Hi all,
>>This RFC adds a new mode for clsact which designates a device's egress
>>classifier as global per netns. The packets that are not classified for
>>a particular device will be classified using the global classifier.
>>We have needed a global classifier for some time now for various
>>purposes and setting the single bridge or loopback/vrf device as the
>>global classifier device is acceptable for us. Doing it this way avoids
>>the act/cls device and queue dependencies.
>>
>>This is strictly an RFC patch just to show the intent, if we agree on
>>the details the proposed patch will have support for both ingress and
>>egress, and will be using a static key to avoid the fast path test when no
>>global classifier has been configured.
>>
>>Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>>$ tc qdisc add dev lo clsact global
>>$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>
>>the last filter will be global for all devices that don't have a
>>specific egress_cl_list (i.e. have clsact configured).
>>
>>Any comments and thoughts would be greatly appreciated.
>
> Did you see my shared blocks work? I believe that it should resolve your
> usecase, in a generic way. You just have to bind the devices you need to
> the shared block. Please see the RFC:
>
> https://www.spinics.net/lists/netdev/msg444067.html


Jiri, yes, we have seen this series. This still requires one to make
the association between dev and tc shared block..and
the rules are associated with every device. Your work will help the
case and is needed for tc in general and can co-exist.
It takes us closer but is still not a way to create global tc rules.
imagine thousands of netdevs. We would ideally like the show to also
display a single set of rules.
Given tc has a rich set of classifiers and actions (and very
extensible!), we are trying to see if those can
be easily applied globally than being tied to a device. Maybe there
are other better ways to achieve this...this thread
is to start that discussion. I think solving this once will help the
scale issue for your hardware offload case as well.
Cong Wang Sept. 5, 2017, 6:18 p.m. UTC | #4
On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> Hi all,
> This RFC adds a new mode for clsact which designates a device's egress
> classifier as global per netns. The packets that are not classified for
> a particular device will be classified using the global classifier.
> We have needed a global classifier for some time now for various
> purposes and setting the single bridge or loopback/vrf device as the
> global classifier device is acceptable for us. Doing it this way avoids
> the act/cls device and queue dependencies.
>
> This is strictly an RFC patch just to show the intent, if we agree on
> the details the proposed patch will have support for both ingress and
> egress, and will be using a static key to avoid the fast path test when no
> global classifier has been configured.
>
> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
> $ tc qdisc add dev lo clsact global
> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>
> the last filter will be global for all devices that don't have a
> specific egress_cl_list (i.e. have clsact configured).

Sorry this is too ugly.

netdevice is still implied in your command line even if you treat it
as global. It is essentially hard to bypass netdevice layer since
netdevice is the core of L2 and also where everything begins.

Maybe the best we can do here is make tc filters standalone
as tc actions so that filters can exist before qdisc's and netdevices.
But this probably requires significant works to make it working
with both existing non-standalone and bindings standalones
with qdisc's.

Note I don't look deeply into this, just one thought, at least this
appears less ugly than yours.
Nikolay Aleksandrov Sept. 5, 2017, 6:25 p.m. UTC | #5
On 9/5/17 9:18 PM, Cong Wang wrote:
> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> Hi all,
>> This RFC adds a new mode for clsact which designates a device's egress
>> classifier as global per netns. The packets that are not classified for
>> a particular device will be classified using the global classifier.
>> We have needed a global classifier for some time now for various
>> purposes and setting the single bridge or loopback/vrf device as the
>> global classifier device is acceptable for us. Doing it this way avoids
>> the act/cls device and queue dependencies.
>>
>> This is strictly an RFC patch just to show the intent, if we agree on
>> the details the proposed patch will have support for both ingress and
>> egress, and will be using a static key to avoid the fast path test when no
>> global classifier has been configured.
>>
>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>> $ tc qdisc add dev lo clsact global
>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>
>> the last filter will be global for all devices that don't have a
>> specific egress_cl_list (i.e. have clsact configured).
> 
> Sorry this is too ugly
> 
> netdevice is still implied in your command line even if you treat it
> as global. It is essentially hard to bypass netdevice layer since
> netdevice is the core of L2 and also where everything begins.
> 

This is only a quick RFC, that can be removed entirely if we limit it to the 
netns and its loopback device. Then we can drop the "device" keyword altogether.

> Maybe the best we can do here is make tc filters standalone
> as tc actions so that filters can exist before qdisc's and netdevices.
> But this probably requires significant works to make it working
> with both existing non-standalone and bindings standalones
> with qdisc's.
>

We've actually been discussing this option internally as well.
I think we'll look into doing that regardless of this patch.
> Note I don't look deeply into this, just one thought, at least this
> appears less ugly than yours.
> 

What I did was aimed at simplicity and is merely a mode of clsact which
doesn't have an impact if not configured. Every other solution requires
a much more invasive change, note that doesn't mean I don't agree. :-)
Roopa Prabhu Sept. 5, 2017, 10:01 p.m. UTC | #6
On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> Hi all,
>> This RFC adds a new mode for clsact which designates a device's egress
>> classifier as global per netns. The packets that are not classified for
>> a particular device will be classified using the global classifier.
>> We have needed a global classifier for some time now for various
>> purposes and setting the single bridge or loopback/vrf device as the
>> global classifier device is acceptable for us. Doing it this way avoids
>> the act/cls device and queue dependencies.
>>
>> This is strictly an RFC patch just to show the intent, if we agree on
>> the details the proposed patch will have support for both ingress and
>> egress, and will be using a static key to avoid the fast path test when no
>> global classifier has been configured.
>>
>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>> $ tc qdisc add dev lo clsact global
>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>
>> the last filter will be global for all devices that don't have a
>> specific egress_cl_list (i.e. have clsact configured).
>
> Sorry this is too ugly.
>
> netdevice is still implied in your command line even if you treat it
> as global. It is essentially hard to bypass netdevice layer since
> netdevice is the core of L2 and also where everything begins.
>
> Maybe the best we can do here is make tc filters standalone
> as tc actions so that filters can exist before qdisc's and netdevices.
> But this probably requires significant works to make it working
> with both existing non-standalone and bindings standalones
> with qdisc's.

yes, like Nikolay says we have been discussing this as well. Nikolay's
patch is a cleaver and most importantly non-invasive
way today given the anchor point for tc rules is a netdev. we have
also considered a separate implicit tc anchor device.
 lo seemed like a good fit for all global rules given it is already
available. And it is not uncommon to hang off global
network config on the loopback interface.
Jamal Hadi Salim Sept. 5, 2017, 10:25 p.m. UTC | #7
On 17-09-05 06:01 PM, Roopa Prabhu wrote:

> 
> yes, like Nikolay says we have been discussing this as well. Nikolay's
> patch is a cleaver and most importantly non-invasive
> way today given the anchor point for tc rules is a netdev. we have
> also considered a separate implicit tc anchor device.
>   lo seemed like a good fit for all global rules given it is already
> available. And it is not uncommon to hang off global
> network config on the loopback interface.
> 

IMO, Jiri has done all the necessary work already with the concept of
blocks. We dont really need the netdev to be the attachment point.

You can add to a block in many locations in the kernel by
constructing the proper "coordinates" in the tcmsg.
i.e this:
   tcmsg {
         unsigned char   tcm_family;
         unsigned char   tcm__pad1;
         unsigned short  tcm__pad2;
         int             tcm_ifindex;
         __u32           tcm_handle;
         __u32           tcm_parent;
}

If you were to set tcm_ifindex to -1 (since that is not a legit
ifindex) then all we need to do is define a parent for a
different location. Current locations tied to netdevs are:
-----
#define TC_H_ROOT       (0xFFFFFFFFU)
#define TC_H_INGRESS    (0xFFFFFFF1U)
#define TC_H_CLSACT     TC_H_INGRESS

#define TC_H_MIN_INGRESS        0xFFF2U
#define TC_H_MIN_EGRESS         0xFFF3U
-----

You should be able to say add a location which maps to a pre-routing
or post-routing etc; and this would work as well...

cheers,
jamal
Daniel Borkmann Sept. 5, 2017, 10:45 p.m. UTC | #8
On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>> <nikolay@cumulusnetworks.com> wrote:
>>> Hi all,
>>> This RFC adds a new mode for clsact which designates a device's egress
>>> classifier as global per netns. The packets that are not classified for
>>> a particular device will be classified using the global classifier.
>>> We have needed a global classifier for some time now for various
>>> purposes and setting the single bridge or loopback/vrf device as the

Can you elaborate a bit more on the ... "we have needed a global
classifier for some time now for various purposes".

>>> global classifier device is acceptable for us. Doing it this way avoids
>>> the act/cls device and queue dependencies.
>>>
>>> This is strictly an RFC patch just to show the intent, if we agree on
>>> the details the proposed patch will have support for both ingress and
>>> egress, and will be using a static key to avoid the fast path test when no
>>> global classifier has been configured.
>>>
>>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>>> $ tc qdisc add dev lo clsact global
>>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>>
>>> the last filter will be global for all devices that don't have a
>>> specific egress_cl_list (i.e. have clsact configured).
>>
>> Sorry this is too ugly.

+1

>> netdevice is still implied in your command line even if you treat it
>> as global. It is essentially hard to bypass netdevice layer since
>> netdevice is the core of L2 and also where everything begins.
>>
>> Maybe the best we can do here is make tc filters standalone
>> as tc actions so that filters can exist before qdisc's and netdevices.
>> But this probably requires significant works to make it working
>> with both existing non-standalone and bindings standalones
>> with qdisc's.
>
> yes, like Nikolay says we have been discussing this as well. Nikolay's
> patch is a cleaver and most importantly non-invasive
> way today given the anchor point for tc rules is a netdev. we have
> also considered a separate implicit tc anchor device.

Seems ugly just as well. :( Hmm, why not just having the two list
pointers (ingress, egress list) in the netns struct and when
something configures them to be effectively non-zero, then devices
in that netns could automatically get a clsact and inherit the
lists from there such that sch_handle_ingress() and sch_handle_egress()
require exactly zero changes in fast-path. You could then go and
say that either you would make changes to clsact for individual
devices immutable when they use the 'shared' list pointers, or then
duplicate the configs when being altered from the global one. Would
push the complexity to control path only at least. Just a brief
thought.
Daniel Borkmann Sept. 5, 2017, 11:12 p.m. UTC | #9
On 09/06/2017 12:45 AM, Daniel Borkmann wrote:
> On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> Hi all,
>>>> This RFC adds a new mode for clsact which designates a device's egress
>>>> classifier as global per netns. The packets that are not classified for
>>>> a particular device will be classified using the global classifier.
>>>> We have needed a global classifier for some time now for various
>>>> purposes and setting the single bridge or loopback/vrf device as the
>
> Can you elaborate a bit more on the ... "we have needed a global
> classifier for some time now for various purposes".
>
>>>> global classifier device is acceptable for us. Doing it this way avoids
>>>> the act/cls device and queue dependencies.
>>>>
>>>> This is strictly an RFC patch just to show the intent, if we agree on
>>>> the details the proposed patch will have support for both ingress and
>>>> egress, and will be using a static key to avoid the fast path test when no
>>>> global classifier has been configured.
>>>>
>>>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>>>> $ tc qdisc add dev lo clsact global
>>>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>>>
>>>> the last filter will be global for all devices that don't have a
>>>> specific egress_cl_list (i.e. have clsact configured).
>>>
>>> Sorry this is too ugly.
>
> +1
>
>>> netdevice is still implied in your command line even if you treat it
>>> as global. It is essentially hard to bypass netdevice layer since
>>> netdevice is the core of L2 and also where everything begins.

One could probably use special wildcard device name, e.g. tcpdump
allows for 'tcpdump -i any' to capture on all devices, so you could
indicate something like 'tc filter add dev any blah' to have similar
semantics in the realm of the netns w/o making it look too hacky
for tc users perhaps.

>>> Maybe the best we can do here is make tc filters standalone
>>> as tc actions so that filters can exist before qdisc's and netdevices.
>>> But this probably requires significant works to make it working
>>> with both existing non-standalone and bindings standalones
>>> with qdisc's.
>>
>> yes, like Nikolay says we have been discussing this as well. Nikolay's
>> patch is a cleaver and most importantly non-invasive
>> way today given the anchor point for tc rules is a netdev. we have
>> also considered a separate implicit tc anchor device.
>
> Seems ugly just as well. :( Hmm, why not just having the two list
> pointers (ingress, egress list) in the netns struct and when
> something configures them to be effectively non-zero, then devices
> in that netns could automatically get a clsact and inherit the
> lists from there such that sch_handle_ingress() and sch_handle_egress()
> require exactly zero changes in fast-path. You could then go and
> say that either you would make changes to clsact for individual
> devices immutable when they use the 'shared' list pointers, or then
> duplicate the configs when being altered from the global one. Would
> push the complexity to control path only at least. Just a brief
> thought.
Roopa Prabhu Sept. 6, 2017, 4:04 a.m. UTC | #10
On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
>>
>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com>
>> wrote:
>>>
>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>>
>>>> Hi all,
>>>> This RFC adds a new mode for clsact which designates a device's egress
>>>> classifier as global per netns. The packets that are not classified for
>>>> a particular device will be classified using the global classifier.
>>>> We have needed a global classifier for some time now for various
>>>> purposes and setting the single bridge or loopback/vrf device as the
>
>
> Can you elaborate a bit more on the ... "we have needed a global
> classifier for some time now for various purposes".

Most of our acl's are global or use a wildcard. eg iptables supports
global rules without an dev. We do end up having hundreds of netdevs.
Another use case for the future is use of tc for policy based routing
which requires global rules.
Roopa Prabhu Sept. 6, 2017, 4:09 a.m. UTC | #11
On Tue, Sep 5, 2017 at 3:25 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 17-09-05 06:01 PM, Roopa Prabhu wrote:
>
>>
>> yes, like Nikolay says we have been discussing this as well. Nikolay's
>> patch is a cleaver and most importantly non-invasive
>> way today given the anchor point for tc rules is a netdev. we have
>> also considered a separate implicit tc anchor device.
>>   lo seemed like a good fit for all global rules given it is already
>> available. And it is not uncommon to hang off global
>> network config on the loopback interface.
>>
>
> IMO, Jiri has done all the necessary work already with the concept of
> blocks. We dont really need the netdev to be the attachment point.
>
> You can add to a block in many locations in the kernel by
> constructing the proper "coordinates" in the tcmsg.
> i.e this:
>   tcmsg {
>         unsigned char   tcm_family;
>         unsigned char   tcm__pad1;
>         unsigned short  tcm__pad2;
>         int             tcm_ifindex;
>         __u32           tcm_handle;
>         __u32           tcm_parent;
> }
>
> If you were to set tcm_ifindex to -1 (since that is not a legit
> ifindex) then all we need to do is define a parent for a
> different location. Current locations tied to netdevs are:
> -----
> #define TC_H_ROOT       (0xFFFFFFFFU)
> #define TC_H_INGRESS    (0xFFFFFFF1U)
> #define TC_H_CLSACT     TC_H_INGRESS
>
> #define TC_H_MIN_INGRESS        0xFFF2U
> #define TC_H_MIN_EGRESS         0xFFF3U
> -----
>
> You should be able to say add a location which maps to a pre-routing
> or post-routing etc; and this would work as well...
>

ok, interesting. Jiri's examples still showed netdev as attachment point.
we will explore some more. thanks.
Jiri Pirko Sept. 6, 2017, 7:24 a.m. UTC | #12
Wed, Sep 06, 2017 at 06:04:17AM CEST, roopa@cumulusnetworks.com wrote:
>On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
>>>
>>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com>
>>> wrote:
>>>>
>>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>>>> <nikolay@cumulusnetworks.com> wrote:
>>>>>
>>>>> Hi all,
>>>>> This RFC adds a new mode for clsact which designates a device's egress
>>>>> classifier as global per netns. The packets that are not classified for
>>>>> a particular device will be classified using the global classifier.
>>>>> We have needed a global classifier for some time now for various
>>>>> purposes and setting the single bridge or loopback/vrf device as the
>>
>>
>> Can you elaborate a bit more on the ... "we have needed a global
>> classifier for some time now for various purposes".
>
>Most of our acl's are global or use a wildcard. eg iptables supports
>global rules without an dev. We do end up having hundreds of netdevs.
>Another use case for the future is use of tc for policy based routing
>which requires global rules.

That is not how TC works. There are devices, qdiscs, blocks, chains. The
global approach does not fit. The block sharing gets you what you need,
without need for any ugly hack.
Nikolay Aleksandrov Sept. 6, 2017, 10:14 a.m. UTC | #13
On 06/09/17 01:45, Daniel Borkmann wrote:
> On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> Hi all,
>>>> This RFC adds a new mode for clsact which designates a device's egress
>>>> classifier as global per netns. The packets that are not classified for
>>>> a particular device will be classified using the global classifier.
>>>> We have needed a global classifier for some time now for various
>>>> purposes and setting the single bridge or loopback/vrf device as the
> 
> Can you elaborate a bit more on the ... "we have needed a global
> classifier for some time now for various purposes".
> 
>>>> global classifier device is acceptable for us. Doing it this way avoids
>>>> the act/cls device and queue dependencies.
>>>>
>>>> This is strictly an RFC patch just to show the intent, if we agree on
>>>> the details the proposed patch will have support for both ingress and
>>>> egress, and will be using a static key to avoid the fast path test when no
>>>> global classifier has been configured.
>>>>
>>>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact):
>>>> $ tc qdisc add dev lo clsact global
>>>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop
>>>>
>>>> the last filter will be global for all devices that don't have a
>>>> specific egress_cl_list (i.e. have clsact configured).
>>>
>>> Sorry this is too ugly.
> 
> +1
> 
>>> netdevice is still implied in your command line even if you treat it
>>> as global. It is essentially hard to bypass netdevice layer since
>>> netdevice is the core of L2 and also where everything begins.
>>>
>>> Maybe the best we can do here is make tc filters standalone
>>> as tc actions so that filters can exist before qdisc's and netdevices.
>>> But this probably requires significant works to make it working
>>> with both existing non-standalone and bindings standalones
>>> with qdisc's.
>>
>> yes, like Nikolay says we have been discussing this as well. Nikolay's
>> patch is a cleaver and most importantly non-invasive
>> way today given the anchor point for tc rules is a netdev. we have
>> also considered a separate implicit tc anchor device.
> 
> Seems ugly just as well. :( Hmm, why not just having the two list
> pointers (ingress, egress list) in the netns struct and when
> something configures them to be effectively non-zero, then devices
> in that netns could automatically get a clsact and inherit the
> lists from there such that sch_handle_ingress() and sch_handle_egress()
> require exactly zero changes in fast-path. You could then go and
> say that either you would make changes to clsact for individual
> devices immutable when they use the 'shared' list pointers, or then
> duplicate the configs when being altered from the global one. Would
> push the complexity to control path only at least. Just a brief
> thought.

Sure, this is a nice refinement, if we decide to continue with the idea of a global
clsact filter I'll push it all to the control path.

Thanks,
 Nik
Roopa Prabhu Sept. 6, 2017, 2:19 p.m. UTC | #14
On Wed, Sep 6, 2017 at 12:24 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, Sep 06, 2017 at 06:04:17AM CEST, roopa@cumulusnetworks.com wrote:
>>On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> On 09/06/2017 12:01 AM, Roopa Prabhu wrote:
>>>>
>>>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov
>>>>> <nikolay@cumulusnetworks.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>> This RFC adds a new mode for clsact which designates a device's egress
>>>>>> classifier as global per netns. The packets that are not classified for
>>>>>> a particular device will be classified using the global classifier.
>>>>>> We have needed a global classifier for some time now for various
>>>>>> purposes and setting the single bridge or loopback/vrf device as the
>>>
>>>
>>> Can you elaborate a bit more on the ... "we have needed a global
>>> classifier for some time now for various purposes".
>>
>>Most of our acl's are global or use a wildcard. eg iptables supports
>>global rules without an dev. We do end up having hundreds of netdevs.
>>Another use case for the future is use of tc for policy based routing
>>which requires global rules.
>
> That is not how TC works. There are devices, qdiscs, blocks, chains. The
> global approach does not fit. The block sharing gets you what you need,
> without need for any ugly hack.

We know how TC works. And we are only trying to see if the classifiers
and actions can be used
in other use-cases. Note that the patch was an RFC to start the
discussion and refine
it. We will explore your block sharing patches when they are in.
diff mbox series

Patch

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dea59c8eec54..cb97973b8555 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -88,6 +88,7 @@  void net_dec_ingress_queue(void);
 #ifdef CONFIG_NET_EGRESS
 void net_inc_egress_queue(void);
 void net_dec_egress_queue(void);
+int net_set_global_egress_cls_dev(struct net *net, struct net_device *dev);
 #endif
 
 void rtnetlink_init(void);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 57faa375eab9..3fef53dfdbfc 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -149,6 +149,9 @@  struct net {
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
+#ifdef CONFIG_NET_EGRESS
+	struct net_device __rcu *global_egress_dev;
+#endif
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 135f5a2dd931..a37c37062446 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -69,6 +69,7 @@  struct Qdisc {
 				      * qdisc_tree_decrease_qlen() should stop.
 				      */
 #define TCQ_F_INVISIBLE		0x80 /* invisible by default in dump */
+#define TCQ_F_GLOBAL		0x100 /* device is used as global clsact dev */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6f845e4fec17..cf883b2470a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1718,6 +1718,31 @@  void net_dec_egress_queue(void)
 	static_key_slow_dec(&egress_needed);
 }
 EXPORT_SYMBOL_GPL(net_dec_egress_queue);
+
+int net_set_global_egress_cls_dev(struct net *net, struct net_device *dev)
+{
+	struct net_device *cur_dev;
+
+	ASSERT_RTNL();
+
+	cur_dev = rtnl_dereference(net->global_egress_dev);
+	if (dev) {
+		/* replace not allowed */
+		if (cur_dev)
+			return -EBUSY;
+		/* global cls devices should not change netns */
+		if (!(dev->features & NETIF_F_NETNS_LOCAL))
+			return -EINVAL;
+	}
+
+	/* set or clear based on dev */
+	rcu_assign_pointer(net->global_egress_dev, dev);
+	if (!dev)
+		synchronize_rcu_bh();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(net_set_global_egress_cls_dev);
 #endif
 
 static struct static_key netstamp_needed __read_mostly;
@@ -3244,8 +3269,15 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
 	struct tcf_result cl_res;
 
-	if (!cl)
-		return skb;
+	if (!cl) {
+		struct net_device *gdev;
+
+		gdev = rcu_dereference_bh(dev_net(dev)->global_egress_dev);
+		if (gdev)
+			cl = rcu_dereference_bh(gdev->egress_cl_list);
+		if (!cl)
+			return skb;
+	}
 
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_bstats_cpu_update(cl->q, skb);
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 44de4ee51ce9..a4871f138904 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -153,6 +153,9 @@  static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
+	net_inc_ingress_queue();
+	net_inc_egress_queue();
+
 	err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list);
 	if (err)
 		return err;
@@ -161,8 +164,12 @@  static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 	if (err)
 		return err;
 
-	net_inc_ingress_queue();
-	net_inc_egress_queue();
+	if (opt) {
+		err = net_set_global_egress_cls_dev(dev_net(dev), dev);
+		if (err)
+			return err;
+		sch->flags |= TCQ_F_GLOBAL;
+	}
 
 	sch->flags |= TCQ_F_CPUSTATS;
 
@@ -172,6 +179,10 @@  static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+
+	if (sch->flags & TCQ_F_GLOBAL)
+		WARN_ON_ONCE(net_set_global_egress_cls_dev(dev_net(dev), NULL));
 
 	tcf_block_put(q->egress_block);
 	tcf_block_put(q->ingress_block);