Message ID | 1504615701-20912-1-git-send-email-nikolay@cumulusnetworks.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Series | [RFC,net-next] net: sch_clsact: add support for global per-netns classifier mode | expand |
Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote: >Hi all, >This RFC adds a new mode for clsact which designates a device's egress >classifier as global per netns. The packets that are not classified for >a particular device will be classified using the global classifier. >We have needed a global classifier for some time now for various >purposes and setting the single bridge or loopback/vrf device as the >global classifier device is acceptable for us. Doing it this way avoids >the act/cls device and queue dependencies. > >This is strictly an RFC patch just to show the intent, if we agree on >the details the proposed patch will have support for both ingress and >egress, and will be using a static key to avoid the fast path test when no >global classifier has been configured. > >Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >$ tc qdisc add dev lo clsact global >$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop > >the last filter will be global for all devices that don't have a >specific egress_cl_list (i.e. have clsact configured). > >Any comments and thoughts would be greatly appreciated. Did you see my shared blocks work? I believe that it should resolve your usecase, in a generic way. You just have to bind the devices you need to the shared block. Please see the RFC: https://www.spinics.net/lists/netdev/msg444067.html
Tue, Sep 05, 2017 at 04:07:51PM CEST, jiri@resnulli.us wrote: >Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote: >>Hi all, >>This RFC adds a new mode for clsact which designates a device's egress >>classifier as global per netns. The packets that are not classified for >>a particular device will be classified using the global classifier. >>We have needed a global classifier for some time now for various >>purposes and setting the single bridge or loopback/vrf device as the >>global classifier device is acceptable for us. Doing it this way avoids >>the act/cls device and queue dependencies. >> >>This is strictly an RFC patch just to show the intent, if we agree on >>the details the proposed patch will have support for both ingress and >>egress, and will be using a static key to avoid the fast path test when no >>global classifier has been configured. >> >>Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >>$ tc qdisc add dev lo clsact global >>$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >> >>the last filter will be global for all devices that don't have a >>specific egress_cl_list (i.e. have clsact configured). >> >>Any comments and thoughts would be greatly appreciated. For the record, I think this "global" thing is a hack similar to cls_u32 shared hashlists. > >Did you see my shared blocks work? I believe that it should resolve your >usecase, in a generic way. You just have to bind the devices you need to >the shared block. Please see the RFC: > >https://www.spinics.net/lists/netdev/msg444067.html
On Tue, Sep 5, 2017 at 7:07 AM, Jiri Pirko <jiri@resnulli.us> wrote: > Tue, Sep 05, 2017 at 02:48:21PM CEST, nikolay@cumulusnetworks.com wrote: >>Hi all, >>This RFC adds a new mode for clsact which designates a device's egress >>classifier as global per netns. The packets that are not classified for >>a particular device will be classified using the global classifier. >>We have needed a global classifier for some time now for various >>purposes and setting the single bridge or loopback/vrf device as the >>global classifier device is acceptable for us. Doing it this way avoids >>the act/cls device and queue dependencies. >> >>This is strictly an RFC patch just to show the intent, if we agree on >>the details the proposed patch will have support for both ingress and >>egress, and will be using a static key to avoid the fast path test when no >>global classifier has been configured. >> >>Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >>$ tc qdisc add dev lo clsact global >>$ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >> >>the last filter will be global for all devices that don't have a >>specific egress_cl_list (i.e. have clsact configured). >> >>Any comments and thoughts would be greatly appreciated. > > Did you see my shared blocks work? I believe that it should resolve your > usecase, in a generic way. You just have to bind the devices you need to > the shared block. Please see the RFC: > > https://www.spinics.net/lists/netdev/msg444067.html Jiri, yes, we have seen this series. This still requires one to make the association between dev and tc shared block..and the rules are associated with every device. Your work will help the case and is needed for tc in general and can co-exist. It takes us closer but is still not a way to create global tc rules. imagine thousands of netdevs. We would ideally like the show to also display a single set of rules. Given tc has a rich set of classifiers and actions (and very extensible!), we are trying to see if those can be easily applied globally than being tied to a device. Maybe there are other better ways to achieve this...this thread is to start that discussion. I think solving this once will help the scale issue for your hardware offload case as well.
On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote: > Hi all, > This RFC adds a new mode for clsact which designates a device's egress > classifier as global per netns. The packets that are not classified for > a particular device will be classified using the global classifier. > We have needed a global classifier for some time now for various > purposes and setting the single bridge or loopback/vrf device as the > global classifier device is acceptable for us. Doing it this way avoids > the act/cls device and queue dependencies. > > This is strictly an RFC patch just to show the intent, if we agree on > the details the proposed patch will have support for both ingress and > egress, and will be using a static key to avoid the fast path test when no > global classifier has been configured. > > Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): > $ tc qdisc add dev lo clsact global > $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop > > the last filter will be global for all devices that don't have a > specific egress_cl_list (i.e. have clsact configured). Sorry this is too ugly. netdevice is still implied in your command line even if you treat it as global. It is essentially hard to bypass netdevice layer since netdevice is the core of L2 and also where everything begins. Maybe the best we can do here is make tc filters standalone as tc actions so that filters can exist before qdisc's and netdevices. But this probably requires significant works to make it working with both existing non-standalone and bindings standalones with qdisc's. Note I don't look deeply into this, just one thought, at least this appears less ugly than yours.
On 9/5/17 9:18 PM, Cong Wang wrote: > On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov > <nikolay@cumulusnetworks.com> wrote: >> Hi all, >> This RFC adds a new mode for clsact which designates a device's egress >> classifier as global per netns. The packets that are not classified for >> a particular device will be classified using the global classifier. >> We have needed a global classifier for some time now for various >> purposes and setting the single bridge or loopback/vrf device as the >> global classifier device is acceptable for us. Doing it this way avoids >> the act/cls device and queue dependencies. >> >> This is strictly an RFC patch just to show the intent, if we agree on >> the details the proposed patch will have support for both ingress and >> egress, and will be using a static key to avoid the fast path test when no >> global classifier has been configured. >> >> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >> $ tc qdisc add dev lo clsact global >> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >> >> the last filter will be global for all devices that don't have a >> specific egress_cl_list (i.e. have clsact configured). > > Sorry this is too ugly > > netdevice is still implied in your command line even if you treat it > as global. It is essentially hard to bypass netdevice layer since > netdevice is the core of L2 and also where everything begins. > This is only a quick RFC, that can be removed entirely if we limit it to the netns and its loopback device. Then we can drop the "device" keyword altogether. > Maybe the best we can do here is make tc filters standalone > as tc actions so that filters can exist before qdisc's and netdevices. > But this probably requires significant works to make it working > with both existing non-standalone and bindings standalones > with qdisc's. > We've actually been discussing this option internally as well. I think we'll look into doing that regardless of this patch. > Note I don't look deeply into this, just one thought, at least this > appears less ugly than yours. > What I did was aimed at simplicity and is merely a mode of clsact which doesn't have an impact if not configured. Every other solution requires a much more invasive change, note that doesn't mean I don't agree. :-)
On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov > <nikolay@cumulusnetworks.com> wrote: >> Hi all, >> This RFC adds a new mode for clsact which designates a device's egress >> classifier as global per netns. The packets that are not classified for >> a particular device will be classified using the global classifier. >> We have needed a global classifier for some time now for various >> purposes and setting the single bridge or loopback/vrf device as the >> global classifier device is acceptable for us. Doing it this way avoids >> the act/cls device and queue dependencies. >> >> This is strictly an RFC patch just to show the intent, if we agree on >> the details the proposed patch will have support for both ingress and >> egress, and will be using a static key to avoid the fast path test when no >> global classifier has been configured. >> >> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >> $ tc qdisc add dev lo clsact global >> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >> >> the last filter will be global for all devices that don't have a >> specific egress_cl_list (i.e. have clsact configured). > > Sorry this is too ugly. > > netdevice is still implied in your command line even if you treat it > as global. It is essentially hard to bypass netdevice layer since > netdevice is the core of L2 and also where everything begins. > > Maybe the best we can do here is make tc filters standalone > as tc actions so that filters can exist before qdisc's and netdevices. > But this probably requires significant works to make it working > with both existing non-standalone and bindings standalones > with qdisc's. yes, like Nikolay says we have been discussing this as well. Nikolay's patch is a cleaver and most importantly non-invasive way today given the anchor point for tc rules is a netdev. we have also considered a separate implicit tc anchor device. lo seemed like a good fit for all global rules given it is already available. And it is not uncommon to hang off global network config on the loopback interface.
On 17-09-05 06:01 PM, Roopa Prabhu wrote: > > yes, like Nikolay says we have been discussing this as well. Nikolay's > patch is a cleaver and most importantly non-invasive > way today given the anchor point for tc rules is a netdev. we have > also considered a separate implicit tc anchor device. > lo seemed like a good fit for all global rules given it is already > available. And it is not uncommon to hang off global > network config on the loopback interface. > IMO, Jiri has done all the necessary work already with the concept of blocks. We dont really need the netdev to be the attachment point. You can add to a block in many locations in the kernel by constructing the proper "coordinates" in the tcmsg. i.e this: tcmsg { unsigned char tcm_family; unsigned char tcm__pad1; unsigned short tcm__pad2; int tcm_ifindex; __u32 tcm_handle; __u32 tcm_parent; } If you were to set tcm_ifindex to -1 (since that is not a legit ifindex) then all we need to do is define a parent for a different location. Current locations tied to netdevs are: ----- #define TC_H_ROOT (0xFFFFFFFFU) #define TC_H_INGRESS (0xFFFFFFF1U) #define TC_H_CLSACT TC_H_INGRESS #define TC_H_MIN_INGRESS 0xFFF2U #define TC_H_MIN_EGRESS 0xFFF3U ----- You should be able to say add a location which maps to a pre-routing or post-routing etc; and this would work as well... cheers, jamal
On 09/06/2017 12:01 AM, Roopa Prabhu wrote: > On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >> <nikolay@cumulusnetworks.com> wrote: >>> Hi all, >>> This RFC adds a new mode for clsact which designates a device's egress >>> classifier as global per netns. The packets that are not classified for >>> a particular device will be classified using the global classifier. >>> We have needed a global classifier for some time now for various >>> purposes and setting the single bridge or loopback/vrf device as the Can you elaborate a bit more on the ... "we have needed a global classifier for some time now for various purposes". >>> global classifier device is acceptable for us. Doing it this way avoids >>> the act/cls device and queue dependencies. >>> >>> This is strictly an RFC patch just to show the intent, if we agree on >>> the details the proposed patch will have support for both ingress and >>> egress, and will be using a static key to avoid the fast path test when no >>> global classifier has been configured. >>> >>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >>> $ tc qdisc add dev lo clsact global >>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >>> >>> the last filter will be global for all devices that don't have a >>> specific egress_cl_list (i.e. have clsact configured). >> >> Sorry this is too ugly. +1 >> netdevice is still implied in your command line even if you treat it >> as global. It is essentially hard to bypass netdevice layer since >> netdevice is the core of L2 and also where everything begins. >> >> Maybe the best we can do here is make tc filters standalone >> as tc actions so that filters can exist before qdisc's and netdevices. >> But this probably requires significant works to make it working >> with both existing non-standalone and bindings standalones >> with qdisc's. > > yes, like Nikolay says we have been discussing this as well. Nikolay's > patch is a cleaver and most importantly non-invasive > way today given the anchor point for tc rules is a netdev. we have > also considered a separate implicit tc anchor device. Seems ugly just as well. :( Hmm, why not just having the two list pointers (ingress, egress list) in the netns struct and when something configures them to be effectively non-zero, then devices in that netns could automatically get a clsact and inherit the lists from there such that sch_handle_ingress() and sch_handle_egress() require exactly zero changes in fast-path. You could then go and say that either you would make changes to clsact for individual devices immutable when they use the 'shared' list pointers, or then duplicate the configs when being altered from the global one. Would push the complexity to control path only at least. Just a brief thought.
On 09/06/2017 12:45 AM, Daniel Borkmann wrote: > On 09/06/2017 12:01 AM, Roopa Prabhu wrote: >> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >>> <nikolay@cumulusnetworks.com> wrote: >>>> Hi all, >>>> This RFC adds a new mode for clsact which designates a device's egress >>>> classifier as global per netns. The packets that are not classified for >>>> a particular device will be classified using the global classifier. >>>> We have needed a global classifier for some time now for various >>>> purposes and setting the single bridge or loopback/vrf device as the > > Can you elaborate a bit more on the ... "we have needed a global > classifier for some time now for various purposes". > >>>> global classifier device is acceptable for us. Doing it this way avoids >>>> the act/cls device and queue dependencies. >>>> >>>> This is strictly an RFC patch just to show the intent, if we agree on >>>> the details the proposed patch will have support for both ingress and >>>> egress, and will be using a static key to avoid the fast path test when no >>>> global classifier has been configured. >>>> >>>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >>>> $ tc qdisc add dev lo clsact global >>>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >>>> >>>> the last filter will be global for all devices that don't have a >>>> specific egress_cl_list (i.e. have clsact configured). >>> >>> Sorry this is too ugly. > > +1 > >>> netdevice is still implied in your command line even if you treat it >>> as global. It is essentially hard to bypass netdevice layer since >>> netdevice is the core of L2 and also where everything begins. One could probably use special wildcard device name, e.g. tcpdump allows for 'tcpdump -i any' to capture on all devices, so you could indicate something like 'tc filter add dev any blah' to have similar semantics in the realm of the netns w/o making it look too hacky for tc users perhaps. >>> Maybe the best we can do here is make tc filters standalone >>> as tc actions so that filters can exist before qdisc's and netdevices. >>> But this probably requires significant works to make it working >>> with both existing non-standalone and bindings standalones >>> with qdisc's. >> >> yes, like Nikolay says we have been discussing this as well. Nikolay's >> patch is a cleaver and most importantly non-invasive >> way today given the anchor point for tc rules is a netdev. we have >> also considered a separate implicit tc anchor device. > > Seems ugly just as well. :( Hmm, why not just having the two list > pointers (ingress, egress list) in the netns struct and when > something configures them to be effectively non-zero, then devices > in that netns could automatically get a clsact and inherit the > lists from there such that sch_handle_ingress() and sch_handle_egress() > require exactly zero changes in fast-path. You could then go and > say that either you would make changes to clsact for individual > devices immutable when they use the 'shared' list pointers, or then > duplicate the configs when being altered from the global one. Would > push the complexity to control path only at least. Just a brief > thought.
On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: > On 09/06/2017 12:01 AM, Roopa Prabhu wrote: >> >> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> >> wrote: >>> >>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >>> <nikolay@cumulusnetworks.com> wrote: >>>> >>>> Hi all, >>>> This RFC adds a new mode for clsact which designates a device's egress >>>> classifier as global per netns. The packets that are not classified for >>>> a particular device will be classified using the global classifier. >>>> We have needed a global classifier for some time now for various >>>> purposes and setting the single bridge or loopback/vrf device as the > > > Can you elaborate a bit more on the ... "we have needed a global > classifier for some time now for various purposes". Most of our acl's are global or use a wildcard. eg iptables supports global rules without an dev. We do end up having hundreds of netdevs. Another use case for the future is use of tc for policy based routing which requires global rules.
On Tue, Sep 5, 2017 at 3:25 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 17-09-05 06:01 PM, Roopa Prabhu wrote: > >> >> yes, like Nikolay says we have been discussing this as well. Nikolay's >> patch is a cleaver and most importantly non-invasive >> way today given the anchor point for tc rules is a netdev. we have >> also considered a separate implicit tc anchor device. >> lo seemed like a good fit for all global rules given it is already >> available. And it is not uncommon to hang off global >> network config on the loopback interface. >> > > IMO, Jiri has done all the necessary work already with the concept of > blocks. We dont really need the netdev to be the attachment point. > > You can add to a block in many locations in the kernel by > constructing the proper "coordinates" in the tcmsg. > i.e this: > tcmsg { > unsigned char tcm_family; > unsigned char tcm__pad1; > unsigned short tcm__pad2; > int tcm_ifindex; > __u32 tcm_handle; > __u32 tcm_parent; > } > > If you were to set tcm_ifindex to -1 (since that is not a legit > ifindex) then all we need to do is define a parent for a > different location. Current locations tied to netdevs are: > ----- > #define TC_H_ROOT (0xFFFFFFFFU) > #define TC_H_INGRESS (0xFFFFFFF1U) > #define TC_H_CLSACT TC_H_INGRESS > > #define TC_H_MIN_INGRESS 0xFFF2U > #define TC_H_MIN_EGRESS 0xFFF3U > ----- > > You should be able to say add a location which maps to a pre-routing > or post-routing etc; and this would work as well... > ok, interesting. Jiri's examples still showed netdev as attachment point. we will explore some more. thanks.
Wed, Sep 06, 2017 at 06:04:17AM CEST, roopa@cumulusnetworks.com wrote: >On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 09/06/2017 12:01 AM, Roopa Prabhu wrote: >>> >>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> >>> wrote: >>>> >>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >>>> <nikolay@cumulusnetworks.com> wrote: >>>>> >>>>> Hi all, >>>>> This RFC adds a new mode for clsact which designates a device's egress >>>>> classifier as global per netns. The packets that are not classified for >>>>> a particular device will be classified using the global classifier. >>>>> We have needed a global classifier for some time now for various >>>>> purposes and setting the single bridge or loopback/vrf device as the >> >> >> Can you elaborate a bit more on the ... "we have needed a global >> classifier for some time now for various purposes". > >Most of our acl's are global or use a wildcard. eg iptables supports >global rules without an dev. We do end up having hundreds of netdevs. >Another use case for the future is use of tc for policy based routing >which requires global rules. That is not how TC works. There are devices, qdiscs, blocks, chains. The global approach does not fit. The block sharing gets you what you need, without need for any ugly hack.
On 06/09/17 01:45, Daniel Borkmann wrote: > On 09/06/2017 12:01 AM, Roopa Prabhu wrote: >> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >>> <nikolay@cumulusnetworks.com> wrote: >>>> Hi all, >>>> This RFC adds a new mode for clsact which designates a device's egress >>>> classifier as global per netns. The packets that are not classified for >>>> a particular device will be classified using the global classifier. >>>> We have needed a global classifier for some time now for various >>>> purposes and setting the single bridge or loopback/vrf device as the > > Can you elaborate a bit more on the ... "we have needed a global > classifier for some time now for various purposes". > >>>> global classifier device is acceptable for us. Doing it this way avoids >>>> the act/cls device and queue dependencies. >>>> >>>> This is strictly an RFC patch just to show the intent, if we agree on >>>> the details the proposed patch will have support for both ingress and >>>> egress, and will be using a static key to avoid the fast path test when no >>>> global classifier has been configured. >>>> >>>> Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): >>>> $ tc qdisc add dev lo clsact global >>>> $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop >>>> >>>> the last filter will be global for all devices that don't have a >>>> specific egress_cl_list (i.e. have clsact configured). >>> >>> Sorry this is too ugly. > > +1 > >>> netdevice is still implied in your command line even if you treat it >>> as global. It is essentially hard to bypass netdevice layer since >>> netdevice is the core of L2 and also where everything begins. >>> >>> Maybe the best we can do here is make tc filters standalone >>> as tc actions so that filters can exist before qdisc's and netdevices. >>> But this probably requires significant works to make it working >>> with both existing non-standalone and bindings standalones >>> with qdisc's. >> >> yes, like Nikolay says we have been discussing this as well. Nikolay's >> patch is a cleaver and most importantly non-invasive >> way today given the anchor point for tc rules is a netdev. we have >> also considered a separate implicit tc anchor device. > > Seems ugly just as well. :( Hmm, why not just having the two list > pointers (ingress, egress list) in the netns struct and when > something configures them to be effectively non-zero, then devices > in that netns could automatically get a clsact and inherit the > lists from there such that sch_handle_ingress() and sch_handle_egress() > require exactly zero changes in fast-path. You could then go and > say that either you would make changes to clsact for individual > devices immutable when they use the 'shared' list pointers, or then > duplicate the configs when being altered from the global one. Would > push the complexity to control path only at least. Just a brief > thought. Sure, this is a nice refinement, if we decide to continue with the idea of a global clsact filter I'll push it all to the control path. Thanks, Nik
On Wed, Sep 6, 2017 at 12:24 AM, Jiri Pirko <jiri@resnulli.us> wrote: > Wed, Sep 06, 2017 at 06:04:17AM CEST, roopa@cumulusnetworks.com wrote: >>On Tue, Sep 5, 2017 at 3:45 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: >>> On 09/06/2017 12:01 AM, Roopa Prabhu wrote: >>>> >>>> On Tue, Sep 5, 2017 at 11:18 AM, Cong Wang <xiyou.wangcong@gmail.com> >>>> wrote: >>>>> >>>>> On Tue, Sep 5, 2017 at 5:48 AM, Nikolay Aleksandrov >>>>> <nikolay@cumulusnetworks.com> wrote: >>>>>> >>>>>> Hi all, >>>>>> This RFC adds a new mode for clsact which designates a device's egress >>>>>> classifier as global per netns. The packets that are not classified for >>>>>> a particular device will be classified using the global classifier. >>>>>> We have needed a global classifier for some time now for various >>>>>> purposes and setting the single bridge or loopback/vrf device as the >>> >>> >>> Can you elaborate a bit more on the ... "we have needed a global >>> classifier for some time now for various purposes". >> >>Most of our acl's are global or use a wildcard. eg iptables supports >>global rules without an dev. We do end up having hundreds of netdevs. >>Another use case for the future is use of tc for policy based routing >>which requires global rules. > > That is not how TC works. There are devices, qdiscs, blocks, chains. The > global approach does not fit. The block sharing gets you what you need, > without need for any ugly hack. We know how TC works. And we are only trying to see if the classifiers and actions can be used in other use-cases. Note that the patch was an RFC to start the discussion and refine it. We will explore your block sharing patches when they are in.
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index dea59c8eec54..cb97973b8555 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -88,6 +88,7 @@ void net_dec_ingress_queue(void); #ifdef CONFIG_NET_EGRESS void net_inc_egress_queue(void); void net_dec_egress_queue(void); +int net_set_global_egress_cls_dev(struct net *net, struct net_device *dev); #endif void rtnetlink_init(void); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 57faa375eab9..3fef53dfdbfc 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -149,6 +149,9 @@ struct net { #endif struct sock *diag_nlsk; atomic_t fnhe_genid; +#ifdef CONFIG_NET_EGRESS + struct net_device __rcu *global_egress_dev; +#endif } __randomize_layout; #include <linux/seq_file_net.h> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 135f5a2dd931..a37c37062446 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -69,6 +69,7 @@ struct Qdisc { * qdisc_tree_decrease_qlen() should stop. */ #define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */ +#define TCQ_F_GLOBAL 0x100 /* device is used as global clsact dev */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; diff --git a/net/core/dev.c b/net/core/dev.c index 6f845e4fec17..cf883b2470a5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1718,6 +1718,31 @@ void net_dec_egress_queue(void) static_key_slow_dec(&egress_needed); } EXPORT_SYMBOL_GPL(net_dec_egress_queue); + +int net_set_global_egress_cls_dev(struct net *net, struct net_device *dev) +{ + struct net_device *cur_dev; + + ASSERT_RTNL(); + + cur_dev = rtnl_dereference(net->global_egress_dev); + if (dev) { + /* replace not allowed */ + if (cur_dev) + return -EBUSY; + /* global cls devices should not change netns */ + if (!(dev->features & NETIF_F_NETNS_LOCAL)) + return -EINVAL; + } + + /* set or clear based on dev */ + rcu_assign_pointer(net->global_egress_dev, dev); + if (!dev) + synchronize_rcu_bh(); + + return 0; +} +EXPORT_SYMBOL_GPL(net_set_global_egress_cls_dev); #endif static struct static_key netstamp_needed __read_mostly; @@ -3244,8 +3269,15 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list); struct tcf_result cl_res; - if (!cl) - return skb; + if (!cl) { + struct net_device *gdev; + + gdev = rcu_dereference_bh(dev_net(dev)->global_egress_dev); + if (gdev) + cl = rcu_dereference_bh(gdev->egress_cl_list); + if (!cl) + return skb; + } /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ qdisc_bstats_cpu_update(cl->q, skb); diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c index 44de4ee51ce9..a4871f138904 100644 --- a/net/sched/sch_ingress.c +++ b/net/sched/sch_ingress.c @@ -153,6 +153,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt) struct net_device *dev = qdisc_dev(sch); int err; + net_inc_ingress_queue(); + net_inc_egress_queue(); + err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list); if (err) return err; @@ -161,8 +164,12 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt) if (err) return err; - net_inc_ingress_queue(); - net_inc_egress_queue(); + if (opt) { + err = net_set_global_egress_cls_dev(dev_net(dev), dev); + if (err) + return err; + sch->flags |= TCQ_F_GLOBAL; + } sch->flags |= TCQ_F_CPUSTATS; @@ -172,6 +179,10 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt) static void clsact_destroy(struct Qdisc *sch) { struct clsact_sched_data *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + + if (sch->flags & TCQ_F_GLOBAL) + WARN_ON_ONCE(net_set_global_egress_cls_dev(dev_net(dev), NULL)); tcf_block_put(q->egress_block); tcf_block_put(q->ingress_block);
Hi all, This RFC adds a new mode for clsact which designates a device's egress classifier as global per netns. The packets that are not classified for a particular device will be classified using the global classifier. We have needed a global classifier for some time now for various purposes and setting the single bridge or loopback/vrf device as the global classifier device is acceptable for us. Doing it this way avoids the act/cls device and queue dependencies. This is strictly an RFC patch just to show the intent, if we agree on the details the proposed patch will have support for both ingress and egress, and will be using a static key to avoid the fast path test when no global classifier has been configured. Example (need a modified tc that adds TCA_OPTIONS when using q_clsact): $ tc qdisc add dev lo clsact global $ tc filter add dev lo egress protocol ip u32 match ip dst 4.3.2.1/32 action drop the last filter will be global for all devices that don't have a specific egress_cl_list (i.e. have clsact configured). Any comments and thoughts would be greatly appreciated. Thanks! Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> --- include/linux/rtnetlink.h | 1 + include/net/net_namespace.h | 3 +++ include/net/sch_generic.h | 1 + net/core/dev.c | 36 ++++++++++++++++++++++++++++++++++-- net/sched/sch_ingress.c | 15 +++++++++++++-- 5 files changed, 52 insertions(+), 4 deletions(-)