Message ID | 20181214181205.28812-1-pablo@netfilter.org |
---|---|
Headers | show |
Series | add flow_rule infrastructure | expand |
On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote: > Hi, > > This patchset introduces a kernel intermediate representation (IR) to > express ACL hardware offloads, this is heavily based on the existing > flow dissector infrastructure and the TC actions. This IR can be used by > different frontend ACL interfaces such as ethtool_rxnfc, tc and > netfilter to represent ACL hardware offloads. > > The main goals of this patchset are: > > * Provide an unified representation that can be passed to the driver > for translation to HW IR. This consolidates the code to be maintained > in the driver and it also simplifies the development of ACL hardware > offloads. Driver developers do not need to add one specific parser for > each supported ACL frontend, instead each frontend just generates > this flow_rule IR and pass it to drivers to populate the hardware IR. Ack. > * Do not expose TC software frontend details to the drivers anymore, > such as structure layouts. Hence, if TC needs to be updated to support > a new software feature, no changes to the drivers will be required. This one you may need to clarify. TC already transforms the rules and for the most part provides abstract enough accessor helpers to protect from changes in internal implementation of the SW path. You could make an argument that if one subsystem already offloads something and the IR can express it another subsystem can add the same functionality and drivers don't have to change. But why do we have multiple subsystem offloading the same thing in the first place?.. > In handcrafted ascii art, the idea is the following: > > . ethtool_rxnfc tc netfilter > | (ioctl) (netlink) (netlink) > | | | | translate native > Frontend | | | | interface representation > | | | | to flow_rule IR > | | | | > . \/ \/ \/ > . ----- flow_rule IR ------ > | | > Drivers | | parsing of flow_rule IR > | | to populate hardware IR > | \/ > . hardware IR (driver) > > This patchset only converts tc and ethtool_rxnfc to use this > infrastructure. Therefore. this patchset comes with no netfilter > changes at this stage. Let's try to differentiate between cls_flower and TC as a subsystem. [...] > P.S: Moving forward, my rough plan is to propose and discuss the > following changes to use this infrastructure from netfilter for ACL > hardware offload: > > * Rename tc_setup ndo to flow_offload (or similar), so it can be used > from netfilter too. Otherwise, I'm open for alternatives. That is the part which really makes me uneasy. Ordering rules, tables and offloads in HW is already a hard question, which keeps coming back (see Jesse's talk from LPC for example). (And if you think you know the answer you probably forgot about legacy SR-IOV :)) The TC offloads themselves today are still quite flaky. I don't think anyone respects rule ordering/priorities at all. Even if we did order cls_flower's rules in HW - what if someone added a cls_basic rule before them. SW path will no longer match HW path. And we are about to mix another subsystem into that soup. I don't really see how you could rename ndo_setup_tc to flow_offload. It offloads block notifications, and Qdiscs. You must realize that actual rules are offloaded via block callbacks..? Are you planning to create fake blocks for netfilter? Create yet another "give me all rules for device X" registration construct? What level of abstraction are we really going to achieve here? The devices still need to know netfilter is a separate subsystem with its quirks and rules. Say someone adds a PASS rule to flower, and a DROP rule to netfilter. Packet has to be dropped, but the device will most likely PASS because flower's rule will hit first. n-tuple filters don't have pass so the problem isn't obvious from this patch set. I'm starting to like Or's idea of "expressing nft rules in cls_flower". I don't mean pretending its flower rules just for offload, but rather have a TC classifier module that can take nft rules (in SW path) and which would semantically fit very clearly into the existing TC framework. Mirroring SW behaviour in HW becomes easier, we don't have to teach drivers about another subsystem. > * Add a conversion function from native netfilter representation to > flow_rule, including extra new glue code from the two-phase commit > protocol to integrate this infrastructure.
Hi Jakub, On Mon, Dec 17, 2018 at 05:39:29PM -0800, Jakub Kicinski wrote: > On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote: > > Hi, > > > > This patchset introduces a kernel intermediate representation (IR) to > > express ACL hardware offloads, this is heavily based on the existing > > flow dissector infrastructure and the TC actions. This IR can be used by > > different frontend ACL interfaces such as ethtool_rxnfc, tc and > > netfilter to represent ACL hardware offloads. > > > > The main goals of this patchset are: > > > > * Provide an unified representation that can be passed to the driver > > for translation to HW IR. This consolidates the code to be maintained > > in the driver and it also simplifies the development of ACL hardware > > offloads. Driver developers do not need to add one specific parser for > > each supported ACL frontend, instead each frontend just generates > > this flow_rule IR and pass it to drivers to populate the hardware IR. > > Ack. Thanks. > > * Do not expose TC software frontend details to the drivers anymore, > > such as structure layouts. Hence, if TC needs to be updated to support > > a new software feature, no changes to the drivers will be required. > > This one you may need to clarify. TC already transforms the rules and > for the most part provides abstract enough accessor helpers to protect > from changes in internal implementation of the SW path. Drivers can still modify the TC structure content, they can modify frontend configurations, make tc dump non-sense values to userspace via netlink or they may crash software plane at a later stage. Better if we tend to expose configuration plane data through read-only infrastructure that is consumed by drivers. > You could make an argument that if one subsystem already offloads > something and the IR can express it another subsystem can add the same > functionality and drivers don't have to change. But why do we have > multiple subsystem offloading the same thing in the first place?.. Having multiple subsystems that allow you to do the same thing is a decision that was made long time ago. This is allowing multiple approaches to compete inside the kernel and this also allowing users to select the subsystem that fits better their requirements. Anyway, the problem that this patchset addresses _already exists_ with ethtool_rxnfc and cls_flower in place, it would be good to fix it now. > > In handcrafted ascii art, the idea is the following: > > > > . ethtool_rxnfc tc netfilter > > | (ioctl) (netlink) (netlink) > > | | | | translate native > > Frontend | | | | interface representation > > | | | | to flow_rule IR > > | | | | > > . \/ \/ \/ > > . ----- flow_rule IR ------ > > | | > > Drivers | | parsing of flow_rule IR > > | | to populate hardware IR > > | \/ > > . hardware IR (driver) > > > > This patchset only converts tc and ethtool_rxnfc to use this > > infrastructure. Therefore. this patchset comes with no netfilter > > changes at this stage. > > Let's try to differentiate between cls_flower and TC as a subsystem. Right. cls_flower is the one more capable subset of tc at this stage. > [...] > > > P.S: Moving forward, my rough plan is to propose and discuss the > > following changes to use this infrastructure from netfilter for ACL > > hardware offload: > > > > * Rename tc_setup ndo to flow_offload (or similar), so it can be used > > from netfilter too. Otherwise, I'm open for alternatives. > > That is the part which really makes me uneasy. Ordering rules, tables > and offloads in HW is already a hard question, which keeps coming back > (see Jesse's talk from LPC for example). (And if you think you know > the answer you probably forgot about legacy SR-IOV :)) > > The TC offloads themselves today are still quite flaky. [...] I'm very optimistic in what I'm seeing, patch 1/12 is rather large because we already have a _fair good amount of drivers_ using the existing upstream infrastructure. I don't see anything unfixable from the existing approach that we can incrementally extend what is already upstream to solve the existing limitations. [...] > I don't really see how you could rename ndo_setup_tc to flow_offload. We can start by using a single .ndo and use the enum parameter to route the offload request to the corresponding driver handler to deal with tc details at this stage. So we use the single .ndo as a multiplexor. By consolidating parsers for ethtool_rxnfc and cls_flower, we are already saving redundant code in this patchset, and I can see more codebase in the tree that drivers could simplify. > What level of abstraction are we really going to achieve here? The > devices still need to know netfilter is a separate subsystem with its > quirks and rules. Say someone adds a PASS rule to flower, and a DROP > rule to netfilter. Packet has to be dropped, but the device will most > likely PASS because flower's rule will hit first. n-tuple filters > don't have pass so the problem isn't obvious from this patch set. In software, tc ingress comes before netfilter, then ethtool_rxnfc comes before tc ingress and netfilter. Once we have a single .ndo, performing unified resource management for offloads will be easier, ie. store ownership of rules in the driver ACL database, then when rebuilding ACL offload, we make sure we place ethtool rules before tc, then tc before netfilter. So we achieve the same semantics as in software. We cannot solve this problem until we have unified infrastructure to address this. [...] > I don't mean pretending its flower rules just for offload, but rather > have a TC classifier module that can take nft rules (in SW path) and > which would semantically fit very clearly into the existing TC > framework. Mirroring SW behaviour in HW becomes easier, we don't have > to teach drivers about another subsystem. Then, we'll have to rewrite userspace netfilter to run on top of TC. Moreover, add support for conntrack, NAT, logging... And after all that is done. For netlink users, we'll have to translate netfilter netlink message to TC from the kernel, then route this to TC. It will take ages to rewrite netfilter on top of tc and there will be semantic differences. This patchset already deals with the existing diversity in the ecosystem by providing a single unified representation for the hardware, no matter what frontend is being used for this goal. Cheers, Pablo
On Tue, 18 Dec 2018 20:57:05 +0100, Pablo Neira Ayuso wrote: > On Mon, Dec 17, 2018 at 05:39:29PM -0800, Jakub Kicinski wrote: > > On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote: > > > * Do not expose TC software frontend details to the drivers anymore, > > > such as structure layouts. Hence, if TC needs to be updated to support > > > a new software feature, no changes to the drivers will be required. > > > > This one you may need to clarify. TC already transforms the rules and > > for the most part provides abstract enough accessor helpers to protect > > from changes in internal implementation of the SW path. > > Drivers can still modify the TC structure content, they can modify > frontend configurations, make tc dump non-sense values to userspace > via netlink or they may crash software plane at a later stage. Better > if we tend to expose configuration plane data through read-only > infrastructure that is consumed by drivers. Sounds like inventing a problem to fit the solution :) I don't think we've even had such a bug in TC offloads. > > You could make an argument that if one subsystem already offloads > > something and the IR can express it another subsystem can add the same > > functionality and drivers don't have to change. But why do we have > > multiple subsystem offloading the same thing in the first place?.. > > Having multiple subsystems that allow you to do the same thing is a > decision that was made long time ago. This is allowing multiple > approaches to compete inside the kernel and this also allowing users > to select the subsystem that fits better their requirements. Perhaps we learnt something new from the few years we spent working on this stuff? I'm not sure where we stand. I believe I asked you something like "why do we need this if we already have flower" when you posted your first nf_flow patches, no? So my position haven't changed :) > Anyway, the problem that this patchset addresses _already exists_ with > ethtool_rxnfc and cls_flower in place, it would be good to fix it now. That's not the point of contention. > > > In handcrafted ascii art, the idea is the following: > > > > > > . ethtool_rxnfc tc netfilter > > > | (ioctl) (netlink) (netlink) > > > | | | | translate native > > > Frontend | | | | interface representation > > > | | | | to flow_rule IR > > > | | | | > > > . \/ \/ \/ > > > . ----- flow_rule IR ------ > > > | | > > > Drivers | | parsing of flow_rule IR > > > | | to populate hardware IR > > > | \/ > > > . hardware IR (driver) > > > > > > This patchset only converts tc and ethtool_rxnfc to use this > > > infrastructure. Therefore. this patchset comes with no netfilter > > > changes at this stage. > > > > Let's try to differentiate between cls_flower and TC as a subsystem. > > Right. cls_flower is the one more capable subset of tc at this stage. cls_flower fits fixed function HW and ACL use cases, but that's slightly beside the point, TC has its pipeline, chains, blocks, templates, qdiscs... > > I don't really see how you could rename ndo_setup_tc to flow_offload. > > We can start by using a single .ndo and use the enum parameter to > route the offload request to the corresponding driver handler to deal > with tc details at this stage. So we use the single .ndo as a > multiplexor. By consolidating parsers for ethtool_rxnfc and > cls_flower, we are already saving redundant code in this patchset, and > I can see more codebase in the tree that drivers could simplify. ndo_setup_tc does not carry flows. It carries information about changes to the TC pipeline. > > What level of abstraction are we really going to achieve here? The > > devices still need to know netfilter is a separate subsystem with its > > quirks and rules. Say someone adds a PASS rule to flower, and a DROP > > rule to netfilter. Packet has to be dropped, but the device will most > > likely PASS because flower's rule will hit first. n-tuple filters > > don't have pass so the problem isn't obvious from this patch set. > > In software, tc ingress comes before netfilter, then ethtool_rxnfc > comes before tc ingress and netfilter. Once we have a single .ndo, > performing unified resource management for offloads will be easier, > ie. store ownership of rules in the driver ACL database, then when > rebuilding ACL offload, we make sure we place ethtool rules before tc, > then tc before netfilter. So we achieve the same semantics as in > software. We cannot solve this problem until we have unified > infrastructure to address this. I'm sorry to say your vision of unified resource management is extremely hazy. These patches are a prep for something for which there isn't even a clear plan. There is no way having drivers learn semantics of another subsystem would make things easier. Expressing the rules in terms of flow dissector is not the hard part. Did you consider things like tunnels and bonds? There are a lot of problems which have been solved for TC offloads, if you create a separate subsystem offload you'll have to repeat all that work. > > I don't mean pretending its flower rules just for offload, but rather > > have a TC classifier module that can take nft rules (in SW path) and > > which would semantically fit very clearly into the existing TC > > framework. Mirroring SW behaviour in HW becomes easier, we don't have > > to teach drivers about another subsystem. > > Then, we'll have to rewrite userspace netfilter to run on top of TC. > Moreover, add support for conntrack, NAT, logging... And after all > that is done. For netlink users, we'll have to translate netfilter > netlink message to TC from the kernel, then route this to TC. It will > take ages to rewrite netfilter on top of tc and there will be semantic > differences. This patchset already deals with the existing diversity > in the ecosystem by providing a single unified representation for the > hardware, no matter what frontend is being used for this goal. I'm not suggesting we replace netfilter with TC. I'm suggesting we replace nf_flow_offload table with something that fits into TC. You're not going to offload entire netfilter. You want to offload simplistic flows through the nf_flow_table. What I'm saying, is add a equivalent of that table into TC. Make user space "link" netfilter to that.
Dear Jakub, On Wed, Dec 19, 2018 at 11:57:03AM -0800, Jakub Kicinski wrote: [...] > > Anyway, the problem that this patchset addresses _already exists_ with > > ethtool_rxnfc and cls_flower in place, it would be good to fix it now. > > That's not the point of contention. What is your alternative plan to unify driver codebase for ethtool_rx_flow_spec and tc/cls_flower to offload ACL from the ingress path? [...] > Did you consider things like tunnels and bonds? There are a lot of > problems which have been solved for TC offloads, if you create a > separate subsystem offload you'll have to repeat all that work. Did I repeat any work to unify ethtool_rx_flow_spec and tc/cls_flower? No :-). I spinned over the existing work and delivered an incremental update. [...] > I'm not suggesting we replace netfilter with TC. I'm suggesting we > replace nf_flow_offload table with something that fits into TC. > > You're not going to offload entire netfilter. You want to offload > simplistic flows through the nf_flow_table. What I'm saying, is add a > equivalent of that table into TC. Make user space "link" netfilter to > that. This patchset is not related to nf_flow_table infrastructure. From what I'm reading, you assume we can use nf_flow_table everywhere, which is probably true if you assume people use your NICs. However, there is a class of hardware where CPU is extremely smallish to cope with flows in software, where dataplane is almost entirely offloaded to hardware. In that scenario, nf_flow_table cannot be used.
On Thu, 20 Dec 2018 01:03:13 +0100, Pablo Neira Ayuso wrote: > On Wed, Dec 19, 2018 at 11:57:03AM -0800, Jakub Kicinski wrote: > > > Anyway, the problem that this patchset addresses _already exists_ with > > > ethtool_rxnfc and cls_flower in place, it would be good to fix it now. > > > > That's not the point of contention. > > What is your alternative plan to unify driver codebase for > ethtool_rx_flow_spec and tc/cls_flower to offload ACL from the ingress > path? The point of contention for me is the "reuse of ndo_setup_tc" :) I haven't played with your IR, but it looks fairly close to the flower dissector based thing, so it's probably fine > > Did you consider things like tunnels and bonds? There are a lot of > > problems which have been solved for TC offloads, if you create a > > separate subsystem offload you'll have to repeat all that work. > > Did I repeat any work to unify ethtool_rx_flow_spec and tc/cls_flower? > No :-). I spinned over the existing work and delivered an incremental > update. Ethtool exists, so you don't have to teach the driver about it. nft offload does not, yet, exist in upstream drivers AFAIK. Besides, if ethtool n-tuples were rich and supported complex use cases we wouldn't have TC flower offloads, so it's not apples to apples ;) > > I'm not suggesting we replace netfilter with TC. I'm suggesting we > > replace nf_flow_offload table with something that fits into TC. > > > > You're not going to offload entire netfilter. You want to offload > > simplistic flows through the nf_flow_table. What I'm saying, is add a > > equivalent of that table into TC. Make user space "link" netfilter to > > that. > > This patchset is not related to nf_flow_table infrastructure. Let's not let the elephant back into the room, please :) > From what I'm reading, you assume we can use nf_flow_table everywhere, > which is probably true if you assume people use your NICs. However, > there is a class of hardware where CPU is extremely smallish to cope > with flows in software, where dataplane is almost entirely offloaded > to hardware. In that scenario, nf_flow_table cannot be used. I'm confused, could you rephrase? How does you work help such devices? How is tc not suitable for them?
On Wed, Dec 19, 2018 at 04:26:53PM -0800, Jakub Kicinski wrote: > On Thu, 20 Dec 2018 01:03:13 +0100, Pablo Neira Ayuso wrote: > > On Wed, Dec 19, 2018 at 11:57:03AM -0800, Jakub Kicinski wrote: > > > > Anyway, the problem that this patchset addresses _already exists_ with > > > > ethtool_rxnfc and cls_flower in place, it would be good to fix it now. > > > > > > That's not the point of contention. > > > > What is your alternative plan to unify driver codebase for > > ethtool_rx_flow_spec and tc/cls_flower to offload ACL from the ingress > > path? > > The point of contention for me is the "reuse of ndo_setup_tc" :) OK, this patchset is not updating ndo_setup_tc. > I haven't played with your IR, but it looks fairly close to the flower > dissector based thing, so it's probably fine OK. [...] > > From what I'm reading, you assume we can use nf_flow_table everywhere, > > which is probably true if you assume people use your NICs. However, > > there is a class of hardware where CPU is extremely smallish to cope > > with flows in software, where dataplane is almost entirely offloaded > > to hardware. In that scenario, nf_flow_table cannot be used. > > I'm confused, could you rephrase? How does you work help such devices? > How is tc not suitable for them? There are two HW offload usecases: #1 Policy resides in software, CPU host sees initial packets, based on policy, you place flows into hardware via nf_flow_table infrastructure. This usecase is fine in your NIC since you assume host CPU can cope with policy in software for these few initial packets of the flow. However, switches usually have a small CPU to run control plane software only. There we _cannot_ use this approach. #2 Policy resides in hardware. For the usecase of switches with small CPU, the ACL is deployed in hardware. We use the host CPU to run control plane configurations only. This patchset _is not_ related to #1, this patchset _is_ related to #2. So far, there is infrastructure in Netfilter to do #1, it should be possible to use it from TC too. In TC, there is infrastructure for #2 which can be reused from Netfilter.
On Thu, Dec 20, 2018 at 2:35 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Dec 19, 2018 at 04:26:53PM -0800, Jakub Kicinski wrote: > > I'm confused, could you rephrase? How does you work help such devices? > > How is tc not suitable for them? > There are two HW offload usecases: > > #1 Policy resides in software, CPU host sees initial packets, based on > policy, you place flows into hardware via nf_flow_table infrastructure. > This usecase is fine in your NIC since you assume host CPU can cope > with policy in software for these few initial packets of the flow. > However, switches usually have a small CPU to run control plane > software only. There we _cannot_ use this approach. > > #2 Policy resides in hardware. For the usecase of switches with small > CPU, the ACL is deployed in hardware. We use the host CPU to run > control plane configurations only. > > This patchset _is not_ related to #1, this patchset _is_ related to #2. confused, isn't this patch set related to connection tracking offloads on modern NIC HWs? > So far, there is infrastructure in Netfilter to do #1, it should be > possible to use it from TC too. In TC, there is infrastructure for #2 > which can be reused from Netfilter.
On Thu, Dec 20, 2018 at 03:51:00PM +0200, Or Gerlitz wrote: > On Thu, Dec 20, 2018 at 2:35 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Wed, Dec 19, 2018 at 04:26:53PM -0800, Jakub Kicinski wrote: > > > > I'm confused, could you rephrase? How does you work help such devices? > > > How is tc not suitable for them? > > > There are two HW offload usecases: > > > > #1 Policy resides in software, CPU host sees initial packets, based on > > policy, you place flows into hardware via nf_flow_table infrastructure. > > This usecase is fine in your NIC since you assume host CPU can cope > > with policy in software for these few initial packets of the flow. > > However, switches usually have a small CPU to run control plane > > software only. There we _cannot_ use this approach. > > > > #2 Policy resides in hardware. For the usecase of switches with small > > CPU, the ACL is deployed in hardware. We use the host CPU to run > > control plane configurations only. > > > > This patchset _is not_ related to #1, this patchset _is_ related to #2. > > confused, isn't this patch set related to connection tracking offloads > on modern NIC HWs? This patchset is aiming to unify ethtool_rxnfc and tc/cls_flower representation to simplify driver codebase, ie. have a single parser to populate HW IR. My immediate future work is to reuse this new infrastructure to explore #2 for netfilter.