Message ID | 20180312175920.9022-1-pablo@netfilter.org |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Mon, 12 Mar 2018 18:58:50 +0100 > The following patchset contains Netfilter/IPVS updates for your net-next > tree. This batch comes with more input sanitization for xtables to > address bug reports from fuzzers, preparation works to the flowtable > infrastructure and assorted updates. In no particular order, they are: Sorry, I've seen enough. I'm not pulling this. What is the story with this flow table stuff? I tried to ask you about this before, but the response I was given was extremely vague and did not answer my question at all. This is a lot of code, and a lot of infrastructure, yet I see no device using the infrastructure to offload conntack. Nor can I see how this can possibly be even useful for such an application. What conntrack offload needs are things completely outside of what the flow table stuff provides. Mainly, they require that the SKB is completely abstracted away from all of the contrack code paths, and that the conntrack infrastructure operates on an abstract packet metadata concept. If you are targetting one specific piece of hardware with TCAMs that you are familiar with. I'd like you to stop right there. Because if that is all that this infrastructure can actually be used for, it is definitely designed wrong. This, as has been the case in the past, is what is wrong with netfilter approach to supporting offloading. We see all of this infrastructure before an actual working use case is provided for a specific piece of hardware for a specific driver in the tree. Nobody can evaluate whether the approach is good or not without a clear driver change implementing support for it. No other area of networking puts the cart before the horse like this. I do not agree at all with the flow table infrastructure and I therefore do not want to pull any more flow table changes into my tree until there is an actual user of this stuff in that pull request which actually works in a way which is useful for people. It is completely dead and useless code currently. If you disagree you have to not just say it, but show it with a driver that successfully and cleanly uses this code to offload conntrack. Meanwhile, remove the flow table commits from this pull request out of your tree and ask me to pull in the rest. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018-03-12 19:58, David Miller wrote: > From: Pablo Neira Ayuso <pablo@netfilter.org> > Date: Mon, 12 Mar 2018 18:58:50 +0100 > >> The following patchset contains Netfilter/IPVS updates for your net-next >> tree. This batch comes with more input sanitization for xtables to >> address bug reports from fuzzers, preparation works to the flowtable >> infrastructure and assorted updates. In no particular order, they are: > > Sorry, I've seen enough. I'm not pulling this. > > What is the story with this flow table stuff? I tried to ask you > about this before, but the response I was given was extremely vague > and did not answer my question at all. > > This is a lot of code, and a lot of infrastructure, yet I see > no device using the infrastructure to offload conntack. > > Nor can I see how this can possibly be even useful for such an > application. What conntrack offload needs are things completely > outside of what the flow table stuff provides. Mainly, they > require that the SKB is completely abstracted away from all of > the contrack code paths, and that the conntrack infrastructure > operates on an abstract packet metadata concept. > > If you are targetting one specific piece of hardware with TCAMs > that you are familiar with. I'd like you to stop right there. > Because if that is all that this infrastructure can actually > be used for, it is definitely designed wrong. > > This, as has been the case in the past, is what is wrong with > netfilter approach to supporting offloading. We see all of this > infrastructure before an actual working use case is provided for a > specific piece of hardware for a specific driver in the tree. > > Nobody can evaluate whether the approach is good or not without > a clear driver change implementing support for it. > > No other area of networking puts the cart before the horse like this. > > I do not agree at all with the flow table infrastructure and I > therefore do not want to pull any more flow table changes into my tree > until there is an actual user of this stuff in that pull request which > actually works in a way which is useful for people. It is completely > dead and useless code currently. It's not dead and useless. In its current state, it has a software fast path that significantly improves nftables routing/NAT throughput, especially on embedded devices. On some devices, I've seen "only" 20% throughput improvement (along with CPU usage reduction), on others it's quite a bit lot more. This is without any extra drivers or patches aside from what's posted. Within OpenWrt, I'm working on a patch that makes the same available to legacy netfilter as well. This is the reason for a lot of the core refactoring that I did. Hardware offload is still being worked on, not sure when we will have the first driver ready. But as it stands now, the code is already very useful and backported to OpenWrt for testing. I think that in a couple of weeks this code will be ready to be enabled by default in OpenWrt, which means that a lot of users' setups will get a lot faster with no configuration change at all. - Felix -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Felix Fietkau <nbd@nbd.name> Date: Mon, 12 Mar 2018 20:30:01 +0100 > It's not dead and useless. In its current state, it has a software fast > path that significantly improves nftables routing/NAT throughput, > especially on embedded devices. > On some devices, I've seen "only" 20% throughput improvement (along with > CPU usage reduction), on others it's quite a bit lot more. This is > without any extra drivers or patches aside from what's posted. I wonder if this software fast path has the exploitability problems that things like the ipv4 routing cache and the per-cpu flow cache both had. And the reason for which both were removed. I don't see how you can avoid this problem. I'm willing to be shown otherwise :-) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018-03-12 21:01, David Miller wrote: > From: Felix Fietkau <nbd@nbd.name> > Date: Mon, 12 Mar 2018 20:30:01 +0100 > >> It's not dead and useless. In its current state, it has a software fast >> path that significantly improves nftables routing/NAT throughput, >> especially on embedded devices. >> On some devices, I've seen "only" 20% throughput improvement (along with >> CPU usage reduction), on others it's quite a bit lot more. This is >> without any extra drivers or patches aside from what's posted. > > I wonder if this software fast path has the exploitability problems that > things like the ipv4 routing cache and the per-cpu flow cache both had. > And the reason for which both were removed. > > I don't see how you can avoid this problem. > > I'm willing to be shown otherwise :-) I don't think it suffers from the same issues, and if it does, it's a lot easier to mitigate. The ruleset can easily be configured to only offload connections that transferred a certain amount of data, handling only bulk flows. It's easy to put an upper limit on the number of offloaded connections, and there's nothing in the code that just creates an offload entry per packet or per lookup or something like that. If you have other concerns, I'm sure we can address them with follow-up patches, but as it stands, I think the code is already quite useful. - Felix -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> wrote: > From: Felix Fietkau <nbd@nbd.name> > Date: Mon, 12 Mar 2018 20:30:01 +0100 > > > It's not dead and useless. In its current state, it has a software fast > > path that significantly improves nftables routing/NAT throughput, > > especially on embedded devices. > > On some devices, I've seen "only" 20% throughput improvement (along with > > CPU usage reduction), on others it's quite a bit lot more. This is > > without any extra drivers or patches aside from what's posted. > > I wonder if this software fast path has the exploitability problems that > things like the ipv4 routing cache and the per-cpu flow cache both had. No, entries in the flow table are backed by an entry in the conntrack table, and that has an upper ceiling. As decision of when an entry gets placed into the flow table is configureable via ruleset (nftables, iptables will be coming too), one can tie the 'fastpathing' to almost-arbitrary criterion, e.g. 'only flows from trusted internal network' 'only flows that saw two-way communication' 'only flows that sent more than 100kbyte' or any combination thereof. Do you see another problem that needs to be addressed? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Florian Westphal <fw@strlen.de> Date: Tue, 13 Mar 2018 14:41:39 +0100 > David Miller <davem@davemloft.net> wrote: >> From: Felix Fietkau <nbd@nbd.name> >> Date: Mon, 12 Mar 2018 20:30:01 +0100 >> >> > It's not dead and useless. In its current state, it has a software fast >> > path that significantly improves nftables routing/NAT throughput, >> > especially on embedded devices. >> > On some devices, I've seen "only" 20% throughput improvement (along with >> > CPU usage reduction), on others it's quite a bit lot more. This is >> > without any extra drivers or patches aside from what's posted. >> >> I wonder if this software fast path has the exploitability problems that >> things like the ipv4 routing cache and the per-cpu flow cache both had. > > No, entries in the flow table are backed by an entry in the conntrack > table, and that has an upper ceiling. > > As decision of when an entry gets placed into the flow table is > configureable via ruleset (nftables, iptables will be coming too), one > can tie the 'fastpathing' to almost-arbitrary criterion, e.g. > > 'only flows from trusted internal network' > 'only flows that saw two-way communication' > 'only flows that sent more than 100kbyte' > > or any combination thereof. > > Do you see another problem that needs to be addressed? Ok, that seems to constrain the exposure. We should talk at some point about how exposed conntrack itself is. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> wrote: [ flow tables ] > Ok, that seems to constrain the exposure. > > We should talk at some point about how exposed conntrack itself is. Sure, we can do that. If you have specific scenarios (synflood, peer that opens 100k (legitimate) connections, perpetual-fin, etc) in mind let me know, i do think that we could still do better in some cases. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, Just for the record, this is a summary of what we have discussed so far: 1) The existing flowtable infrastructure provides a software fast path that is being useful for a valid number of usecases, in particular, OpenWRT/LEDE developers/users are very enthusiastic about this. Reason for this is that they have had no other choice rather than loading out of tree kernel modules to enable fast forwarding paths before this infrastructure has been mainlined. Fortunately, now they have an upstream alternative that can help them get rid of those modules. This fast path can be enabled very easily, actually one single rule to select what flows follow the alternative path is sufficient. 2) The software flowtable implementation is not affected by the problems that flow/routing cache used to have. An attacker that cycles through all key values by sending forged packets to fill up the hashtable will get no entries. Ruleset policy specifies when to offload entries into the flowtable, users can arbitrarily decide when to push the flow into the flowtable, eg. add rule filter forward ct status assured flow offload @x Worst case scenario is that users need to see two packets, one on each direction, to be able to place a flow in the flowtable. 3) There is no hardware offload integration yet. There's a public patch - waiting to have a driver - that proposes ndo hooks, this patch is not merged upstream. The flowtable design and the hardware offload patch has been the result of conversations with many vendors that represent a wide range of networking device classes, so it is an individual effort by looking at one single device. Stateful flowtable offload has been another main topic, pipeline is going to stall a bit if we cannot make incremental progress towards that direction. Note that this batch was coming with a patch to reduce cache footprint of the flowtable entries, so there is already working-in-progress targeted at improving performance of this new software fast path. Also, preparation works to introduce iptables support has been in the radar while working on this. We understand, we may have have spent more time in explaining all this in the mailing list, we are trying to amend this now. Therefore, we can probably convince someone here to write design documentation to be placed on the Documentation/flowtable/ directory in the next pull request if that makes it easier for the broader audience to understand our effort and rise concerns, if any. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Wed, 14 Mar 2018 19:38:48 +0100 > Just for the record, this is a summary of what we have discussed so > far: ... > Note that this batch was coming with a patch to reduce cache footprint > of the flowtable entries, so there is already working-in-progress > targeted at improving performance of this new software fast path. > Also, preparation works to introduce iptables support has been in the > radar while working on this. > > We understand, we may have have spent more time in explaining all this > in the mailing list, we are trying to amend this now. Therefore, we > can probably convince someone here to write design documentation to be > placed on the Documentation/flowtable/ directory in the next pull > request if that makes it easier for the broader audience to understand > our effort and rise concerns, if any. Noted. Please resend your pull request. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/03/2018 20:58, David Miller wrote: > From: Pablo Neira Ayuso <pablo@netfilter.org> > Date: Mon, 12 Mar 2018 18:58:50 +0100 > >> The following patchset contains Netfilter/IPVS updates for your net-next >> tree. This batch comes with more input sanitization for xtables to >> address bug reports from fuzzers, preparation works to the flowtable >> infrastructure and assorted updates. In no particular order, they are: > Sorry, I've seen enough. I'm not pulling this. > > What is the story with this flow table stuff? I tried to ask you > about this before, but the response I was given was extremely vague > and did not answer my question at all. > > This is a lot of code, and a lot of infrastructure, yet I see > no device using the infrastructure to offload conntack. Hi David, Pablo's code is a very welcome addition to the flow tables infrastructure. We at Mellanox already have customers asking for Offload of Connection Tracking. While a complete hardware implementation is yet to arrive. Pablo's contribution is blessed. Using this infrastructure we are capable of completely offloading connection tracking (Without TCP window validation) and possibly do a complete offload once hardware support for TCP window Validation shows up. I'm currently working closely with Pablo to create the first driver implementation to utilize hardware offloading. Needless to say - prior to having hardware implementation a software infrastructure is required. > Nor can I see how this can possibly be even useful for such an > application. What conntrack offload needs are things completely > outside of what the flow table stuff provides. Mainly, they > require that the SKB is completely abstracted away from all of > the contrack code paths, and that the conntrack infrastructure > operates on an abstract packet metadata concept. Assuming that the software maintains the flow in the system, it is reasonable to allow software do the connection establishment and termination and let the hardware do all the rest between (again - without TCP window validation, unless a specialized hardware exists) > > This, as has been the case in the past, is what is wrong with > netfilter approach to supporting offloading. We see all of this > infrastructure before an actual working use case is provided for a > specific piece of hardware for a specific driver in the tree. > > Nobody can evaluate whether the approach is good or not without > a clear driver change implementing support for it. I'm speaking on behalf of Mellanox. Would one driver support as demonstration suffice? Thanks, Guy -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Guy Shattah <sguy@mellanox.com> Date: Fri, 16 Mar 2018 18:39:03 +0200 > Would one driver support as demonstration suffice? It would certinaly improve the reviewability of the changes. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html