diff mbox series

[nf] Revert "netfilter: flowtable: Remove redundant hw refresh bit"

Message ID 20210614193440.3813-1-olek2@wp.pl
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series [nf] Revert "netfilter: flowtable: Remove redundant hw refresh bit" | expand

Commit Message

Aleksander Jan Bajkowski June 14, 2021, 7:34 p.m. UTC
This reverts commit c07531c01d8284aedaf95708ea90e76d11af0e21.

The previously mentioned commit significantly reduces NAT performance
in OpenWRT. Another user reports a high ping issue. The results of
IPv4 NAT benchmark on BT Home Hub 5A (with software flow offloading):
* 5.4.124             515 Mb/s
* 5.10.41             570 Mb/s
* 5.10.42             250 Mb/s
* 5.10.42 + revert    580 Mb/s

Reverting this commit fixes this issue.

Fixes: c07531c01d8284aedaf95708ea90e76d11af0e21 ("netfilter: flowtable: Remove redundant hw refresh bit")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
---
 include/net/netfilter/nf_flow_table.h | 1 +
 net/netfilter/nf_flow_table_core.c    | 3 ++-
 net/netfilter/nf_flow_table_offload.c | 7 +++----
 3 files changed, 6 insertions(+), 5 deletions(-)

Comments

Pablo Neira Ayuso June 14, 2021, 9:53 p.m. UTC | #1
Hi,

On Mon, Jun 14, 2021 at 09:34:40PM +0200, Aleksander Jan Bajkowski wrote:
> This reverts commit c07531c01d8284aedaf95708ea90e76d11af0e21.
>
> The previously mentioned commit significantly reduces NAT performance
> in OpenWRT. Another user reports a high ping issue. The results of
> IPv4 NAT benchmark on BT Home Hub 5A (with software flow offloading):
> * 5.4.124             515 Mb/s
> * 5.10.41             570 Mb/s
> * 5.10.42             250 Mb/s
> * 5.10.42 + revert    580 Mb/s
>
> Reverting this commit fixes this issue.

The xt_flowoffload module is inconditionally setting on the hardware
offload flag:

static int __init xt_flowoffload_tg_init(void)
{
       int ret;

       register_netdevice_notifier(&flow_offload_netdev_notifier);

       ret = init_flowtable(&flowtable[0]);
       if (ret)
               return ret;

       ret = init_flowtable(&flowtable[1]);
       if (ret)
               goto cleanup;

       flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD;
[...]

which is triggering the slow down because packet path is allocating
work to offload the entry to hardware, however, this driver does not
support for hardware offload.

Probably this module can be updated to unset the flowtable flag if the
harware does not support hardware offload.
Martin Blumenstingl July 11, 2021, 1:02 a.m. UTC | #2
Hi Aleksander,

> The xt_flowoffload module is inconditionally setting on the hardware
> offload flag:
[...]
>
> which is triggering the slow down because packet path is allocating
> work to offload the entry to hardware, however, this driver does not
> support for hardware offload.
> 
> Probably this module can be updated to unset the flowtable flag if the
> harware does not support hardware offload.

yesterday there was a discussion about this on the #openwrt-devel IRC
channel. I am adding the IRC log to the end of this email because I am
not sure if you're using IRC.

I typically don't test with flow offloading enabled (I am testing with
OpenWrt's "default" network configuration, where flow offloading is
disabled by default). Also I am not familiar with the flow offloading
code at all and reading the xt_FLOWOFFLOAD code just raised more
questions for me.

Maybe you can share some info whether your workaround from [0] "fixes"
this issue. I am aware that it will probably break other devices. But
maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD
bug or rather some generic flow offload issue (as Felix suggested on
IRC).


Best regards,
Martin


[0] https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721


<rsalvaterra> nbd: I saw your flow offloading updates. Just to make sure, this issue hasn't been addressed yet, has it? https://lore.kernel.org/netdev/20210614215351.GA734@salvia/
<nbd> i don't think so
<nbd> can you reproduce it?
<rsalvaterra> nbd: Not really, I don't have the hardware.
<rsalvaterra> It's lantiq, I think (bthh5a).
<rsalvaterra> However, I believe dwmw2_gone has one, iirc.
<xdarklight> nbd: I also have a HH5A. if you have any patch ready you can also send it as RFC on the mailing list and Cc Aleksander
<rsalvaterra> xdarklight: Have you been able to reproduce the flow offloading regression?
<xdarklight> I can try (typically I test with a "default" network configuration, where flow offloading is disabled)
<rsalvaterra> xdarklight: I don't have a lot of details, only this thread: https://github.com/openwrt/openwrt/pull/4225#issuecomment-855454607
<xdarklight> rsalvaterra: this is the workaround that Aleksander has in his tree: https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721
<rsalvaterra> xdarklight: Well, but that basically breaks hw flow offloading everywhere else, if I'm reading correctly. :)
<xdarklight> rsalvaterra: I am not arguing with that :). I wanted to point out that Pablo's finding on the netfilter mailing list seems to be correct
<rsalvaterra> xdarklight: Sure, which is why I pinged nbd, since he's the original author of the xt_FLOWOFFLOAD target.
<rsalvaterra> What it seems is that it isn't such trivial fix. :)
<xdarklight> I looked at the xt_FLOWOFFLOAD code myself and it raised more questions than I had before looking at the code. so I decided to wait for someone with better knowledge to look into that issue :)
<rsalvaterra> Same here. I just went "oh, this requires divine intervention" and set it aside. :P
<nbd> xdarklight: which finding did you mean?
<xdarklight> nbd: "The xt_flowoffload module is inconditionally setting on the hardware offload flag" [...] flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD;
<xdarklight> nbd: from this email: https://lore.kernel.org/netdev/20210614215351.GA734@salvia/
<nbd> i actually think that finding is wrong
<nbd> xt_FLOWOFFLOAD registers two flowtables
<nbd> one with hw offload, one without
<nbd> the target code does this:
<nbd> table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)];
<nbd> so it selects flowtable[1] only if info->flags has XT_FLOWOFFLOAD_HW set
<rsalvaterra> nbd: That's between you and Pablo, I mustn't interfere. :)
<nbd> i did reply to pablo, but never heard back from him
<rsalvaterra> nbd: The merge window is still open, he's probably busy at the moment… maybe ping him on Monday?
<xdarklight> nbd: it seems that your mail also didn't make it to the netdev mailing list (at least I can't find it)
<rsalvaterra> xdarklight: Now that you mention it, neither do I.
<nbd> he wrote to me in private for some reason
<xdarklight> oh okay. so for me to summarize: you're saying that the xt_FLOWOFFLOAD code should be fine. in that case Aleksander's workaround (https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721) is also a no-op and the original problem would still be seen
<rsalvaterra> xdarklight: I don't think it's a no-op, otherwise he wouldn't carry it in his tree… maybe something's wrong in the table selection? table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; does look correct, though.
<nbd> xdarklight: well, it's not a no-op if the issue was reproduced with option flow_offloading_hw 1 in the config
<rsalvaterra> nbd: Uh… If he has option flow_offloading_hw '1' on a system which doesn't support hw flow offloading… he gets to keep the pieces, right?
<nbd> it shouldn't break
<nbd> and normally i'd expect the generic flow offload api to be able to deal with it without attempting to re-enable hw offload over and over again
Aleksander Jan Bajkowski July 11, 2021, 12:33 p.m. UTC | #3
Hi Martin,

Thanks for the IRC log. Today I repeated my previous tests. I think I had to have Hardware flow offloading enabled before, even though Lantiq xRX200 doesn't support it. With only the software flow offloading turned on, I do not see a significant performance drop.

Today's results (IPv6 routing with DSA driver and Burst Length Patch):
Device	Kernel	Flow offload	Upload	Download
HH5A	5.4.128	Disabled	101	170
HH5A	5.10.46	Disabled	95.4	159
HH5A	5.10.42	Disabled	96.6	165
HH5A	5.10.41	Disabled	101	165
HH5A	5.4.128	Soft		471	463
HH5A	5.10.46	Soft		553	472
HH5A	5.10.42	Soft		556	468
HH5A	5.10.41	Soft		558	492
HH5A	5.4.128	Soft+Hard	434	460
HH5A	5.10.46	Soft+Hard	229	181
HH5A	5.10.42	Soft+Hard	228	177
HH5A	5.10.41	Soft+Hard	577	482

It seems my workaround is unnecessary.

Best regards,
Aleksander Jan Bajkowski

On 7/11/21 3:02 AM, Martin Blumenstingl wrote:
> Hi Aleksander,
> 
>> The xt_flowoffload module is inconditionally setting on the hardware
>> offload flag:
> [...]
>>
>> which is triggering the slow down because packet path is allocating
>> work to offload the entry to hardware, however, this driver does not
>> support for hardware offload.
>>
>> Probably this module can be updated to unset the flowtable flag if the
>> harware does not support hardware offload.
> 
> yesterday there was a discussion about this on the #openwrt-devel IRC
> channel. I am adding the IRC log to the end of this email because I am
> not sure if you're using IRC.
> 
> I typically don't test with flow offloading enabled (I am testing with
> OpenWrt's "default" network configuration, where flow offloading is
> disabled by default). Also I am not familiar with the flow offloading
> code at all and reading the xt_FLOWOFFLOAD code just raised more
> questions for me.
> 
> Maybe you can share some info whether your workaround from [0] "fixes"
> this issue. I am aware that it will probably break other devices. But
> maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD
> bug or rather some generic flow offload issue (as Felix suggested on
> IRC).
> 
> 
> Best regards,
> Martin
> 
> 
> [0] https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721
> 
> 
> <rsalvaterra> nbd: I saw your flow offloading updates. Just to make sure, this issue hasn't been addressed yet, has it? https://lore.kernel.org/netdev/20210614215351.GA734@salvia/
> <nbd> i don't think so
> <nbd> can you reproduce it?
> <rsalvaterra> nbd: Not really, I don't have the hardware.
> <rsalvaterra> It's lantiq, I think (bthh5a).
> <rsalvaterra> However, I believe dwmw2_gone has one, iirc.
> <xdarklight> nbd: I also have a HH5A. if you have any patch ready you can also send it as RFC on the mailing list and Cc Aleksander
> <rsalvaterra> xdarklight: Have you been able to reproduce the flow offloading regression?
> <xdarklight> I can try (typically I test with a "default" network configuration, where flow offloading is disabled)
> <rsalvaterra> xdarklight: I don't have a lot of details, only this thread: https://github.com/openwrt/openwrt/pull/4225#issuecomment-855454607
> <xdarklight> rsalvaterra: this is the workaround that Aleksander has in his tree: https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721
> <rsalvaterra> xdarklight: Well, but that basically breaks hw flow offloading everywhere else, if I'm reading correctly. :)
> <xdarklight> rsalvaterra: I am not arguing with that :). I wanted to point out that Pablo's finding on the netfilter mailing list seems to be correct
> <rsalvaterra> xdarklight: Sure, which is why I pinged nbd, since he's the original author of the xt_FLOWOFFLOAD target.
> <rsalvaterra> What it seems is that it isn't such trivial fix. :)
> <xdarklight> I looked at the xt_FLOWOFFLOAD code myself and it raised more questions than I had before looking at the code. so I decided to wait for someone with better knowledge to look into that issue :)
> <rsalvaterra> Same here. I just went "oh, this requires divine intervention" and set it aside. :P
> <nbd> xdarklight: which finding did you mean?
> <xdarklight> nbd: "The xt_flowoffload module is inconditionally setting on the hardware offload flag" [...] flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD;
> <xdarklight> nbd: from this email: https://lore.kernel.org/netdev/20210614215351.GA734@salvia/
> <nbd> i actually think that finding is wrong
> <nbd> xt_FLOWOFFLOAD registers two flowtables
> <nbd> one with hw offload, one without
> <nbd> the target code does this:
> <nbd> table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)];
> <nbd> so it selects flowtable[1] only if info->flags has XT_FLOWOFFLOAD_HW set
> <rsalvaterra> nbd: That's between you and Pablo, I mustn't interfere. :)
> <nbd> i did reply to pablo, but never heard back from him
> <rsalvaterra> nbd: The merge window is still open, he's probably busy at the moment… maybe ping him on Monday?
> <xdarklight> nbd: it seems that your mail also didn't make it to the netdev mailing list (at least I can't find it)
> <rsalvaterra> xdarklight: Now that you mention it, neither do I.
> <nbd> he wrote to me in private for some reason
> <xdarklight> oh okay. so for me to summarize: you're saying that the xt_FLOWOFFLOAD code should be fine. in that case Aleksander's workaround (https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721) is also a no-op and the original problem would still be seen
> <rsalvaterra> xdarklight: I don't think it's a no-op, otherwise he wouldn't carry it in his tree… maybe something's wrong in the table selection? table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; does look correct, though.
> <nbd> xdarklight: well, it's not a no-op if the issue was reproduced with option flow_offloading_hw 1 in the config
> <rsalvaterra> nbd: Uh… If he has option flow_offloading_hw '1' on a system which doesn't support hw flow offloading… he gets to keep the pieces, right?
> <nbd> it shouldn't break
> <nbd> and normally i'd expect the generic flow offload api to be able to deal with it without attempting to re-enable hw offload over and over again
>
Pablo Neira Ayuso July 12, 2021, 9:46 a.m. UTC | #4
Hi Martin,

On Sun, Jul 11, 2021 at 03:02:44AM +0200, Martin Blumenstingl wrote:
> Hi Aleksander,
> 
> > The xt_flowoffload module is inconditionally setting on the hardware
> > offload flag:
> [...]
> >
> > which is triggering the slow down because packet path is allocating
> > work to offload the entry to hardware, however, this driver does not
> > support for hardware offload.
> > 
> > Probably this module can be updated to unset the flowtable flag if the
> > harware does not support hardware offload.
> 
> yesterday there was a discussion about this on the #openwrt-devel IRC
> channel. I am adding the IRC log to the end of this email because I am
> not sure if you're using IRC.
> 
> I typically don't test with flow offloading enabled (I am testing with
> OpenWrt's "default" network configuration, where flow offloading is
> disabled by default). Also I am not familiar with the flow offloading
> code at all and reading the xt_FLOWOFFLOAD code just raised more
> questions for me.
> 
> Maybe you can share some info whether your workaround from [0] "fixes"
> this issue. I am aware that it will probably break other devices. But
> maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD
> bug or rather some generic flow offload issue (as Felix suggested on
> IRC).

Maybe the user reporting this issue is enabling the --hw option?
As I said, the patch that is being proposed to be revert is just
amplifying.

The only way to trigger this bug that I can find is:

- NF_FLOWTABLE_HW_OFFLOAD is enabled.
- packets are following the software path.

I don't see yet how this can happen with upstream codebase, nftables
enables NF_FLOWTABLE_HW_OFFLOAD at configuration time, if the driver
does not support for hardware offload, then NF_FLOWTABLE_HW_OFFLOAD is
not set.

Is xt_flowoffload rejecting the rule load if user specifies --hw and
the hardware does not support for hardware offload?

By reading Felix's discussion on the IRC, it seems to me he does not
like that the packet path retries to offload flows. If so, it should
be possible to add a driver flag to disable this behaviour, so driver
developers select what they prefer that flowtable core retries to
offload entries. I can have a look into adding such flag and use it
from the mtk driver.
Felix Fietkau July 12, 2021, 10:18 a.m. UTC | #5
On 2021-07-12 11:46, Pablo Neira Ayuso wrote:
> Maybe the user reporting this issue is enabling the --hw option?
> As I said, the patch that is being proposed to be revert is just
> amplifying.
> 
> The only way to trigger this bug that I can find is:
> 
> - NF_FLOWTABLE_HW_OFFLOAD is enabled.
> - packets are following the software path.
> 
> I don't see yet how this can happen with upstream codebase, nftables
> enables NF_FLOWTABLE_HW_OFFLOAD at configuration time, if the driver
> does not support for hardware offload, then NF_FLOWTABLE_HW_OFFLOAD is
> not set.
> 
> Is xt_flowoffload rejecting the rule load if user specifies --hw and
> the hardware does not support for hardware offload?
> 
> By reading Felix's discussion on the IRC, it seems to me he does not
> like that the packet path retries to offload flows. If so, it should
> be possible to add a driver flag to disable this behaviour, so driver
> developers select what they prefer that flowtable core retries to
> offload entries. I can have a look into adding such flag and use it
> from the mtk driver.
I'd prefer making the retry behavior depend on the error code during
setup. For example, if we get -ENOMEM, -EAGAIN or something like that,
we should definitely retry.
If we get -EOPNOTSUPP or -EINVAL, I don't think a retry makes any sense
on any driver.

- Felix
diff mbox series

Patch

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 48ef7460ff30..51d8eb99764d 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -157,6 +157,7 @@  enum nf_flow_flags {
 	NF_FLOW_HW,
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
+	NF_FLOW_HW_REFRESH,
 	NF_FLOW_HW_PENDING,
 };
 
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 1d02650dd715..39c02d1aeedf 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -306,7 +306,8 @@  void flow_offload_refresh(struct nf_flowtable *flow_table,
 {
 	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
 
-	if (likely(!nf_flowtable_hw_offload(flow_table)))
+	if (likely(!nf_flowtable_hw_offload(flow_table) ||
+		   !test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags)))
 		return;
 
 	nf_flow_offload_add(flow_table, flow);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 528b2f172684..2af7bdb38407 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -902,11 +902,10 @@  static void flow_offload_work_add(struct flow_offload_work *offload)
 
 	err = flow_offload_rule_add(offload, flow_rule);
 	if (err < 0)
-		goto out;
-
-	set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
+		set_bit(NF_FLOW_HW_REFRESH, &offload->flow->flags);
+	else
+		set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
 
-out:
 	nf_flow_offload_destroy(flow_rule);
 }