Message ID | 1528185843-18645-1-git-send-email-paulb@mellanox.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] net: sched: cls: Fix offloading when ingress dev is vxlan | expand |
From: Paul Blakey <paulb@mellanox.com> Date: Tue, 5 Jun 2018 11:04:03 +0300 > When using a vxlan device as the ingress dev, we count it as a > "no offload dev", so when such a rule comes and err stop is true, > we fail early and don't try the egdev route which can offload it > through the egress device. > > Fix that by not calling the block offload if one of the devices > attached to it is not offload capable, but make sure egress on such case > is capable instead. > > Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") > Reviewed-by: Roi Dayan <roid@mellanox.com> > Acked-by: Jiri Pirko <jiri@mellanox.com> > Signed-off-by: Paul Blakey <paulb@mellanox.com> Applied and queued up for -stable, thanks.
On Tue, 5 Jun 2018 11:04:03 +0300, Paul Blakey wrote: > When using a vxlan device as the ingress dev, we count it as a > "no offload dev", so when such a rule comes and err stop is true, > we fail early and don't try the egdev route which can offload it > through the egress device. > > Fix that by not calling the block offload if one of the devices > attached to it is not offload capable, but make sure egress on such case > is capable instead. > > Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") > Reviewed-by: Roi Dayan <roid@mellanox.com> > Acked-by: Jiri Pirko <jiri@mellanox.com> > Signed-off-by: Paul Blakey <paulb@mellanox.com> Very poor commit message. What you're doing is re-enabling skip_sw filters on tunnel devices which semantically make no sense and shouldn't have been allowed in the first place. This will breaks block sharing between tunnels and HW netdevs (because you skip the tcf_block_cb_call() completely). The entire egdev idea remains badly broken accepting filters like this: # tc filter add dev vxlan0 ingress \ matchall action skip_sw \ mirred egress redirect dev lo \ mirred egress redirect dev sw1np0 Do we still care about correctness and not breaking backward compatibility?
On Tue, 5 Jun 2018 11:57:47 -0700, Jakub Kicinski wrote: > On Tue, 5 Jun 2018 11:04:03 +0300, Paul Blakey wrote: > > When using a vxlan device as the ingress dev, we count it as a > > "no offload dev", so when such a rule comes and err stop is true, > > we fail early and don't try the egdev route which can offload it > > through the egress device. > > > > Fix that by not calling the block offload if one of the devices > > attached to it is not offload capable, but make sure egress on such case > > is capable instead. > > > > Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") > > Reviewed-by: Roi Dayan <roid@mellanox.com> > > Acked-by: Jiri Pirko <jiri@mellanox.com> > > Signed-off-by: Paul Blakey <paulb@mellanox.com> > > Very poor commit message. What you're doing is re-enabling skip_sw > filters on tunnel devices which semantically make no sense and > shouldn't have been allowed in the first place. > > This will breaks block sharing between tunnels and HW netdevs (because > you skip the tcf_block_cb_call() completely). The entire egdev idea > remains badly broken accepting filters like this: > > # tc filter add dev vxlan0 ingress \ > matchall action skip_sw \ > mirred egress redirect dev lo \ > mirred egress redirect dev sw1np0 For above we probably need something like this (untested): diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 3f4cf930f809..71e5eebec31a 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -1511,7 +1511,7 @@ int tc_setup_cb_egdev_call(const struct net_device *dev, struct tcf_action_egdev *egdev = tcf_action_egdev_lookup(dev); if (!egdev) - return 0; + return err_stop ? -EOPNOTSUPP : 0; return tcf_action_egdev_cb_call(egdev, type, type_data, err_stop); } EXPORT_SYMBOL_GPL(tc_setup_cb_egdev_call); But the correct fix is to remove egdev crutch completely IMO.
From: Jakub Kicinski <kubakici@wp.pl> Date: Tue, 5 Jun 2018 11:57:47 -0700 > Do we still care about correctness and not breaking backward > compatibility? Jakub let me know if you want me to revert this change.
On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote: > From: Jakub Kicinski <kubakici@wp.pl> > Date: Tue, 5 Jun 2018 11:57:47 -0700 > > > Do we still care about correctness and not breaking backward > > compatibility? > > Jakub let me know if you want me to revert this change. Yes, I think this patch introduces a regression when block is shared between offload capable and in-capable device, therefore it should be reverted. Shared blocks went through a number of review cycles to ensure such cases are handled correctly. Longer story about the actual issue which is never explained in the commit message is this: in kernels 4.10 - 4.14 skip_sw flag was supported on tunnels in cls_flower only: static int fl_hw_replace_filter(struct tcf_proto *tp, [...] if (!tc_can_offload(dev)) { if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) || (f->hw_dev && !tc_can_offload(f->hw_dev))) { f->hw_dev = dev; return tc_skip_sw(f->flags) ? -EINVAL : 0; } dev = f->hw_dev; cls_flower.egress_dev = true; } else { f->hw_dev = dev; } In 4.15 - 4.17 with addition of shared blocks egdev mechanism was promoted to a generic TC thing supported on all filters but it no longer works with skip_sw. I'd argue skip_sw is incorrect for tunnels, because the rule will only apply to traffic ingressing on ASIC ports, not all traffic which hits the tunnel netdev. Therefore we should keep the 4.15 - 4.17 behaviour. But that's a side note, I don't think we should be breaking offload on shared blocks whether we decide to support skip_sw on tunnels or not.
On Tue, Jun 5, 2018 at 9:59 PM, Jakub Kicinski <kubakici@wp.pl> wrote: > On Tue, 5 Jun 2018 11:57:47 -0700, Jakub Kicinski wrote: >> On Tue, 5 Jun 2018 11:04:03 +0300, Paul Blakey wrote: >> > When using a vxlan device as the ingress dev, we count it as a >> > "no offload dev", so when such a rule comes and err stop is true, >> > we fail early and don't try the egdev route which can offload it >> > through the egress device. >> > >> > Fix that by not calling the block offload if one of the devices >> > attached to it is not offload capable, but make sure egress on such case >> > is capable instead. >> > >> > Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") >> > Reviewed-by: Roi Dayan <roid@mellanox.com> >> > Acked-by: Jiri Pirko <jiri@mellanox.com> >> > Signed-off-by: Paul Blakey <paulb@mellanox.com> >> >> Very poor commit message. What you're doing is re-enabling skip_sw >> filters on tunnel devices which semantically make no sense and >> shouldn't have been allowed in the first place. >> >> This will breaks block sharing between tunnels and HW netdevs (because >> you skip the tcf_block_cb_call() completely). The entire egdev idea >> remains badly broken accepting filters like this: >> >> # tc filter add dev vxlan0 ingress \ >> matchall action skip_sw \ >> mirred egress redirect dev lo \ >> mirred egress redirect dev sw1np0 > > For above we probably need something like this (untested): > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > index 3f4cf930f809..71e5eebec31a 100644 > --- a/net/sched/act_api.c > +++ b/net/sched/act_api.c > @@ -1511,7 +1511,7 @@ int tc_setup_cb_egdev_call(const struct net_device *dev, > struct tcf_action_egdev *egdev = tcf_action_egdev_lookup(dev); > > if (!egdev) > - return 0; > + return err_stop ? -EOPNOTSUPP : 0; > return tcf_action_egdev_cb_call(egdev, type, type_data, err_stop); > } > EXPORT_SYMBOL_GPL(tc_setup_cb_egdev_call); Jakub, We will test it out and let you know > But the correct fix is to remove egdev crutch completely IMO. Not against it, sometimes designs should change and be replaced with better ones, happens
On Wed, Jun 6, 2018 at 12:27 AM, Jakub Kicinski <kubakici@wp.pl> wrote: > On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote: >> From: Jakub Kicinski <kubakici@wp.pl> >> Date: Tue, 5 Jun 2018 11:57:47 -0700 >> >> > Do we still care about correctness and not breaking backward >> > compatibility? >> >> Jakub let me know if you want me to revert this change. > > Yes, I think this patch introduces a regression when block is shared > between offload capable and in-capable device, therefore it should be > reverted. Shared blocks went through a number of review cycles to > ensure such cases are handled correctly. > > > Longer story about the actual issue which is never explained in the > commit message is this: in kernels 4.10 - 4.14 skip_sw flag was > supported on tunnels in cls_flower only: > > static int fl_hw_replace_filter(struct tcf_proto *tp, > [...] > if (!tc_can_offload(dev)) { > if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) || > (f->hw_dev && !tc_can_offload(f->hw_dev))) { > f->hw_dev = dev; > return tc_skip_sw(f->flags) ? -EINVAL : 0; > } > dev = f->hw_dev; > cls_flower.egress_dev = true; > } else { > f->hw_dev = dev; > } > > > In 4.15 - 4.17 with addition of shared blocks egdev mechanism was > promoted to a generic TC thing supported on all filters but it no > longer works with skip_sw. > > I'd argue skip_sw is incorrect for tunnels, because the rule will only > apply to traffic ingressing on ASIC ports, not all traffic which hits > the tunnel netdev. This argument makes sense, however, skip_sw for tunnel decap rules **is** allowed since 4.10 and we have some sort of regression here (turns out that before and after the patch..) > Therefore we should keep the 4.15 - 4.17 behaviour. > But that's a side note, I don't think we should be breaking offload on > shared blocks whether we decide to support skip_sw on tunnels or not. skip_sw on tunnels was there before shared-block, newer features should take care not to break existing ones.
On 06/06/2018 00:27, Jakub Kicinski wrote: > On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote: >> From: Jakub Kicinski <kubakici@wp.pl> >> Date: Tue, 5 Jun 2018 11:57:47 -0700 >> >>> Do we still care about correctness and not breaking backward >>> compatibility? >> >> Jakub let me know if you want me to revert this change. > > Yes, I think this patch introduces a regression when block is shared > between offload capable and in-capable device, therefore it should be > reverted. Shared blocks went through a number of review cycles to > ensure such cases are handled correctly. > > > Longer story about the actual issue which is never explained in the > commit message is this: in kernels 4.10 - 4.14 skip_sw flag was > supported on tunnels in cls_flower only: > > static int fl_hw_replace_filter(struct tcf_proto *tp, > [...] > if (!tc_can_offload(dev)) { > if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) || > (f->hw_dev && !tc_can_offload(f->hw_dev))) { > f->hw_dev = dev; > return tc_skip_sw(f->flags) ? -EINVAL : 0; > } > dev = f->hw_dev; > cls_flower.egress_dev = true; > } else { > f->hw_dev = dev; > } > > > In 4.15 - 4.17 with addition of shared blocks egdev mechanism was > promoted to a generic TC thing supported on all filters but it no > longer works with skip_sw. > > I'd argue skip_sw is incorrect for tunnels, because the rule will only > apply to traffic ingressing on ASIC ports, not all traffic which hits > the tunnel netdev. Therefore we should keep the 4.15 - 4.17 behaviour. > > But that's a side note, I don't think we should be breaking offload on > shared blocks whether we decide to support skip_sw on tunnels or not. > Maybe we can apply my patch logic of still trying the egress dev if the block has a single device, and not shared. Is that ok with you? You're patch seems good as an add on, but the egress hw device (sw1np0) would go over the tc actions and see if it can offload such rule (output to software lo device) and would fail anyway.
On Wed, 6 Jun 2018 08:15:27 +0300, Or Gerlitz wrote: > On Wed, Jun 6, 2018 at 12:27 AM, Jakub Kicinski <kubakici@wp.pl> wrote: > > On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote: > >> From: Jakub Kicinski <kubakici@wp.pl> > >> Date: Tue, 5 Jun 2018 11:57:47 -0700 > >> > >> > Do we still care about correctness and not breaking backward > >> > compatibility? > >> > >> Jakub let me know if you want me to revert this change. > > > > Yes, I think this patch introduces a regression when block is shared > > between offload capable and in-capable device, therefore it should be > > reverted. Shared blocks went through a number of review cycles to > > ensure such cases are handled correctly. > > > > > > Longer story about the actual issue which is never explained in the > > commit message is this: in kernels 4.10 - 4.14 skip_sw flag was > > supported on tunnels in cls_flower only: > > > > static int fl_hw_replace_filter(struct tcf_proto *tp, > > [...] > > if (!tc_can_offload(dev)) { > > if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) || > > (f->hw_dev && !tc_can_offload(f->hw_dev))) { > > f->hw_dev = dev; > > return tc_skip_sw(f->flags) ? -EINVAL : 0; > > } > > dev = f->hw_dev; > > cls_flower.egress_dev = true; > > } else { > > f->hw_dev = dev; > > } > > > > > > In 4.15 - 4.17 with addition of shared blocks egdev mechanism was > > promoted to a generic TC thing supported on all filters but it no > > longer works with skip_sw. > > > > I'd argue skip_sw is incorrect for tunnels, because the rule will only > > apply to traffic ingressing on ASIC ports, not all traffic which hits > > the tunnel netdev. > > This argument makes sense, however, skip_sw for tunnel decap rules > **is** allowed since 4.10 and we have some sort of regression here (turns > out that before and after the patch..) As I said it was allowed in 4 releases, which was a mistake, in last 3 it wasn't. I understand your use case, but the semantics of skip_sw are not preserved here so we should find a different solution. > > Therefore we should keep the 4.15 - 4.17 behaviour. > > But that's a side note, I don't think we should be breaking offload on > > shared blocks whether we decide to support skip_sw on tunnels or not. > > skip_sw on tunnels was there before shared-block, newer features should > take care not to break existing ones. Oh, I agree we shouldn't break existing use cases so please don't break the use case I mentioned above. I want to set up shared block between a LAG and its members. Now since the bond will not be offload-capable TC will not even make an attempt to offload to members. I'm gonna test that my reading of the code is correct and send a revert later today, sorry.
On Wed, 6 Jun 2018 08:12:20 +0300, Or Gerlitz wrote: > On Tue, Jun 5, 2018 at 9:59 PM, Jakub Kicinski <kubakici@wp.pl> wrote: > > On Tue, 5 Jun 2018 11:57:47 -0700, Jakub Kicinski wrote: > >> On Tue, 5 Jun 2018 11:04:03 +0300, Paul Blakey wrote: > >> > When using a vxlan device as the ingress dev, we count it as a > >> > "no offload dev", so when such a rule comes and err stop is true, > >> > we fail early and don't try the egdev route which can offload it > >> > through the egress device. > >> > > >> > Fix that by not calling the block offload if one of the devices > >> > attached to it is not offload capable, but make sure egress on such case > >> > is capable instead. > >> > > >> > Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") > >> > Reviewed-by: Roi Dayan <roid@mellanox.com> > >> > Acked-by: Jiri Pirko <jiri@mellanox.com> > >> > Signed-off-by: Paul Blakey <paulb@mellanox.com> > >> > >> Very poor commit message. What you're doing is re-enabling skip_sw > >> filters on tunnel devices which semantically make no sense and > >> shouldn't have been allowed in the first place. > >> > >> This will breaks block sharing between tunnels and HW netdevs (because > >> you skip the tcf_block_cb_call() completely). The entire egdev idea > >> remains badly broken accepting filters like this: > >> > >> # tc filter add dev vxlan0 ingress \ > >> matchall action skip_sw \ > >> mirred egress redirect dev lo \ > >> mirred egress redirect dev sw1np0 > > > > For above we probably need something like this (untested): > > > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > > index 3f4cf930f809..71e5eebec31a 100644 > > --- a/net/sched/act_api.c > > +++ b/net/sched/act_api.c > > @@ -1511,7 +1511,7 @@ int tc_setup_cb_egdev_call(const struct net_device *dev, > > struct tcf_action_egdev *egdev = tcf_action_egdev_lookup(dev); > > > > if (!egdev) > > - return 0; > > + return err_stop ? -EOPNOTSUPP : 0; > > return tcf_action_egdev_cb_call(egdev, type, type_data, err_stop); > > } > > EXPORT_SYMBOL_GPL(tc_setup_cb_egdev_call); > > Jakub, > > We will test it out and let you know That's probably insufficient, because the second action should be skipped completely. But an improvement nonetheless :) > > But the correct fix is to remove egdev crutch completely IMO. > > Not against it, sometimes designs should change and be replaced > with better ones, happens Cool, I'm not sure when we'll get to it, but perhaps we can go over the ideas I presented at netconf on switchdev call next week (if it's still active)?
On Wed, 6 Jun 2018 10:59:30 +0300, Paul Blakey wrote: > Maybe we can apply my patch logic of still trying the egress dev if the > block has a single device, and not shared. Is that ok with you? I don't remember that patch but sounds pretty bad. > You're patch seems good as an add on, but the egress hw device (sw1np0) > would go over the tc actions and see if it can offload such rule (output > to software lo device) and would fail anyway. Ack, I'm not trying to solve the skip_sw problem. It's a hard one.
From: Jakub Kicinski <kubakici@wp.pl> Date: Tue, 5 Jun 2018 14:27:00 -0700 > On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote: >> From: Jakub Kicinski <kubakici@wp.pl> >> Date: Tue, 5 Jun 2018 11:57:47 -0700 >> >> > Do we still care about correctness and not breaking backward >> > compatibility? >> >> Jakub let me know if you want me to revert this change. > > Yes, I think this patch introduces a regression when block is shared > between offload capable and in-capable device, therefore it should be > reverted. Ok, I've reverted the change. Please sort out how to fix things properly.
On Wed, Jun 6, 2018 at 6:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote: > On Wed, 6 Jun 2018 08:12:20 +0300, Or Gerlitz wrote: >> On Tue, Jun 5, 2018 at 9:59 PM, Jakub Kicinski <kubakici@wp.pl> wrote: >> > But the correct fix is to remove egdev crutch completely IMO. >> Not against it, sometimes designs should change and be replaced >> with better ones, happens > Cool, I'm not sure when we'll get to it, but perhaps we can go over > the ideas I presented at netconf on switchdev call next week (if it's > still active)? sounds good, I'll check. Where the netconf slides can be found?
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index a57e112..2cd579f 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -734,10 +734,6 @@ static int tcf_block_cb_call(struct tcf_block *block, enum tc_setup_type type, int ok_count = 0; int err; - /* Make sure all netdevs sharing this block are offload-capable. */ - if (block->nooffloaddevcnt && err_stop) - return -EOPNOTSUPP; - list_for_each_entry(block_cb, &block->cb_list, list) { err = block_cb->cb(type, type_data, block_cb->cb_priv); if (err) { @@ -1580,21 +1576,31 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts *exts, int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts, enum tc_setup_type type, void *type_data, bool err_stop) { - int ok_count; + int ok_count = 0; int ret; - ret = tcf_block_cb_call(block, type, type_data, err_stop); - if (ret < 0) - return ret; - ok_count = ret; + if (!block->nooffloaddevcnt) { + ret = tcf_block_cb_call(block, type, type_data, err_stop); + if (ret < 0) + return ret; + ok_count = ret; + } if (!exts || ok_count) - return ok_count; + goto skip_egress; + ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop); if (ret < 0) return ret; ok_count += ret; +skip_egress: + /* if one of the netdevs sharing this block are not offload-capable + * make sure we succeeded in egress instead. + */ + if (block->nooffloaddevcnt && !ok_count && err_stop) + return -EOPNOTSUPP; + return ok_count; } EXPORT_SYMBOL(tc_setup_cb_call);