Message ID | 20200702221923.650779-6-saeedm@mellanox.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net,01/11] net/mlx5: Fix eeprom support for SFP module | expand |
On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote: > From: Eli Cohen <eli@mellanox.com> > > Net devices might be removed. For example, a vxlan device could be > deleted and its ifnidex would become invalid. Use dev_get_by_index() > instead of __dev_get_by_index() to hold reference on the device while > accessing it and release after done. So if user space app installed a tc rule and then crashed or just exited without uninstalling the rule, the mirred (vxlan, vf rep, etc) device could never be removed? Why this is any better from the current situation?
On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote: > On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote: > > From: Eli Cohen <eli@mellanox.com> > > > > Net devices might be removed. For example, a vxlan device could be > > deleted and its ifnidex would become invalid. Use dev_get_by_index() > > instead of __dev_get_by_index() to hold reference on the device while > > accessing it and release after done. > > So if user space app installed a tc rule and then crashed or just > exited without > uninstalling the rule, the mirred (vxlan, vf rep, etc) device could > never be removed? Why do you think so? I decrease ref count, unconditionally, right after returning from mlx5e_attach_encap().
On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote: > > On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote: > > On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote: > > > From: Eli Cohen <eli@mellanox.com> > > > > > > Net devices might be removed. For example, a vxlan device could be > > > deleted and its ifnidex would become invalid. Use dev_get_by_index() > > > instead of __dev_get_by_index() to hold reference on the device while > > > accessing it and release after done. > > > > So if user space app installed a tc rule and then crashed or just > > exited without > > uninstalling the rule, the mirred (vxlan, vf rep, etc) device could > > never be removed? > > Why do you think so? I decrease ref count, unconditionally, right after > returning from mlx5e_attach_encap(). so what are we protecting here against? someone removing the device while the tc rule is being added? why do it in the driver and not higher in the tc stack? if I got you correctly, the same problem can happen for sw only (skip-hw) rules
On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote: > On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote: > > so what are we protecting here against? someone removing the device > while the tc rule is being added? > Not necessairly. In case of ecmp, the rule may be copied to another eswitch. At this time, if the mirred device does not exsist anymore, we will crash. I saw this problem at a customer's machine and this solved the problem. It was an older kernel but I think we have the same issue here as well. All you have is the ifindex of the mirred device so what I did here is required.
On Tue, Jul 7, 2020 at 5:57 PM Eli Cohen <eli@mellanox.com> wrote: > On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote: > > On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote: > > > > so what are we protecting here against? someone removing the device > > while the tc rule is being added? > > > Not necessairly. In case of ecmp, the rule may be copied to another > eswitch. At this time, if the mirred device does not exsist anymore, we > will crash. > > I saw this problem at a customer's machine and this solved the problem. > It was an older kernel but I think we have the same issue here as well. > All you have is the ifindex of the mirred device so what I did here is > required. Repeating myself, why do it in the driver and not higher in the tc stack? if I got you correctly, the same problem can happen for sw only (skip-hw) rules
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 75f169aef1cf..e88f98ab062f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1327,10 +1327,14 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, continue; mirred_ifindex = parse_attr->mirred_ifindex[out_index]; - out_dev = __dev_get_by_index(dev_net(priv->netdev), - mirred_ifindex); + out_dev = dev_get_by_index(dev_net(priv->netdev), + mirred_ifindex); + if (!out_dev) + return -ENODEV; + err = mlx5e_attach_encap(priv, flow, out_dev, out_index, extack, &encap_dev, &encap_valid); + dev_put(out_dev); if (err) return err;