diff mbox series

[net,05/11] net/mlx5e: Hold reference on mirred devices while accessing them

Message ID 20200702221923.650779-6-saeedm@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net,01/11] net/mlx5: Fix eeprom support for SFP module | expand

Commit Message

Saeed Mahameed July 2, 2020, 10:19 p.m. UTC
From: Eli Cohen <eli@mellanox.com>

Net devices might be removed. For example, a vxlan device could be
deleted and its ifnidex would become invalid. Use dev_get_by_index()
instead of __dev_get_by_index() to hold reference on the device while
accessing it and release after done.

Fixes: 3c37745ec614 ("net/mlx5e: Properly deal with encap flows add/del under neigh update")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Or Gerlitz July 3, 2020, 9:33 a.m. UTC | #1
On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> From: Eli Cohen <eli@mellanox.com>
>
> Net devices might be removed. For example, a vxlan device could be
> deleted and its ifnidex would become invalid. Use dev_get_by_index()
> instead of __dev_get_by_index() to hold reference on the device while
> accessing it and release after done.

So if user space app installed a tc rule and then crashed or just
exited without
uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
never be removed?

Why this is any better from the current situation?
Eli Cohen July 5, 2020, 7:19 a.m. UTC | #2
On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote:
> On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> > From: Eli Cohen <eli@mellanox.com>
> >
> > Net devices might be removed. For example, a vxlan device could be
> > deleted and its ifnidex would become invalid. Use dev_get_by_index()
> > instead of __dev_get_by_index() to hold reference on the device while
> > accessing it and release after done.
> 
> So if user space app installed a tc rule and then crashed or just
> exited without
> uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
> never be removed?

Why do you think so? I decrease ref count, unconditionally, right after
returning from mlx5e_attach_encap().
Or Gerlitz July 6, 2020, 6:13 a.m. UTC | #3
On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
>
> On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote:
> > On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> > > From: Eli Cohen <eli@mellanox.com>
> > >
> > > Net devices might be removed. For example, a vxlan device could be
> > > deleted and its ifnidex would become invalid. Use dev_get_by_index()
> > > instead of __dev_get_by_index() to hold reference on the device while
> > > accessing it and release after done.
> >
> > So if user space app installed a tc rule and then crashed or just
> > exited without
> > uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
> > never be removed?
>
> Why do you think so? I decrease ref count, unconditionally, right after
> returning from mlx5e_attach_encap().

so what are we protecting here against? someone removing the device
while the tc rule is being added?

why do it in the driver and not higher in the tc stack? if I got you
correctly, the same problem can
happen for sw only (skip-hw) rules
Eli Cohen July 7, 2020, 2:57 p.m. UTC | #4
On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote:
> On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
> 
> so what are we protecting here against? someone removing the device
> while the tc rule is being added?
>
Not necessairly. In case of ecmp, the rule may be copied to another
eswitch. At this time, if the mirred device does not exsist anymore, we
will crash.

I saw this problem at a customer's machine and this solved the problem.
It was an older kernel but I think we have the same issue here as well.
All you have is the ifindex of the mirred device so what I did here is
required.
Or Gerlitz July 7, 2020, 4:57 p.m. UTC | #5
On Tue, Jul 7, 2020 at 5:57 PM Eli Cohen <eli@mellanox.com> wrote:
> On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote:
> > On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
> >
> > so what are we protecting here against? someone removing the device
> > while the tc rule is being added?
> >
> Not necessairly. In case of ecmp, the rule may be copied to another
> eswitch. At this time, if the mirred device does not exsist anymore, we
> will crash.
>
> I saw this problem at a customer's machine and this solved the problem.
> It was an older kernel but I think we have the same issue here as well.
> All you have is the ifindex of the mirred device so what I did here is
> required.

Repeating myself, why do it in the driver and not higher in the tc stack?
if I got you correctly, the same problem can happen for sw only (skip-hw) rules
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 75f169aef1cf..e88f98ab062f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1327,10 +1327,14 @@  mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 			continue;
 
 		mirred_ifindex = parse_attr->mirred_ifindex[out_index];
-		out_dev = __dev_get_by_index(dev_net(priv->netdev),
-					     mirred_ifindex);
+		out_dev = dev_get_by_index(dev_net(priv->netdev),
+					   mirred_ifindex);
+		if (!out_dev)
+			return -ENODEV;
+
 		err = mlx5e_attach_encap(priv, flow, out_dev, out_index,
 					 extack, &encap_dev, &encap_valid);
+		dev_put(out_dev);
 		if (err)
 			return err;