diff mbox series

[net] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock

Message ID 20230111183145.1497367-1-david.m.ertman@intel.com
State Changes Requested
Delegated to: Anthony Nguyen
Headers show
Series [net] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock | expand

Commit Message

Dave Ertman Jan. 11, 2023, 6:31 p.m. UTC
RDMA is not supported in ice on a PF that has been added to a bonded
interface. To enforce this, when an interface enters a bond, we unplug
the auxiliary device that supports RDMA functionality.  This unplug
currently happens in the context of handling the netdev bonding event.
This event is sent to the ice driver under RTNL context.  This is causing
a deadlock where the RDMA driver is waiting for the RTNL lock to complete
the removal.

Defer the unplugging/re-plugging of the auxiliary device to the service
task so that it is not performed under the RTNL lock context.

Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
 drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
 2 files changed, 12 insertions(+), 19 deletions(-)

Comments

Michal Swiatkowski Jan. 12, 2023, 6:43 a.m. UTC | #1
On Wed, Jan 11, 2023 at 10:31:45AM -0800, Dave Ertman wrote:
> RDMA is not supported in ice on a PF that has been added to a bonded
> interface. To enforce this, when an interface enters a bond, we unplug
> the auxiliary device that supports RDMA functionality.  This unplug
> currently happens in the context of handling the netdev bonding event.
> This event is sent to the ice driver under RTNL context.  This is causing
> a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> the removal.
> 
> Defer the unplugging/re-plugging of the auxiliary device to the service
> task so that it is not performed under the RTNL lock context.
> 
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> -- 
Loogs good to me, even cleaner than previous solution.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Thanks, Michal
[...]
> 2.37.3
> 
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Jay Vosburgh Jan. 13, 2023, 6:52 p.m. UTC | #2
Dave Ertman <david.m.ertman@intel.com> wrote:

>RDMA is not supported in ice on a PF that has been added to a bonded
>interface. To enforce this, when an interface enters a bond, we unplug
>the auxiliary device that supports RDMA functionality.  This unplug
>currently happens in the context of handling the netdev bonding event.
>This event is sent to the ice driver under RTNL context.  This is causing
>a deadlock where the RDMA driver is waiting for the RTNL lock to complete
>the removal.

	Why is RDMA disallowed on interfaces that are members of a bond?
Is this something specific to ice, or a generic problem with RDMA in
general?

	-J

>Defer the unplugging/re-plugging of the auxiliary device to the service
>task so that it is not performed under the RTNL lock context.
>
>Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
>Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
>Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
>Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
>Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
> drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
> 2 files changed, 12 insertions(+), 19 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
>index 2f0b604abc5e..0ad9bab84617 100644
>--- a/drivers/net/ethernet/intel/ice/ice.h
>+++ b/drivers/net/ethernet/intel/ice/ice.h
>@@ -506,6 +506,7 @@ enum ice_pf_flags {
> 	ICE_FLAG_VF_VLAN_PRUNING,
> 	ICE_FLAG_LINK_LENIENT_MODE_ENA,
> 	ICE_FLAG_PLUG_AUX_DEV,
>+	ICE_FLAG_UNPLUG_AUX_DEV,
> 	ICE_FLAG_MTU_CHANGED,
> 	ICE_FLAG_GNSS,			/* GNSS successfully initialized */
> 	ICE_PF_FLAGS_NBITS		/* must be last */
>@@ -950,16 +951,11 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf)
>  */
> static inline void ice_clear_rdma_cap(struct ice_pf *pf)
> {
>-	/* We can directly unplug aux device here only if the flag bit
>-	 * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
>-	 * could race with ice_plug_aux_dev() called from
>-	 * ice_service_task(). In this case we only clear that bit now and
>-	 * aux device will be unplugged later once ice_plug_aux_device()
>-	 * called from ice_service_task() finishes (see ice_service_task()).
>+	/* defer unplug to service task to avoid RTNL lock and
>+	 * clear PLUG bit so that pending plugs don't interfere
> 	 */
>-	if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
>-		ice_unplug_aux_dev(pf);
>-
>+	clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
>+	set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
> 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
> }
> #endif /* _ICE_H_ */
>diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
>index a9a7f8b52140..e2bc1340833e 100644
>--- a/drivers/net/ethernet/intel/ice/ice_main.c
>+++ b/drivers/net/ethernet/intel/ice/ice_main.c
>@@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work)
> 		}
> 	}
> 
>-	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
>-		/* Plug aux device per request */
>+	/* Plug aux device per request */
>+	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> 		ice_plug_aux_dev(pf);
> 
>-		/* Mark plugging as done but check whether unplug was
>-		 * requested during ice_plug_aux_dev() call
>-		 * (e.g. from ice_clear_rdma_cap()) and if so then
>-		 * plug aux device.
>-		 */
>-		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
>-			ice_unplug_aux_dev(pf);
>-	}
>+	/* unplug aux dev per request, if an unplug request came in
>+	 * while processing a plug request, this will handle it
>+	 */
>+	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
>+		ice_unplug_aux_dev(pf);
> 
> 	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
> 		struct iidc_event *event;
>-- 
>2.37.3
>
>_______________________________________________
>Intel-wired-lan mailing list
>Intel-wired-lan@osuosl.org
>https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Jaroslav Pulchart Jan. 14, 2023, 1:03 p.m. UTC | #3
Hello

I tested this patch with 6.1.5 and 6.1.6 and the system has a working network.

Best,
Jaroslav P.


st 11. 1. 2023 v 19:31 odesílatel Dave Ertman <david.m.ertman@intel.com> napsal:
>
> RDMA is not supported in ice on a PF that has been added to a bonded
> interface. To enforce this, when an interface enters a bond, we unplug
> the auxiliary device that supports RDMA functionality.  This unplug
> currently happens in the context of handling the netdev bonding event.
> This event is sent to the ice driver under RTNL context.  This is causing
> a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> the removal.
>
> Defer the unplugging/re-plugging of the auxiliary device to the service
> task so that it is not performed under the RTNL lock context.
>
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
>  drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
>  2 files changed, 12 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 2f0b604abc5e..0ad9bab84617 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -506,6 +506,7 @@ enum ice_pf_flags {
>         ICE_FLAG_VF_VLAN_PRUNING,
>         ICE_FLAG_LINK_LENIENT_MODE_ENA,
>         ICE_FLAG_PLUG_AUX_DEV,
> +       ICE_FLAG_UNPLUG_AUX_DEV,
>         ICE_FLAG_MTU_CHANGED,
>         ICE_FLAG_GNSS,                  /* GNSS successfully initialized */
>         ICE_PF_FLAGS_NBITS              /* must be last */
> @@ -950,16 +951,11 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf)
>   */
>  static inline void ice_clear_rdma_cap(struct ice_pf *pf)
>  {
> -       /* We can directly unplug aux device here only if the flag bit
> -        * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
> -        * could race with ice_plug_aux_dev() called from
> -        * ice_service_task(). In this case we only clear that bit now and
> -        * aux device will be unplugged later once ice_plug_aux_device()
> -        * called from ice_service_task() finishes (see ice_service_task()).
> +       /* defer unplug to service task to avoid RTNL lock and
> +        * clear PLUG bit so that pending plugs don't interfere
>          */
> -       if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> -               ice_unplug_aux_dev(pf);
> -
> +       clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
> +       set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
>         clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
>  }
>  #endif /* _ICE_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index a9a7f8b52140..e2bc1340833e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work)
>                 }
>         }
>
> -       if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
> -               /* Plug aux device per request */
> +       /* Plug aux device per request */
> +       if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
>                 ice_plug_aux_dev(pf);
>
> -               /* Mark plugging as done but check whether unplug was
> -                * requested during ice_plug_aux_dev() call
> -                * (e.g. from ice_clear_rdma_cap()) and if so then
> -                * plug aux device.
> -                */
> -               if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> -                       ice_unplug_aux_dev(pf);
> -       }
> +       /* unplug aux dev per request, if an unplug request came in
> +        * while processing a plug request, this will handle it
> +        */
> +       if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
> +               ice_unplug_aux_dev(pf);
>
>         if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
>                 struct iidc_event *event;
> --
> 2.37.3
>
Dave Ertman Jan. 17, 2023, 6:01 p.m. UTC | #4
> -----Original Message-----
> From: Jay Vosburgh <jay.vosburgh@canonical.com>
> Sent: Friday, January 13, 2023 10:53 AM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com>
> Subject: Re: [Intel-wired-lan] [PATCH net] ice: avoid bonding causing auxiliary
> plug/unplug under RTNL lock
> 
> Dave Ertman <david.m.ertman@intel.com> wrote:
> 
> >RDMA is not supported in ice on a PF that has been added to a bonded
> >interface. To enforce this, when an interface enters a bond, we unplug
> >the auxiliary device that supports RDMA functionality.  This unplug
> >currently happens in the context of handling the netdev bonding event.
> >This event is sent to the ice driver under RTNL context.  This is causing
> >a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> >the removal.
> 
> 	Why is RDMA disallowed on interfaces that are members of a bond?
> Is this something specific to ice, or a generic problem with RDMA in
> general?
> 
> 	-J

This is a current limitation of the ice driver.

DaveE
G, GurucharanX Jan. 25, 2023, 9:30 a.m. UTC | #5
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Dave Ertman
> Sent: Thursday, January 12, 2023 12:02 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Subject: [Intel-wired-lan] [PATCH net] ice: avoid bonding causing auxiliary
> plug/unplug under RTNL lock
> 
> RDMA is not supported in ice on a PF that has been added to a bonded
> interface. To enforce this, when an interface enters a bond, we unplug the
> auxiliary device that supports RDMA functionality.  This unplug currently
> happens in the context of handling the netdev bonding event.
> This event is sent to the ice driver under RTNL context.  This is causing a
> deadlock where the RDMA driver is waiting for the RTNL lock to complete the
> removal.
> 
> Defer the unplugging/re-plugging of the auxiliary device to the service task
> so that it is not performed under the RTNL lock context.
> 
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-
> 0487c95e395d@leemhuis.info/
> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
>  drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
>  2 files changed, 12 insertions(+), 19 deletions(-)
> 

Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 2f0b604abc5e..0ad9bab84617 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -506,6 +506,7 @@  enum ice_pf_flags {
 	ICE_FLAG_VF_VLAN_PRUNING,
 	ICE_FLAG_LINK_LENIENT_MODE_ENA,
 	ICE_FLAG_PLUG_AUX_DEV,
+	ICE_FLAG_UNPLUG_AUX_DEV,
 	ICE_FLAG_MTU_CHANGED,
 	ICE_FLAG_GNSS,			/* GNSS successfully initialized */
 	ICE_PF_FLAGS_NBITS		/* must be last */
@@ -950,16 +951,11 @@  static inline void ice_set_rdma_cap(struct ice_pf *pf)
  */
 static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 {
-	/* We can directly unplug aux device here only if the flag bit
-	 * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
-	 * could race with ice_plug_aux_dev() called from
-	 * ice_service_task(). In this case we only clear that bit now and
-	 * aux device will be unplugged later once ice_plug_aux_device()
-	 * called from ice_service_task() finishes (see ice_service_task()).
+	/* defer unplug to service task to avoid RTNL lock and
+	 * clear PLUG bit so that pending plugs don't interfere
 	 */
-	if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
-		ice_unplug_aux_dev(pf);
-
+	clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
+	set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 }
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a9a7f8b52140..e2bc1340833e 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2290,18 +2290,15 @@  static void ice_service_task(struct work_struct *work)
 		}
 	}
 
-	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
-		/* Plug aux device per request */
+	/* Plug aux device per request */
+	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
 		ice_plug_aux_dev(pf);
 
-		/* Mark plugging as done but check whether unplug was
-		 * requested during ice_plug_aux_dev() call
-		 * (e.g. from ice_clear_rdma_cap()) and if so then
-		 * plug aux device.
-		 */
-		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
-			ice_unplug_aux_dev(pf);
-	}
+	/* unplug aux dev per request, if an unplug request came in
+	 * while processing a plug request, this will handle it
+	 */
+	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
+		ice_unplug_aux_dev(pf);
 
 	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
 		struct iidc_event *event;